Incident Overview
Between December 29 and December 30, 2025, a subset of Secret Server Cloud (SSC) customers in the US region experienced failures accessing the SSC Service Portal when traffic was routed through the Chicago, IL data center Point of Presence (PoP). Customers connecting through other US regions were not impacted.
The issue manifested as intermittent portal inaccessibility, increased latency, and reduced application availability. Standard network diagnostics (e.g., ICMP, MTR) often appeared normal, making the issue difficult to detect from a customer perspective.
Impact Windows (EST)
- December 29, 2025: 13:32 – 18:17
- December 30, 2025: 13:17 – 14:29
- December 30, 2025: 16:50 – 18:39
Affected
- SSC tenants in the US region whose traffic was routed via the Chicago PoP
- Customers using outbound IP allow-listing experienced additional connectivity challenges after traffic rerouting
Unaffected
- SSC customers routed through non-Chicago PoPs
- Customers accessing services through alternate network paths or regions
The incident was fully mitigated on December 30, 2025 at 18:39 EST
Root Cause
The root cause was a software defect within the network vendor’s Chicago (CHI) Point of Presence (PoP).
Between December 29 and December 30, 2025, the network vendor identified an issue where, under specific conditions, a subset of TCP connection attempts failed to complete. While initial packets were sent, connection establishment was not consistently acknowledged by backend systems. This resulted in sporadic application-level failures despite network-level health checks appearing normal.
As mitigation, the network vendor bypassed the affected Chicago PoP and rerouted traffic through alternate PoP locations, restoring service stability. A permanent fix was subsequently implemented by the network vendor, and the environment has remained stable since.
Preventive Actions
Actions Implemented by Network Vendor
- A permanent software fix has been deployed by the network vendor to address the underlying defect that could prevent successful TCP connection establishment under certain conditions.
- Enhanced monitoring and alerting have been implemented at the PoP level to enable earlier detection of similar intermittent TCP handshake failures.
- Updated the runbooks for the Network Operations Center to ensure rapid rerouting can be performed if required in the future.
Customer Resiliency Recommendations
- SSC Customers using outbound IP allow-listing on their network should ensure that all documented IP ranges are permitted to prevent connectivity issues during traffic rerouting or failover events.
While outbound requests to SSC primarily resolve to six IP addresses in 45.60.x.x CIDR range, we recommend that the expanded list documented here is allowed on your firewall, as the SSC tenant may resolve to a different IP address during failover events.
- SSC customers are encouraged to deploy Distributed Engine in a secondary region to improve resilience and disaster recovery capabilities.
- SSC Customers are advised to maintain a secondary ISP or alternate network path that can be used during upstream network disruptions.