Platform: Elevated Gateway Timeout Errors and Slowness

Incident Report for Delinea

Postmortem

Incident Overview

A subset of Platform customers experienced intermittent "504 Gateway Time-out" errors and degraded performance while accessing their Platform tenants. Notably, slower page load times were reported when navigating through web pages.

The issue was isolated to customers utilizing the Delinea Connector for secure integration between the Delinea Platform and their on-premises Active Directory (AD). Logs from the Delinea Connector indicated failures to connect to some domain controllers, with the following error:

ADEnvCache: Failed to cache connection to LDAP://{{customer’s domain controller}}: System.Runtime.InteropServices.COMException (0x8007203A): The server is not operational.

Root cause

On April 2, 18:30 UTC, a new version (v6.2.382) of the Delinea Connector was released. This version introduced changes in how the connector discovers Active Directory topology, with the intent to better support selective domain controller usage.

However, this release introduced issues in AD environments where some domain controllers were unreachable. The new discovery method did not adequately handle these partial outages, causing:

  • Lookup operations (e.g., for user/group resolution) to stall.
  • Eventual API request timeouts, leading to 504 errors.
  • Performance issues when failing over between domain controllers.

Tenants with fully healthy and reachable domain controllers were not impacted.

Mitigation and Resolution

  • Our support team provided impacted customers with a registry key workaround to bypass unreachable domain controllers, which restored API performance and prevented timeouts.
  • We reverted the v6.2.382 release in our update channel to prevent additional Delinea Connector instances from auto-updating to the problematic version.
  • We released a newer version (v6.2.383) on April 10th, reverting to the previous, more resilient AD topology discovery method. Your Delinea Connectors will update to the latest version if auto-update is enabled. Please see our documentation to configure auto-update setting.
  • Additionally, v6.2.383 includes support for a registry-based configuration to whitelist specific domain controllers for environments requiring it.

For technical details and configuration steps, refer to our Support Article

Preventive Actions

  • Expand QA test scenarios to include partially degraded AD environments and simulate unreachable domain controllers.
  • Enhance the Delinea Connector logging to clearly indicate AD topology discovery failures and fallback behavior.
Posted Apr 11, 2025 - 17:05 EDT

Resolved

This incident has been resolved.
Posted Apr 03, 2025 - 15:35 EDT

Identified

We have identified the issue affecting a subset of Platform customers experiencing "504 Gateway Time-out" errors. A fix is available for impacted customers.

Our support team can assist with implementing the fix on your connector machines. Please reach out to our support team by opening a support case for assistance.

We appreciate your cooperation and will continue to monitor the situation.
Posted Apr 03, 2025 - 14:58 EDT

Update

We are continuing to investigate this issue. If you have an active support case, please share the connector logs with our support team to assist with the investigation.

We will provide further updates as we make progress. Thank you for your patience.
Posted Apr 03, 2025 - 14:10 EDT

Investigating

We are aware that a subset of Platform customers are experiencing "504 Gateway Time-out" errors when accessing their tenants. Some users may also notice slower page load times. Our engineering team is actively investigating, and we have all hands on deck to identify and resolve the issue as quickly as possible.

We will provide an update as soon as we have more information.

Thank you for your patience.
Posted Apr 03, 2025 - 12:09 EDT
This incident affected: US (Platform) and CA (Platform).