Platform: US - Intermittent Login Failures

Incident Report for Delinea

Postmortem

Incident Overview

On May 28, 2025, between 7:30 PM and 7:55 PM Central, a subset of customers in the US region experienced intermittent login issues when accessing their tenants. Affected users may have encountered slow responses or HTTP 504 errors during login attempts. The issue was isolated to the US region; customers in other regions were not affected. Service was fully restored by 7:55 PM Central.

Root Cause

The incident was triggered by a recent change in how Active Directory (AD) user change events are processed. A fix in the latest release corrected a bug that had previously routed AD change events directly through the API. With the fix in place, these events were correctly routed through the message queueing system to be processed by worker services.

However, due to the significantly higher volume of AD user changes in the production environment (compared to testing environments), this caused a surge in messages that overwhelmed the message queueing instance. As a result, the instance ran out of disk space. This prevented other platform services from reading or writing to the shared message bus, leading to request timeouts and intermittent login failures for impacted users.

Our operations team resolved the issue by clearing a backlog of system messages on the message queueing instance, which restored normal login behavior. No customer data was lost during this process, as the cleared messages were temporary and used only for background processing.

Preventive Actions

To prevent recurrence and improve resilience, we are taking the following actions:

We introduced a configuration setting (defaulted to off) to control whether AD user change messages are routed to background workers per region. This will be available in our next release.
Additional changes are being developed to better manage high-volume message loads and to prevent queue overflows. These improvements will be delivered in future releases.
Enhanced monitoring has been put in place to detect and alert on message queue growth and disk space usage before they impact service availability.

We sincerely apologize for the disruption and are committed to continuing to strengthen the reliability of our platform.

Posted Jun 03, 2025 - 13:56 EDT

Resolved

Between May 28th 19:30 Central and 19:55 Central, a subset of customers in the US region experienced intermittent login failures when accessing their tenant. Affected users may have encountered slowness or received HTTP 504 errors during login attempts.

Customers in other regions were not impacted.
Access has been fully restored and the issue is now resolved. We are continuing to monitor the platform to ensure stability. If you have any questions or require further assistance, please contact our support team at https://support.delinea.com.

Posted May 28, 2025 - 20:30 EDT