Resolved | Feb 26, 2024 | 10:40 GMT+00:00
Incident:
During February 20, 09:21 to 10:41 PM IST and February 21, 07:26 to 09:11 PM IST, a production outage occurred in the Identity Server, disrupting service for multiple customers.
Duration:
The slowness persisted for 185 minutes in total, causing interruptions in service availability and connectivity.
Impact:
Customers using document360 portal and private projects had intermittent timeouts.
Cause:
During peak traffic periods, the system experienced performance degradation characterized by delayed response times and subsequent timeouts. Automatic horizontal scaling mechanisms were activated to manage the increased load on our servers. However, as the horizontal scaling reached a critical threshold, it exacerbated connection issues with the SQL database, resulting in timeouts.
Resolution:
To address this challenge, we implemented a solution by vertically scaling our server infrastructure to higher-core machines capable of handling increased loads with high availability. This adjustment ensures improved performance and reliability during peak usage periods, thereby mitigating potential disruptions caused by resource limitations.
Thank you for your understanding and continued support.
Should you have any further inquiries or require additional information, please do not hesitate to reach out to our support team.
Monitoring | Feb 22, 2024 | 07:01 GMT+00:00
We have taken mitigation measures for the identity service slowness, the application is working normal now and we are monitoring the situation.
Investigating | Feb 21, 2024 | 15:25 GMT+00:00
We are facing slowness in our identity/authentication servers, we are currently investigating the issue.