Document360 Status

Facing slowness in authentication

Resolved | Feb 26, 2024 | 10:40 GMT+00:00

Incident:
During February 20, 09:21 to 10:41 PM IST and February 21, 07:26 to 09:11 PM IST, a production outage occurred in the Identity Server, disrupting service for multiple customers. 

Duration:
The slowness persisted for 185 minutes in total, causing interruptions in service availability and connectivity. 

Impact:
Customers using document360 portal and private projects had intermittent timeouts. 

Cause:
During peak traffic periods, the system experienced performance degradation characterized by delayed response times and subsequent timeouts. Automatic horizontal scaling mechanisms were activated to manage the increased load on our servers. However, as the horizontal scaling reached a critical threshold, it exacerbated connection issues with the SQL database, resulting in timeouts. 

Resolution:
To address this challenge, we implemented a solution by vertically scaling our server infrastructure to higher-core machines capable of handling increased loads with high availability. This adjustment ensures improved performance and reliability during peak usage periods, thereby mitigating potential disruptions caused by resource limitations. 

Thank you for your understanding and continued support.
Should you have any further inquiries or require additional information, please do not hesitate to reach out to our support team.

+Show history

Monitoring | Feb 22, 2024 | 07:01 GMT+00:00

We have taken mitigation measures for the identity service slowness, the application is working normal now and we are monitoring the situation.
Investigating | Feb 21, 2024 | 15:25 GMT+00:00

We are facing slowness in our identity/authentication servers, we are currently investigating the issue.

Service disruption in private projects KB Site and Portal

Resolved | Feb 26, 2024 | 10:40 GMT+00:00

Incident:
During February 20, 09:21 to 10:41 PM IST and February 21, 07:26 to 09:11 PM IST, a production outage occurred in the Identity Server, disrupting service for multiple customers. 

Duration:
The slowness persisted for 185 minutes in total, causing interruptions in service availability and connectivity. 

Impact:
Customers using document360 portal and private projects had intermittent timeouts. 

Cause:
During peak traffic periods, the system experienced performance degradation characterized by delayed response times and subsequent timeouts. Automatic horizontal scaling mechanisms were activated to manage the increased load on our servers. However, as the horizontal scaling reached a critical threshold, it exacerbated connection issues with the SQL database, resulting in timeouts. 

Resolution:
To address this challenge, we implemented a solution by vertically scaling our server infrastructure to higher-core machines capable of handling increased loads with high availability. This adjustment ensures improved performance and reliability during peak usage periods, thereby mitigating potential disruptions caused by resource limitations. 

Thank you for your understanding and continued support.
Should you have any further inquiries or require additional information, please do not hesitate to reach out to our support team.

+Show history

Identified | Feb 20, 2024 | 04:02 GMT+00:00

Between 04:02PM and 04:57 PM UTC on 20 February 2024, Customers who had set up private mode experienced inaccessibility to the Knowledgebase and portal site's pages, causing a disruption in service. Requests for the home page or portal would consistently time out after approximately 30 seconds.

Root Cause:
After conducting an investigation, it was determined that the identity server encountered a sudden and substantial increase in traffic, characterized by an unusually high volume of requests. This surge led to the database system reaching its maximum concurrent requests limit, resulting in data write issues.

Mitigation:
The delay in the data write process resulted in temporary system slowness, which, in turn, caused a delay in the scaling up process as defined in the system.

Next Steps:
We are currently engaged in a comprehensive analysis to identify potential areas for improvement to prevent similar incidents in the future.

We appreciate the understanding and patience of our customers during this incident, and we are dedicated to continuously enhancing our systems to provide a seamless and reliable service.

False Alert : Knowledgebase (instance 3) is down

Resolved | Dec 26, 2023 | 23:22 GMT+00:00

Earlier today, our monitoring system identified a ping URL as being offline, attributing it to a DNS issue. Subsequent to a meticulous investigation, we have determined that this alert was, indeed, a false positive.

Our findings affirm that the ping URL was fully operational throughout the specified period, and there was no authentic disruption in service. We sincerely apologize for any confusion or inconvenience this erroneous alert may have caused.

Should you have further inquiries, concerns, or insights pertaining to this incident, please don't hesitate to contact our dedicated support team. Your feedback is of great value to us as we strive to enhance our systems and processes continually.

Service disruption in private projects Portal and KB Site

Resolved | Dec 14, 2023 | 15:24 GMT+00:00

Impact:
Customers who had set up private mode experienced inaccessibility to the Knowledgebase and portal site's pages, causing a disruption in service. Requests for the home page or portal would consistently time out after approximately 30 seconds.

Cause:
Upon investigation, it was found that the identity server experienced a sudden and significant surge, marked by an unusually high number of requests. This influx, in turn, triggered a timeout situation in the SQL server. Our analysis revealed a notable increase in thread connections, directly attributable to the surge in requests. This surge, coupled with connection pool starvation, led to the inability to establish necessary connections with the database, resulting in the observed timeouts.

Mitigation:
To address the immediate impact, our auto-heal setup and process efficiently identified the elevated server load and initiated the scaling out of resources. This reactive measure ensured that the system could adapt to the increased demand, mitigating the severity of the issue and restoring accessibility to the Knowledgebase and portal pages for users in private mode.

Next Steps:
We are currently engaged in a comprehensive analysis to identify potential areas for improvement to prevent similar incidents in the future. Our aim is to implement proactive measures that will enhance the system's resilience and responsiveness, ensuring a more robust and reliable experience for our users even during periods of unexpected demand.

We appreciate the understanding and patience of our customers during this incident, and we are dedicated to continuously enhancing our systems to provide a seamless and reliable service.

User authentication service got high response time

Resolved | Nov 15, 2023 | 09:56 GMT+00:00

Root Cause Analysis (RCA) Report
Incident Summary: An unusual surge in traffic was directed towards one instance of our authentication server. The increased traffic load overwhelmed the affected instance, leading to a partial outage and impacting the availability and performance of our authentication services. The incident was identified as an Azure outage in traffic manager services affecting our hosting infrastructure.

Incident Resolution: We have taken comprehensive measures to enhance the reliability and availability of our authentication and other services. This includes strengthening our auto-scaling configurations to better handle future demands and potential challenges. Our team has worked diligently to ensure that these services are not only more resilient but also more efficient in responding to varying loads and usage patterns.

Conclusion: We apologize for any inconvenience caused and are taking steps to prevent future disruptions. Thank you for your understanding and support. For further questions or information, please contact our support team.

+Show history

Open | Nov 15, 2023 | 09:55 GMT+00:00

Beginning on Friday, November 10, 2023, at 10:34 UTC, the Document360 user authentication service got a high response time impacting some of our select customers. This issue is impacting the KB site and portal login functionality.
We are actively working on restoring the services. We apologize for any impact this incident has had on your business. We treated the disruption as our highest priority to ensure resolution.

Knowlege base site home page & analytics services down

Resolved | Nov 08, 2023 | 09:27 GMT+00:00

Root Cause Analysis (RCA) Report
Incident Summary: One of the Analytics cluster nodes was identified to be in an unhealthy state. This situation was triggered by an overwhelming surge in the volume of incoming requests and data processing in our analytics servers. Consequently, the delayed data synchronization in the secondary node resulted in timeouts for new requests, leading to the failure of the home page to load when the widget was configured to display on the home page. However, other sections such as the Docs page and articles pages remained unaffected.

Incident Resolution: The database system responded to the issue by initiating an automatic recovery process, scaling up to the next available premium tier once the node's status transitioned back to a healthy state. This process ensured the restoration of the Analytics services, ultimately resolving the issue.

Conclusion: We deeply regret any disruption this incident may have caused to your business operations. We are committed to implementing the necessary measures to prevent such occurrences and ensure the continued seamless functioning of our services. Thank you for your understanding and continued support.
Should you have any further inquiries or require additional information, please do not hesitate to reach out to our support team.

+Show history

Open | Nov 07, 2023 | 21:22 GMT+00:00

Beginning on Tuesday, November 7, 2023 at 17:02 UTC, Document360 knowledge base site home page and analytics services experienced an outage impacting some of our customers. This issue is impacting KB site home page and analytics related services.

We’re actively working on to restore the services. We apologize for any impact this incident has had on your business. We treated the disruption as our highest priority to ensure resolution.

False Alert : Knowledgebase (instance 3) is down

Resolved | Aug 11, 2023 | 00:00 GMT+01:00

Earlier today, our monitoring system flagged a ping URL as being down due to a DNS issue. After a thorough investigation, we've concluded that this alert was, in fact, a false positive. The ping URL was fully operational during the specified period, and there was no genuine disruption in service.

We apologize for any confusion or inconvenience this false alert may have caused.

If you have any further questions, concerns, or insights related to this incident, please feel free to reach out to our support team. Your feedback is invaluable as we work to improve our systems and processes.

Document360 API Outage

Resolved | Jul 07, 2023 | 12:08 GMT+01:00

The issue is resolve and the api is stable now.

+Show history

Monitoring | Jul 07, 2023 | 10:34 GMT+01:00

We have observed network level issue in our portal API, we are actively investigating the situation.

update: 10 AM UTC
We have taken measures to address the problem and are actively monitoring the situation to ensure stability. Additionally, we are conducting an investigation to determine the underlying cause of the issue.

Portal down - Azure service interruption

Resolved | Jan 25, 2023 | 07:10 GMT+00:00

We are encountering an issue accessing our document360 portal application due to an outage in Azure Services. We are waiting for an update from the cloud platform on the same. We will provide more updates shortly.

Summary of Impact:
Customers experienced issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources.

Mitigation:
MS Azure identified a recent change to WAN as the underlying cause and have rolled back that change. Issue is resolved and all services are functioning now.

Portal down - Azure service interruption

Closed | Jul 21, 2022 | 05:15 GMT+01:00

We are encountering an issue accessing our document360 portal application due to an outage in one of the services in Azure (SQL Database). We are waiting for an update from the cloud platform on the same. We will update once it's resolved.

We observed an incident in Azure SQL Database West Europe region which affects our EU customers accessing portal and KB sites.
Azure SQL engineers identified a configuration change on the metadata drop operation which has caused the overall issue.

This incident has been resolved.

False Alert : Knowledgebase (instance 3) is down

Resolved | Jul 18, 2022 | 11:11 GMT+01:00

We observed a false alert (malfunction) in our monitoring system which caused to send downtime incident notification. We notified our monitoring system to resolve this issue at the earliest. All our servers are functioning seamlessly.

Portal and Customer website slowness and downtime

Closed | Feb 02, 2022 | 17:32 GMT+00:00

The situation is now resolved, all the sites and portal are up and running. However, we are closely monitoring the status.

+Show history

Open | Feb 02, 2022 | 15:46 GMT+00:00

We are currently investigating an issue that's affecting both the portal and customer-facing website. Our team is actively working on resolving this issue.

Certificate issue

Closed | Apr 05, 2021 | 08:34 GMT+01:00

Currently, we are experiencing issues related to our SSL certificate blocking users from accessing our portal. We are currently working on resolving this issue.

update - 05th April, 8:45 GMT
-------------------------------------------
The certificate issue is now resolved and all the services are back to normal.

Document360 DNS issue

Closed | Apr 01, 2021 | 22:21 GMT+01:00

There was a DNS issue with azure cloud, which caused a short outage of our service. The services are now restored. We are monitoring the situation.

Link to azure incident page: https://status.azure.com/en-in/status/history/

The issue is resolved by Microsoft Azure.

Knowledge base site rendering issue

Resolved | Mar 28, 2021 | 05:57 GMT+01:00

At the moment we have identified an issue affecting our customer's knowledge base site. The engineering team is currently investigating the issue. We will keep updating the status here.

Update
-----------
We identified the issue is caused by one of the application instances and customers hosted on that particular instance. The issue is now resolved and everything is back to normal. We are monitoring the environment closely.

Portal down

Resolved | May 19, 2020 | 10:44 GMT+01:00

We have investigating some slowness in our APIs. we will update the progress shortly.

Knowledgebase down

Resolved | Feb 26, 2020 | 21:29 GMT+00:00

We are currently experiencing some outages in our Azure infrastructure. This is being investigated by our technical team . We will keep you posted on the progress here.

Knowledgebase site Down

Resolved | Jan 06, 2020 | 22:39 GMT+00:00

Resolved and root cause being investigated.

+Show history

Open | Jan 06, 2020 | 21:00 GMT+00:00

We are currently experiencing some outages in our Azure infrastructure. This is being investigated by our technical team . We will keep you posted on the progress here.

Authentication Issues in Portal

Resolved | Dec 19, 2019 | 19:58 GMT+00:00

Auth0 service outage is now resolved and Document360 Portal login function is back to normal.

+Show history

Identified | Dec 19, 2019 | 19:33 GMT+00:00

Our authentication partner Auth0 is experiencing some service outage, causing delayed response while logging in to Document360 Portal.

Service disruption in knowledge base

Closed | Nov 07, 2019 | 11:00 GMT+00:00

Outage in azure cloud is resolved now and all our dependent services are function as expected.

+Show history

Closed | Nov 07, 2019 | 08:00 GMT+00:00

We are facing some down time as our cloud provider Microsoft Azure is facing outage in their West Europe region.

Link to Azure status page (select West Europe): https://status.azure.com/en-us/status

Public site down

Resolved | Oct 18, 2019 | 02:00 GMT+01:00

Hi there, we are aware of the problem affecting the public websites. We are working on it at the moment. Apologies for the inconvenience.

Document360 API Outage

Resolved | Aug 13, 2019 | 20:10 GMT+01:00

The issue in Document360 APIs causing the disruption is identified and fixed.

+Show history

Identified | Aug 13, 2019 | 19:40 GMT+01:00

[13th August 2019, 7.40] : We experienced a service disruption in Document360 portal (https://portal.document360.io), caused by higher response times from the APIs. The issue is identified and resolved.

Portal Downtime

Closed | Aug 02, 2019 | 18:00 GMT+01:00

2nd August 2019 | Document360 Portal was down during the release for more than an hour, caused by an unexpected technical issue in production environment. The release was rolled back and application is back to normal.

Login page issue

Closed | Nov 28, 2018 | 19:47 GMT+00:00

Auth0 has confirmed that the issue they were facing was resolved.

+Show history

Open | Nov 28, 2018 | 17:17 GMT+00:00

We are facing issue with our login page due to an outage in our authentication service provider Auth0. We are following up with them to get the issue resolved at the earliest.

Link to the incident. https://status.auth0.com/incidents/rjhcwj1d2r61

Down time in knowledge base due to change in license policy

Closed | Oct 03, 2018 | 11:30 GMT+01:00

Issue was identified and resolved immediately.

+Show history

Open | Oct 03, 2018 | 07:00 GMT+01:00

Certain customers faced down time in knowledge base due to change in license policy.

Search Issue

Closed | Jun 27, 2018 | 15:15 GMT+01:00

Identified the root cause and the issue is resolved.

+Show history

Open | Jun 27, 2018 | 15:15 GMT+01:00

We are currently experiencing some issue with our backend related to search, the team is currently working on it.

All systems operational