Get in touch
Back

Previous incidents

December 2024
No incidents reported.
November 2024
November 19, 2024
Resolved

Incident Report on Extended Downtime of UpTrader CRM System

Date: November 19, 2024


1. Introduction

This report provides a detailed account of the incidents that led to extended downtimes of our UpTrader CRM system on November 1 and November 19, 2024. These incidents were unprecedented in our seven years of operation, and we are addressing them with utmost seriousness to prevent future occurrences.

2. Incident Description

November 1, 2024

  • Time: At 01:14 UTC, the UpTrader CRM system began experiencing significant performance issues.
  • Issue: A network card on one of our servers started malfunctioning. While it continued to process traffic, it did so at a drastically reduced speed.
  • Diagnosis: The degraded performance made it challenging to promptly identify the root cause.
  • Resolution: Upon determining the issue, the affected server was rebooted, restoring normal operations.
  • Downtime Duration: 2 hours.

Analysis revealed that the network card malfunctioned due to processing excessive data without regular reboots.

November 19, 2024

  • Time: At 10:39 UTC, a similar issue occurred on another server that had not been rebooted since the previous incident.
  • Issue: The network card failed, leading to service disruption.
  • Resolution: The server was rebooted, and the system was restored after 25 minutes of downtime.

Following this, we decided to proactively reboot the third and final server to prevent further disruptions.

  • Additional Issue: During the reboot of the third server, a system misconfiguration caused the entire production cluster to go down.
  • Resolution: Multiple issues were identified and resolved. The system was fully operational after 1 hour and 58 minutes of intermittent downtime.

3. Timeline of Events

  • November 1, 2024
    • 01:14 UTC: Downtime began due to a malfunctioning network card on Server 1.
    • 03:12 UTC: Server 1 rebooted; system restored.
  • November 19, 2024
    • 10:39 UTC: Network card failure on Server 2 caused service disruption.
    • 11:04 UTC: Server 2 rebooted; system restored after 25 minutes.
    • 11:20 UTC: Initiated reboot of Server 3.
    • 11:20 UTC - 13:18 UTC: System misconfiguration led to intermittent production cluster downtime.
    • 13:18 UTC: System restored after resolving multiple issues.

4. Root Cause Analysis

  • Network Card Malfunctions: The primary cause was the network cards processing excessive data without regular reboots, leading to performance degradation.
  • Delayed Diagnosis: The servers continued to process traffic slowly, complicating timely identification of the issue.
  • System Misconfiguration: An unforeseen misconfiguration in the system settings caused a complete cluster shutdown during the reboot of the third server.

5. Impact Assessment

  • Service Downtime:
    • November 1: 2 hours of system unavailability.
    • November 19 (First Incident): 25 minutes of downtime.
    • November 19 (Second Incident): 1 hour and 58 minutes of intermittent system downtime.
  • Business Impact: Clients experienced interruptions in accessing the UpTrader CRM system, affecting their business operations.

6. Preventive Measures and Future Actions

To prevent similar incidents, we are implementing the following measures:

  1. Infrastructure Enhancement:
    • Increasing the capacity of our production cluster to provide additional resources and redundancy.
    • Upgrading server  hardware to handle higher loads efficiently.
  2. Proactive Maintenance:
    • Introducing scheduled reboots during low-traffic periods.
  3. System Configuration Review:
    • Conducting a comprehensive analysis of current system configurations.
    • Adjusting settings to enhance fault tolerance and prevent misconfigurations.
    • Establishing protocols for configuration changes and system reboots to ensure stability.

7. Conclusion

We deeply regret the inconvenience caused by these incidents. Ensuring the reliability and stability of our services is our top priority. We are committed to taking all necessary steps to prevent future occurrences and to maintain the trust our clients have placed in us over the past seven years.

For any further inquiries or assistance, please contact our support team at support@uptrader.io


Prepared by:

Vasily Alexeev
CEO/CTO
UpTrader

Resolved · 19 Nov at 04:46pm EET
October 2024
No incidents reported.