Post-Incident Review: OCPP 1.6J Cloud Partial Instability on September 7, 2025
On September 7, 2025, we experienced a partial service disruption that affected our OCPP 1.6J Cloud platform. This incident resulted in delayed status updates and command processing for a portion of connected chargers. We want to provide a transparent overview of what happened, how we responded, and the steps we are taking to prevent this from happening again.
## Timeline of Events
09:14: We received the first report of status updates not being received from chargers.
09:18: Our engineering team immediately began investigating the issue.
09:38: The root cause was identified: two message-processing instances were stalled.
09:39: Our team restarted the affected components, and messages immediately began processing again.
11:04: After a period of monitoring, we confirmed that all systems were fully operational and stable.
12:45: The incident was officially declared resolved.
# Next Steps and Mitigation
Reliability is a top priority for us. To prevent a recurrence of this incident, we are taking the following actions:
Enhanced Alerting: We are immediately improving our monitoring systems to include real-time alerts on message processing rates. This will allow us to detect and address similar issues proactively before they impact service.
System Resilience Review: We are conducting a thorough investigation into why our automated self-healing systems did not prevent this disruption. We will strengthen these systems to ensure they respond correctly in the future.
We thank you for your patience and understanding. We are committed to learning from this incident and continuously improving the stability and performance of our platform.