At 14:06 CEST a deployment was made to our API that included work done on two features. This work included an inadvertent change of behaviour to several endpoints used to control configuration of installations and devices:
POST /charger
PUT /charger/{id}
POST /installation
PUT /installation/{id}
POST /installation/{id}/update
With this change, when requesting one of these endpoints, and if the JSON body was missing any of the parameters regarding OCPP Bridge configuration (central system URL parameter, default tag ID, or password), the system would then delete those missing parameters from the device/installation configuration. This differs from the previous behaviour, where a missing parameter would leave the configured value intact.Since many customers interact with these endpoints to change the behaviour of their installations and devices, this caused their OCPP Bridge configuration to be wiped if it was previously set. Unfortunately, given that those endpoints are updated very frequently in certain scenarios (like setting available current on an installation), this has led to several thousand devices losing their OCPP connection.The issue was quickly uncovered by a partner starting to have severe issues, since they call one of the endpoints very frequently for every installation that they manage — which caused OCPP settings to be wiped. Once aware, Global Support quickly brought the issue to the attention of developers.Once the behaviour was noticed, it was quickly correlated to the recent deployment, and a quick analysis revealed the root cause. At 15:54 CEST, the API was reverted to the previous state, and the endpoint behaviour returned to normal. However, the data for OCPP Bridge configuration had already been wiped for many installations at this point. An incident response team was set up at 16:54 CEST and it immediately started working on strategies to recover the corrupted data. A point-in-time restore of the database was initiated to obtain a copy of the data that could be used to fix affected installations. The team also managed to identify that all the corruption had been tracked in our Change Log for every installation. This redirected efforts to locating all the damaged data within the Change Log both on Chargers and Installations, allowing us to identify affected installations, as well as individual charging stations. The change log was retrieved, and the team’s effort focused on creating a script to restore the data. The behaviour of the script was verified in stages, and eventually applied to all installations, restoring the data. Data for the individual charging stations was corrected manually by Global Support.Improvements identified
Timeline (all times CEST)