Swiss tech firm Proton, which offers privacy-focused on-line companies, says {that a} Thursday worldwide outage was attributable to an ongoing infrastructure migration to Kubernetes and a software program change that triggered an preliminary load spike.
As the corporate revealed yesterday in an incident report printed on its standing web page, the outage began round 10:00 AM ET.
Proton customers reported that they could not connect with their Proton VPN, Proton Mail, Proton Calendar, Proton Drive, Proton Go, and Proton Pockets accounts.
As an example, when trying to connect with Proton Mail, these affected noticed error messages stating, “One thing went flawed. We could not load this web page. Please refresh the web page or test your web connection.”
The problems have been totally resolved inside about two hours, with Proton Mail and Proton Calendar being the final companies introduced again on-line.
“As of 16:15 CET, all companies aside from Mail and Calendar are working usually. We’re nonetheless engaged on fixing the difficulty and restoring the remainder of the affected companies,” the corporate mentioned.
Right now, in an replace to the unique incident report, Proton revealed that yesterday’s world outage was triggered by a software program change recognized by the positioning reliability engineering group.
The change severely restricted the variety of new connections to Proton’s database servers, inflicting an preliminary load spike when the variety of customers connecting elevated sharply round 4 PM Zurich.
“This overloaded Proton’s infrastructure, and made it unattainable for us to serve all buyer connections. Whereas Proton VPN, Proton Go, Proton Drive/Docs, and Proton Pockets have been recovered rapidly, points continued for longer on Proton Mail and Proton Calendar,” the corporate mentioned.
“For these companies, through the incident, roughly 50% of requests failed, resulting in intermittent service unavailability for some customers (the service would look to be alternating between up and down from minute to minute).”
Whereas Proton would have had sufficient further capability to deal with all the brand new connections, an ongoing migration to Kubernetes, which required working “two parallel infrastructures on the similar time,” made it unattainable to steadiness the load.
“In whole, it took us roughly 2 hours to get again to the state the place we might service 100% of requests, with customers experiencing degraded efficiency till then. The service was accessible, however solely intermittently, with efficiency being considerably improved through the second hour of the incident, however requiring an extra hour to completely resolve,” Proton added.
Proton says it has since resolved all connection points affecting its on-line companies and is at the moment monitoring for extra points despite the fact that “the state of affairs has been steady for a while.”