Session
The Formula for Faster Outage Recovery
Production outages can be extremely costly. For a global business, just one minute of downtime can exceed an engineer’s annual compensation. So, how can we build smooth and efficient incident response?
After over a decade of building and operating systems used by millions of people, I’ve distilled an approach that helps engineer strengthen their teams' incident management practices, based on a simple formula:
Outage Duration = Time to Detect + Time to Acknowledge + Time to Repair.
To resolve outages quickly, we need to be efficient in all three stages. But shortening the time of each stage requires a coordinated mix of technical, process, and cultural changes.
We’ll unpack each component and examine practical strategies—from tooling investments and observability practices to cultural habits and on-call readiness—that can dramatically shorten outage duration.
By the end of the talk, you’ll learn how to set up your team for success when they face an outage.
Maxim Schepelin
Engineering leader at Booking.com
Amsterdam, The Netherlands
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top