How to handle incidents at scale

In this session, I want to guide the audience into applying healthy processes to handle incidents. We have all probably been in the situation of having our apps unexpectedly crashing - either because of a bad release, because of unexpected Backend changes, etc.

In these scenarios, we usually get overwhelmed by messages from different departments (support folks, managers, and so on), this is especially true when operating at a large scale, and things can quickly get out of control.

This talk aims to propose rules that will not only mitigate the above scenarios but also improve the communication flow, give tips about how to set up alerting systems and what to measure, and talk about a "post-mortem" process.

This session mainly focuses on processes and refers to some soft tooling (chat services, simple bots, alerting systems), thus it doesn't require any particular technical skill. The preferred duration is ~20 mins and the target is mainly developers (of any platform really), DevOps and (technical) managers.

Alessandro Mautone

Tech Lead | Senior Full Stack Engineer @Aquablu

Amsterdam, The Netherlands

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

How to handle incidents at scale

Alessandro Mautone

Links

Actions