Speaker

Maxim Schepelin

Maxim Schepelin

Engineering leader at Booking.com

Amsterdam, The Netherlands

Actions

Maxim Schepelin has been an engineering leader in various industries, including Game Dev, e-commerce, and travel building teams capable of launching products from idea to world-scale. With over a decade of experience leading engineering teams, Maxim has mastered the art and science of management, as well as how to exit both Vim and Emacs.
Maxim continues exploring ways to build effective engineering teams that deliver value at a sustainable pace.

Area of Expertise

  • Business & Management
  • Information & Communications Technology
  • Media & Information
  • Physical & Life Sciences

Topics

  • Software Development
  • Leadership
  • Software Engineering Management
  • Management
  • Site Reliability Engineering (SRE)
  • Agile Methodologies
  • Software Delivery
  • Organizational Design
  • Team Management
  • Engineering Culture & Leadership
  • Software Engineering

How to set SLOs, drive improvements, and make friends with business stakeholders

Make reliability a shared priority, not just tech-speak. This session shows you how to frame SLOs in business terms, engage stakeholders, and use clear metrics to align technical and business priorities.

The outline of the talk:
- Engineers care about reliability, and we have developed a language to talk about it
- We count the nines (9s) of availability, define error budgets, measure MTTR and MTBF
- Why is it so hard then to convince our business counterparts about doing technical improvements?
- Because our language sounds intimidating and disconnected from reality.
- We fail to explain the actual value of reliability
- The key question of reliability:
- How much does it cost when your service is down for one hour?
- Don't ask how many nines of availability a service should have, ask how much cost is acceptable?
- SLO formula
- X must be true Y percentage of the time
- X is your definition of success
- Y is your threshold
- Two level of how you can measure success:
- Technical level: A service is running, DB is working, API returns a 200 status code.
- Business level: The business process is working. 99.9% of transfers are successful, 99% of reports are generated within 30 seconds, etc.
- Aim to define SLO on the business level.
- From measuring to prioritization
- Benefits of measuring SLO on the business level:
- You know the costs of outages
- You know the cost of bad architecture
- You know the cost of slow processes
- Your data points are facts from the pasts
- Business plans and new features are guess work about the future
- It's easier to talk about priorities when you numbers are solid.
- You're using the same units to compare tech improvements and features.
- Use error budgets to drive improvement
- Review how your systems perform against SLO.
- If your SLO is 99.9%, you allow yourself to fail in 0.01% of cases. This is your error budget.
- What do you do when you exceed the budget?
- Code freeze
- Prioritize immediate improvements to recover reliability.
- Conduct postmortems
- "Do better next time" is not a strategy.
- Make reliability a first-class citizen.
- Report SLOs together with business metrics.
- Remind your stakeholders that availability is your most important feature.

So, what your team is going to do in the next six months?

Avoid common objective-setting pitfalls. Learn proven techniques to define clear, impactful goals for the team, align stakeholders, and make impact.

Maxim Schepelin

Engineering leader at Booking.com

Amsterdam, The Netherlands

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top