Session

How to Run Smarter in Production: Getting Started With Site Reliability Engineering

Site Reliability Engineering and the DevOps movement share a similar set of challenges but addresses each in a different way. SRE got its start at Google in 2003 and according to Ben Treynor, VP of 24/7 Operations: ”SRE is what happens when you ask a software engineer to design an operations team”. In 2016, Google published a book about Site Reliability Engineering principles, practices and organizational constructs.

The practice of Site Reliability Engineering at Google encompasses more than just managing production systems and responding to emergencies. Applying software engineering in a principled way to operations allows SRE to holistically address the reliability of software applications across the product lifecycle.

Implementing SRE in an organization requires a commitment to supporting some core principles and a fundamental culture shift -SRE needs Service Level Objectives, with consequences.
-SREs have time to make tomorrow better than today.
-SRE teams have the ability to regulate their workload.
-SREs and the organization’s leaders remove the word ‘blame’ from their vocabulary.

This talk will highlight key SRE principles and how they map to recognized DevOps focus areas. We’ll also discuss how any organization can adopt SRE, and how our recent experience of working with our customers on implementing SRE practices has shown these principles will work across a range of organizations of different types and sizes.

Jennifer Petoff

Director, Google Cloud Platform and Technical Infrastructure Education

Lisbon, Portugal

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top