Speaker

Jennifer Petoff

Jennifer Petoff

Director, Google Cloud Platform and Technical Infrastructure Education

Lisbon, Portugal

Actions

Jennifer Petoff (she/her) is director of Google Cloud Platform (GCP) & Technical Infrastructure (TI) Education and is based in Lisbon, Portugal. She leads training programs for Google's GCP and TI Engineering teams. Jennifer is one of the co-editors of the best-selling book, Site Reliability Engineering: How Google Runs Production Systems; lead author of Training Site Reliability Engineers: What Your Organization Needs to Create a Learning Program; and is a regular speaker at DevOps and Site Reliability Engineering conferences around the world.

Jennifer joined Google 16 years ago after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester in the United States.

Area of Expertise

  • Information & Communications Technology
  • Media & Information

Topics

  • Site Reliability Engineering
  • SRE
  • Getting started with Site Reliability Engineering

Site Reliability Engineering to build high performance software and teams

Site Reliability Engineering (SRE) is a discipline founded at Google that is now widely practiced across the Tech industry. SRE represents a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations. In this talk, we will discuss the key principles and practices of SRE, and how they can be used to build high performance software and teams. We’ll explore insights from the State of DevOps Report and how SRE can help foster the type of generative organizational culture that is a hallmark of high performing organizations.

Swim Don’t Sink: Why Training Matters to an SRE Practice

Do you offer training to the engineers in your organization or do you throw them off the deep end to “sink or swim”? Providing training and education is universally important to set team members up for success in your organization and is critical for establishing a thriving Site Reliability Engineering (SRE) or DevOps practice and culture in the first place.

The specific training needs of each engineer varies depending on several factors including:

-The maturity of your organization in adopting DevOps / SRE principles, practices, and culture

-The knowledge those individuals have about your organization and infrastructure

-The experience of the individuals being trained, both in terms of technical skill and familiarity with the SRE / DevOps model

This talk will explore the business case for training, the trade-offs between cost and effectiveness, and best practices for training design and deployment depending on where your organization lies on the spectrum of size and maturity.

Learn why training is not about unleashing a fire hose of information upon unsuspecting engineers but about giving those engineers the confidence to run production systems at scale.

Site Reliability Engineering: Anti-patterns in Everyday Life and What They Teach Us

Real world experience and things that go wrong are two of life’s best teachers. This talk will explore key elements of scalable large-system design and Site Reliability Engineering (SRE) principles* through anti-patterns encountered in real life. Find out what lessons can be gleaned from watching the dynamics in a crowded cafe or dealing with a security issue during a hotel stay. Learn about fundamental site reliability engineering principles and practices including:

-Avoiding cascading failures
-Not feeding the machines with human toil
-Writing blameless postmortems
-Engineering solutions to eliminate classes of errors rather than implementing point fixes

These principles will be framed through a lens of the suboptimal while demonstrating the impact of SRE anti-patterns on user trust.

* SRE is often thought of as a specific implementation of the DevOps interface.

How to Run Smarter in Production: Getting Started With Site Reliability Engineering

Site Reliability Engineering and the DevOps movement share a similar set of challenges but addresses each in a different way. SRE got its start at Google in 2003 and according to Ben Treynor, VP of 24/7 Operations: ”SRE is what happens when you ask a software engineer to design an operations team”. In 2016, Google published a book about Site Reliability Engineering principles, practices and organizational constructs.

The practice of Site Reliability Engineering at Google encompasses more than just managing production systems and responding to emergencies. Applying software engineering in a principled way to operations allows SRE to holistically address the reliability of software applications across the product lifecycle.

Implementing SRE in an organization requires a commitment to supporting some core principles and a fundamental culture shift -SRE needs Service Level Objectives, with consequences.
-SREs have time to make tomorrow better than today.
-SRE teams have the ability to regulate their workload.
-SREs and the organization’s leaders remove the word ‘blame’ from their vocabulary.

This talk will highlight key SRE principles and how they map to recognized DevOps focus areas. We’ll also discuss how any organization can adopt SRE, and how our recent experience of working with our customers on implementing SRE practices has shown these principles will work across a range of organizations of different types and sizes.

2020 All Day DevOps Sessionize Event

November 2020

All Day DevOps: Spring Break Edition Sessionize Event

April 2020

2019 All Day DevOps Sessionize Event

November 2019

Jennifer Petoff

Director, Google Cloud Platform and Technical Infrastructure Education

Lisbon, Portugal

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top