Most Active Speaker

Mandi Walls

Mandi Walls

DevOps Advocate

Roselle, New Jersey, United States

Actions

Mandi Walls is a DevOps Advocate on the Community and Advocacy Team at PagerDuty. Before joining PagerDuty, Mandi spent a number of years at Chef Software, working with customers and community members in the US and Europe. Originally a large-scale systems administrator, Mandi has focused on IT automation; organizational culture and change; and community.

Awards

  • Most Active Speaker 2023

Area of Expertise

  • Information & Communications Technology

Topics

  • DevOps
  • DevOps & Automation
  • DevOps Journey
  • DevOps Transformation

On-Call Best Practices

The always-on, always-available expectations of digital services have increased the requirements of technical teams to provide response and readiness around the clock. For teams new to this concept, introducing on-call can be challenging. There are technical and cultural considerations to keep in mind when adding on-call responsibilities to new teams. In this talk, we’ll look at some of those challenges and provide recommendations for folks who are dreading their new duties.

Blameless Postmortems and Learning from Incidents

So you’ve had an incident. Restoring service is just the first step—your team should also be prepared to learn from incidents and outages. In this workshop, you will learn some best practices around postmortems/incident reviews to help your team and organization see incidents as a learning opportunity and not just a disruption in service.

Improve Your Automation to Reduce Toil

In the course of your day as an SRE, your knowledge and expertise are in high demand. You can’t do every task every person in your org needs from you without the help of comprehensive automation.

Automation can be tricky. Some systems aren’t built with automation in mind, but assume that a human being will be there to keep an eye on things and fix errors on the fly, and we can’t be everywhere when there’s too much to do.

Plus, you want to provide access to automation for the right folks and keep a record of when the tools were used.

In this talk, we’ll cover some things to keep in mind when you’re building out your automation library, characteristics of good automation, and give you a look at PagerDuty Rundeck, a platform that will help you share your expertise with other folks in your organization.

Build automation that works for you and gives you your time back!

Plan for Unplanned Work: Game Days with Chaos Engineering

How do you plan for unplanned incidents? You practice with Chaos Engineering. Strong incident response doesn't just happen, you have to build the skills and train your team. Practicing for major incidents gives your team insight into how your applications will behave when something goes wrong as well as how the team will interact to solve problems. Combining your Incident Response practices with Chaos Engineering roots your response practice in real-world scenarios, helping your team build confidence.

Futuristic Luxury Incident Response

Responding to incidents is work. It’s unplanned, sometimes chaotic, and often stressful. It should be getting better, but many organizations find improving difficult and often backslide into bad practices. Teams tackling too many incidents see more burnout and have less time to work on work that impacts the bottom line.

Getting better at handling incidents takes practice and resources, changes to culture as well as improvements to tooling. We want to prioritize the most important issues, the problems that impact users, while delegating lower priority issues to automation. In the long term, reducing the number of incidents that responders have to deal with will improve team engagement, reduce burnout, and recapture time to spend on more important tasks.

In this talk, we’ll cover a number of methods that will have a positive impact on incident response, from crafting alerts, to writing automation, to setting good practices to prevent frustration among your team.

Telling Customer Stories with Terraform

Using Terraform as a lingua franca to talk infrastructure as code with our customer teams helps shorten implementation times for new projects and aids in new feature adoption. Terraform is the first stop for our largest customers, since it gives them compliance and comprehensive templates for new teams and services added to their PagerDuty account.

Translating real-world teams, services, and other organizational structures into PagerDuty objects can be challenging. Teams are busy, and the PagerDuty Operations Cloud isn’t their only tool.

To help this effort, and supplement our Terraform provider documentation, we put together a project to help relate the Real World to Terraform to PagerDuty for our users. We created a fictional organization, and described their requirements. We used Terraform as our Rosetta Stone to help guide the fictional team to success with their PagerDuty account, adding the “why” and “when” to the “how” of our regular documentation, along with a rigorous improvement to our sample code. In this session, we’ll take you through our project, the lessons we wanted to teach our users, and the lessons we learned ourselves about our products.

Managing Vendor Incidents

Recognizing, troubleshooting, and remediating incidents on the services your team owns and runs is hard enough; when the incident is actually happening to an upstream vendor, what can you do? Large outages might get attention from mainstream media, or at least be well-recognized among other technology teams. Those large incidents can have catastrophic results for your organization. Smaller or more obscure might just be a temporary annoyance.

How your team handles vendor outages is more and more important as many teams become more dependent on SaaS and cloud providers for most of their tool chain as well as their production environments. This session will discuss how to plan for vendor incidents, what to have on hand, and provide some suggestions for how your team can cope when someone else is having a very bad day.

Chaos Carnival 2024 Sessionize Event

January 2024

2023 All Day DevOps Sessionize Event

October 2023

DevOpsDays DC 2022 Sessionize Event

September 2022 Washington, Washington, D.C., United States

ChefConf '21: Online Sessionize Event

September 2021

Mandi Walls

DevOps Advocate

Roselle, New Jersey, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top