Ana Margarita Medina
Sr. Staff Developer Advocate
San Francisco, California, United States
Actions
Ana Margarita Medina is a Sr. Staff Developer Advocate, she speaks on all things SRE, DevOps, and Reliability. She is a self-taught engineer with over 14 years of experience, focusing on cloud infrastructure and reliability. She has been part of the Kubernetes Release Team since v1.25, serves on the Kubernetes Code of Conduct Committee, and is on the GC for CNCF's Keptn project, When time permits, she leads efforts to dispel the stigma surrounding mental health and bring more Black and Latinx folks into tech.
Links
Area of Expertise
Topics
Yes, Observability Landscape as Code is a Thing!
Abstract:
=======
I started my own Observability journey a little over a year ago, when I managed the Observability Practices team at Tucows/Wavelo. As part of my journey, I learned about Observability and OpenTelemetry, and dove into what it takes to achieve Observability greatness at an organization-wide scale. Part of that involved understanding my Observability Landscape, and how it can be codified to ensure consistency, maintainability, and reproducibility.
Summary:
========
Observability is about good practices. Good practices are useless unless you have a consistent landscape. In order to support these practices, there are a number of setup-type things that are required to enable teams to truly unlock Observability’s powers.
With so many integrations and moving parts, it can be hard to keep track of all the things that you need in order to achieve Observability greatness. This is where Observability-Landscape-as-Code (OLaC) can help. OLaC means supporting Observability by codifying your Observability landscape to ensure:
* Consistency
* Maintainability
* Reproducibility
* Reliability
The Observability Landscape is made up of the following:
* Application instrumentation
* Collecting and storing application telemetry
* An Observability back-end
* A set of meaningful SLOs
* An Incident Response system for alerting on-call Engineers
Observability-Landscape-as-Code makes this possible, through the following practices:
1. Instrumenting your code with OpenTelemetry
2. Codifying the deployment of the OTel Collector
3. Using a Terraform Provider to configure your Observability back-end
4. Codifying SLOs using the vendor-neutral OpenSLO specification
5. Using APIs to configure your Incident Management systems
This talk digs into the above practices that support OLaC as part of my personal Observability journey (see journey details below).
Chaos Engineering Bootcamp
Chaos engineering is the practice of conducting thoughtful, planned experiments designed to reveal weaknesses in our systems. This hands-on workshop will share how you can get started practicing Chaos Engineering. We will cover the tools and practices needed to implement this in your organization and discover how other companies using this practice to create reliable distributed systems.
During this workshop, attendees will be broken up into teams of four, and each assigned a role that is critical to the Chaos Engineering Experiment process. Folks will work together as a team to plan and execute various Chaos Engineering experiments with the guidance of the speakers.
Ana will provide cloud infrastructure, a demo environment, chaos + monitoring tool access, and printed material to design experiments. She will cover the foundations of chaos engineering, give folks time to have hands-on experience and then we will talk about how to break through on your practice and wins from the industry.
Building Islands and Reliability
We might have just spent the last year mastering the art of having the perfect flower field around our home, keeping the weeds out of our island, or just trying to build relationships with our neighbors. The skills and the lessons we’ve mastered through building our islands can also help us in real life, from staying connected to building stronger engineering teams and applications. Let’s take a moment to see the similarities of the work we’ve done on our islands, ourselves, workplaces, and celebrate all we’ve learned.
Don't Forget the Humans
We spend all day thinking about our technical systems, but we often neglect the needs of our human systems. Ana and Julie will walk attendees through the principles of system reliability and how to not only apply them to their systems but their personal life to prevent burnout and enjoy their weekends more.
In this talk, attendees will learn how to apply incident response and blameless practices into their everyday activities. Attendees will also walk away knowing how to build reliable socio-technical systems and some tips to apply them to the workplace.
OKRs with BLOs & SLOs via User Journeys
We hear it in commercials, in job interviews, and in the applications we use. “Users matter!” or “Customer experience is built into our culture and values!” But how are we proactively following what your organization is preaching?
Observability-as-Code in Practice with Terraform
Observability has quickly become a part of the foundation of modern SRE practices. Observability is about good practices, and like a good SRE would know, its codification is crucial to ensure consistency, maintainability, repeatability, and reliability. In order to support Observability as part of your SRE practice, there are a number of setup-type things that are required to enable teams to unlock Observability’s powers truly. This is where Observability-as-Code (OaC) can help.
The Evolution of GameDays
How does your team prepare for failure and learn from incidents? GameDays are a time to come together as a team and organization to explore failure and learn. This practice has been done across most industries, from fire departments to technology companies. Sometimes this has been unplugging data centers, table-top exercises, or chaos engineering experiments. All these ways have 1 thing in common: learning. In this session, I will look back at my SRE experience and how GameDays have evolved in other industries to share tips, so you can make your teams and companies more reliable.
Continuous Reliability. How?
As engineers we expect our systems and applications to be reliable. And we often test to ensure that at a small scale or in development. But when you scale up and your infrastructure footprint increases, the assumption that conditions will remain stable is wrong. Reliability at scale does not mean eliminating failure; failure is inevitable. How can we get ahead of these failures and ensure we do it in a continuous way?
One of the ways we can go about this is by implementing solutions like CNCF’s sandbox project Keptn. Keptn allows us to leverage the tooling we already use and implement pipelines where we execute chaos engineering experiments and performance testing while implementing SLOs. Ana will share how you can start simplifying cloud-native application delivery and operations with Keptn to ensure you deploy reliable applications to production.
5 lessons I’ve learned after falling down and getting back up
Over the last few years of working in the DevOps space, I’ve experienced a lot of failures and successes to get where I’m at. I’ve brought down multiple services I’ve worked on, under-provisioned resources, and even burned out. But situations like these allowed me to re-evaluate my engineering processes, implementations, and even work/life balance. Sometimes things need to break or fall apart before they can get better.
I’ll share my journey from self-taught software engineer to site reliability engineer to developer advocate. These ups and downs have constantly reminded me to rethink the ways things I get things done, so I can get back up and make my processes, and systems more reliable. Join me as I shared what I’ve learned on my journey, so it can help you on yours.
A Key to Success: Failure with Chaos Engineering
Chaos Engineering is thoughtful, planned experiments designed to reveal the weakness in our systems. Ana will discuss how performing Chaos Engineering experiments and celebrating failure helps engineers build muscle memory, spend more time building features and build more resilient complex systems.
Getting Started with Chaos Engineering
Chaos engineering is the practice of conducting thoughtful, planned experiments designed to reveal weaknesses in our systems. Chaos engineering can be thought of as the facilitation of experiments to uncover systemic weaknesses
This talk will introduce you to the practice of Chaos Engineering and explain how to get started practicing Chaos Engineering in your organization
. You will also learn how to plan your first Chaos Day. You've heard of Hack Days where you focus on feature development, Chaos Days are an opportunity to encourage your whole organization to focus on building more reliable systems.
Brief outline:
- An Introduction to Chaos Engineering
- A guide to getting started with Chaos Engineering in your organization
- Planning your first Chaos Day
2024 All Day DevOps Sessionize Event
Open Source Summit North America 2024 Sessionize Event
Maintainer Track + ContribFest: KubeCon + CloudNativeCon Europe 2024 Sessionize Event
KubeCon + CloudNativeCon North America 2023 Sessionize Event
DevOpsDays Seattle 2023 Sessionize Event
Devopsdays New York City 2023 Sessionize Event
KubeHuddle Toronto 2023 Sessionize Event
SLOconf 2023 Sessionize Event
DevOpsDays Austin 2023 Sessionize Event
2022 All Day DevOps Sessionize Event
DevopsDays Detroit 2022 Sessionize Event
DevOpsDays Austin 2022 Sessionize Event
2021 All Day DevOps Sessionize Event
Deserted Island DevOps 2021 Sessionize Event
2020 All Day DevOps Sessionize Event
RedisConf 2020 Takeaway (Online) Sessionize Event
DevOpsDays Austin 2020 Sessionize Event
Domain-Driven Design Europe 2019 Sessionize Event
DevOpsDays KC 2018 Sessionize Event
Tech Con 2018 Sessionize Event
Ana Margarita Medina
Sr. Staff Developer Advocate
San Francisco, California, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top