Lesley Cordero
Staff Software Engineer, The New York Times
New York City, New York, United States
Actions
Lesley Cordero is currently a Staff Software Engineer, Tech Lead at The New York Times. She has spent the majority of her career on edtech teams as an engineer, including Google for Education and other edtech startups.
In her current role, she is focused on observability, shared platforms, and building excellent teams by setting reliability vision & strategy across The Times, improving our observability footprint, and cultivating culture that builds with the most vulnerable employees in mind first. She shows care for others by holding them accountable to the best versions of themselves – and by buying them the occasional bubble tea.
Links
Area of Expertise
Topics
Actionable Observability
While observability is the first step towards building observable systems, monitoring is what enables us to action on the telemetry collected in a more automated way. Previous approaches to monitoring have relied primarily on infrastructure and service metrics, but modern approaches have embraced the idea of monitoring based on metrics that reflect the user experiences that consider business impact more accurately. In this, we're enabled to drive clarity over reliability targets, effectively prioritize reliability improvements, and hold ourselves accountability to customers.
This talk will elaborate on those differences and review traditional and new approaches to monitoring, including the role of Service Level Objectives (SLOs). Additionally, we’ll cover the following:
1. Observability vs Monitoring
2. Metrics and the types of Metrics & Monitors
3. The benefits of an SLO-based approach to monitoring
4. Effective Accountability with Error Budget Policies
Takeaways:
1. How observability is a more holistic and human centered, whereas monitoring is about automating away the tasks that enable us to effectively debug our applications.
2. Differentiated approaches to monitoring & alerting, e.g. traditional monitor based vs error budget based approaches.
3. How Error Budget Policies clarify reliability expectations to our users, and hold us accountable to delivering these experiences.
4. Acquiring buy-in from upper management across different stakeholder groups (product, engineering, business analysts, etc).
DevOps is for Product Engineers, too.
In this talk, we'll be diving into the intersection between product engineering, DevOps, and Site Reliability Engineering (SRE). We'll explore how they're combined to create a culture of technical excellence and psychological safety, both within a team and across an entire organization.
We'll start by discussing the fundamentals of DevOps and SRE, and then we'll explore how product engineers can use these principles and practices to develop more reliable, scalable, and resilient systems. We'll cover topics such as Service Level Objectives (SLOs), and how to define and use them effectively to manage expectations and prioritize high-impact work.
We'll also touch on how to create feedback loops to continuously improve the quality and performance of your products, and how to get buy-in from key stakeholders. Ultimately, you'll leave this talk with a deeper understanding of how to foster a culture of excellence, accountability, and psychological safety in your product engineering teams, and drive better outcomes for your organization as a whole.
Our target audience is leaders and managers of product engineering teams, though any engineer can benefit from the talk.
I will cover specific practices (toil elimination, SLOs, etc) and how they apply to product engineering teams, the ideal collaboration model with product and platform teams, and share examples of my time at The New York Times.
Standardizing Observability in Microservice Architectures
Managing microservice architectures requires navigating highly complex systems. This complexity informs how we approach making these systems observable.
We’ll review a standardized platform-focused approach to building effective observable architectures, including how it addresses the new organizational challenges specific to microservices. The platform-focused approach encompasses three parts: the patterns we use, the needed organizational support, and the stack we use.
- Define observability: “The ability to understand what’s happening inside of your software systems to debug problems you've never seen before just using the telemetry (traces, logs, & metrics) emitted by your applications. It's also not just about a specific tool, it's about a team's ability to analyze that telemetry data.” (Liz Fong-Jones, Honeycomb)
- Three new organization challenges of microservices & how they apply to observability: silos & drift, multiple points of failure, & inherent lack of certainty.
- Introduce the idea of a standardized observability platform that encompasses patterns, support, and a stack.
- Patterns: focused on standardized communication protocols with three patterns.
- Support Strategy: Tie back to the part of our observability definition “about a team's ability to analyze that telemetry data.”
- Observability Stack: Review the different parts of the stack, referencing ETL patterns from my data engineering background (telemetry collection, processing, and exporting, which makes up our “communication layer”) to enable utilizing observability tools like DataDog, Honeycomb, SumoLogic, etc.
- Communication Layer Tool Principles: Data reliability & richness, followed by best practices.
- Investigation & Analysis Tools: Since most organizations aren’t building their own investigation tools, I focus on effective decision criteria for selecting tools.
Psychologically Safe Reliability Management
Psychological safety is particularly important for teams that manage service reliability. The vulnerability that comes with mitigating failures in production requires principles of trust, transparency, and inclusion that can only come from cultures that minimize harm and enable empowerment.
Cultivating this kind of culture requires leaders to think proactively about how to build processes and systems that enable teams to be healthy, productive, and effective, while being adequately prepared for situations when failure inevitably happens.
We’ll review the cultural consequences of chronic issues and the strategies we can use as leaders to align with our shared goal of building excellent teams. We’ll touch upon themes of privilege, power, and accountability.
The lens I take this talk from is one that’s informed by trauma informed teaching. I thread these different principles throughout the rest of my talk, starting with an overview of chronic issues (I provide a definition), followed by three sets of strategies for addressing chronic problems preventatively, proactively, and reactively. The end of the talk focuses heavily on the responsibility leaders must take for these issues and what that looks like.
Organizational Sustainability with Platform Engineering
Engineering organizations often face the consequences of building software in a way that prioritizes short-term gains over long-term ones. This has a lot of sociotechnical consequences, including tech debt, retention issues, and, ultimately, business risk. This talk focuses on how Platform Engineering can drive sustainability through its DevOps based principles, strong support system, and standardized shared architecture.
We’ll begin by reviewing what organizational sustainability is and how Platform Engineering can facilitate it. The rest of the talk will be split into three primary sections:
1. The sociotechnical principles provided by DevOps
2. The robust support structures that enable platform adoption and faster delivery.
3. The Platform architecture, its principles, common tensions, and a framework for how to build platform architectures that enable product engineers to do their best work.
By the end, these principles and practices will tie together to form a concrete case study on how organizations can benefit from Platform Engineering teams.
I have worked in platform focused teams across various organization sizes, including large organizations like Google, medium sized companies like The New York Times, and small startups earlier in their technical journey.
This talk can be between 30 and 45 minutes.
Takeaways:
1. Gain a new framework for building platforms that improve developer experience in a way that enables organizations to operate in a sustainable way.
2. Understand the type of impact a community-driven approach to Platform Engineering leads to, e.g. technical excellence, faster delivery, and how this ties to business impact.
3. Learn about architectural patterns that address common design tensions, e.g. standardization vs flexibility, simplicity vs complexity, integrations and coupling, and the decision to build vs buy.
I've previously spoken at:
LeadDev 2022, NDC London 2023, DevOpsDays Atlanta, DevOpsDay Chicago, The DEVOPS Conference, and more.
Copenhagen Developers Festival 2023 Sessionize Event
DevOps Enterprise Summit Amsterdam 2023 Sessionize Event
DevOpsDays Zurich 2023 Sessionize Event
NDC London 2023 Sessionize Event
DeveloperWeek Global (Management, Cloud, Enterprise) 2021 Sessionize Event
2021 All Day DevOps Sessionize Event
DevOpsDays Texas 2021 Sessionize Event
Lesley Cordero
Staff Software Engineer, The New York Times
New York City, New York, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top