Session

Agent P and the Case of the Crashing Cloud: Deploying AI SRE Agents for Root Cause Analysis

Perry the Platypus may appear to be just an ordinary semi-aquatic mammal, but beneath that fedora lies the greatest secret agent in Danville — and your cloud infrastructure's most overbooked defender. With villains multiplying faster than Dr. Doofenshmirtz can build -inators, Perry faces a crisis familiar to every engineering team: there are simply too many incidents, too many alerts, and not enough Agent P to go around. The solution? Perry does what any resourceful secret agent would do — he builds a squad of AI Site Reliability Engineer (SRE) agents, each purpose-built and GitHub Copilot-powered, to monitor the cloud, triage alerts, and hunt down root causes across distributed systems while he handles the missions only he can take. Just like Perry, today's engineering teams must scale their incident response beyond human bandwidth, and AI agents are the new field operatives making that possible.

In this session, we'll follow Perry as he designs, deploys, and coordinates his AI SRE agent network — each agent assigned a specialty: log analysis, trace correlation, anomaly detection, and post-mortem drafting. We'll walk through a real-world multi-service failure scenario where the AI agents triage the incident, surface the most likely root causes, and brief the human Squad in plain language so engineers can make fast, confident decisions rather than drowning in dashboards. Powered by GitHub Copilot, these agents don't just surface data — they reason about it, suggest next investigative steps, and even draft the blameless post-mortem before Perry's had his morning platypus kibble. This talk is equal parts mission briefing and practical blueprint, applicable to any cloud platform, and designed to send you home ready to build your own AI-assisted incident response operation.

Attendees Will Learn

- Why AI SRE agents are the next evolution in cloud incident response and root cause analysis
- How to design purpose-built AI agents for specific RCA tasks: log triage, trace analysis, anomaly detection, and post-mortem generation
- How GitHub Copilot powers agent reasoning to go beyond data retrieval and into actionable insight
- How to orchestrate a human-AI Squad model where engineers make decisions and agents do the heavy lifting
- Techniques for correlating logs, metrics, and distributed traces across any cloud platform
- How to distinguish symptoms from root causes using AI-assisted dependency mapping and timeline analysis
- Best practices for keeping humans in the loop without creating bottlenecks in high-pressure outages
- How to write a blameless post-mortem that actually prevents future incidents — with a little AI help
- Strategies for reducing mean time to resolution (MTTR) by scaling your response capability with AI agents

Sean Whitesell

President of SkyForge Consulting, Chief Cloud Architect, & President of Tulsa .NET User Group

Tulsa, Oklahoma, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top