Session
Keeping the Pager Human: Balancing AI Self‑Healing Dreams with Operational Duty
A late‑night outage flooded hundreds of thousands of users with 500 errors. Redis logs showed closed sockets, VPC Flow Logs filled with new connections, and CloudWatch warned that the NAT Gateway was at full capacity. An AI chat assistant pointed to a recent third‑party failover change, uncovering a small code bug that triggered a connection storm. Each AI insight handed the investigation back to a human, and every human judgment primed the AI for the next clue in an ongoing feedback loop.
Using this real incident use-case as our guide, we will explore today's AI‑powered SRE tools: alert clustering, anomaly detection, automated runbooks, self‑healing platforms and more. You will see where AI accelerates triage and where an engineer's eye remains essential. Attendees will learn practical techniques to turn raw telemetry into clues, rebuild incident timelines, confirm root causes, and craft RCAs their teams can trust. We will conclude by mapping the next horizon for AI in SRE, outlining the skills and guardrails that keep humans firmly and strategically in the loop.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top