Session
Self Healing Productions: Agentic AI for SRE & Support Teams
Production systems today generate thousands of alerts, dashboards, and telemetry signals, but incident response is still largely manual. Engineers spend time correlating metrics, logs, and traces before taking action, which increases MTTR and operational fatigue.
In this session, I introduce a practical approach to designing AI agents that assist in running production systems. Instead of reacting manually to alerts, AI agents can detect anomalies, analyze incidents, estimate impact, and suggest mitigation steps. These agents can work across observability signals and help teams move toward autonomous operations.
I will walk through a simple architecture showing how detection agents, diagnosis agents, and mitigation agents collaborate during incidents such as traffic spikes, latency increases, and service failures. The session will include a short demo video of a prototype AI command center that demonstrates how incidents can be identified and handled with AI-assisted decision making.
The focus of this talk is not a specific tool, but the idea and design pattern. Attendees will learn how they can build their own autonomous reliability systems using their existing observability platforms, automation workflows, and AI models.
This session is ideal for SREs, DevOps engineers, platform teams, and developers interested in AI-driven observability, AIOps, and self-healing production systems.
Shobhit Verma
SRE Lead | 15 Yrs Infra & Observability (AWS, ELK, Dynatrace) | Career Mentor & Speaker | Filmmaker & Creator (50k+) | aka TheJugaadGuy
Pune, India
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top