Session

Autonomous Incident Lifecycle Management using AI Agents

SRE teams at PayPal have developed a multi-agent orchestration framework to autonomously manage the full incident lifecycle starting from detection and intelligent triage to stakeholder communication, automated mitigation, and post-incident RCA. By leveraging agentic AI, the platform reduces end-to-end incident MTTR from ~45 minutes to under 5 minutes.

In this session, we’ll explore the architecture behind the system. We will also talk about our frameworks used including how agents collaborate, share persistent memory, generate memories, manage context and safely execute remediation workflows in production environments including HITL (human-in-the-loop).

Attendees will have a clear understanding of how to leverage AI Agents in SRE space and drive faster incident resolution. They will also learn how PayPal reduced in operational cost per incident and scale incident triage beyond priority incidents to other indicators and stop lower priority issues before that start introducing financial impact.

Sohil Shah

Staff Software Engineer - Agentic AI @PayPal | Ex-TikTok, Ex-JPMC

San Jose, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top