Session

Your AI Agent is lying to you: Observability for LLM systems in production

You have shipped your LLM-powered agent. Congrats. Now do you actually know what it is doing? Most teams flying blind in production only discover issues when users complain, by which point the damage is already done. This talk dives deep into the observability gap in GenAI systems: why traditional APM tools were never designed for non-deterministic, multi-step agentic workflows, and why bolting them on creates a false sense of confidence. We will explore what real observability looks like for LLM applications, covering distributed traces across agent hops, evaluation frameworks for output quality, and feedback loops that catch regressions before they reach users. You will leave with a practical framework for debugging, evaluating, and continuously improving AI systems in production, so you are never again the last person to find out something went wrong.


As agentic AI systems move from prototypes into production, the industry is discovering that traditional monitoring tools were never designed for non-deterministic, multi-step workflows. Failures are subtle, regressions are hard to catch, and most teams only find out something is wrong when a user complains. This talk equips the community with a concrete framework for tracing, evaluating, and continuously improving LLM systems in production. The patterns shared are open, tool-agnostic, and immediately applicable regardless of which stack or framework attendees are using.

Siddhant Agarwal

Senior Developer Relations Advocate @ ClickHouse | Google Developer Expert in GenAI

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top