Session
Evaluating Agentic Systems: What to Measure Before You Trust the Output
Too many teams ship agent experiences with little more than anecdotal validation. This session focuses on how to evaluate enterprise agentic systems using task completion, groundedness, tool success, latency, cost, and safety-oriented measures instead of relying on vibe-based acceptance criteria.
I will show how to build repeatable evaluation datasets, compare prompt and tool changes, and instrument systems for ongoing regression detection. Attendees will leave with a clear view of how evaluation fits into delivery pipelines and why it must be treated as an engineering discipline, not an optional extra.
Eric Boyd
Founder & CEO, responsiveX, Azure & AI MVP, Microsoft RD
Chicago, Illinois, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top