Evaluating Agentic Systems: What to Measure Before You Trust the Output

Too many teams ship agent experiences with little more than anecdotal validation. This session focuses on how to evaluate enterprise agentic systems using task completion, groundedness, tool success, latency, cost, and safety-oriented measures instead of relying on vibe-based acceptance criteria.

I will show how to build repeatable evaluation datasets, compare prompt and tool changes, and instrument systems for ongoing regression detection. Attendees will leave with a clear view of how evaluation fits into delivery pipelines and why it must be treated as an engineering discipline, not an optional extra.

Eric Boyd

Founder & CEO, responsiveX, Azure & AI MVP, Microsoft RD

Chicago, Illinois, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Evaluating Agentic Systems: What to Measure Before You Trust the Output

Eric Boyd

Links

Actions