Session
Do You Trust Your AI Agents? Verify with AgentEval
AI agents don’t just generate text — they take actions. They plan steps, call tools, retrieve data, and adapt over multiple turns. That power comes with new failure modes: the right tool with the wrong arguments, skipped approval steps, prompt-injection detours, confident answers without evidence, and costs that quietly explode at scale.
AgentEval is an evaluation toolkit designed to bring real engineering discipline to agentic AI. It lets you test agents the way they actually run in production: end-to-end runs, tool traces, multi-step flows, and multi-turn conversations — with scoring, guardrail checks, and performance gates.
In this session, you’ll see live demos where agent runs fail for the right reasons, with feedback that pinpoints what broke and why. Then we’ll fix the behavior and re-run the same suite to prove the improvement.
You’ll leave with practical patterns to:
-validate tool use and action sequences
-enforce safety and policy rules continuously
-evaluate non-deterministic behavior with repeatable runs
-track quality, latency, and cost so you can ship with confidence
Agentic AI is powerful — AgentEval helps you make it reliable.
Jose Luis Latorre Millas
Agentic & Software Architect at Swiss Life, Microsoft AI MVP, and creator of AgentEval. I help build agentic frameworks and the validation discipline that makes AI agents & workflows reliable.
Zürich, Switzerland
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top