Session

Stop Hallucinating, Start Evaluating: AgentEval for Your AI Knight Army

A travel agent that booked 47 flights instead of one. A support bot that leaked customer data. An automation agent that burned $12,000 in API calls overnight. These aren't hypotheticals—these are war stories from the trenches of Agentic AI.

AI agents are powerful digital knights that can search, reason, and execute complex missions. But without proper testing, they go rogue. How do you ensure your AI army follows orders?

In this session, I'll introduce AgentEval—the .NET evaluation toolkit I built to bring evaluation discipline to AI agents. You'll learn to validate tool chains with fluent API assertions, enforce behavioral policies as code (forbidden tools, compliance rules, safety guardrails), handle LLM non-determinism with stochastic testing, and set performance SLAs that fail builds when costs exceed budgets.

Through practical code demos, you'll see agent tests pass and fail beautifully—with error messages that actually help you debug. Whether you're building travel agents, customer service bots, or enterprise automation, you'll leave with the patterns and code to ship reliable AI to production.

Your agents will follow orders. Guaranteed.

Jose Luis Latorre Millas

José Luis Latorre Millas is a Microsoft AI MVP, creator of AgentEval, and Software Architect at Swiss Life. He builds tools to make AI agents reliable—because untested agents are expensive chaos.

Zürich, Switzerland

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top