Session

Stop Hallucinating, Start Evaluating: AgentEval for Your AI Squad

A striker who shoots at his own goal. A midfielder who shares your game plan with the opponent. A player who burns through the entire season's budget in one match.

These aren't hypotheticals—these are AI agents in production.

As the manager of your AI squad, you've assembled talented players that can search, reason, and execute complex plays. But without proper coaching, discipline, and testing, they go rogue. How do you build a championship-caliber team you can trust?

In this session, I'll introduce AgentEval—the .NET evaluation toolkit I built to bring testing discipline to AI agents. Think of it as your coaching staff, analytics team, and referee all in one.

You'll learn to:
- Validate formations: Did your agents execute the play in the right order?
- Enforce team rules: Forbidden moves, required approvals, compliance policies
- Scout consistently: Handle AI unpredictability with statistical confidence
- Track performance: Set standards that bench agents who exceed budget or miss SLAs

Through code demos, you'll see agent tests pass and fail—with feedback that actually helps you improve your squad. Whether you're building customer service agents, automation workflows, or enterprise AI, you'll leave ready to coach a reliable team.

Your agents will follow tactics. Guaranteed.

Jose Luis Latorre Millas

José Luis Latorre Millas is a Microsoft AI MVP, creator of AgentEval, and Software Architect at Swiss Life. He builds tools to make AI agents reliable—because untested agents are expensive chaos.

Zürich, Switzerland

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top