Session
AgentOps for Real: Evals, Tracing, and Regression Tests for AI Agents on Azure
It's easy to build an AI agent demo in Azure. It's much harder to build one that stays reliable after prompt changes, model swaps, tool failures, and real user behavior start hitting it in production.
In this session, I'll show a practical AgentOps approach for Azure-based agents using evaluation datasets, golden tasks, tracing, telemetry, and regression testing. We'll walk through how to instrument an agent with observability using OpenTelemetry, Azure Monitor, and Application Insights - how to score responses with both rules and model-based judges, and how to catch regressions before they quietly torch trust in production. We'll also look at common failure modes in multi-step agent flows: tool misuse, hallucinated handoffs, brittle prompts, and silent drift after updates.
Built on Azure AI Foundry, Microsoft Agent Framework, and Azure Monitor, this session gives you a concrete playbook for moving from "cool prototype" to "production system people can actually depend on."
Three takeaways:
- Build an eval set that measures what your agent actually needs to do
- Add tracing and telemetry that make failures diagnosable
- Turn agent quality into a repeatable regression gate, not a vibe check
Brian Haydin
Microsoft Cloud & AI Architect | Azure • GitHub • Power Platform | Demo-first sessions (US + Europe)
Milwaukee, Wisconsin, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top