Session

Your AI Lies. Let's Catch It. Practical LLM Evaluation for Engineers Who Ship

Your AI doesn't crash. It doesn't throw a stack trace. It doesn't fail a build. It just quietly tells your users the wrong thing — with complete confidence, perfect grammar, and zero remorse.
Hallucination isn't a bug you can reproduce on demand. Bias doesn't show up in your logs. Prompt injection won't trigger your monitoring alerts. And unlike the software bugs you're used to hunting, these failures look like success right up until they aren't.
This session is your debugging toolkit for AI systems. We'll cover how evaluation frameworks like DeepEval, RAGAS, and Azure AI Foundry bring the discipline of software testing to language models — from writing your first eval in pytest to red-teaming your system for adversarial attacks before someone else does it for you. You'll learn the metrics that matter (faithfulness, hallucination detection, contextual precision), how to build golden datasets that actually mean something, and how to wire evals into CI/CD so a drop in quality kills the PR — not the product.
No PhD required. If you've written a unit test, you're already halfway there.
By the end, you'll have a four-phase strategy and a working eval setup you can run this week. Because the question was never whether your AI would lie. The question is whether you'll catch it first.

Ron Dagdag

Microsoft AI MVP and Research Engineering Manager @ Thomson Reuters

Fort Worth, Texas, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top