The Liar in Your Pipeline: Catching AI Failures Before They Catch You

Picture this: your AI assistant goes live on Monday. By Wednesday, it's hallucinating product prices, citing made-up policies, and your support tickets are climbing. No unit test failed. Your pipeline was green. The liar was already inside.
AI applications break differently from traditional software — and most teams are still testing them like it's 2016. In this session, we close that gap using Microsoft Foundry's evaluation toolkit. You'll watch a complete quality pipeline get built live: an evaluation dataset constructed from real failure modes, automated scoring across groundedness, coherence, and safety metrics, and a CI/CD gate wired to block any deployment that doesn't meet the bar. The demo runs against a RAG-based assistant — the kind of app most of us are actually shipping.
By the end, you'll know how to make your AI pipeline tell the truth — or at least get caught when it doesn't.

Ron Dagdag

Microsoft AI MVP and Research Engineering Manager @ Thomson Reuters

Fort Worth, Texas, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

The Liar in Your Pipeline: Catching AI Failures Before They Catch You

Ron Dagdag

Links

Actions