Session
Your AI Lies. Let's Catch It
Your AI feature passed every test you wrote. It demo'd perfectly. Your team shipped it with confidence.
Then a user asked it something slightly off-script — and it answered with total authority, zero accuracy, and a smile.
AI systems fail differently than software. There's no stack trace for a hallucination. No alert for a biased response. No build failure for a prompt that gets quietly hijacked. The bug looks like a feature until it isn't — and by then, it's already in production, already in front of users.
This session is your toolkit for catching AI before it lies to your users. We'll walk through a practical four-phase eval loop: identify your failure modes, build a small but meaningful golden dataset, write semantic checks that go beyond "does it return something," and automate the whole thing in CI so regressions break the build instead of the product.
You'll see a live demo building evals around a real AI feature, deliberately breaking it, and watching the pipeline catch it. Hallucination detection, faithfulness scoring, adversarial red-teaming — all shown with patterns you can drop into your own stack, whatever it is.
No research background needed. If you've written a unit test, the mindset already transfers.
Walk away with a four-phase strategy and a working eval template you can run on your own AI feature this week.
Ron Dagdag
Microsoft AI MVP and Research Engineering Manager @ Thomson Reuters
Fort Worth, Texas, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top