Session

AI Features Don't Crash. They Just Quietly Get Worse.

It worked in the demo. It worked in staging. Your team shipped it, gave each other a nod, and moved on.

Six weeks later, someone changed the prompt. Someone else upgraded the model. A retrieval config got tweaked. Nobody broke anything — but the feature got quietly worse, one response at a time, until a user noticed before you did.
This is the failure mode AI features are built for. No exceptions. No alerts. No obvious moment where something went wrong. Just a slow drift away from the thing you tested and toward something nobody owns.

This session gives you the eval loop that catches drift before your users do. Using a realistic knowledge assistant as the running example, we'll expose three real failure modes — unsupported claims, missed key information, and brittleness after a change — then build a lightweight eval suite that turns each one into a repeatable check. You'll see a compact golden dataset, three scoring approaches, and a regression workflow wired into CI.

No research background required. If you know how to write a test, you already know how to think about this.
Walk away with a practical four-step eval workflow and a template you can drop into whatever AI feature your team is shipping right now

Ron Dagdag

Microsoft AI MVP and Research Engineering Manager @ Thomson Reuters

Fort Worth, Texas, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top