Session

Your AI Coding Assistant Customizations Have Zero Tests. Let's Fix That.

You'd never ship a function without a test. But the instruction files and agent definitions shaping every AI response in your codebase? Those get a code review where the best anyone can do is read the markdown and hope.

Teams are shipping these customizations alongside every feature, and reviewers are doing their best. But there's nothing to run. A misconfigured instruction file won't fail a build or throw an exception. It surfaces later, when a prompt returns output that's subtly, consequentially wrong. Same prompt, different customizations, wildly different results. It's "works on my machine" for AI.

This talk introduces practical evaluation methods for the full range of AI assistant customizations — instruction files, agent definitions, skills, and rules. You'll learn how to define what "correct" means for an instruction file or skill, how to choose between deterministic checks, LLM-as-judge rubrics, and human review, and how to wire evaluations into your existing PR workflow — so customizations face the same quality bar as the rest of your codebase.

You'll leave with a concrete framework for making AI customizations something your team can verify, not just eyeball.

Jurre Brandsen

Software Engineer at Info Support

Utrecht, The Netherlands

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top