Session

Stop Guessing, Start Testing: A Practical Framework for Evaluating LLMs and Prompts

Working with LLMs can feel like trying to communicate in a brand new language. Your prompt says one thing, the model infers another, and the output is not what you expected. It can make you question your communication skills or whether AI is all hype. What if the problem lies somewhere in the middle?

When you write a function, you know what the end result should look like. LLMs introduce variability that can't be resolved by writing the "perfect prompt." They don't know your preferences, standards, or what "good" means until it is defined explicitly and verified systematically. Without that, it’s all guesswork.

Developers know that testing surfaces problems early, yet most guidance focuses on enhancing prompts and maximizing context. In this talk, I'll walk through a practical framework for bringing software testing discipline to working with LLMs: defining what the LLM should do, establishing human-reviewed scoring metrics, and building structured test cases that generate valuable insight rather than just more outputs to scroll through.

Danielle Maxwell

Rotational Labs, Software Engineer

Atlanta, Georgia, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top