Session

Testing, Evaluating, and Monitoring AI agents using AI Evaluation and OpenTelemetry

In this workshop we'll explore on-device techniques for running and evaluating LLMs using open-source tools and ways of monitoring and improving AI solutions using OpenTelemetry, Aspire, and Microsoft.Extensions.AI.Evaluation and .NET.

Prompt engineering can be surprisingly difficult, but AI can help grade the effectiveness of your AI solutions, and metrics over time can tell an interesting story. We'll see how to do this with the built in quality, safety, and NLP evaluators as well as how to write your own evaluator if you need to.

Additionally, minor changes such as tweaking the wording of a prompt, changing a model, or even adding a new tool to an AI system can all have surprising impacts. This is why we'll explore how to write integration tests that take advantage of these evaluation libraries and can be used to detect drift in your overall AI system quality.

Finally, communicating software is hard, but tools like Aspire, OpenTelemetry, and AI Evaluation reporting can help you quickly build visuals to share with other team members to show how your system is trending over time.

By the time we're done, you'll have more tools at your disposal to build and consistently deliver high quality LLM solutions.

Matthew Hope Eland

Wizard at Leading EDJE

Columbus, Ohio, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top