Session

Choosing the Smallest LLM That Won’t Completely Fail You

Bigger isn’t always better, especially when it comes to running language models locally. In this session, we’ll explore how to evaluate and benchmark Small Language Models (SLMs) using Go, Docker, and Testcontainers.

You’ll learn how to build a framework in Go that leverages Docker's Model Runner as inference engine to automatically spin up SLMs, run controlled evaluation scenarios, and collect observability metrics. We’ll define an Evaluator Agent that executes a battery of standard prompts across multiple models, an approach that helps you understand performance, accuracy, and resource trade-offs in practical developer setups.

We’ll move from building a reusable evaluation harness to defining and orchestrating prompts as tests for different models. You’ll see how to instrument Go benchmarks with metrics and traces to visualize behavior instantly and make informed decisions. And of course, you’ll walk away with practical insights on selecting the smallest model that won’t fail you.

By the end, you’ll have a repeatable approach for testing and comparing language models.

Warm up your GPUs, but less than you think.

Manuel de la Peña

Docker, Staff Software Engineer

Toledo, Spain

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top