Session

Cut Your LLM Bill with OpenTelemetry

Prototypes are cheap. Production is not. LLM bills can turn into a nasty surprise the moment you hit production. This deep-dive workshop shows how to make LLM cost measurable, understandable, and optimisable.

The workshop is for anyone running LLMs in production - data scientists, software engineers, and tech leads. We'll build an end-to-end cost tracking solution around a sample Python application. You'll instrument the code with an SDK, spin up the OpenTelemetry collector and storage, and analyse the results in Grafana. We'll go beyond measuring: you'll run experiments with model swaps and prompt context inflation. Finally we'll apply optimisations like prompt caching and batch processing to reduce costs while preserving quality.

By the end, you'll know the path to control costs of LLM applications - you'll be prepared to measure, explain, and optimise your GenAI bills.

### Plan
We'll use a sample Python HTTP app with two endpoints - one for low-latency single-request use cases and one for high-throughput batch processing.
1. Setup
1. Instrument the app to measure LLM cost-related metrics (token counts, costs) with OpenTelemetry.
2. Set up OpenTelemetry collector and storage and send the telemetry data with docker/docker compose.
3. Set up observability tools to analyse the metrics (Grafana).
2. Experiment
1. Experiment with switching models to see how it affects costs.
2. Experiment with inflated prompt context to observe cost spikes
3. Optimise
1. Apply prompt caching to reduce the cost.
2. Apply batch API to the batch-processing endpoint and measure the cost impact.

Maciej Rząsa

Senior Software Engineer at Chattermill

Rzeszów, Poland

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top