AI FinOps: Making Production LLMs Affordable Without Cutting Quality

Production LLM costs spiral quickly. This talk shares the cost-control patterns that work at enterprise scale: model tiering and routing with LiteLLM/Portkey, semantic caching with GPTCache, prompt compression, batch inference where appropriate, eval-driven model downgrades, and the observability discipline (Datadog LLMObs, Langfuse) that surfaces cost regressions before the bill arrives.
Takeaways: A model-routing decision framework. Semantic-caching patterns that actually save money. A cost-observability checklist for production LLMs.

Preferred length: 30 min.
Audience: AI engineers, engineering managers, AI platform leads.
Level: Intermediate.
First public delivery: 2026.

Anwar Khan

Production AI Engineering — Agentic AI · MCP · Knowledge RAG · LLM Engineering | Speaker · Author · Mentor

Moline, Illinois, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

AI FinOps: Making Production LLMs Affordable Without Cutting Quality

Anwar Khan

Links

Actions