Supercharging AI Inference on K8s: Demystifying KV Cache, LM Cache & Smart Routing

Running LLMs in production on Kubernetes is no longer just about deploying containers—it’s about surviving GPU scarcity, unpredictable workloads, and soaring inference bills. For many teams, scaling LLMs still feels like navigating a maze of latency and memory bottlenecks.

But what if you could unlock dramatically higher efficiency using engineering fundamentals rather than exotic hardware?

In this session, I break down how modern architectures—KV Cache, LM Cache, and cache-aware routing—transform Kubernetes into a highly optimized inference engine. The talk will include live demos, illustrations and sandbox environments for users to try out hands-on.

Raghavendra Sirigeri

Founder, Questodev

Bengaluru, India

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Supercharging AI Inference on K8s: Demystifying KV Cache, LM Cache & Smart Routing

Raghavendra Sirigeri

Links

Actions