Beyond the Monolith: Building Disaggregated LLM Serving Pipelines on K8s

The standard monolithic pattern for deploying LLMs on Kubernetes is hitting a breaking point. As context windows expand, the resource conflict between the compute-intensive "Prefill" phase and the memory-bound "Decode" phase destroys performance and dramatically inflates cloud costs. This session explores the architecture of prefill/decode disaggregation, a method that dismantles the single-pod model in favour of a specialised pipeline where prompt processing and token generation scale independently.

A deep dive into implementing this architecture using LMCache to create a high-speed, shared KV store across your cluster. We will cover the engineering realities of designing separate node pools using multi-tier storage and solving the network physics involved in moving gigabytes of context data between pods faster than an H100 can recompute them. Attendees will gain a blueprint for a persistence layer that achieves 5x throughput gains by reusing computation rather than repeating it.

Arya Soni

Gurugram, India

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Beyond the Monolith: Building Disaggregated LLM Serving Pipelines on K8s

Arya Soni

Links

Actions