Scaling Production RAG Systems with Kubernetes

Deploying LLM applications at scale requires reliable cloud native infrastructure. This talk explores how to run production RAG pipelines using Kubernetes, covering vector search services, scalable inference with vLLM, and distributed embedding pipelines. We will discuss observability, cost optimization, and autoscaling strategies for real world AI workloads running in containerized environments.

Samir Sengupta

AI/ML ENGINEER, BUILDING AGI

New City, New York, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Scaling Production RAG Systems with Kubernetes

Samir Sengupta

Links

Actions