
Kaushik Mitra
Software Engineer, Google
Actions
Kaushik Mitra holds a Ph.D. in Theoretical Physics and transitioned into the exciting world of AI, driven by the deep mathematical foundations common to both fields. Currently a Software Engineer at Google, with previous roles at Meta and Microsoft, he specializes in Kubernetes-based AI inference systems. Kaushik is passionate about foundational AI principles and distributed systems. Outside of technology, he enjoys performing arts, bringing both technical depth and creative energy to his presentations.
Optimizing LLM Inference: Kubernetes-native Gateway for Efficient, Fair AI Serving
As AI engineers, we've experienced firsthand how deploying Large Language Models (LLMs) and other diverse AI workloads introduces unique infrastructure complexities, especially around effective GPU and TPU resource management. Traditional Kubernetes services often fall short, leading to poor utilization, inconsistent performance, and challenging governance.
Enter Inference Gateway, an open-source, Kubernetes-native inference solution specifically engineered for the nuanced demands of AI serving. By incorporating sophisticated load-balancing techniques, fairness-aware queuing mechanisms, and latency-sensitive scheduling algorithms, Inference Gateway significantly enhances resource utilization and ensures equitable distribution of GPU/TPU resources across multiple concurrent AI workloads.
Initial production benchmarks illustrate compelling results, achieving 30-50% reductions in latency and increases in throughput compared to traditional Kubernetes deployments. Moreover, Inference Gateway simplifies AI governance, providing predictable and controllable resource allocation critical for enterprise-scale deployments.
Join this session to deep dive into Inference Gateway's architecture, discuss advanced resource allocation strategies, explore how fairness and queuing mechanisms are implemented, and learn how this cutting-edge solution sets new standards in AI inference serving.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top