Open Source to Enterprise: Scaling LLM/Diffusion Model Inference in Kubernetes

Our session will unveil how Kubernetes-based cloud-native technologies power the transformation of cutting-edge LLMs and diffusion models from lab experiments to massively scalable SaaS services. Key highlights include:
1. Cloud-Native Scaling for AI Inference: Containerized deployment, dynamic scaling, and distributed scheduling on Kubernetes support millions of daily inference requests, with GPU utilization boosted by 40%;
2. Efficiency Breakthroughs in Inference: Through model quantization, distributed parallelism, and caching strategies, we achieved a 60% reduction in LLM inference latency and 35% cost savings for video generation;
3. SaaS Productization Journey: From API design to billing systems, learn how we packaged complex inference technologies into user-friendly services, driving 300% user growth and serving 500+ global enterprise clients;
4. Battle-Tested Solutions: Lessons from multi-model deployment and multi-tenant isolation scenarios, with open-source toolkits and reusable architecture templates for the community.

Samzong Lu

PM at DaoCloud, AI/LLMOps PM Leader, CNCF Multiple Project Contributors, Open Source Enthusiast

Shanghai, China

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Open Source to Enterprise: Scaling LLM/Diffusion Model Inference in Kubernetes

Samzong Lu

Links

Actions