© Mapbox, © OpenStreetMap

Speaker

Samzong Lu

Samzong Lu

PM at DaoCloud, AI/LLMOps PM Leader, CNCF Multiple Project Contributors, Open Source Enthusiast

Shanghai, China

Actions

- Samzong is Product Manager (Focus on AI/LLMOps, Muti-Cluster, Cluster LCM, Microservice, ServiceMesh )
- Kubernetes / Kubernetes-sigs active Contributor
- Karmada active Contributor, member
- Istio active Contributor, member
- CNCF Multiple Project Contributors
- CNCF Open Source Enthusiast

Area of Expertise

  • Information & Communications Technology
  • Physical & Life Sciences
  • Transports & Logistics

Topics

  • Karmada project member
  • Isito project member
  • Product Manager
  • OpenSource

把智能路由收口到网关:vLLM Semantic Router 落地复盘

用户习惯选最大的模型、开最深的思考,但大部分请求根本用不到那么重的算力。智能路由自动把请求分给合适的模型,体验不掉,延迟更低,这也是当前大模型推理优化一个非常重要的方向。

但是如果从传统网关来处理,会导致路由逻辑散在业务服务里,结果通常是策略失控,排障链路越来越长。

这场分享讲一条更稳的做法:把决策收口到网关侧的 Envoy ExtProc。

我会用 semantic-router 走完整链路:请求进入网关后如何打上路由信号,Gateway 资源如何把流量落到具体后端。

同时对比两种常见部署路线:Istio + Gateway API Inference Extension,以及 Envoy AI Gateway。

LLM-D:面向云原生的大模型部署框架与实践

LLM-D(Large Language Model Deployment)是一套基于 Kubernetes 的大模型部署框架,旨在简化和加速大语言模型在云原生环境中的全生命周期管理。作为一名 AI 开发者和开源贡献者,我探索了如何借助 Kubernetes 及其生态工具,让大模型的部署过程更具可重复性、可扩展性和成本效率。本次分享将介绍 LLM-D 的核心实践模式:从模型容器化、分布式推理优化,到自动化上线与治理。结合 DaoCloud 的真实案例,我将展示如何通过 LLM-D,帮助团队快速从原型验证走向生产级 LLMOps 流水线,让开发者在保持高效交付的同时实现稳定运营。

From Hugging Face to Cloud Native: Igniting the LLM Revolution with Kubernetes and Open Source Tools

In the wave of AI, open-source large language models (LLMs) such as LLaMA, Gemma, and DeepSeek are reshaping the technological landscape, but how to efficiently deploy these models from prototype to production remains a challenge for developers. This presentation will share how to build a scalable and efficient LLMOps pipeline using Kubernetes and open-source tools, covering the entire process from model download to inference optimization. Based on my experience at DaoCloud and open-source projects (such as the Hugging Face model download GUI and Kubernetes configuration tool), I will demonstrate how cloud-native technologies can simplify LLM deployment, including practical cases of automated model distribution, dynamic resource scheduling, and inference acceleration.

Exploring and Solving Challenges in Multi-Cloud, Multi-Cluster Environments with Karmada

More and more enterprises have to deal with increasingly complex business scenarios, and implementing multi-cluster applications can greatly improve the stability and security of application programs. So, how do you manage multiple Kubernetes clusters simultaneously and avoid vendor lock-in? reduce the additional costs associated with inconsistent application delivery in a multi-cluster environment? unify multi-cluster deployment, cross-cluster traffic governance, and security governance for your application programs?
In this session, we will introduce solve these problems using the Karmada project. You will learn how to achieve consistent application delivery in a multi-cluster scenario, unified deployment of application programs, automatic distribution, automatic scaling and fault migration of application programs, cross-cluster dr. during the tutorial, you leverage its functionality to solve various challenges encountered in actual business scenarios.

Open Source to Enterprise: Scaling LLM/Diffusion Model Inference in Kubernetes

Our session will unveil how Kubernetes-based cloud-native technologies power the transformation of cutting-edge LLMs and diffusion models from lab experiments to massively scalable SaaS services. Key highlights include:
1. Cloud-Native Scaling for AI Inference: Containerized deployment, dynamic scaling, and distributed scheduling on Kubernetes support millions of daily inference requests, with GPU utilization boosted by 40%;
2. Efficiency Breakthroughs in Inference: Through model quantization, distributed parallelism, and caching strategies, we achieved a 60% reduction in LLM inference latency and 35% cost savings for video generation;
3. SaaS Productization Journey: From API design to billing systems, learn how we packaged complex inference technologies into user-friendly services, driving 300% user growth and serving 500+ global enterprise clients;
4. Battle-Tested Solutions: Lessons from multi-model deployment and multi-tenant isolation scenarios, with open-source toolkits and reusable architecture templates for the community.

KCD Hangzhou + OpenInfra Days China 2025 Sessionize Event

November 2025 Hangzhou, China

Samzong Lu

PM at DaoCloud, AI/LLMOps PM Leader, CNCF Multiple Project Contributors, Open Source Enthusiast

Shanghai, China

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top