Past, Present, and Future of Cloud Native Model Serving

As AI adoption accelerates, the need for scalable, flexible, and efficient model serving has become critical. This talk will explore the evolution of cloud-native model serving platforms — from early bespoke setups 2-3 years ago to today’s dynamic, Kubernetes-native solutions. We'll examine current challenges in productionizing large models, such as performance, cost, and portability, and highlight how open source innovation is addressing them. Finally, we’ll look ahead at emerging trends, including technologies that help with distributed inference, inference orchestration, disaggregated serving, KV-cache management, autoscaling, hardware acceleration, etc. Attendees will gain a clear picture of how the model serving landscape is evolving and how to prepare for what’s next.

Yuan Tang

Senior Principal Software Engineer at Red Hat; Project Lead at Argo, Kubeflow, and KServe

West Lafayette, Indiana, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Past, Present, and Future of Cloud Native Model Serving

Yuan Tang

Links

Actions