
Yuan Tang
Senior Principal Software Engineer at Red Hat; Project Lead at Argo, Kubeflow, and KServe
West Lafayette, Indiana, United States
Actions
Yuan is a Senior Principal Software Engineer at Red Hat AI. Previously, he has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source communities, including Argo, Kubeflow, Kubernetes, and CNCF. He's also a maintainer of many popular open source projects, including Llama Stack and XGBoost. In addition, Yuan authored three technical books and published numerous impactful papers. He's a regular conference speaker, technical advisor, leader, and mentor at various organizations.
Links
Area of Expertise
Topics
When Metaflow met Argo: planet scale, production ready ML systems
Over the past five years, many new tools have emerged in the field of MLOps, and the existing ones have matured. Yet, there is no clear picture of a canonical stack for productive data science and ML organizations. Through our work with open-source Metaflow, which was started at Netflix in 2017, and Argo Project, we have had an opportunity to work with hundreds of companies at various maturity levels regarding ML infrastructure.
We'll introduce the unique challenges met when incorporating machine learning, data, and the entropy of the real world into existing software stacks. In particular, we'll reason through what needs to change about how we approach building software when experimentation and data become central to it.
We'll talk about the new full stack of machine learning that emerges and how data scientists and machine learning engineers can interact with the stack - with a particular focus on how we went about building it with Metaflow, Argo Workflows, and Argo Events.
WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes
The emergence of Generative AI (GenAI) has introduced new challenges and demands in AI/ML inference, necessitating advanced solutions for efficient serving infrastructures. The recently created Kubernetes Working Group Serving (WG Serving) is dedicated to enhancing serving workload on K8s, especially for hardware-accelerated AI/ML inference. This group prioritizes compute-intensive inference scenarios using specialized accelerators, benefiting various serving workloads such as web services and stateful databases.
This session will dive into WG Serving's initiatives and workstreams. We will spotlight discussions and advancements in each workstream. We are also actively looking for feedback and partnership with model server authors and other practitioners who want to utilize powers of K8s for their serving workloads. Join us to gain insight into our work and learn how to contribute to advancing AI/ML inference on K8s.
Unlocking Potential of Large Models in Production
The recent paradigm shift from traditional ML to GenAI and LLMs has brought with it a new set of non-trivial LLMOps challenges around deployment, scaling, and operations that make building an inference platform to meet all business requirements an unsolved problem.
This talk highlights these new challenges along with best-practices and solutions for building out large, scalable, and reliable inference platforms on top of cloud native technologies such as Kubernetes, Kubeflow, Kserve, and Knative.
Which tools help effectively benchmark and assess the quality of an LLM? What type of storage and caching solutions enable quick auto-scaling and model downloads? How can you ensure your model is optimized for the specialized accelerators running in your cluster? How can A/B testing or rolling upgrades be accomplished with limited compute? What exactly do you monitor in an LLM? In this session we will use KServe as a case study to answer these questions and more.
Production-ready AI Platform on Kubernetes
In recent years, advances in ML/AI have made tremendous progress yet designing large-scale data science and machine learning applications still remain challenging. The variety of machine learning frameworks, hardware accelerators, cloud vendors as well as the complexity of data science workflows brings new challenges to MLOps. For example, it’s non-trivial to build an inference system that’s suitable for models of different sizes, especially for LLMs or large models in general.
This talk presents various best practices and challenges on building large, efficient, scalable, and reliable AI/ML platforms using cloud-native technologies such as Kubernetes, Kubeflow, and KServe. We will deep dive into a reference platform dedicated for modern cloud-native AI infrastructure.
Advancements in AI/ML Inference Workloads on Kubernetes from WG Serving and Ecosystem Projects
The emergence of Generative AI (GenAI) has introduced new challenges and demands in AI/ML inference, necessitating advanced solutions for efficient serving infrastructures. The Kubernetes Working Group Serving (WG Serving) is dedicated to enhancing serving workload on K8s, especially for hardware-accelerated AI/ML inference. This group prioritizes compute-intensive inference scenarios using specialized accelerators, benefiting various serving workloads such as web services and stateful databases.
This session will dive into recent progress and updates on WG Serving's initiatives and workstreams. We will spotlight discussions and advancements in each workstream. We are also actively looking for feedback and partnership with model server authors and other practitioners who want to utilize powers of K8s for their serving workloads. Join us to gain insight into our work and learn how to contribute to advancing AI/ML inference on K8s.
Past, Present, and Future of Cloud Native Model Serving
As AI adoption accelerates, the need for scalable, flexible, and efficient model serving has become critical. This talk will explore the evolution of cloud-native model serving platforms — from early bespoke setups 2-3 years ago to today’s dynamic, Kubernetes-native solutions. We'll examine current challenges in productionizing large models, such as performance, cost, and portability, and highlight how open source innovation is addressing them. Finally, we’ll look ahead at emerging trends, including technologies that help with distributed inference, inference orchestration, disaggregated serving, KV-cache management, autoscaling, hardware acceleration, etc. Attendees will gain a clear picture of how the model serving landscape is evolving and how to prepare for what’s next.
Beyond Prototypes: Production-ready ML systems with Metaflow and Argo Project
The wave of exciting new AI demos & applications over the past year has highlighted a familiar engineering challenge: Beyond prototypes, how can we ensure that we can improve our systems rapidly, & maintain them without too many headaches, recognizing the amount of real-world complexity involved? While the question is not new, it has been exacerbated by the presence of data & rapidly evolving ML.
Our talk will delve into this, focusing on a unique feature of ML systems—they're not isolated islands but deeply intertwined with their surrounding environment, responsive to real-time changes. We'll draw from our experiences at Netflix to explore a practical, user-friendly approach to composing & orchestrating advanced ML systems.
We'll discuss our solution, Metaflow, an open-source ML platform we developed at Netflix, which now powers thousands of ML projects at 23&Me, Disney, Goldman Sachs, Sanofi, etc., & Argo Workflows. We'll highlight our recent work that builds on top of Argo Events.
Engaging the Kubeflow Community: Building an Enterprise-Ready AI/ML Platform
Organizations often ask themselves when building a new solution whether to develop everything from scratch or integrate existing tools into an end-to-end solution. Kubeflow’s journey was exactly at this crossroads when it started. Part of CNCF as an incubating project, Kubeflow integrates a series of leading open source tools such as Knative, Istio, KServe amongst other AI/ML tools for both predictive and GenAI/LLM applications.
In this panel we will discuss the trade-offs between building a product based on existing tools vs. a DIY approach. We will delve into the key considerations of adding new enhancements and components, based on the developments in the industry and user adoption. The panel will highlight the challenges of being an official distribution of such a product and customer use cases and the influence they had over the project’s roadmap. We will talk through the trials and tribulations that paid off in a win-win outcome for the Kubeflow community and our users.
Engaging the KServe community, the impact of integrating a solutions with standardized CNCF projects
Building a new solution and contemplating whether or not the OSS path is right for you? Wondering where to get started with a large cloud initiative and where the pitfalls may lie? Curious to know all the benefits waiting if your organization embraces a rich CNCF ecosystem?
In this talk we will discuss the trade-offs between building a product on a full OSS platform vs. a DIY approach. We will delve into the issues of working with internal stakeholders or partners to embrace an OSS community and will cover the benefits and scaling factors that come when embracing open standards.
We will use the recent integration of NVIDIA NIM into KServe as a case study and talk through the trials and tribulations that paid off in a win-win-win situation for our solutions, the OSS projects, and our users. We will cover Kubeflow, Knative, Istio, KServe, and wg-serve as well as a network of companies building enterprise K8s platforms and enterprise AI applications on top of these foundations.
Panel: Engaging the Kubeflow Community: Building an Enterprise-Ready AI/ML Platform
Organizations often ask themselves when building a new solution whether to develop everything from scratch or integrate existing tools into an end-to-end solution. Kubeflow’s journey was exactly at this crossroads when it started. Part of CNCF as an incubating project, Kubeflow integrates a series of leading open source tools such as Knative, Istio, KServe amongst other AI/ML tools for both predictive and GenAI/LLM applications.
In this panel we will discuss the trade-offs between building a product based on existing tools vs. a DIY approach. We will delve into the key considerations of adding new enhancements and components, based on the developments in the industry and user adoption. The panel will highlight the challenges of being an official distribution of such a product and customer use cases and the influence they had over the project’s roadmap. We will talk through the trials and tribulations that paid off in a win-win outcome for the Kubeflow community and our users.
Project Lightning Talk + Maintainer Track + Contribfest: KubeCon + CloudNativeCon Europe 2025 Sessionize Event
CNCF-hosted Co-located Events Europe 2025 Sessionize Event
KubeCon + CloudNativeCon North America 2024 Sessionize Event
Project Lightning Talk + ContribFest + Maintainer Track: KubeCon + CloudNativeCon North America 2024 Sessionize Event
PlatformCon 2024 Sessionize Event
Maintainer Track + ContribFest: KubeCon + CloudNativeCon Europe 2024 Sessionize Event
KubeCon + CloudNativeCon North America 2023 Sessionize Event

Yuan Tang
Senior Principal Software Engineer at Red Hat; Project Lead at Argo, Kubeflow, and KServe
West Lafayette, Indiana, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top