Yuan Chen
Nvidia, Software Engineer, Kubernetes, Scheduling, GPU, AI/ML, Resource Management
San Jose, California, United States
Actions
Yuan Chen is a Principal Software Engineer at Nvidia. Before joining Nvidia, Yuan served as a Staff Software engineer at Apple, where he contributed to the development of Apple's Kubernetes infrastructure beginning in 2019. Yuan has actively contributed to the Kubernetes projects and the CNCF community, and has delivered over 10 talks at KubeCon. He was a Principal Architect at JD.com and a Principal Researcher at Hewlett Packard Labs. He received a Ph.D. in Computer Science from Georgia Tech.
Links
Area of Expertise
Topics
Which GPU sharing strategy is right for you? A Comprehensive Benchmark Study using DRA
Dynamic Resource Allocation (DRA) is one of the most anticipated features to ever make its way into Kubernetes. It promises to revolutionize the way hardware devices are consumed and shared between workloads. In particular, DRA unlocks the ability to manage heterogeneous GPUs in a unified and configurable manner without the need for awkward solutions shoehorned on top of the existing device plugin API.
In this talk, we use DRA to benchmark various GPU sharing strategies including Multi-Instance GPUs, Multi-Process Service (MPS), and CUDA Time-Slicing. As part of this, we provide guidance on the class of applications that can benefit from each strategy as well as how to combine different strategies in order to achieve optimal performance. The talk concludes with a discussion of potential challenges, future enhancements, and a live demo showcasing the use of each GPU sharing strategy with real-world applications.
Navigating AI/ML Workloads in Large-Scale Kubernetes Clusters
Managing AI/ML workloads with GPUs on Kubernetes presents formidable challenges due to complex job management and scheduling, along with the need for substantial specialized computing resources, such as GPUs, which are not readily available.
This talk introduces Knavigator, an open-source framework and toolkit designed to support developers of Kubernetes systems. Knavigator facilitates the development, testing, troubleshooting, benchmarking, chaos engineering, performance analysis, and optimization of AI/ML control planes with GPUs in Kubernetes.
Knavigator enables tests on Kubernetes clusters using both real and virtual GPU nodes, allowing for large-scale testing with limited resources, such as a laptop.
Through real examples and demos, this presentation will showcase Knavigator's capabilities in feature validation, performance, load testing, and reliability testing. It will also highlight how Knavigator enhances the fault tolerance of large model training jobs in Kubernetes.
Enhancing Reliability and Fault-Tolerance Testing in Kubernetes Using KWOK
Kubernetes has emerged as a popular platform for running AI workloads with GPUs. As a result, enhancing reliability has become increasingly important. This talk will demonstrate how the popular Kubernetes testing toolkit KWOK has been enhanced for reliability and fault-tolerance testing.
Shiming Zhang, the creator and maintainer of KWOK, and Yuan Chen from NVIDIA, will outline KWOK's capabilities to simulate and manage a large number of virtual nodes and pods on a laptop, and discuss practical use cases at DaoCloud and NVIDIA.
The session will provide examples and demos, offering a deep dive into KWOK’s latest chaos engineering features, including its ability to simulate failures by introducing targeted fault injections into GPU nodes and pods, thereby facilitating reliability testing, and evaluation of fault-tolerance mechanisms for improving the resilience of AI workloads in Kubernetes.
Attendees will gain practical experience and knowledge about KWOK and its advanced capabilities.
Decoding and Taming the Soaring Costs of Large Language Models
Running generative AI applications can be prohibitively expensive. This talk unravels the soaring costs of inference - providing real-time responses to user queries using large language models with billions or trillions of parameters.
We estimate the staggering costs to serve individual user queries by accessing massive models, and delve into the performance and cost challenges, from GPU hardware accelerators to latencies in running ChatGPT.
We explore the potential for improving resource efficiency and performance of running large-scale AI applications on Kubernetes and in cloud-native environments through GPU resource sharing, advanced scheduling, and dynamic batching.
We hope this talk will spur further discussion and collaboration within the CNCF community around taming the costs of deploying and scaling generative AI using cloud native technologies and best practices.
Accelerating AI Workloads with GPUs in Kubernetes
As AI and machine learning become ubiquitous, GPU acceleration is essential for model training and inference at scale. However, effectively leveraging GPUs in Kubernetes brings challenges around efficiency, configuration, extensibility, and scalability.
This talk provides a comprehensive overview of capabilities in Kubernetes and GPUs to address these challenges, enabling seamless support for next-generation AI applications.
The session will cover:
- GPU resource sharing mechanisms such as MPS (Multiple-Process Service), Time-Slicing, MIG (Multi-Instance GPU), and vGPU (virtualization with vGPU) on Kubernetes.
- Flexible accelerator configuration via DevicePlugins and Dynamic Resource Allocation with ResourceClaims and ResourceObjects in Kubernetes.
- Advanced scheduling and resource management features including gang scheduling, topology-aware scheduling, quota management, and job queues.
- The open-source efforts in Volcano, Yunikorn and Slurm for supporting GPU and AI workloads in Kubernetes.
Securing Kubernetes: Migrating from Long-Lived to Time-Bound Tokens Without Disrupting Existing Apps
In earlier versions of Kubernetes, secrets containing long-lived tokens are automatically generated for service accounts, posing security risks as these tokens do not expire and could be shared among pods and users. Recent updates have introduced TokenRequestAPI to obtain time-bound tokens with bounded lifetimes, enhancing security practices and discouraging the use of long-lived tokens.
Yuan Chen and James Munnelly will delve into the details of these changes, shedding light on their impact and providing strategies for migrating existing long-lived tokens to time-bound tokens without disrupting current customer applications. Additionally, they will share best practices for tracking and monitoring different token uses within a Kubernetes cluster. This includes legacy long-lived tokens, time-bound tokens created via TokenRequestAPI, and manually managed long-lived tokens. They will also address effective management of time-bound token expiry in large-scale Kubernetes clusters.
From Novice to Contributor: Making Your Mark in Kubernetes and CNCF Open Source Projects
Yuan Chen, an active Kubernetes open source contributor from Apple, will guide open source novices on a journey to make their initial contributions in the world of Kubernetes and CNCF projects. Drawing from his personal experience, Yuan will provide a comprehensive roadmap, offering a step-by-step walkthrough on filing issues, submitting pull requests, engaging in fruitful discussions, and navigating the review process.
Yuan will address the potential challenges that may arise along the open source path and share effective strategies for conflict resolution. Additionally, he will provide invaluable insights into time management, empowering individuals to strike a harmonious balance between personal/work commitments and their open source endeavors.
This talk is designed to empower open source novices, equipping them with the knowledge and confidence to make their initial and impactful contributions that truly count within the CNCF community.
Yuan Chen
Nvidia, Software Engineer, Kubernetes, Scheduling, GPU, AI/ML, Resource Management
San Jose, California, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top