Making Kubernetes GPU- and AI-Ready on Cloud: The Missing Runtime Pieces

Kubernetes is becoming the go-to platform for AI workloads, with GPU Operator serving as a key enabler by simplifying GPU management. However, large-scale AI demands more: managing diverse high-performance networking fabrics, tuning configurations across different cloud and on-prem environments, and optimizing container environments for AI/ML workloads.

To address this, we propose an accelerator-optimized runtime stack to manage underlying operators and components such as GPU Operator, Network Operator, DRA driver, etc. It automates deployment, configuration, and lifecycle management of these components, delivering a production-ready accelerated container environment that “just works” for AI/ML workloads on Kubernetes.

In this talk, we present the design and implementation of this runtime stack for NVIDIA DGX Cloud's Kubernetes AI platform, sharing real-world lessons and operational experience to help you efficiently run and scale AI workloads on Kubernetes.

Yuan Chen

Nvidia, Software Engineer, Kubernetes, Scheduling, GPU, AI/ML Infrastructure, Resource Management

San Jose, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Making Kubernetes GPU- and AI-Ready on Cloud: The Missing Runtime Pieces

Yuan Chen

Links

Actions