Accelerating AI Workloads with GPUs in Kubernetes

As AI and machine learning become ubiquitous, GPU acceleration is essential for model training and inference at scale. However, effectively leveraging GPUs in Kubernetes brings challenges around efficiency, configuration, extensibility, and scalability.

This talk provides a comprehensive overview of capabilities in Kubernetes and GPUs to address these challenges, enabling seamless support for next-generation AI applications.

The session will cover:

- GPU resource sharing mechanisms such as MPS (Multiple-Process Service), Time-Slicing, MIG (Multi-Instance GPU), and vGPU (virtualization with vGPU) on Kubernetes.

- Flexible accelerator configuration via DevicePlugins and Dynamic Resource Allocation with ResourceClaims and ResourceObjects in Kubernetes.

- Advanced scheduling and resource management features including gang scheduling, topology-aware scheduling, quota management, and job queues.

- The open-source efforts in Volcano, Yunikorn and Slurm for supporting GPU and AI workloads in Kubernetes.

Yuan Chen

Nvidia, Software Engineer, Kubernetes, Scheduling, GPU, AI/ML Infrastructure, Resource Management

San Jose, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Accelerating AI Workloads with GPUs in Kubernetes

Yuan Chen

Links

Actions