A Quick Guide to Setting Up DRA and Managing GPU Resources for AI/ML Workloads in Kubernetes

Dynamic Resource Allocation (DRA) introduces a new paradigm for requesting, configuring, and sharing GPU resources in Kubernetes. It enables fine-grained, flexible resource management for AI/ML workloads in a unified and customizable manner.

This session provides a concise guide and live demo on installing, configuring and using DRA. Attendees will learn how to use DRA to effectively manage GPU resources in a kind cluster on a local Linux machine, covering the following use cases:

- A single pod/container using a dedicated GPU
- Multiple containers within a single pod sharing a dedicated GPU
- Multiple pods/containers sharing a dedicated GPU using different strategies: Time-Slicing and Multiple Process Service (MPS)
- Multiple pods/containers sharing multiple GPUs with Multiple Instance GPU (MIG)
- Layered GPU sharing across multiple pods/containers, such as Time-Slicing on MIG and MPS on MIG

Yuan Chen

Nvidia, Software Engineer, Kubernetes, Scheduling, GPU, AI/ML Infrastructure, Resource Management

San Jose, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

A Quick Guide to Setting Up DRA and Managing GPU Resources for AI/ML Workloads in Kubernetes

Yuan Chen

Links

Actions