Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes

Dynamic Resource Allocation (DRA) is new Kubernetes feature that puts resource scheduling in the hands of 3rd-party developers. It moves away from the limited "countable" interface for requesting access to resources (e.g. "nvidia.com/gpu: 2"), providing an API more akin to that of persistent volumes.

In the context of GPUs, this unlocks a host of new features without the need for awkward solutions shoehorned on top of the existing device plugin API.

These features include:
* Controlled GPU Sharing (both within a pod and across pods)
* Multiple GPU models per node (e.g. T4 and A100)
* Specifying arbitrary constraints for a GPU (min/max memory, device model, etc.)
* Dynamic allocation of Multi-Instance GPUs (MIG)
* … the list goes on ...

In this talk, you will learn about the DRA resource driver we have built for GPUs. We walk through each of the features it provides, including its integration with the NVIDIA GPU Operator. We conclude with a demo of how you can get started today.

Kevin Klues

Distinguished Engineer at NVIDIA

Berlin, Germany

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes

Kevin Klues

Links

Actions