Session
Which GPU sharing strategy is right for you? A Comprehensive Benchmark Study using DRA
Dynamic Resource Allocation (DRA) is one of the most anticipated features to ever make its way into Kubernetes. It promises to revolutionize the way hardware devices are consumed and shared between workloads. In particular, DRA unlocks the ability to manage heterogeneous GPUs in a unified and configurable manner without the need for awkward solutions shoehorned on top of the existing device plugin API.
In this talk, we use DRA to benchmark various GPU sharing strategies including Multi-Instance GPUs, Multi-Process Service (MPS), and CUDA Time-Slicing. As part of this, we provide guidance on the class of applications that can benefit from each strategy as well as how to combine different strategies in order to achieve optimal performance. The talk concludes with a discussion of potential challenges, future enhancements, and a live demo showcasing the use of each GPU sharing strategy with real-world applications.
Yuan Chen
Nvidia, Software Engineer, Kubernetes, Scheduling, GPU, AI/ML, Resource Management
San Jose, California, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top