A Deep Dive on How To Leverage the NVIDIA GB200 for Ultra-Fast Training and Inference on Kubernetes

Kubernetes traditionally does not have a mechanism for allocating non-node-local resources. The one exception being persistent volumes, which allow a user to attach the same volume to multiple pods running on different nodes. With the introduction of Dynamic Resource Allocation (DRA) we now have a way to allocate any type of resource with similar semantics.

In this talk, we discuss how DRA’s ability to allocate non-node-local resources has unlocked the potential to read / write remote GPU memory over high-bandwidth, multi-node NVLinks. We begin with an introduction on how DRA models non-node-local resources in general, followed by the specifics of how we have leveraged this capability to enable lightning fast multi-node training and inference on the NVIDIA GB200 NVL72 supercomputer. As part of this, we discuss how this support has been pushed to all major cloud providers and integrated with their managed Kubernetes offerings. We conclude with a demo.

Kevin Klues

Distinguished Engineer at NVIDIA

Berlin, Germany

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.