Unlocking the Future of GPU Scheduling in Kubernetes with Reinforcement Learning

Scaling up Multi GPU setup using Kubernetes for large scale ML projects has been a hot topic equally stressed upon among both the AI and cloud community. While Kubernetes is able to providing computing power by scheduling GPU nodes, certain issues like resource fragmentation and low utilization plague the performance and results in cost issues.
Why Reinforcement Learning (RL) in particular one would ask. Unlike the other algorithms, RL shines in its unique ability to continuously adapt to changing environments and efficiently handle Complex and Multi-dimensional Objectives making it particularly suitable for the dynamic and heterogeneous nature of Kubernetes clusters.
In this talk, we shall explore the current landscape of GPU scheduling and some state of the art RL algorithms proposed for scheduling. Their current impact on Kubernetes and the possible use of RLHF shall be dived deep into. We hope that audience gain more insights into these new ways of scheduling GPUs on Kubernetes.

Nikunj Goyal

Member of Technical Staff

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Unlocking the Future of GPU Scheduling in Kubernetes with Reinforcement Learning

Nikunj Goyal

Links

Actions