Building a Fine-Grained and Intelligent Resource Management System on Kubernetes

The resource management capabilities of vanilla K8s are limited: 1. The static resource model leads to low resource utilization due to the tidal nature of online services. 2. Only full GPU requests are allowed, which causes huge GPU waste in AI inference scenarios. 3. The native micro-topology allocation strategy can not meet the performance requirements of workloads such as search, recommendation, and AI training.
In this talk, Wei and He will introduce a resource management system, Katalyst, and its application in ByteDance: 1. Colocate online services and offline jobs to improve resource utilization and ensure their SLOs. 2. Implement GPU-share scheduling, which allows requests of 1% granularity computing power and 1 MiB granularity GPU memory, to improve GPU utilization in AI inference scenarios. 3. Implement topology-aware scheduling and customize a strategy for GPU-RDMA affinity at root complex level, so GPUDirect RDMA can be used to boost training speed in AI training scenarios.

Wei Shao

Senior Software Engineer, ByteDance

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.