Session

Sailing Ray workloads with KubeRay and Kueue in Kubernetes

Compute demands for machine learning are growing rapidly nowadays. Ray, a unified computing framework, allows ML engineers to scale their workloads effortlessly without building complex computing infrastructures.

On the other hand, Kubernetes, a popular open-source container orchestration platform, can help to manage a wide range of workloads at ease with KubeRay, an operator for Ray workloads.

At ByteDance, thousands of jobs are submitted to the Ray cluster created by KubeRay daily. With the capability to debug programs on long-running clusters and launch regular jobs through Ray Job custom resources, users benefit from a streamlined workflow.

Meanwhile, efficiently managing concurrent Ray jobs poses challenges such as job starvation and resource allocation. Kueue, a Kubernetes native job queueing system offering capacities like resource management, multi-tenant support, and resource fair-sharing perfectly addresses the Ray job challenges in Kubernetes.

Kante Yin

HivergeAI, Founding Engineer

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top