Session

Not All Schedulers Are Created Equal: Why Your ML Jobs Deserve Better Than Default Kube-Scheduler

Kubernetes has become the de facto platform for deploying machine learning (ML) workloads, but its default scheduler was designed for long-running web services—not batch jobs, distributed training, or GPU-heavy tasks. As a result, teams running ML workloads often face poor cluster utilization, job starvation, and costly inefficiencies—all because the scheduler isn’t optimized for their needs.

In this talk, we’ll explore:
1. Why the default kube-scheduler falls short for ML workloads (e.g., no gang scheduling, poor spot instance handling).
2. How specialized schedulers (like Apache Yunikorn, Kube-batch, or Volcano) solve these problems with features like topology-aware placement, fairness queues, and elastic scaling.
3. Real-world case studies of teams that improved job completion times and reduced costs by switching schedulers.
4. A decision framework for choosing the right scheduler based on workload requirements.

Nurudeen Kamilu

Senior Systems Engineer @ Visa | Kubestronaut | Championing Reliable Container Infrastructure

Warsaw, Poland

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top