Session
From Kueue to Volcano: A 1st time journey for GPU-based workloads
A camp-fire tale for K8s platform engineers tackling GPU workloads.
🪵 Honeymoon – Kueue powers our clusters. Single-GPU camera jobs sail through; dashboards glow green.
🔥 Plot Twist – LLM training lands. Each job wants eight GPUs together. Queues stall, messages erupt: “Why is nothing finishing?”
🔎 Detective Work – With plain kubectl and the default dashboard we find Kueue granting half the GPUs, pods wait forever, 40 % of GPU cards idle.
🧬 Evolution, not Re-write – We drop in Volcano, turn on gang scheduling, keep the same Job YAML, and the backlog melts.
🎒 Take-away Kit – A quick checklist to know when to stay on Kueue and when to switch, copy-paste PromQL to spot starvation, and a GitHub lab to repeat the migration on a weekend rig.
No vendor tools—just Kubernetes, YAML, and the journey from “it kind of works” to “it really works.”

Michael Forrester
Preparing Tomorrow's Innovators, Elevating the Average
Atlanta, Georgia, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top