Session

From Chaos to Control: Herding GPU-Hungry Dragons with Kubernetes DRA

Balancing AI training, real-time inference, and bursty batch jobs on a shared accelerator fleet used to feel like herding caffeinated dragons. Kubernetes 1.33’s Dynamic Resource Allocation (DRA) turns that chaos into choreography: Pods state exactly what accelerator slice they need and the scheduler guarantees where it will run, long before a container starts. With the new partitionable-device, prioritized-list, and device-taint gates, platform teams carve GPUs, FPGAs, or Smart-NICs on demand—no nvidia-smi incantations, no “GPU not found” crash loops. On GCP we slashed idle GPU hours by 42 %, shrinking datacenter spend while giving each tenant iron-clad isolation via namespace-scoped claims. Dev namespaces grab bite-size slices for rapid prototyping; prod jobs scale to full-fat allocations using the same YAML, zero redeploys. Observability hooks keep SLO dashboards glowing!! One control plane, two QoS tiers, dragons tamed.

Prashant Ramhit

Mirantis - Snr DevOps & QA

Dubai, United Arab Emirates

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top