Session

Unlocking Performance: Topology-Aware CPU Scheduling with a DRA Driver

You want optimal performance from your applications, especially in the areas of AI, Inference, Telco, and HPC. Without proper alignment for CPU, GPU, and NIC, this is not attainable. Unaligned CPUs slow down data transfers and Kubelet does not have a native way to coordinate CPU placement alongside other Dynamic Resource Allocation (DRA) resources, a critical bottleneck for high-performance workloads.
This session explores a CPU management solution using DRA. We'll show how a DRA driver can describe a node's CPU Topology and attributes (e.g., core types). This allows workloads to request CPUs CPUs guaranteed to align with other allocated devices. We will cover the high-level design of CPU allocation by the DRA driver and actuation by the node-level plugin.
This approach offers powerful capabilities but includes challenges like performance overhead on the scheduler and the node. We aim to foster a discussion on the trade-offs, and the future evolution of CPU management in Kubernetes.

Marlow Warnicke (Weston)

Principal Cloud Engineer at SchedMD

Austin, Texas, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top