Unlocking how to efficiently, flexibly, manage and schedule seven AI chips in Kubernetes

There are more and more AI accelerator manufacturers emerged in recent years. Data centers often face scenarios where multiple AI accelerators from different vendors exist at the same time, such as Nvidia and AMD, Intel, etc..
Therefore, managing these heterogeneous devices face bigger challenges. The CNCF sandbox project HAMi (Heterogeneous AI Computing Virtualization Middleware) was officially born for this purpose.
This session will focus on efficiently managing heterogeneous AI chips through HAMi in Kubernetes clusters
* A unified scheduler which capable of topology-aware, numa-aware, supports binpack and spread schedule policy on 7 AI accelerators.
* Virtualization on 6 AI accelerators
* Task priority
* Memory oversubscription on k8s GPU tasks
* Observability in two dimensions: allocated resources and real usage
* HAMi+Volcano/Koordinator for collaborative orchestration and scheduling capabilities on AI batch tasks
* HAMi+Kueue for practice in training and inference scenarios

Mengxuan Li

4paradigm

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Unlocking how to efficiently, flexibly, manage and schedule seven AI chips in Kubernetes

Mengxuan Li

Links

Actions