Session
krun - Multi-Node Launcher for Deep Learning Workloads in Kubernetes
krun (Kubernetes-run) is a multi-node application launcher for Kubernetes, designed to run large-scale deep learning jobs. It supports frameworks like PyTorch & MPI, and abstracts the complexities of the underlying infrastructure, eliminating the need for users to construct complex launch commands.
This talk will show how users can effortlessly pass training scripts alongside the launcher executable, easily setting up distributed launch commands and the required environment. A live demo will highlight the launcher's ability to run large language models (LLM). We will also demonstrate performance improvements through the integrated NUMA Binding, optimizing resource allocation for faster training.
krun offers two key advantages: it simplifies the process of launching deep learning workloads and also helps in boosting training performance. These benefits empower researchers and data scientists to focus on their core work, accelerating AI development in Kubernetes environments.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top