Session
krun - Multi-Node Launcher for Deep Learning Workloads
krun (kubernetes-run) is a multi-node application launcher utility on Kubernetes-based Platforms that supports PYTORCH jobs. It is designed to launch distributed DL workloads across single/multi-nodes and easing the launch process for the user. 
The primary benefits of krun over other existing launchers (mpirun & srun) are the following:
- Removes dependency on mpirun in the container image.
- Provides srun equivalence to allow users to easily migrate jobs between Slurm and Kubernetes based clusters.
- Tight Integration with the PYTORCH framework and ability to extend its capabilities for a platform.
krun enables the PYTORCH based Deep Learning Workloads like LLM to achieve peak performance by performing NUMA Binding on GPUs. The NUMA Binding feature eliminates the need for the user  to know the topology information and requirement of hardware tuning. It provides the users a mechanism to bind their job rank-processes to cpu-cores.  For the BERT workload, NUMA binding provides performance gains of about 0.6% and for SSD the performance improvement is around 2.5%.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top