Managing LLM Workloads on GPUs with Docker + WASM + GPU

In this talk, I will begin with a brief introduction to WasmEdge, a CNCF Sandbox project, highlighting its seamless integration with existing cloud-native infrastructure such as Kubernetes, Docker, and CRI-O. This integration allows for the deployment, management, and execution of lightweight WebAssembly applications within these environments. My focus will be on how Kubernetes ecosystem tools work with WasmEdge WebAssembly applications.

Next, I will delve into managing LLM (Large Language Model) workloads on GPUs using advanced container tools. We will explore a novel approach that combines Docker, Crun, WasmEdge, and CDI to leverage host GPU devices effectively.

To illustrate the practical application of this new approach, I will present a live demo of running the Llama model using our WASM application.

Yongkang He

Founder @KSUG.AI @KubeSmart.AI | Creator @awstronaut @kubestrong

Singapore

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Managing LLM Workloads on GPUs with Docker + WASM + GPU

Yongkang He

Links

Actions