Integrating WasmEdge with Kubernetes for Managing LLM Workloads on GPUs

In this talk, I will introduce WasmEdge as a CNCF Sandbox project and highlight its seamless integration with existing cloud-native infrastructures such as Kubernetes, Podman, and CRI-O. We'll explore how these tools enable the deployment, management, and execution of lightweight WebAssembly applications.

Moving forward, I'll delve into managing LLM (Large Language Model) workloads on GPUs using advanced container tools. Specifically, we'll discuss a cutting-edge approach that combines Podman, Crun, WasmEdge, and CDI to effectively utilize host GPU devices.

To demonstrate the practical application of this approach, I'll conduct a live demo showcasing the deployment and execution of the Llama model using our WASM application.

Yongkang He

Founder @KSUG.AI @KubeSmart.AI | Creator @awstronaut @kubestrong

Singapore

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Integrating WasmEdge with Kubernetes for Managing LLM Workloads on GPUs

Yongkang He

Links

Actions