WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes

The emergence of Generative AI (GenAI) has introduced new challenges and demands in AI/ML inference, necessitating advanced solutions for efficient serving infrastructures. The recently created Kubernetes Working Group Serving (WG Serving) is dedicated to enhancing serving workload on K8s, especially for hardware-accelerated AI/ML inference. This group prioritizes compute-intensive inference scenarios using specialized accelerators, benefiting various serving workloads such as web services and stateful databases.

This session will dive into WG Serving's initiatives and workstreams. We will spotlight discussions and advancements in each workstream. We are also actively looking for feedback and partnership with model server authors and other practitioners who want to utilize powers of K8s for their serving workloads. Join us to gain insight into our work and learn how to contribute to advancing AI/ML inference on K8s.

Eduardo Arango Gutierrez

Senior Systems Software Engineer @NVIDIA

Landsberg am Lech, Germany

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes

Eduardo Arango Gutierrez

Links

Actions