Speaker

Kevin Klues

Kevin Klues

Distinguished Engineer at NVIDIA

Berlin, Germany

Actions

Kevin Klues is a distinguished engineer on the NVIDIA Cloud Native team. Kevin has been involved in the design and implementation of a number of Kubernetes technologies, including the Topology Manager, the Kubernetes stack for Multi-Instance GPUs, and Dynamic Resource Allocation (DRA). When not working, you can usually find Kevin on a snowboard or up in the mountains in some capacity or another.

Which GPU sharing strategy is right for you? A Comprehensive Benchmark Study using DRA

Dynamic Resource Allocation (DRA) is one of the most anticipated features to ever make its way into Kubernetes. It promises to revolutionize the way hardware devices are consumed and shared between workloads. In particular, DRA unlocks the ability to manage heterogeneous GPUs in a unified and configurable manner without the need for awkward solutions shoehorned on top of the existing device plugin API.

In this talk, we use DRA to benchmark various GPU sharing strategies including Multi-Instance GPUs, Multi-Process Service (MPS), and CUDA Time-Slicing. As part of this, we provide guidance on the class of applications that can benefit from each strategy as well as how to combine different strategies in order to achieve optimal performance. The talk concludes with a discussion of potential challenges, future enhancements, and a live demo showcasing the use of each GPU sharing strategy with real-world applications.

Kubernetes WG Device Management - Advancing K8s Support for GPUs

The goal of the recently formed WG Device Management is to enable simple and efficient configuration, sharing, and allocation of accelerators (such as GPUs and TPUs) and other specialized devices. This working group focuses on the APIs, abstractions, and feature designs needed to configure, target, and share the necessary hardware for both batch and serving (inference) workloads.

The current focus of the working group is the Dynamic Resource Allocation (DRA) feature. Come to this talk to learn what we have delivered in Kubernetes 1.31, what is coming in 1.32 and beyond, and how you can influence the roadmap for Kubernetes support of accelerated workloads.

From Vectors to Pods: Integrating AI with Cloud Native

The rise of AI is challenging long-standing assumptions about running cloud native workloads. AI demands hardware accelerators, vast data, efficient scheduling and exceptional scalability. Although Kubernetes remains the de facto choice, feedback from end users and collaboration with researchers and academia are essential to drive innovation, address gaps and integrate AI in cloud native.

This panel features end users, AI infra researchers and leads of the CNCF AI and Kubernetes device management working groups focussed on:

- Expanding beyond LLMs to explore AI for cloud native workload management, memory usage and debugging
- Challenges with scheduling and scaling of AI workloads from the end user perspective
- OSS Projects and innovation in AI and cloud native in the CNCF landscape
- Improving resource utilisation and performance of AI workloads

The next decade of Kubernetes will be shaped by AI. We don’t yet know what this will look like, come join us to discover it together.

From foundation model to hosted AI solution in minutes

AI-driven applications, co-hosted by IONOS and NVIDIA. Discover how IONOS leverages NVIDIA’s cutting-edge hardware to offer robust foundation models, propelling AI capabilities to new heights. Learn about IONOS's Kubernetes as a Service, designed to seamlessly integrate with powerful GPU infrastructure, ensuring optimal performance and scalability for your AI projects.

We will demonstrate the dynamic interaction between these solutions, showcasing real-world examples of how they work together to enhance AI-driven applications. This session will not only delve into current implementations but also explore future directions, providing insights into the potential advancements in AI applications facilitated by GPU integration within Kubernetes environments.

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes

Dynamic Resource Allocation (DRA) is new Kubernetes feature that puts resource scheduling in the hands of 3rd-party developers. It moves away from the limited "countable" interface for requesting access to resources (e.g. "nvidia.com/gpu: 2"), providing an API more akin to that of persistent volumes.

In the context of GPUs, this unlocks a host of new features without the need for awkward solutions shoehorned on top of the existing device plugin API.

These features include:
* Controlled GPU Sharing (both within a pod and across pods)
* Multiple GPU models per node (e.g. T4 and A100)
* Specifying arbitrary constraints for a GPU (min/max memory, device model, etc.)
* Dynamic allocation of Multi-Instance GPUs (MIG)
* … the list goes on ...

In this talk, you will learn about the DRA resource driver we have built for GPUs. We walk through each of the features it provides, including its integration with the NVIDIA GPU Operator. We conclude with a demo of how you can get started today.

DRAcon: demystifying Dynamic Resource Allocation - from myths to facts

At KubeCon NA 2023, dynamic resource allocation (DRA) made headlines because it was mentioned in the keynote. This generated so much buzz that Tim Hockin quipped on social media that it felt like he attended DRAcon instead of KubeCon. At KubeCon EU we’ll demystify this new technology!

DRA is a new approach for describing resource requirements in a Kubernetes cluster. It was first introduced in Kubernetes 1.26 and continues to remain in an alpha state in 1.29.

It offers several advantages compared to existing approaches:
Support for custom hardware can be added by developing and deploying DRA drivers, without having to modify Kubernetes.
Resource parameters are defined by vendors.
Sharing of a resource instance between containers and pods.

In order to move forward to beta and beyond, we need feedback from the community to understand whether it’s ready in its current form, who wants to use it for what, and how we can solve some of the open challenges, like cluster autoscaler support.

Mastering GPU Management in Kubernetes Using the Operator Pattern

Kubernetes is no longer just a tool for running workloads like web applications and microservices, it is the ideal platform for supporting the end-to-end lifecycle of large artificial intelligence (AI) and machine learning (ML) workloads, such as LLMs.

GPUs have become the foundation of this workload shift. However, managing GPUs in a Kubernetes cluster requires full-stack knowledge from the installation of kernel drivers to the setup of container runtimes, device plugins, and a monitoring stack. These activities can be broken down into 4 phases.

Installation of the GPU software stack on a small cluster
Infrastructure build-out by adding more nodes
Lifecycle management, Software Updates
Monitoring and Error recovery

In this talk, we discuss leveraging the operator pattern for the lifecycle management of GPU software in K8s. We demo the NVIDIA GPU Operator to show how the operator pattern can benefit K8s admin from basic driver installation to managing advanced AI/ML use cases.

Running AI Workloads in Containers and Kubernetes

Containers are the best way to run machine learning and AI workloads in the cloud. However, running these workloads efficiently poses unique challenges, from resource management to performance optimization.

In this talk, we dive into the details of how GPUs are made available to such workloads when running with both standalone containers as well as with Kubernetes. As part of this, we discuss various options for sharing GPUs between them. These techniques include simple time-slicing, MPS, and MIG.

By the end of this session, attendees will have a comprehensive understanding of how GPU support in containers and Kubernetes works under the hood, as well as the knowledge required to make the most efficient use of GPUs in their own applications.

ContainerDays Conference 2024 Sessionize Event

September 2024 Hamburg, Germany

WeAreDevelopers World Congress 2024 Sessionize Event

July 2024 Berlin, Germany

Maintainer Track + ContribFest: KubeCon + CloudNativeCon Europe 2024 Sessionize Event

March 2024 Paris, France

KubeCon + CloudNativeCon Europe 2024 Sessionize Event

March 2024 Paris, France

KubeCon + CloudNativeCon North America 2023 Sessionize Event

November 2023 Chicago, Illinois, United States

Kevin Klues

Distinguished Engineer at NVIDIA

Berlin, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top