Speaker

Kay Yan

Kay Yan

Maintainerof kubespray containerd/nerdctl and LWS(LeaderWorkerSet), Software Engineer in DaoCloud

Shanghai, China

Actions

​​Principal Software Engineer​​ at ​​DaoCloud​​, driving cloud-native innovation with 15+ years of expertise in Kubernetes/CNCF ecosystems, AI/ML workload orchestration. Incubated ​​4 CNCF Sandbox projects​​​​ (Kubean, Spiderpool, HwameiStor, Merbridge) and Kubernetes tools (Kubespray, Containerd/nerdctl, and LWS). Holder of ​​23 Chinese patents​​ and ​​9 U.S. patents. Former ​​Senior Technologist at DELL/EMC​​, awarded ​​EMC Innovation CTO Award​​ for pioneering PaaS architectures.

Area of Expertise

  • Information & Communications Technology

Topics

  • kubernetes

Building Custom GPU Clusters at Scale: Using Kubespray to Create High-Performance AI Infrastructure

Kubespray, recognized by Kubernetes' SIG Cluster Lifecycle, deploys production-ready Kubernetes clusters on bare metal, enhancing performance for AI applications with robust GPU support. This session covers Kubespray's fundamentals, key features, and updates.

As AI workloads like LLMs grow, scalable GPU clusters are essential. Engineers will share insights from deploying custom GPU clusters at scale with Kubespray, discussing challenges and best practices. Attendees will learn to integrate Kubernetes technologies like LWS, Kueue, Gateway API Inference Extension, DRA, and tensor parallelism to enhance AI workloads like RAG and LoRA, improving resource utilization and performance.

We'll share Kubespray's inventory source code to customize AI clusters and use Kubernetes operators to define infrastructure in private clouds, enabling efficient cluster scaling.

AI-Powered Kubernetes Diagnostics with K8sGPT

In this Lightning Talk, we’ll dive into K8sGPT, a CNCF sandbox project that uses AI to enhance Kubernetes management. K8sGPT leverages LLMs to diagnose cluster issues, offering root cause analysis and solutions in simple terms. It encodes SRE expertise into analyzers, extracting key insights and enriching them with AI-powered explanations.
Key highlights:
- Core Features: Learn to use the CLI and K8sGPT Operator for cluster error analysis and contextualized insights.
- AI Integration & Security: Explore integration with AI models like OpenAI, Azure, and Ollama, with data anonymization for security.
- Real-world Demos: See how K8sGPT simplifies Kubernetes troubleshooting.
- Enterprise Strategies: Discover techniques like LoRA and RAG to tailor K8sGPT for specific environments.
Whether you're new to Kubernetes or an expert, K8sGPT can streamline cluster management, reduce troubleshooting time, and boost efficiency.

Kubespray Unleashed: Navigating Bare Metal Services in Kubernetes for LLM and RAG

Kubespray, popular within the SIG-Cluster-Lifecycle of Kubernetes, is celebrated for deploying production-ready Kubernetes clusters, particularly on bare metal, which boosts performance for AI workloads like LLM and RAG. This session will explore using Kubespray in bare metal settings, addressing challenges, and sharing best practices.

The first part of the talk will show Kubespray's key features and provide practical tips. The latter half will focus on swiftly deploying AI using Retrieval-Augmented Generation (RAG), demonstrating how Kubespray facilitates setting up Kubernetes clusters on bare metal. This setup enhances AI applications by integrating continuous knowledge updates and domain-specific information via RAG, improving the accuracy and credibility of the AI systems.

The session will conclude with discussions on community engagement and future advancements, followed by a Q&A period to address participant queries.

How to deploy an AI-optimized k8s cluster with Kubespray

Kubespray is one of the most popular projects in the SIG-Cluster-Lifecycle community of Kubernetes, often used in a bare-metal environment. As AI workloads are rapidly increasing, bare metal can provide superior performance. Therefore, this session will share features and best practices of using Kubespray to build an AI-optimized cluster.
In the first half of the session, we will demo and discuss the most main features of Kubespray, and we'll also share useful tips and best practices from Kubespray.
In the second half of the session, we will highlight enhanced features and share best practices to support AI workloads. This will include insights on GPU support, scheduler enhancement, batch job queuing, RDMA network, DRA driver, GPU monitoring, and more.
Lastly, we aim to delve deeper into community engagement and open a discussion about progressing the project further. We will then allocate a substantial amount of time for questions.

nerdctl: Docker-compatible CLI for containerd

During this session, participants will learn about nerdctl’s compatibility compared to Docker and Podman, along with features that Docker has not yet implemented. These include:
* Lazy-pulling with Stargz/Nydus/OverlayBD
* Peer-to-peer image distribution with IPFS
* Image encryption with OCIcrypt
* Image signing with Cosign
* Slirp-less rootless containers with bypass4netns
* Interactive Dockerfile debugging with buildg

Furthermore, the session will delve into nerdctl’s features, related projects(such as Lima, AWS Finch, Colima, Rancher Desktop, Kind ...), and the envisioned roadmap for its future development. Lastly, we aim to delve deeper into community engagement to contribute to the project.

SIG Cluster Lifecycle: What's new in Kubespray

Kubespray is one of the most versatile Kubernetes-cluster manager, and it benefits an extremely active worldwide community, especially in Asia.

In the first half of the session we will demo and discuss the most recent features such as HA with kube-vip, Manage offline files script for Air-Gap environment, fast image mirror, New OS(Rocky, Kylin, OpenEuler Linux, OpenEuler Linux...) Support, multi-arch cluster, support for Ansible collections, Cluster Hardening, work with the operator and GitOps. And we'll also share useful tips and best practices from Kubespray.

In the second half part, we would like to share some deep-dive about giving voice to the community and open a discussion about how to keep moving the project forward. And then allow a large amount of time for questions.

KubeCon + CloudNativeCon China 2025 Sessionize Event

June 2025 Hong Kong

KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 Sessionize Event

August 2024 Hong Kong

Maintainer Track + ContribFest: KubeCon + CloudNativeCon Europe 2024 Sessionize Event

March 2024 Paris, France

KubeCon + CloudNativeCon + Open Source Summit China 2023 Sessionize Event

September 2023 Shanghai, China

Kay Yan

Maintainerof kubespray containerd/nerdctl and LWS(LeaderWorkerSet), Software Engineer in DaoCloud

Shanghai, China

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top