Speaker

Wei Shao

Wei Shao

Senior Software Engineer, ByteDance

Actions

Wei Shao is a tech lead on the Orchestration & Scheduling team at ByteDance, and a maintainer of Katalyst. Wei has 5+ years of experience in the cloud native area, focusing on resource management in K8s. Wei led the development of Katalyst and the large-scale application of colocation in ByteDance Group's internal business as an end user. Wei has given many talks at tech conferences such as QCon, and has been recognized as a star lecturer.

Kelemetry: Global Control Plane Tracing for Kubernetes

Debugging Kubernetes system issues is complicated: different controllers manipulate objects independently, sometimes triggering changes in other controllers. Unlike traditional RPC-based services, the relationship between components is not explicit; identifying which component causes an issue could be like finding a needle in a haystack. Components expose their own fragmented data, often limited to the lifecycle of a single request and fail to illustrate the bigger picture of asynchronous causal events.
This talk introduces Kelemetry, a global tracing system for the Kubernetes control plane using scattered data sources from audit log, events, informers and component traces. Through several demonstrations of troubleshooting online problems, we will see how Kelemetry reveals the state transition of related objects over a long timespan and reconstructs the causal hierarchy of events to provide intuitive insight into the What, When and Why of everything going on in a Kubernetes system.

KubeGateway: A layer 7 Load Balancer Gateway for Kubernetes API Server

The API Server is the central component of Kubernetes control plane, typically fronted by a Layer 4 load balancer (e.g., nginx) for high availability. However, as most clients connect to API Server using HTTP/2, a Layer 4 load balancer routes all requests from a single client to the same API Server endpoint, leading to unbalanced load distribution - an issue that becomes more pronounced in large-scale Kubernetes deployments.
KubeGateway is a Layer 7 API Gateway designed specifically for Kubernetes API Server. It provides request-level load balancing, supports proxying to multi kubernetes clusters, enables flexible request routing strategy, and offers advanced traffic management capabilities, including global rate limiting, degradation, and circuit breaking. KubeGateway has been deployed in over 500 production Kubernetes clusters, handling more than 1 million QPS of proxied requests in ByteDance. The largest single Kubernetes cluster it supports has scaled up to 20,000 nodes.

Building a Fine-Grained and Intelligent Resource Management System on Kubernetes

The resource management capabilities of vanilla K8s are limited: 1. The static resource model leads to low resource utilization due to the tidal nature of online services. 2. Only full GPU requests are allowed, which causes huge GPU waste in AI inference scenarios. 3. The native micro-topology allocation strategy can not meet the performance requirements of workloads such as search, recommendation, and AI training.
In this talk, Wei and He will introduce a resource management system, Katalyst, and its application in ByteDance: 1. Colocate online services and offline jobs to improve resource utilization and ensure their SLOs. 2. Implement GPU-share scheduling, which allows requests of 1% granularity computing power and 1 MiB granularity GPU memory, to improve GPU utilization in AI inference scenarios. 3. Implement topology-aware scheduling and customize a strategy for GPU-RDMA affinity at root complex level, so GPUDirect RDMA can be used to boost training speed in AI training scenarios.

Wei Shao

Senior Software Engineer, ByteDance

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top