AI Profiling: A new online fine-grained observability solution for AI workloads on Kubernetes

LLM training and inference are expanding resource demand and footprint in AI Kubernetes clusters, while task failures and performance regressions surge. Beyond fleet-level monitoring, fine-grained, workload-centric observability and operator-level tuning (e.g., CUDA/Torch ops) are required. A container-native AI profiling capability for Kubernetes, built on dynamic injection and eBPF, provides zero-instrumentation, zero-disruption, online, dynamically switchable profiling with ultra-low overhead, capturing end-to-end call chains and communication paths across the stack. Multi-dimensional telemetry—CPU, CUDA kernels, Torch Profiler, system calls, CPython, RDMA networking—is correlated to surface bottlenecks and interference. The approach enables targeted diagnosis and remediation of production issues, illustrated by an LLM inference case. Deployment variants support runc and Kata Containers for multi-tenant, security-hardened, performance-critical clusters.

Zhixin Huo

Alibaba Cloud Intelligence, Senior Software Engineer

Beijing, China

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

AI Profiling: A new online fine-grained observability solution for AI workloads on Kubernetes

Zhixin Huo

Links

Actions