Shiming Zhang
KWOK Co-Founder & Maintainer, Kubernetes SIG-Node Reviewer
Shanghai, China
Actions
Shiming Zhang is a contributor to Kubernetes with the main focus on scalability, performance, reliability, and testing, he gained experience and contributed to many Kubernetes features and most of its components.
KWOK Co-Founder & Maintainer, Kubernetes SIG-Node Reviewer
Links
Area of Expertise
Topics
Deep Dive: KWOK
KWOK makes controller testing as easy and simple as you could hope for. Let's look back at some of the features and discuss what's next on the roadmap.
Enhancing Reliability and Fault-Tolerance Testing in Kubernetes using KWOK
Kubernetes has emerged as a popular platform for running AI workloads with GPUs. As a result, enhancing reliability has become increasingly important. This talk will demonstrate how the popular Kubernetes testing toolkit KWOK has been enhanced for reliability and fault-tolerance testing.
Shiming Zhang, the creator and maintainer of KWOK, and Yuan Chen from NVIDIA, will outline KWOK's capabilities to simulate and manage a large number of virtual nodes and pods on a laptop, and discuss practical use cases at DaoCloud and NVIDIA.
The talk will provide examples and demos, offering a deep dive into KWOK’s latest chaos engineering features, including its ability to simulate failures by introducing targeted fault injections into GPU nodes and pods, thereby facilitating reliability testing, and evaluation of fault-tolerance mechanisms for improving the resilience of AI workloads in Kubernetes.
Attendees will gain practical experience and knowledge about KWOK and its advanced capabilities.
Shiming Zhang
KWOK Co-Founder & Maintainer, Kubernetes SIG-Node Reviewer
Shanghai, China
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top