XingYan Jiang's Speaker Profile @ Sessionize

Fake GPU Is All Your Need

The rise of AI and large language models (LLMs) has driven the rapid adoption of AI applications, which rely heavily on GPUs for their operation. As a result, GPU scheduling, management, monitoring, fault alerts, and performance testing have become essential. While many companies have developed tools to address these needs, such solutions typically depend on real GPUs.

However, GPUs are far scarcer and more expensive than CPUs. Testing these functionalities, while crucial, often results in resource wastage. Moreover, limited access to diverse GPU types and configurations makes it challenging to perform large-scale validation across multi-card, multi-node, and heterogeneous GPU environments.

The ecosystem currently lacks an open-source project for GPU information simulation.
To address this, we introduce Fake-GPU, an open-source project that simulates GPU runtime environments non-intrusively.

Breaking the Mold: Revolutionizing Kubernetes Autoscaling Beyond Pod Resource Requests

Cluster autoscaling is not straightforward. Scaling solely based on pod resource requests (CPU, memory, etc.) often leads to inefficiencies and waste. To solve this, we introduce a fine-grained, cost-aware, and workload-specific autoscaling plugin for Karpenter:
- Cost-Aware Scaling – Dynamically provisions non-preemptible (expensive) and preemptible (cheap) VMs based on workload stability, maximizing cost efficiency. A simple YAML file can define scaling policies for different scenarios.
- Performance-Aware Scaling – Uses a scoring system to assess application performance needs and VM capabilities (CPU, I/O, etc.), scaling VMs to meet the needs while ensuring an optimal balance between performance and cost.
- ARM-Optimized Scaling – Detects ARM-compatible workloads and prioritizes cost-efficient ARM instances.
With this fine-grained, cost-aware, and workload-specific Karpenter plugin, we enable precise scaling and sharp cost reductions without compromising availability compliance.

A Deep Dive into Cilium Gateway API: The Future of Ingress Traffic Routing

In the cloud-native era, the traffic routing and secure access of microservices architecture have gone beyond the traditional Kubernetes Ingress API. Cloud-native solutions provide more flexible, scalable, and secure ways to manage traffic both inside and outside the cluster.
For example, Service Mesh technologies like Istio and Linkerd provide rich traffic management features, including dynamic routing, circuit breaking, retries, timeouts, and more. They also have built-in secure service-to-service authentication and encrypted communication, significantly improving the overall system security.
Additionally, modern API gateways like Cilium can seamlessly integrate with Kubernetes, providing more fine-grained routing rules, load balancing, monitoring, and other functionalities. They can serve as the unified entry point for the cluster, simplifying the management of external access.

Unlocking Cloud-Agnostic Power with Karmada: Seamless Multi-Cluster Kubernetes Orchestration

In the rapidly evolving landscape of cloud-native technologies, achieving true portability across diverse cloud providers remains a critical challenge. Organizations seek solutions that allow them to seamlessly deploy and manage applications across different clouds without being tied to a specific vendor. Karmada (Kubernetes Armada), a powerful tool designed to enable cloud-agnostic deployments for Kubernetes workloads. In this session, we will explore the principles, benefits, and practical implementation strategies behind Karmada’s cloud-agnostic approach.

Heterogeneous AI Computing Virtualization Middleware

Heterogeneous AI Computing Virtualization Middleware (HAMi), formerly known as k8s-vGPU-scheduler, is an "all-in-one" chart designed to manage Heterogeneous AI Computing Devices in a k8s cluster. It includes everything you would expect, such as:

Device sharing: Each task can allocate a portion of a device instead of the entire device, allowing a device to be shared among multiple tasks.

Device Memory Control: Devices can be allocated a specific device memory size (e.g., 3000M) or a percentage of the whole GPU's memory (e.g., 50%), ensuring it does not exceed the specified boundaries.

Device Type Specification: You can specify the type of device to use or avoid for a particular task by setting annotations, such as "nvidia.com/use-gputype" or "nvidia.com/nouse-gputype".

Easy to use: You don't need to modify your task YAML to use our scheduler. All your jobs will be automatically supported after installation. Additionally, you can specify a resource name other than "nvidia.com/gpu" if you prefer.

Break through cluster boundaries to autoscale workloads across them on a large scale

Nowadays, multi-cluster workload deployment and management have become increasingly common. Users usually use HPA for scaling workloads to meet changing demands. However, the current autoscaling is limited to a single cluster even when using multi-clusters. If we can break through the cluster boundaries, there will be some awesome scenarios, including scaling across clusters to extend resources infinitely, scaling up workloads in the local IDC first before the public cloud to save costs, and so on.

To bring the benefits of autoscaling across clusters to users, we designed and implemented two types of multi-cluster HPA: centralized and distributed. They are different and have their own appropriate scenarios.

In this session, Wei and XingYan will go over:
1. The challenges, benefits, and scenarios of autoscaling across clusters.
2. How we implement them in Karmada to solve the challenges.
3. How to select the appropriate type for different scenarios and example demonstrations.

Break through cluster boundaries to autoscale workloads across them on a large scale

Nowadays, multi-cluster workload deployment and management have become increasingly common. Users usually use HPA for scaling workloads to meet changing demands. However, the current autoscaling is limited to a single cluster even when using multi-clusters. If we can break through the cluster boundaries, there will be some awesome scenarios, including scaling across clusters to extend resources infinitely, scaling up workloads in the local IDC first before the public cloud to save costs, and so on.

To bring the benefits of autoscaling across clusters to users, we designed and implemented two types of multi-cluster HPA: centralized and distributed. They are different and have their own appropriate scenarios.

In this session, Wei and XingYan will go over:
1. The challenges, benefits, and scenarios of autoscaling across clusters.
2. How we implement them in Karmada to solve the challenges.
3. How to select the appropriate type for different scenarios and example demonstrations.

Speaker

XingYan Jiang

Actions

Links

Sessions

Fake GPU Is All Your Need

Breaking the Mold: Revolutionizing Kubernetes Autoscaling Beyond Pod Resource Requests

A Deep Dive into Cilium Gateway API: The Future of Ingress Traffic Routing

Unlocking Cloud-Agnostic Power with Karmada: Seamless Multi-Cluster Kubernetes Orchestration

Heterogeneous AI Computing Virtualization Middleware

Break through cluster boundaries to autoscale workloads across them on a large scale

Break through cluster boundaries to autoscale workloads across them on a large scale

XingYan Jiang

Links

Actions