Speaker

Chen Wang

Chen Wang

IBM, Senior Research Scientist

Chappaqua, New York, United States

Actions

Chen Wang is a Senior Research Scientist at the IBM T.J. Watson Research Center. Her interests lie in Kubernetes, Container Cloud Resource Management, Cloud Native AI & LLM systems, and applying AI in Cloud system management. She is an open-source advocate, a Kubernetes & CNCF contributor, and a KubeCon speaker. She obtained an MS and a Ph.D. in Electrical & Computer Engineering from Carnegie Mellon University (CMU).

Area of Expertise

  • Information & Communications Technology

Topics

  • AI
  • LLMs
  • Kubernetes
  • sustainability
  • Model Serving

Trimaran: Load-Aware Scheduling for Power Efficiency and Performance Stability

If you're experiencing a cluster where some nodes are stubbornly congested and others are not, or some nodes are spiky in their utilization, or some pods are able to burst freely yet others are not, then you may need to use a Trimaran scheduler. In this talk, we will provide an overview of the Trimaran scheduler plugins and demonstrate their utility. Basically, Trimaran plugins are load-aware schedulers which place pods on nodes based on actual measured node resource utilization, while considering requests and limits specifications of resources. Having utilization as an objective helps (1) minimize power consumption by targeting an optimal range of utilization, (2) avoid congestion and interference among multiple containers running on the same node, and (3) lower the risk of over-commitment when containers burst their usage to the specified limits.

Cloud Native Sustainable LLM Inference in Action

Join our tutorial on sustainable Large Language Models (LLM) inference using cloud-native tech. We'll cover LLMs, energy use, and Kepler's role in monitoring power during LLM workloads. Learn about balancing environmental sustainability and tech efficiency, using AI accelerator frequency adjustments in Cloud Native tech for optimized LLM inference. This ensures power efficiency and cost-effectiveness.

Experience a live demo of vLLM, an advanced inference framework, in action. See how we tweak AI accelerator settings in a Kubernetes cluster for ideal power-computation balance.

This tutorial is a must-attend for professionals keen on integrating environmental sustainability with cloud-native technology solutions. Whether you're a developer, an IT specialist, or a sustainability advocate, you'll gain valuable insights into the future of eco-friendly cloud computing. Join us to be at the forefront of this significant technological evolution.

KubeCon + CloudNativeCon Europe 2024 Sessionize Event

March 2024 Paris, France

CNCF-hosted Co-located Events Europe 2024 Sessionize Event

March 2024 Paris, France

Chen Wang

IBM, Senior Research Scientist

Chappaqua, New York, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top