Climatik: Cloud Native Sustainable LLM via Power Capping

As GenAI workloads grow, the need for advanced accelerators with higher power consumption is surging. NVIDIA GPU peak power has risen from 300W for V100 to 1000W for B100. However, current power infrastructure and cooling systems are not designed to handle rapid power increases, leading to challenges like limited accelerator deployment in some regions or overheating risks that could cause fire hazards. We propose Climatik, a dynamic power capping system that enables data center and cluster admins and developers to set power caps dynamically at the cluster, service namespace, and rack levels. Climatik leverages Kepler for observability and offers APIs for integration with Kubernetes control knobs, including autoscalers, schedulers, and queuing systems, to ensure power caps are maintained across all levels. We will demo how to use Climatik to configure power capping for a large language model (LLM) inference service on KServe and show how power capping influences KEDA on autoscaling.

Chen Wang

IBM, Senior Research Scientist

Chappaqua, New York, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Climatik: Cloud Native Sustainable LLM via Power Capping

Chen Wang

Links

Actions