Session

Optimizing AI/ML Workloads on Kubernetes: Cutting Costs Without Compromising Scale

Unleash the power of distributed AI/ML training on Kubernetes without breaking the bank. Discover open-source gems like Kubernetes cluster autoscalers, DASK, and Volcano that unlock intelligent scheduling, autoscaling, and resource optimization across your cluster. Explore real-world case studies on maximizing GPU utilization, right-sizing resources, and leveraging spot instances to their full potential.

Cut your compute costs by up to 60% while maintaining peak performance. Gain valuable insights into evaluating and adopting cost-effective distributed training frameworks such as Horovod, TensorFlow Distributed, and PyTorch Lightning, tailored for Kubernetes environments. Leave with actionable strategies to optimize your AI/ML pipelines for both scalability and cost-efficiency on any cloud platform.

Sat Agrawal

Senior Principal Software Engineer @ Discover Financial Services

Jacksonville, Florida, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top