Stop Allocating GPUs, Start Delivering Intelligence:An Enterprise Blueprint for AI ROI on Kubernetes

For any enterprise, the high cost and chronic underutilization of GPUs is the single greatest threat to AI ROI. With one-third of cloud GPUs operating at less than 15% capacity, the key is to stop managing hardware and start delivering value.

This session presents a blueprint for transforming siloed GPU infrastructure into a centralized, high-yield "AI platform." We'll show why Kubernetes is the core economic engine that maximizes the return on your most expensive assets, not just a technical orchestrator. Learn how to:

Create a single, fungible GPU fabric to be shared across teams, boosting utilization.
Use intelligent autoscaling to match infrastructure spend to real-time demand.
Leverage vLLM and a new observability framework for distributed inference on kubernetes, llm-d, to manage performance SLOs (e.g., TTFT, TPS), giving you fine-grained control over token economics through tiered service offerings.

This blueprint is for platform engineers, MLOps leaders, and IT decision-makers aiming to justify AI spend and build an efficient foundation for innovation.

Vincent Caldeira

Leading Open Source Technology Innovation for a Sustainable Future

Singapore

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Stop Allocating GPUs, Start Delivering Intelligence:An Enterprise Blueprint for AI ROI on Kubernetes

Vincent Caldeira

Links

Actions