Session
The Hidden Cost of AI on Kubernetes (And How to Fix It Before Finance Calls You)
Everyone is deploying AI.
Almost no one is measuring what it truly costs.
AI workloads behave fundamentally differently from traditional microservices — they are GPU-hungry, memory-intensive, bursty, and often poorly autoscaled. On Kubernetes, that translates into idle GPUs, oversized nodes, inefficient bin-packing, runaway inference scaling, and hidden infrastructure waste.
And the scary part?
Most teams don’t even see it.
In this deep-dive session, we’ll explore what actually happens when AI workloads hit production Kubernetes clusters — especially in cloud environments like Azure.
We’ll break down:
• Why traditional autoscaling strategies fail for inference workloads
• GPU scheduling, bin-packing, and resource fragmentation problems
• Cost traps in model serving architectures
• Observability patterns to detect waste before it becomes a budget crisis
• Designing cost-aware AI platforms on Kubernetes
• Practical architecture patterns that balance performance, scalability, and cost
This session is not about hype. It’s about infrastructure reality.
If you are a cloud engineer, architect, or platform builder deploying AI systems — this talk will give you the mental models and architectural patterns to run AI workloads responsibly at scale.
Because scaling AI without cost visibility is not innovation.
It’s liability.
Snigdha Kashyap
Contributing to #Tech as SDE-2 @ExpediaGroup
Gurugram, India
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top