Real-Time AI on Kubernetes: Streaming to Inference Architecture

This session explores architectural principles for building real-time AI systems on Kubernetes that process streaming data and deliver predictions at scale. We'll examine how to unify streaming infrastructure with ML pipelines while addressing challenges of latency, reliability, and resource efficiency.
Key topics include:

Kubernetes CRDs for declarative streaming management
ML pipeline orchestration patterns with real-time data flows
Resilience strategies: circuit breakers, fallback models, graceful degradation
GPU scheduling and cost optimization models
Multi-tenancy and isolation designs

We'll analyze architectural trade-offs using distributed systems principles like backpressure and flow control. The session provides decision frameworks for technology selection and capacity planning, focusing on timeless patterns rather than specific tools.

Sonika Arora

Lead Member of Technical Staff @ Salesforce

San Francisco, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Real-Time AI on Kubernetes: Streaming to Inference Architecture

Sonika Arora

Links

Actions