Rudraksh Karpe
AI Engineer @ ZS
Bengaluru, India
Actions
Rudraksh is an AI Engineer at ZS Associates, where he builds enterprise-grade Generative AI solutions with a strong emphasis on privacy, security, and scalability. He is an active open-source contributor, having participated twice in Google Summer of Code with the openSUSE Project, focusing on AI/ML workloads, containerization, and edge–cloud orchestration.
He has presented internationally at leading conferences, including the openSUSE Conference 2025 in Nuremberg, the Early Adopter Tech Summit in Florida, PyCon US 2025, PyCon Japan 2025, and the openSUSE Asia Summit 2024 in Tokyo, among others. His talks often highlight the intersection of GenAI, open-source innovation, and cloud-native technologies, reflecting his commitment to advancing the global developer community.
Links
Area of Expertise
Topics
Kubernetes as the universal GPU Control Plane for AI workloads
AI workloads are driving huge demand for GPUs and AI accelerators, yet the default Kubernetes model still leans on vendor-specific device plugins, which tie workloads to particular hardware and complicate portability across heterogeneous clusters. In this session, members from the Kubernetes and KAITO projects will present a more unified alternative: coupling HAMi’s device virtualization and unified scheduling abstraction with KAITO’s AI workload automation, transforming Kubernetes into a cross-vendor GPU control plane. Together, they enable cross-vendor accelerator management, reducing lock-in and improving workload portability.
We’ll walk through demos that show how HAMi abstracts device details (splitting, isolation, topology-aware scheduling), while KAITO automates workload lifecycles (model deployment, node provisioning, scaling). Attendees will leave with a practical blueprint for running AI workloads on heterogeneous infrastructure on Kubernetes.
GPU Agnostic AI inference with Ray on Kubernetes
In production AI systems today, 60–70% of inference workloads are tightly coupled to a single GPU vendor or instance type, leading to 30–50% higher infrastructure costs, poor portability, and operational friction when scaling across cloud and on-prem environments. As demand grows, teams face a choice: lock in deeper or redesign for flexibility.
This session presents a GPU-agnostic inference architecture built with Ray on Kubernetes, designed to run reliably across heterogeneous accelerator clusters. By decoupling application logic from hardware assumptions and leveraging Ray’s distributed execution with Kubernetes-native scheduling, teams can scale inference without rewriting pipelines for each GPU type.
Using a production-grade reference architecture, we’ll show how inference traffic flows through Ray Serve, how workloads scale across mixed CPU/GPU nodes, and how concurrency, fault tolerance, and autoscaling are handled under real-world load. We’ll also demonstrate how KubeRay reduces operational overhead by managing Ray clusters through Kubernetes-native lifecycle controls.
Scaling AI Inference Across Heterogeneous GPUs with Ray on Kubernetes
In production AI systems today, 60–70% of inference workloads are tightly coupled to a single GPU vendor or instance type, leading to 30–50% higher infrastructure costs, poor portability, and operational friction when scaling across cloud and on-prem environments. As demand grows, teams face a choice: lock in deeper or redesign for flexibility.
This presentation presents a GPU-agnostic inference architecture built with Ray on Kubernetes, designed to run reliably across heterogeneous accelerator clusters. By decoupling application logic from hardware assumptions and leveraging Ray’s distributed execution with Kubernetes-native scheduling, teams can scale inference without rewriting pipelines for each GPU type.
Using a production-grade reference architecture, we’ll show how inference traffic flows through Ray Serve, how workloads scale across mixed CPU/GPU nodes, and how concurrency, fault tolerance, and autoscaling are handled under real-world load. We’ll also demonstrate how KubeRay reduces operational overhead by managing Ray clusters through Kubernetes-native lifecycle controls.
Persistent AI Memory with OpenSearch: Building Context-Aware Agents that Learn Over Time
AI agents often lose context between interactions, limiting personalization, continuity, and long-term learning. This session introduces persistent agentic memory with OpenSearch, enabling context-aware agents that remember, learn, and improve over time.
We explain how OpenSearch agentic memory provides a durable, searchable memory layer where agents can store and retrieve session history, working context, long-term knowledge, and audit logs. You’ll learn how configurable memory processing extracts facts, preferences, and summaries from interactions, turning raw conversations into lasting intelligence. We also cover namespace design for isolating memory by user, agent, or session to support secure, scalable systems.
In this talk, we’ll demonstrates how to integrate agentic memory with existing AI frameworks using standard REST APIs, allowing seamless connection to LangChain, LangGraph, or custom agent pipelines without vendor lock-in. At the end, we’ll show how memory retrieval directly influences agent decisions and improves personalization over time.
Lightning-Fast Knowledge Graphs in Python: Real-Time Multi-Hop Reasoning with NVIDIA cuGraph
Imagine querying complex knowledge graphs in real time—right from Python—with all the performance of a GPU supercomputer and none of the usual code headaches. This session reveals how NVIDIA cuGraph turbocharges single-hop, multi-hop, and traversal operations on giant knowledge graphs, cutting response times from seconds to milliseconds. We’ll break down how cuGraph’s GPU-accelerated algorithms work seamlessly with popular Python tools and how you can combine cuGraph with deep learning frameworks like PyTorch for ultra-scalable AI and retrieval-augmented generation (RAG) pipelines. Join us for practical demos, hands-on advice, and approachable insights—whether you’re building enterprise reasoning engines, interactive agents, or next-gen graph-powered search. Unlock the full speed of your data, from Python, with just a few lines of code!
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top