Anshul Jindal
Sr. Solution Architect
Actions
Anshul is a Sr. Solution Architect at NVIDIA's DGX Cloud team, he specializes in assisting customers with deploying their workloads at scale. Anshul has a strong background in SRE and has extensive experience in managing production-grade Kubernetes clusters across various Cloud Service Providers (CSPs). He has received Ph.D. in computer science from TU Munich, graduating summa cum laude.
Links
NVIDIA Expert Session: Your AI, Everywhere: Unifying Infrastructure for Unbounded Innovation
Visit us at booth A-16 to get answers to your most pressing questions from NVIDIA technology experts, and claim a free self-paced DLI course on July 10-11.
Your Next AI Needs 10,000 GPUs. Now What?
As Large Language Models become foundational to modern applications, the complexity of securing and managing the right GPU infrastructure is a critical bottleneck. This session demystifies GPU consumption at scale, from single-GPU setups to vast, multi-node clusters. We will then introduce NVIDIA DGX Cloud Lepton, a revolutionary AI platform and compute marketplace. Discover how Lepton connects developers to a global network of cloud partners, unlocking access to tens of thousands of GPUs to build the next generation of AI.
Key Takeaways:
- GPU and multi-node system architectures for AI.
- Navigating abstraction layers: Bare Metal, VMs, and Kubernetes.
- Unifying access with the DGX Cloud Lepton compute marketplace.
- Achieving location and vendor-agnostic AI deployment.
Accelerating AI Inference at Scale: A Deep Dive Into NVIDIA Dynamo on Kubernetes
As foundation models move toward deeper test-time computation, inference becomes the dominant scaling constraint. Latency, throughput, and cost are governed by a small set of forces: autoregressive decoding, KV-cache growth, memory bandwidth, and scheduling under contention. This workshop frames large-scale inference through these emerging laws of inference, starting from first principles and building toward real systems. Learners deploy NVIDIA Dynamo on Kubernetes to operate aggregated and disaggregated inference architectures using built-in KV-aware routing and scheduling. The outcome is a principled understanding of where inference time and money go — and how architectural choices bend those curves in production. Participants will deploy both aggregated and disaggregated inference on a 4xA100 node and compare the performance of the two.
LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices
As the adoption of LLMs continues to grow, the complexity of fine-tuning, and deploying these models has become a significant bottleneck. Manual processes and fragmented workflows can lead to errors, inconsistencies, and delays that hinder innovation and progress. This hands-on workshop introduces LLMOps, an approach to automating the entire LLM evaluation, and inference lifecycle using a GitOps-based methodology.
Participants will learn how to build an end-to-end automated pipeline leveraging NVIDIA NIMs and Nemo Microservices for fine-tuning, evaluation, and deployment of LLMs. Through practical demonstrations, we will explore how to ensure seamless integration, validation, and deployment of updates, leading to faster development cycles, improved accuracy, and increased reliability.
Key Topics:
- Kubernetes-based LLM Pipelines
- Argo CD for Continuous Delivery
- Argo Workflows for LLM Workflow Automation
- Cloud-Agnostic Deployment
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top