Rama Krishna Raju Samantapudi

Sr. Staff AI/ML Architect at ServiceNow

Austin, Texas, United States

Actions

Rama Samantapudi is a Sr. Staff AI/ML Architect at ServiceNow, specializing in Search, Ranking, Recommendations, Conversational AI, Generative AI, and Agentic AI. With over 13 years of experience across Walmart, Zillow, State Street, and FactSet, Rama has led large-scale AI initiatives that bridge applied research and production systems. His work focuses on building intelligent search, ranking, reasoning and structured extraction models that enhance user experience, automation and decision-making at scale.

Area of Expertise

Information & Communications Technology
Media & Information
Physical & Life Sciences
Region & Country

Topics

Machine Learning & AI
Natural Language Processing (NLP)
Agentic AI
Conversational AI
Generative AI
Document AI
ElasticSearch
Vector Databases & Semantic Search
Personalization & Recommendations
AI search
Agentic AI architecture
Agentic rags
GraphRAG
Graph Data Science
knowledge graph
graph learning
Retrieval-Augmented Generation (RAG)
Large Language Models (LLMs)
AI Agents & Multi-Agent Systems
AI Agentic Workflows
AI & Agentic Systems
Generative & Agentic AI
Copliot Agents
Multi-Agents System
AI Agents
Agentic Systems
Agentic AI Orchestrator
Integrating LLMs into Developer Workflows: From Copilot to Agentic AI
Multi-Modal & Agentic AI
Llm observability
Local LLMs
LLM Inference at Scale
Agentic AI / Autonomous Agents
Vibe Coding vs. Engineering: A Spec-First Approach to Agentic Tooling
Agentic Fraemworks
Designing Production-Ready Agentic AI Systems
Applied Machine Learning
AI & Machine Learning
Machine Learning and AI
Machine Learning Engineering
Machine Learning and Artificial Intelligence
Machine Learning/Artificial Intelligence
Machine Learning
Graph RAG
Graph Neural Networks

Knowledge Distillation: How LLMs train SLMs

Gemma was trained by Gemini. Llama 4's smaller models were trained by their two-trillion-parameter sibling. DeepSeek does something in the same family. The technique is called knowledge distillation.

This session traces distillation's near-20-year history — from compressing thousand-model ensembles onto PDAs in 2006, to Hinton's "dark knowledge" and the teacher–student framing, to the temperature knob you use every day. We'll then look at how Google, Meta, and DeepSeek distill their models today, and why the implementation details — proper distillation vs. behavioral cloning, who owns the teacher, sequential vs. co-training — quietly make or break a model.

Attendees will leave understanding what distillation actually transfers between models, why "soft labels" carry more than answers, and how to tell genuine distillation from mere imitation.

Agentic Governance: Securing Autonomous AI Systems at Enterprise Scale

Building an AI agent demo takes an afternoon. Deploying it safely to production takes months — and that gap is almost entirely a governance problem, not a model problem. This technical deep dive presents a four-pillar framework — Lifecycle Management, Risk Management, Security, and Observability — that gives engineering teams a concrete, implementable path from prototype to production-ready agent system.

The talk covers nine actionable patterns: versioning agents as deployable artifacts with CI/CD and eval gates; securing data access through curated views, column masking, and intentional APIs; assigning dedicated least-privilege service identities; and building governance-grade observability that
can answer, for any request, which version ran, which tools were called, and whether policy was followed.

Attendees leave with a production readiness checklist, a prioritized four-step implementation roadmap, and a framework they can apply immediately to any agent they have in production or in development.

"Smarter, Cheaper AI Agents: Semantic Caching in Production"

AI agents are expensive to scale. A single agentic workflow can involve dozens of LLM calls, and popular reasoning models make every token costly. The classical solution caching breaks down for natural language: no two users phrase the same question identically.

Semantic caching solves this by matching on meaning (embedded as vectors) instead of characters. But getting this right in production requires the right threshold, the right eviction strategy, the right accuracy techniques, and the right query routing.

This talk walks through the full engineering stack: how semantic caches work, how to measure them rigorously, four composable techniques to improve accuracy, how to embed caching inside agentic workflows at the sub-question level, and how Walmart's waLLMartCache achieved ~90% accuracy in production across a multi-tenant, globally scaled deployment.

Rama Krishna Raju Samantapudi

Sr. Staff AI/ML Architect at ServiceNow

Austin, Texas, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Rama Krishna Raju Samantapudi

Actions

Links

Area of Expertise

Topics

Sessions

Knowledge Distillation: How LLMs train SLMs

Agentic Governance: Securing Autonomous AI Systems at Enterprise Scale

"Smarter, Cheaper AI Agents: Semantic Caching in Production"

Rama Krishna Raju Samantapudi

Links

Actions