Speaker

Rama Krishna Raju Samantapudi

Rama Krishna Raju Samantapudi

Sr. Staff AI/ML Architect at ServiceNow

Austin, Texas, United States

Actions

Rama Samantapudi is a Sr. Staff AI/ML Architect at ServiceNow, specializing in Search, Ranking, Recommendations, Conversational AI, Generative AI, and Agentic AI. With over 13 years of experience across Walmart, Zillow, State Street, and FactSet, Rama has led large-scale AI initiatives that bridge applied research and production systems. His work focuses on building intelligent search, ranking, reasoning and structured extraction models that enhance user experience, automation and decision-making at scale.

Area of Expertise

  • Information & Communications Technology
  • Media & Information
  • Physical & Life Sciences
  • Region & Country

Topics

  • Machine Learning & AI
  • Natural Language Processing (NLP)
  • Agentic AI
  • Conversational AI
  • Generative AI
  • Document AI
  • ElasticSearch
  • Vector Databases & Semantic Search
  • Personalization & Recommendations
  • AI search
  • Agentic AI architecture
  • Agentic rags
  • GraphRAG
  • Graph Data Science
  • knowledge graph
  • graph learning
  • Retrieval-Augmented Generation (RAG)
  • Large Language Models (LLMs)
  • AI Agents & Multi-Agent Systems
  • AI Agentic Workflows
  • AI & Agentic Systems
  • Generative & Agentic AI
  • Copliot Agents
  • Multi-Agents System
  • AI Agents
  • Agentic Systems
  • Agentic AI Orchestrator
  • Integrating LLMs into Developer Workflows: From Copilot to Agentic AI
  • Multi-Modal & Agentic AI
  • Llm observability
  • Local LLMs
  • LLM Inference at Scale
  • Agentic AI / Autonomous Agents
  • Vibe Coding vs. Engineering: A Spec-First Approach to Agentic Tooling
  • Agentic Fraemworks
  • Designing Production-Ready Agentic AI Systems
  • Applied Machine Learning
  • AI & Machine Learning
  • Machine Learning and AI
  • Machine Learning Engineering
  • Machine Learning and Artificial Intelligence
  • Machine Learning/Artificial Intelligence
  • Machine Learning
  • Graph RAG
  • Graph Neural Networks

"Smarter, Cheaper AI Agents: Semantic Caching in Production"

AI agents are expensive to scale. A single agentic workflow can involve dozens of LLM calls, and popular reasoning models make every token costly. The classical solution caching breaks down for natural language: no two users phrase the same question identically.

Semantic caching solves this by matching on meaning (embedded as vectors) instead of characters. But getting this right in production requires the right threshold, the right eviction strategy, the right accuracy techniques, and the right query routing.

This talk walks through the full engineering stack: how semantic caches work, how to measure them rigorously, four composable techniques to improve accuracy, how to embed caching inside agentic workflows at the sub-question level, and how Walmart's waLLMartCache achieved ~90% accuracy in production across a multi-tenant, globally scaled deployment.

Rama Krishna Raju Samantapudi

Sr. Staff AI/ML Architect at ServiceNow

Austin, Texas, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top