Anwar Khan

Production AI Engineering — Agentic AI · MCP · Knowledge RAG · LLM Engineering | Speaker · Author · Mentor

Moline, Illinois, United States

Actions

Anwar Khan is a Senior Staff Software Engineer specializing in production AI systems, including agentic orchestration, retrieval-augmented generation (RAG), Knowledge RAG / GraphRAG, Model Context Protocol (MCP) server design, LLM evaluation harnesses, and the data-readiness and guardrails work required to make enterprise AI reliable in production. Over 16 years, he has led the productionization of agentic AI systems, secure tool-server architecture, and distributed data platforms operating at petabyte scale.

Today he serves at a Fortune 100 industrial technology company, where he designs and operates AI and platform engineering patterns adopted across multiple engineering teams, including production MCP server infrastructure with multi-tenant authorization, LangGraph-based multi-agent orchestration, hybrid retrieval across vector stores and knowledge graphs, and enterprise LLM observability and evaluation frameworks. His earlier infrastructure work delivered ~400× improvement in streaming throughput while reducing processing latency from 24 hours to under 3 minutes.

Anwar writes and speaks on enterprise AI engineering patterns including MCP, agentic orchestration, evaluation harnesses, LLM guardrails, and AI platform engineering. He is available for keynote, breakout, panel, podcast, technical advisory, judging, and peer-review engagements for AI/ML systems conferences and journals. He holds AWS Solutions Architect Professional and AWS DevOps Engineer Professional certifications, has authored 3 peer-reviewed publications, and received Distinguished Performance recognition in 2025. Stack: Python, TypeScript, Databricks, AWS, Kubernetes, Terraform, LangGraph, LangSmith, Datadog LLM Observability.

Badges

Area of Expertise

Information & Communications Technology
Manufacturing & Industrial Materials

Topics

DevOps
RAG
GraphRAG
AWS DevOps
Agentic
Agentic Workflow
Agentic rags
Databricks
aws
MLOps
Artificial Intelligence
Machine Learning
Enterprise AI
Cloud Architecture
Kubernetes
Microservices Architecture
Model Context Protocol (MCP)
Agentic AI
Agentic AI Orchestrator
LangGraph
LangChain
LangSmith
LLLM apps at scale
Llm observability
Vector Database
knowledge graph
Enterprise AI Architecture
Enterprise Agentic Framework
AI Platform
AI Governance
distributed systems

GraphRAG vs Vector RAG: When the Knowledge Graph Pays For Itself

Vector search gets you 70% of the way for many enterprise RAG use cases. The other 30% — multi-hop reasoning, entity disambiguation, semantic precision over compliance language — often needs a knowledge graph alongside the vector store.
This talk compares vector-only RAG, GraphRAG (Microsoft), and hybrid retrieval architectures with concrete evaluation results, query-routing patterns, and the cost model that determines when each approach is right.
Takeaways: A side-by-side comparison of vector-only, GraphRAG, and hybrid approaches. A cost-vs-quality model for picking between them. Reference architectures for each.

Preferred length: 30 min.
Audience: AI engineers, data engineers, knowledge engineers.
Level: Intermediate.
First public delivery: 2026.

From Proof of Concept to Production: Operationalizing Agentic AI

The hardest part of agentic AI is not the demo — it's everything between the demo and a system you can trust on Monday morning. This talk maps the engineering work that takes an agent from working in a notebook to surviving production: evaluation, observability, identity and authorization, retry semantics, and human-in-the-loop integration.
Concrete patterns, failure stories, and a checklist for productionization-readiness.
Takeaways: A production-readiness checklist for agentic systems. Patterns for evaluation, retry, and human-in-the-loop. Common failure modes to watch for in week one.

Preferred length: 45 min (also 30 min).
Audience: AI engineers, engineering managers, platform architects.
Level: Intermediate to advanced.
First public delivery: 2026.

From Chatbot to Production Agent: MCP, Identity, Evaluation, and Observability

Many teams can build an impressive AI agent demo. The harder part is turning it into a production-ready agent that can safely use tools, reason over messy enterprise data, and earn trust from developers, users, and stakeholders.

This session walks through a practical architecture for moving from prototype to production: preparing AI-ready data before model invocation, exposing tools through MCP or API-based integration layers, securing tool access with identity and authorization, adding retrieval and ranking patterns, and instrumenting agent workflows with logging, traces, evaluation, and feedback loops.

Using public-safe enterprise examples from service, dealer, and quality workflows, we will break down the decisions that make agents useful beyond a chat window. Attendees will leave with a blueprint for building agents that are secure, observable, reusable, and ready for real product teams.

RAG Without Data Readiness Is Hallucination Tax: An AI-Readiness Methodology

Most production RAG failures are not retrieval failures — they are data-readiness failures. This talk presents a pre-LLM data-readiness methodology built around labeling, embedding curation, similarity ranking, and noise reduction, with measurable impact on downstream answer quality.
We walk through the methodology, the failure modes it prevents, and how it integrates with hybrid retrieval over vector stores and knowledge graphs.
Takeaways: A reference methodology for pre-LLM data curation. A failure-mode catalog for production RAG. Integration patterns with existing retrieval stacks.

Preferred length: 30 min.
Audience: AI engineers, data engineers, RAG practitioners.
Level: Intermediate.
First public delivery: 2026.
Format: Conference talk, workshop section, podcast.

LLM Observability for Production: An Implementation View

LLM observability is not the same as application observability. This talk shows how to instrument production RAG and agentic systems using Datadog LLM Observability, Galileo, and LangSmith — what each tool catches, what they miss, and how to build behavioral regression testing that surfaces drift before users do.
Takeaways: A comparison framework for choosing an LLM observability stack. Instrumentation patterns that work across RAG and agentic workloads. Drift-detection signals that matter in production.

Preferred length: 30 min.
Audience: AI engineers, SREs, platform engineers.
Level: Intermediate.
First public delivery: 2026.

Hybrid Retrieval: When Knowledge Graphs Earn Their Cost

Vector search alone gets you 70% of the way for many enterprise RAG use cases. The remaining 30% — multi-hop reasoning, entity disambiguation, semantic precision — often requires a knowledge graph alongside the vector store.
This talk shows when hybrid retrieval is worth the engineering and operational cost, with reference architectures, query-routing patterns, and evaluation criteria for picking between vector-only, graph-only, and hybrid approaches.
Takeaways: A decision framework for retrieval architecture. Query-routing patterns for hybrid systems. Evaluation criteria that surface when the graph pays off.

Preferred length: 30 min.
Audience: AI engineers, data architects, RAG practitioners.
Level: Intermediate.
First public delivery: 2026.

Agentic Orchestration Patterns That Don't Burn Tokens: Multi-Agent Design for Real Workloads

Multi-agent systems demo beautifully and bankrupt teams in production. This talk walks through orchestration patterns that survive enterprise cost and reliability constraints: supervisor-worker decomposition, when to use LangGraph vs. CrewAI vs. AutoGen, hierarchical vs. flat coordination, agent memory boundaries, and the eval discipline that prevents agents from looping.
Concrete trade-offs, failure stories, and a decision framework for picking the right orchestration shape for the problem.
Takeaways: A decision framework for orchestration topology. A cost-aware design checklist for multi-agent systems. Observability patterns that catch loops and drift early.

Preferred length: 45 min (also 30 min).
Audience: AI engineers, engineering managers.
Level: Intermediate to advanced.
First public delivery: 2026.

Harness Engineering for Production LLMs: Eval-as-Code From Day One

An evaluation harness is the difference between an LLM application you can change with confidence and one that becomes a write-only system. This talk shows how to build reusable evaluation harnesses with lm-evaluation-harness, RAGAS, DeepEval, and Promptfoo — eval-as-code in CI, behavioral regression suites, faithfulness scorecards, and the operational practices that keep eval coverage growing as the application grows.
Takeaways: A reusable harness pattern that works across RAG and agentic systems. CI integration patterns for LLM evaluation. Operational practices for keeping evals current as the system evolves.

Preferred length: 30 min.
Audience: AI engineers, QA engineers, ML practitioners.
Level: Intermediate.
First public delivery: 2026.

LLM Guardrails for Enterprise Audit: Patterns Beyond Toy Filters

Enterprise LLM applications need guardrails that survive compliance review. This talk covers production guardrail patterns using NeMo Guardrails, Guardrails AI, and Llama Guard; how to handle PII with Microsoft Presidio; prompt-injection defenses that actually work against indirect attacks; and the policy-as-code patterns that connect guardrails to enterprise governance frameworks (EU AI Act, NIST AI RMF).
Takeaways: A guardrail-stack reference. PII engineering patterns for LLM pipelines. Defenses against indirect prompt injection. A policy-as-code approach to AI governance.

Preferred length: 45 min (also 30 min).
Audience: AI engineers, security engineers, AI governance leads.
Level: Intermediate to advanced.
First public delivery: 2026.

AI FinOps: Making Production LLMs Affordable Without Cutting Quality

Production LLM costs spiral quickly. This talk shares the cost-control patterns that work at enterprise scale: model tiering and routing with LiteLLM/Portkey, semantic caching with GPTCache, prompt compression, batch inference where appropriate, eval-driven model downgrades, and the observability discipline (Datadog LLMObs, Langfuse) that surfaces cost regressions before the bill arrives.
Takeaways: A model-routing decision framework. Semantic-caching patterns that actually save money. A cost-observability checklist for production LLMs.

Preferred length: 30 min.
Audience: AI engineers, engineering managers, AI platform leads.
Level: Intermediate.
First public delivery: 2026.

Lessons From Shipping a Production MCP Server: Authorization, Observability, and the Boundaries

The Model Context Protocol promises a clean way to expose enterprise tools to AI agents. In practice, shipping MCP into production reveals the harder problems: multi-tenant authorization that survives audit, observability that explains tool-call failures, and platform-boundary design that keeps the surface area governable.

This talk walks through the patterns used to put an MCP server into production behind real OAuth, real Redis/DynamoDB-backed authorization, and real Datadog observability — with the three biggest mistakes made and how other enterprise teams can avoid them.
Attendees will leave with a concrete reference architecture and a checklist for evaluating MCP rollouts in their own organizations.
Takeaways:
(1) An MCP authorization model that fits enterprise audit.
(2) Observability patterns that catch tool-call regressions.
(3) Platform-boundary heuristics for what to expose vs. keep internal.

Preferred length: 45 min (also available as 30 min breakout).
Audience: AI engineers, engineering managers, platform architects.
Level: Intermediate to advanced.
First public delivery: 2026.
Format: Conference talk, breakout, panel, or podcast.

Anwar Khan

Production AI Engineering — Agentic AI · MCP · Knowledge RAG · LLM Engineering | Speaker · Author · Mentor

Moline, Illinois, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Anwar Khan

Actions

Links

Badges

Area of Expertise

Topics

Sessions

GraphRAG vs Vector RAG: When the Knowledge Graph Pays For Itself

From Proof of Concept to Production: Operationalizing Agentic AI

From Chatbot to Production Agent: MCP, Identity, Evaluation, and Observability

RAG Without Data Readiness Is Hallucination Tax: An AI-Readiness Methodology

LLM Observability for Production: An Implementation View

Hybrid Retrieval: When Knowledge Graphs Earn Their Cost

Agentic Orchestration Patterns That Don't Burn Tokens: Multi-Agent Design for Real Workloads

Harness Engineering for Production LLMs: Eval-as-Code From Day One

LLM Guardrails for Enterprise Audit: Patterns Beyond Toy Filters

AI FinOps: Making Production LLMs Affordable Without Cutting Quality

Lessons From Shipping a Production MCP Server: Authorization, Observability, and the Boundaries

Anwar Khan

Links

Actions