Speaker

Elizabeth Fuentes Leone

Elizabeth Fuentes Leone

Developer Advocate

Developer Advocate

San Francisco, California, United States

Actions

As Developer Advocate, helping developers build production-ready AI applications. With a background spanning data analytics, machine learning, and developer education, she specializes in making complex AI concepts accessible through hands-on tutorials, open-source projects, and live demos.

She creates practical resources for RAG systems, agentic workflows, and multimodal applications—focusing on code that developers can deploy immediately. As a conference speaker and workshop instructor, she bridges the gap between cutting-edge AI research and real-world implementation.

En mi rol de Especialista en Análisis de Datos y Aprendizaje Automático/Inteligencia Artificial (ML/AI), mi misión es simplificar conceptos complejos, traduciéndolos a un lenguaje accesible para todos. Me dedico a crear soluciones innovadoras que enfrentan de forma eficaz los retos que surgen en el mundo real. A través de mi participación en conferencias y la creación de recursos educativos, busco compartir mis conocimientos y experiencias con el fin de empoderar a los desarrolladores, ayudándoles a expandir sus habilidades y alcanzar sus objetivos profesionales.

Area of Expertise

  • Information & Communications Technology
  • Media & Information

Topics

  • Machine Learning and Artificial Intelligence
  • Machine Learning & AI
  • aws
  • AWS Data
  • Data Science
  • Big Data
  • All things data
  • IoT
  • generative ai
  • LLMs
  • LLM app
  • RAG

Why AI Agents Forget Everything

Your AI agent helps a user pick a premium option. Next interaction, it suggests the cheapest alternative. It forgot everything, preferences, history, instructions. Every conversation starts from zero. Users repeat themselves, get frustrated, and abandon the product. Three types of memory loss that affect every AI agent: 1. Memory decay: The agent forgets preferences between turns and sessions. Tools return results but never store what they learn from user actions. 2. No structured profile: Even with state, the agent has no mechanism to build, update, or query a structured user profile over time. It can't answer "what do I usually prefer?" 3. Memory overload: When memory grows to dozens of sections, dumping everything into context wastes tokens and degrades response quality. Loading 8 memory sections when only 1 is relevant. I will cover persistent state with agent.state and FileSessionManager so preferences survive across sessions, the Core Memory Pattern (MIRIX/MemGPT) that gives the agent tools to manage its own memory (read, write, update, list), semantic retrieval over memory that retrieves only relevant sections per query (60-98% fewer tokens), and how these three progressive patterns build on each other and apply to any agent domain. You'll walk away with: • Persistent state with cross-session survival in any agent framework • Core Memory tools (read/write/update/list) so agents manage their own profiles • Semantic search over memory sections using embeddings • A decision framework for which memory pattern fits which use case • Open-source code adaptable to e-commerce, support, education, healthcare, or any domain Most memory talks focus on RAG or vector databases for external knowledge. This focuses on agent self-memory, how agents remember users across conversations. The patterns are domain-agnostic and apply to any AI agent that needs personalization.

Outline: • Why Agents Forget • Fix 1: Persistent State • Fix 2: Core Memory Pattern • Fix 3: Semantic Memory Retrieval • Decision Framework + Resources

Your Agent Works on Localhost. Now Ship It

Your anti-hallucination demos are impressive. GraphRAG returns precise answers. Semantic tool selection cuts errors. Guardrails block invalid operations. Multi-agent validation catches fabricated data. But they all run on your laptop with hardcoded API keys and in-memory data. Production is a different world. The gap is real. Production needs secure credential storage, scalable databases, semantic tool routing without custom FAISS indexes, business rules that change without redeployment, and observability to know when things go wrong. Most teams spend months bridging this gap. Some never do. I will walk through how 5 anti-hallucination techniques translate from prototype to production, including semantic tool routing via MCP (replacing custom FAISS indexes), database-backed steering rules that let you change agent behavior without redeploying, STEER messages that let agents self-correct instead of hard-failing on rule violations, GraphRAG with a managed graph database using auto-built knowledge graphs from 300 documents, and a live demo covering 8 test scenarios including hallucination attempts, rule violations, and edge cases. You'll walk away with: • Complete production architecture deployable as infrastructure-as-code • Database-backed steering rules pattern (change rules in seconds, no redeploy) • STEER message pattern for agent self-correction instead of hard failure • MCP semantic routing replacing custom vector indexes • Open-source code with serverless infrastructure, database tables, and graph database integration Most production AI talks focus on infrastructure and scaling. This one specifically focuses on keeping anti-hallucination guarantees when you move from prototype to production. You will see real steering rules, real STEER messages, and real test scenarios proving the agent does not hallucinate in production either.

Outline: • The Prototype-to-Production Gap • Semantic Tool Routing via MCP Gateway • Steering Rules in DynamoDB • GraphRAG in Production • Full Production Test • Resources + Q&A

Building Perfect Memory for AI Agents

Your AI agent handles complex workflows with multiple tools. But ask "What do I usually prefer?", blank stare. Tell it "Remember, I only want pet-friendly options", it acknowledges, then forgets by next session. It has no structured way to build a profile of who the user is. The problem: Even agents with persistent state store preferences implicitly, buried in conversation history or tool results. There's no mechanism for the agent to actively manage what it knows. It can't decide "this preference is important, I should save it" or "this user's diet changed, I should update the food section." Research shows this is solvable: MIRIX (2025) achieves SOTA 85.4% on the LOCOMO benchmark with structured memory types. I will cover the Core Memory Pattern (4 tools the agent uses to manage its own memory: read, write, update, list), memory sections inspired by MIRIX's 6 types (persona, preferences, history, instructions), autonomous memory management where the agent decides WHEN to store and WHAT to remember, memory evolution as preferences change over time, and cross-session persistence where memory survives restarts with FileSessionManager. You'll walk away with: • Core memory tool implementations (read, write, update, list) for any agent framework • Memory section design (persona, preferences, history, instructions) for your domain • Cross-session persistence so user profiles survive across sessions • MIRIX and MemGPT research concepts applied to practical, production code • Open-source code adaptable to any agent domain (support, e-commerce, education, healthcare) Most agent talks give the LLM a system prompt with user context. Core Memory flips this: the AGENT decides what to remember, like a human managing a personal notebook. The pattern is domain-agnostic and works in any agent framework with tool support.

Outline: • The Missing Memory • Core Memory Architecture • Scenario 1-2: Empty to Populated • Scenario 3-4: Evolution and Persistence • Production Patterns • Resources + Q&A

Persistent Memory for AI Agents: The Simplest Fix

AI agents treat every interaction as a new user. A user selects a premium option, and the very next turn the agent suggests the cheapest alternative, it forgot everything. This happens because agent tools return results but never store what they learn. Each tool call is isolated. There is no mechanism to capture preferences from user actions. The root cause: Without explicit state management, agents have no short-term or long-term memory. Research on cognitive memory in LLMs (Shan et al., 2025) identifies three memory layers, sensory, short-term, and long-term, but most agent implementations only have sensory (the current context window). The result: MemoryOS research shows a +49% F1 improvement when agents gain hierarchical memory. I will cover why stateless tools cause preference amnesia (tools return results but never store insights), how agent.state creates within-session memory accessible to all tools, how FileSessionManager persists state across sessions so users return days later with preferences intact, the pattern for tools that learn from actions by capturing implicit preferences at execution time, and research validation from MemoryOS (+49% F1, +46% BLEU-1) and Cognitive Memory in LLMs. You'll walk away with: • agent.state implementation for within-session memory in any agent framework • Cross-session persistence with FileSessionManager or equivalent • Tool design that captures implicit preferences from user actions • The three-layer memory model (sensory, short-term, long-term) applied to your agents • Open-source code you can adapt to any domain Most memory talks focus on RAG or external knowledge bases. This focuses on the simplest, most impactful memory fix: making agents remember what users DO, not what they SAY. The technique applies to any agent domain (e-commerce, support, education, healthcare), not the demo scenario alone.

Outline: • The Memory Decay Problem • Fix: agent.state for Within-Session Memory • Fix: FileSessionManager for Cross-Session • How Memory Works in AI Agents • Production Patterns + Resources

Efficient Agent Memory Retrieval with Semantic Search

Your AI agent has accumulated 8 memory sections about a user: persona, travel preferences, food allergies, work schedule, 6 past trips with ratings, loyalty programs, communication style, and emergency contacts. The user asks: "What food do I like and what should I avoid?" The naive approach dumps all 8 sections into context. The agent receives emergency contacts (irrelevant), work schedule (irrelevant), loyalty program miles (irrelevant), and somewhere in that noise, the food preferences it actually needs. Tokens wasted. Response quality degraded. And this gets worse as memory grows. The problem scales: A user with 8 sections today has 20 sections next month. Dumping everything becomes impossible as memory grows. context windows fill up, older memories get pushed out, and the agent starts hallucinating from information overload. I will cover why dump-all memory retrieval wastes 60-98% of tokens on irrelevant data, how keyword search improves on dump-all but misses synonyms and related concepts, how semantic search uses embedding similarity to find conceptually related memories (top-3 per query), multi-turn retrieval where different queries load different memory sections automatically, and research validation from Zep (94.8% DMR, 90% less latency), PersonaAgent (+56.1% F1), and HippoRAG 2. You'll walk away with: • Working semantic search over core memory using SentenceTransformers • Comparison of dump-all vs keyword vs semantic retrieval with real token metrics • Multi-turn pattern where each query retrieves different memory sections • Understanding of when each retrieval strategy makes sense • Open-source code with 8-section user profile and 4 retrieval scenarios Most RAG talks focus on searching external documents. This applies the same semantic search techniques to agent self-memory, searching what the agent knows about YOU. You will see real token counts, real precision differences, and the exact moment where semantic beats keyword.

Outline: • The Memory Overload Problem • Scenario 1: Dump All • Scenario 2: Keyword Search • Scenario 3: Semantic Search Top-3 • Scenario 4: Multi-Turn Retrieval • Decision Framework + Resources

The 424 Error Eating Your MCP Agents

Your AI agent just called an MCP tool that talks to an external API. Fifteen seconds pass. Thirty seconds. Then a cryptic 424 Failed Dependency error kills the entire workflow. Your user sees nothing useful. Your logs show nothing helpful. This is the MCP timeout problem, and it silently kills agent-to-tool integrations. The problem runs deeper than you expect. MCP tools are black boxes to the calling agent, with no visibility into what is happening on the other side. An estimated 43% of MCP tool failures in production trace back to timeout issues. A single slow external API call cascades into full agent workflow failure. Default timeout configurations are rarely appropriate for real-world API latencies. And 300-second hangs consume resources while providing zero user feedback. The async handleId pattern provides a clean solution. start_long_job initiates the operation and returns a job ID immediately (sub-second response). check_job_status allows the agent to poll for results at controlled intervals. The agent remains responsive and can perform other work while waiting. Failed or stalled jobs are detected and handled gracefully with clear error states. Four simulation scenarios cover the full spectrum: fast (2s), slow (15s), unresponsive (300s), and failing APIs. You'll walk away with: • A working FastMCP server implementing the async handleId pattern with job tracking • MCP debugging techniques for identifying timeout root causes in production • Production-ready job tracking with status management and cleanup strategies • your MCP client integration patterns for connecting agents to async MCP tools This talk builds a complete MCP server live on stage, simulating four real-world API behavior scenarios. Instead of theoretical architecture diagrams, you will see actual 424 errors occur and then watch the async pattern eliminate them.

Outline: • The 424 Problem • MCP Server Architecture • Async HandleId Pattern • Production Integration • Advanced Patterns and Wrap-Up

Why AI Agents Loop and How to Stop Them

Your AI travel booking agent just charged your customer's credit card fourteen times for the same flight. The agent called the booking tool, got back a vague response, was not sure if it worked, and tried again. And again. And again. Fourteen tool calls where two would have been enough. Twenty-one seconds of wasted time. Hundreds of thousands of burned tokens. This is the repeated tool call problem, and it is far more common than you think. The root cause is surprisingly simple. Ambiguous tool feedback leaves agents uncertain whether an action succeeded or failed. Without clear completion signals, agents default to retrying, which is a rational response to uncertainty. Studies show an average of 3.2x overcalling when tools return unstructured or unclear responses. The problem compounds in multi-step workflows where each step's ambiguity cascades forward. Three complementary solutions address the problem at different layers. First, DebounceHook maintains a sliding window of recent tool calls, detecting and blocking duplicate invocations before they execute with configurable window size and similarity thresholds. Second, clear SUCCESS/FAILED states redesign tool responses with explicit status indicators that tell the agent unambiguously whether to proceed or retry, using structured response formats with action guidance. Third, LimitToolCounts provides hard ceiling enforcement via the HookProvider that caps the maximum number of calls to any specific tool, acting as a safety net when other approaches miss edge cases.

Outline: • The Token Waste Problem • DebounceHook - Detect and Block Duplicates • Clear SUCCESS/FAILED States - Prevention by Design • LimitToolCounts - Hard Ceiling Enforcement • Production Patterns and Wrap-Up

Multimodal AI Agents with Long-Term Memory

Your chatbot forgets who you are after every conversation. Your video agent cannot remember what it analyzed yesterday. A user asks your multimodal agent to "compare this video with the one I shared last week," and the agent has no idea what video they mean. You have built something intelligent that has no memory, and without memory, intelligence is just computation. The memory problem in multimodal agents is harder than it looks. Text-only agents can summarize conversations into compact strings. But multimodal agents process video frames, image features, audio patterns, and text simultaneously. What should be remembered? The raw video? A description of it? The embeddings? The user's reaction? And how do you retrieve the right memory when a new conversation references something from three sessions ago using natural language? Traditional session stores and database-backed chat histories were not designed for this. In this talk, I will show you: • How to build multimodal agents using an open-source agent SDK, an open-source framework for creating production-ready agent systems (similar patterns apply to LangGraph, AutoGen, or other frameworks) • How to create custom tools for video content analysis that extract structured information from video, images, and audio • How to convert agent tools into MCP servers so they can be shared across agents, teams, and projects without code duplication • How to implement scalable chat memory with a managed vector store that stores multimodal conversation context and retrieves relevant memories using semantic search • A live demo: building a complete multi-agent system where agents share tools via MCP, remember past conversations, and deliver personalized multimodal responses You will walk away with: • A working multi-agent architecture using an open-source agent SDK with custom tools and MCP server integration • Patterns for creating reusable MCP servers from agent tools, so you build once and use across your entire agent flee

Outline: • The Agent That Forgets • Building Multimodal Agents with Strands • Converting Tools to MCP Servers • Scalable Chat Memory with S3 Vectors • The Complete System and Resources

Research Agents That Don't Invent Sources

Your research agent works great in Jupyter. Then it leaks API keys in a stack trace, forgets what it researched two messages ago, and returns three citations, two of which link to pages that do not exist. Your manager asks why the "AI research assistant" is making up sources, and you realize the demo that impressed everyone last month has become a liability. Research agents have a unique set of production challenges that generic agent deployment guides do not address. They interact with multiple external APIs (search engines, academic databases, news services), each requiring different authentication methods. They conduct iterative research where the answer to one query shapes the next, demanding conversation context that most agent frameworks discard between turns. And they must provide source attribution that is not just plausible but verifiable, because a hallucinated citation in a research report destroys trust permanently. In this talk, I will show you: • How API credentials leak through error messages, logs, and agent responses, and how API gateways prevent this by isolating credentials from agent code entirely • How identity management services provide per-user, per-session credentials that expire automatically, eliminating the hardcoded API keys buried in environment variables • How to maintain conversation context across a multi-turn research session so the agent builds on its own findings rather than starting over each turn • How to implement source verification that checks every citation before including it, confirming the URL exists, the content matches the claim, and the source is accessible • A live demo: building a research agent that conducts iterative web research, maintains full conversation context, and returns structured responses where every source is verified You will walk away with: • A security architecture for research agents that handles credentials through API gateways and identity management, not environment variables • Patterns for

Outline: • The Research Agent That Became a Liability • Securing Credentials with API Gateways • Persistent Conversation Context • Source Verification That Actually Works • The Complete Research Agent and Resources

Ship It: From Agent Demo to Production in Minutes

Your agent demo wowed the team. The VP nodded approvingly. "Ship it," they said. Six months later, you are still rewriting it for production. The agent forgets users between sessions. You have no idea why it failed at 3 AM. It cannot handle more than 10 concurrent requests. And last month's bill was four times the estimate. You are not building features anymore; you are rebuilding infrastructure. The prototype-to-production gap is the graveyard of AI projects. It is not that the agent does not work. It works beautifully in a notebook. The problem is everything around it: memory that persists across sessions so users feel recognized, monitoring that tells you what happened without instrumenting every function, infrastructure that scales to 1,000 users without manual intervention, and cost controls that prevent a single runaway conversation from blowing your budget. These are not hard problems individually, but together they take months when they should take minutes. In this talk, I will show you: • How to add cross-session memory to any agent using a managed vector store, so your agent remembers users, their preferences, and past interactions without managing a database • How to implement zero-code monitoring that captures every agent decision, tool call, and token count without modifying your agent logic • How to deploy auto-scaling infrastructure that handles traffic spikes gracefully and scales to zero when idle • How to apply cost optimization patterns that set per-conversation budgets, cache repeated tool calls, and alert before bills surprise you • A live demo: taking a prototype agent from a notebook to a production endpoint with all four capabilities in under 15 minutes You will walk away with: • A production deployment checklist covering memory, monitoring, scaling, and cost, applicable to any agent framework • Working infrastructure-as-code templates you can deploy to your own cloud account • A cost estimation model that predicts monthly spend based on

Outline: • The Six-Month Gap • Cross-Session Memory with S3 Vectors • Zero-Code Monitoring and Observability • Auto-Scaling and Cost Optimization • The Complete Picture and Resources

Reduce AI Agent Costs with Semantic Tool Selection

Your AI agent has 29 tools. On every single call, all 29 tool descriptions get serialized into the context window, whether the user asks about weather or hotel bookings. That is thousands of tokens wasted per query, and the LLM still picks the wrong tool 15% of the time. The dual problem: As agents scale beyond 10-15 tools, two things break simultaneously. First, the LLM struggles to select the correct tool from a crowded context, leading to tool hallucination where it calls tools that do not exist or picks the wrong one. Second, every tool description consumes tokens on every call, inflating costs linearly with tool count. I will cover why tool descriptions are the hidden cost driver in agent architectures, how semantic tool selection uses FAISS + SentenceTransformers to filter tools, three implementation approaches (basic filtering, threshold-based, and hybrid), dynamic tool swapping while preserving conversation memory, and a live comparison of all-tools vs semantic selection on the same queries. You'll walk away with: • Working semantic tool selection implementation with FAISS • Tool registry pattern with embeddings and metadata • Memory preservation across dynamic tool swaps • Token cost calculation and comparison methodology • Open source code for a 29-tool travel agent system Most agent optimization talks focus on prompt engineering or model selection. This addresses the overlooked architectural problem of tool management at scale. You will see exact token counts, error rates, and cost comparisons, not theoretical improvements but measured results from a working system.

Outline: • The Dual Problem • Solution Architecture • Live Implementation • Production Pattern • Advanced Patterns

When RAG Hallucinates Numbers: Graph-RAG for Precise Answers

Your RAG agent seems smart, until you ask it to count something. "How many items match criteria X?" Traditional RAG fabricates: "approximately 45-50." The real answer from your data? 133. Vector similarity can't count, aggregate, or reason across relationships. The fundamental limitation: Traditional RAG retrieves text chunks by similarity, then asks the LLM to synthesize answers. This works for simple lookups but fails systematically on four query types: counting ("how many?"), aggregation ("what's the average?"), multi-hop reasoning ("what's available at the highest-rated?"), and out-of-domain detection ("any results in Antarctica?". RAG fabricates, Graph-RAG correctly says "none"). I will cover why traditional RAG hallucinates on structured queries (the architectural root cause), how Graph-RAG builds knowledge graphs automatically using neo4j-graphrag without manual schema design, the Text2Cypher pattern that converts natural language into precise database queries the LLM cannot fabricate, a side-by-side comparison on identical queries showing RAG fabrication vs Graph-RAG precision, and production implementation patterns with open-source tools. You'll walk away with: • Graph-RAG implementation with Neo4j and auto entity extraction for any document set • Text2Cypher query generation to get precise answers from knowledge graphs • A concrete decision framework for when to use RAG vs Graph-RAG • Hybrid architecture patterns: Graph-RAG for structured queries, RAG for unstructured • Open-source code adaptable to any domain with structured data (product catalogs, FAQs, inventories) Most RAG talks focus on embeddings and retrieval tuning. This addresses RAG's fundamental limitation: statistical hallucinations on structured data. The solution (knowledge graphs + Cypher) is domain-agnostic and applies wherever your documents contain countable, aggregatable, or relationship-rich data.

Outline: • The RAG Hallucination Problem • Graph-RAG Architecture • Live Implementation • Production Patterns • Decision Framework

When RAG Hallucinates Numbers: Graph-RAG for Precise Answers

Your RAG agent seems smart, until you ask it to count something. "How many items match criteria X?" Traditional RAG fabricates: "approximately 45-50." The real answer from your data? 133. Vector similarity can't count, aggregate, or reason across relationships. The fundamental limitation: Traditional RAG retrieves text chunks by similarity, then asks the LLM to synthesize answers. This works for simple lookups but fails systematically on four query types: counting ("how many?"), aggregation ("what's the average?"), multi-hop reasoning ("what's available at the highest-rated?"), and out-of-domain detection ("any results in Antarctica?". RAG fabricates, Graph-RAG correctly says "none"). I will cover why traditional RAG hallucinates on structured queries (the architectural root cause), how Graph-RAG builds knowledge graphs automatically using neo4j-graphrag without manual schema design, the Text2Cypher pattern that converts natural language into precise database queries the LLM cannot fabricate, a side-by-side comparison on identical queries showing RAG fabrication vs Graph-RAG precision, and production implementation patterns with open-source tools. You'll walk away with: • Graph-RAG implementation with Neo4j and auto entity extraction for any document set • Text2Cypher query generation to get precise answers from knowledge graphs • A concrete decision framework for when to use RAG vs Graph-RAG • Hybrid architecture patterns: Graph-RAG for structured queries, RAG for unstructured • Open-source code adaptable to any domain with structured data (product catalogs, FAQs, inventories) Most RAG talks focus on embeddings and retrieval tuning. This addresses RAG's fundamental limitation: statistical hallucinations on structured data. The solution (knowledge graphs + Cypher) is domain-agnostic and applies wherever your documents contain countable, aggregatable, or relationship-rich data.

Outline: • The RAG Hallucination Problem • Graph-RAG Architecture • Live Implementation • Production Patterns • Decision Framework

Reduce AI Agent Costs with Semantic Tool Selection

Your AI agent has 29 tools. On every single call, all 29 tool descriptions get serialized into the context window, whether the user asks about weather or hotel bookings. That is thousands of tokens wasted per query, and the LLM still picks the wrong tool 15% of the time. The dual problem: As agents scale beyond 10-15 tools, two things break simultaneously. First, the LLM struggles to select the correct tool from a crowded context, leading to tool hallucination where it calls tools that do not exist or picks the wrong one. Second, every tool description consumes tokens on every call, inflating costs linearly with tool count. I will cover why tool descriptions are the hidden cost driver in agent architectures, how semantic tool selection uses FAISS + SentenceTransformers to filter tools, three implementation approaches (basic filtering, threshold-based, and hybrid), dynamic tool swapping while preserving conversation memory, and a live comparison of all-tools vs semantic selection on the same queries. You'll walk away with: • Working semantic tool selection implementation with FAISS • Tool registry pattern with embeddings and metadata • Memory preservation across dynamic tool swaps • Token cost calculation and comparison methodology • Open source code for a 29-tool travel agent system Most agent optimization talks focus on prompt engineering or model selection. This addresses the overlooked architectural problem of tool management at scale. You will see exact token counts, error rates, and cost comparisons, not theoretical improvements but measured results from a working system.

Outline: • The Dual Problem • Solution Architecture • Live Implementation • Production Pattern • Advanced Patterns

When Prompts Fail: Enforcing Business Rules in AI Agents

You wrote a tool with a clear docstring: "Maximum 10 guests per booking." Your agent calls it with 15 guests and gets back "SUCCESS." The rule was ignored because prompts and docstrings are suggestions. The LLM processes them as context, not constraints. This is the same problem web developers solved decades ago: never trust user input, validate on the server. For AI agents, the equivalent is never trust the LLM's judgment on business rules, validate at the tool layer. I'll build a guardrail system live using two components. First, rules defined as Python dataclasses: typed, testable, versionable. Each rule specifies which tool it applies to, what parameter to check, and what threshold to enforce. Second, a hook that intercepts every tool call before execution, checks it against the rules, and cancels violations with a clear message the LLM cannot argue with. The demo runs the same three invalid requests through two versions of the same agent. The prompt-only version allows all three violations. The hook-based version blocks all three and tells the LLM exactly why. You'll walk away with: • A hook-based validation pattern that works with any agent framework (about 30 lines of Python) • Rules as dataclasses you can test, version, and deploy independently from the agent • A decision framework for when you need hooks vs when prompts are enough • Understanding of the specific bypass mechanisms LLMs use against prompt-based rules • Open-source code you can adapt to payment validation, compliance checks, rate limiting, or any domain Most guardrail talks focus on content safety: toxicity, PII, prompt injection. This talk is about business logic. The rules your product must never break. The kind of violations that cost money, lose customers, and create legal liability.

Outline: • The Prompt Engineering Failure • Neurosymbolic Architecture • Live Implementation • Production Patterns • Advanced Applications

Catching Hallucinations with Multi-Agent Validation

Your AI agent confirms an operation with full confidence, reference number, details, status. One problem: the data is fabricated. The agent hallucinated the entire result, and your user won't discover it until real-world consequences hit. The fundamental problem: Single AI agents have no mechanism to verify their own outputs. When an LLM generates a plausible-sounding response, there is no internal check distinguishing real data from fabricated data. The agent is equally confident whether the result is real or invented. Research on multi-agent debate (2025) shows this can be solved through cross-validation between specialized agents. I will cover why single agents cannot self-correct hallucinations and why "be accurate" prompts do not help, the Executor to Validator to Critic pattern with three specialized roles for cross-validation, Swarm orchestration where agents hand off autonomously with shared context, how the Validator independently verifies data existence before the Critic approves, and production patterns for integrating multi-agent validation into any agent workflow. You'll walk away with: • The Executor-Validator-Critic pattern implemented in your own agent systems • Swarm orchestration configured for autonomous agent handoffs • Cross-validation pipeline design that catches hallucinations before users see them • A framework for evaluating when multi-agent validation is worth the overhead • Open-source code adaptable to any domain (finance, healthcare, e-commerce, support) Most multi-agent talks focus on task decomposition, splitting work across agents for efficiency. This addresses a fundamentally different problem: using multiple agents for correctness. The Executor-Validator-Critic pattern is specifically designed to catch hallucinations, not distribute work.

Outline: • Single-Agent Hallucination • Multi-Agent Pattern • Live Implementation • Production Patterns • Advanced Applications

Context Engineering: Stop Agents from Choking on Their Own Data

Your AI agent just ingested 214KB of server logs. It looks like it worked. No errors, no warnings. But the response is garbage. The context window silently overflowed, critical data got truncated, and your agent confidently hallucinated an answer based on incomplete information. Context overflow accounts for a significant portion of production agent failures, and it is almost always silent. The problem is bigger than you think. Tool outputs have no size limits by default, so a single API call can return megabytes of data. Context window overflow produces no errors and no warnings, only degraded output quality. An estimated 67% of production agent failures trace back to context management issues. Multi-agent systems multiply the problem as data passes between agents without size controls. The Memory Pointer Pattern changes everything. Store large tool outputs in agent.state via ToolContext instead of returning them directly. Return lightweight 52-byte pointers that reference the stored data. Use invocation_state for shared data access across agents in multi-agent Swarm systems. Implement SlidingWindowConversationManager for automatic conversation history management. Transform 214KB payloads into manageable references without losing any data. You'll walk away with: • A working Memory Pointer implementation using ToolContext and agent.state • Multi-agent state sharing patterns using invocation_state with multi-agent orchestration • Production debugging techniques for identifying silent context overflow • Open-source demo code that processes 145KB+ log files seamlessly across multiple agents This talk does not stop at describing the problem. It provides a complete, production-tested solution with working code. Every pattern shown runs live on stage with real data.

Outline: • The Silent Killer • Memory Pointer Pattern Deep Dive • Multi-Agent State Sharing • Production Patterns • Advanced Techniques and Wrap-Up

When RAG Hallucinates Numbers: Graph-RAG for Precise Answers

Your RAG agent seems smart, until you ask it to count something. "How many items match criteria X?" Traditional RAG fabricates: "approximately 45-50." The real answer from your data? 133. Vector similarity can't count, aggregate, or reason across relationships. The fundamental limitation: Traditional RAG retrieves text chunks by similarity, then asks the LLM to synthesize answers. This works for simple lookups but fails systematically on four query types: counting ("how many?"), aggregation ("what's the average?"), multi-hop reasoning ("what's available at the highest-rated?"), and out-of-domain detection ("any results in Antarctica?". RAG fabricates, Graph-RAG correctly says "none"). I will cover why traditional RAG hallucinates on structured queries (the architectural root cause), how Graph-RAG builds knowledge graphs automatically using neo4j-graphrag without manual schema design, the Text2Cypher pattern that converts natural language into precise database queries the LLM cannot fabricate, a side-by-side comparison on identical queries showing RAG fabrication vs Graph-RAG precision, and production implementation patterns with open-source tools. You'll walk away with: • Graph-RAG implementation with Neo4j and auto entity extraction for any document set • Text2Cypher query generation to get precise answers from knowledge graphs • A concrete decision framework for when to use RAG vs Graph-RAG • Hybrid architecture patterns: Graph-RAG for structured queries, RAG for unstructured • Open-source code adaptable to any domain with structured data (product catalogs, FAQs, inventories) Most RAG talks focus on embeddings and retrieval tuning. This addresses RAG's fundamental limitation: statistical hallucinations on structured data. The solution (knowledge graphs + Cypher) is domain-agnostic and applies wherever your documents contain countable, aggregatable, or relationship-rich data.

Outline: • The RAG Hallucination Problem • Graph-RAG Architecture • Live Implementation • Production Patterns • Decision Framework

Multimodal RAG: Video Search Without the Pipeline

You need to search 500 hours of video for a specific product demo where someone mentions a pricing change while showing a dashboard. With the traditional approach, you would extract frames at fixed intervals, run OCR on each frame, separate the audio track, transcribe it, generate text embeddings for transcripts, generate image embeddings for frames, store everything in separate vector indices, and then orchestrate a query across all six outputs hoping the timestamps align. That is not retrieval; that is suffering. The problem is architectural. Traditional video RAG treats video as a bundle of separate modalities that must be decomposed before they can be searched. Frame extraction loses temporal context. Audio separation loses visual grounding. Separate embedding spaces create alignment nightmares. The orchestration layer becomes the most complex part of your system, and it is also the most fragile. When it breaks (and it will break) you debug across six different tools trying to figure out where the pipeline lost the answer. In this talk, I will show you: • How traditional video RAG pipelines decompose video into frames, audio, and text, and why each decomposition step loses information • How multimodal models understand video natively, preserving temporal relationships between what is shown, said, and displayed on screen • How unified temporal embeddings eliminate the alignment problem that plagues multi-index approaches • How agent-based architectures turn a 200-line orchestration script into a single tool call • A live demo: building a complete video analysis agent that searches, summarizes, and answers questions about video content in minutes You will walk away with: • A working architecture for production video RAG using multimodal models and pgvector • A decision framework for when traditional decomposition still makes sense (hint: edge cases exist) • Performance benchmarks comparing six-tool pipelines versus single-agent approaches on retrieval accurac

Outline: • The 500-Hour Problem • Why Decomposition Fails • The Multimodal Shift • Building the Video Agent • When to Use What and Resources

Your AI Agent Isn't Crashing. It's Bleeding Tokens

Your AI agent does not crash; it gets stuck. It silently produces wrong results when data overflows the context window. It waits forever when an MCP tool calls a slow API. It calls the same tool 14 times because the response said "more results may be available." None of these failures throw errors. They just waste tokens and time. Three silent failures that cost real money. Context overflow: a tool returns 214KB of logs, the context window fills up, and the agent produces incomplete results with no error. MCP tools hanging: an external API takes 15 seconds and the agent gets a cryptic 424 error. Reasoning loops: ambiguous tool feedback causes 14 retries, burning tokens with zero progress. I will cover the Memory Pointer Pattern (store large data outside context, return a pointer, based on IBM Research), async handleId for MCP (return job IDs immediately, poll for results, based on Octopus Research), and DebounceHook with clear SUCCESS states that block duplicate calls (14 calls to 2). Each fix includes a live demo with before/after metrics. You'll walk away with: • Three production-ready patterns you can implement the same day • Working code with real metrics for each fix • Understanding of which failure mode is causing your agent's problems • Open-source repository with all demos Most agent talks focus on capabilities. This focuses on efficiency: what agents waste.

Outline: • Three Silent Failures • Fix 1: Memory Pointer Pattern • Fix 2: Async HandleId for MCP • Fix 3: DebounceHook + Clear States • Decision Matrix + Resources

How to Stop AI Agent Hallucinations: 5 Targeted Fixes

You added input validation. Your agent fabricated a record that does not exist in any database. You added a guardrail hook. Your agent selected the wrong tool and returned made-up data. You added prompt instructions. Your agent bypassed a payment requirement because the LLM decided to "make an exception." Each fix solved one problem and left four others wide open. AI agents do not hallucinate in one way. They hallucinate in five: fabricating data when retrieval returns nothing, selecting wrong tools when descriptions overlap, ignoring business rules the LLM treats as suggestions, failing to adapt when soft constraints are violated, and bypassing financial and legal requirements that must never be overridden. A single guardrail cannot cover all five. Each failure mode requires a different defense. I will walk through five techniques that form a layered system: graph-based retrieval that computes answers from structured data instead of guessing (zero fabrication on knowledge queries), semantic tool routing via protocol-based discovery that replaces brittle keyword matching (correct tool selection without custom vector indexes), database-driven steering rules you update in seconds without redeploying the agent, STEER messages that guide agents to self-correct instead of hard-failing on soft constraint violations (15 guests requested, agent adjusts to 10 and informs the user), and framework-level hooks that block operations the LLM must never bypass regardless of how it reasons.

Outline: • Your AI Agent Hallucinates in 5 Different Ways • Grounded Retrieval with Graph Queries • Semantic Tool Routing • Steering Rules + STEER Messages • Hard Hooks That Cannot Be Bypassed • Full Layered Defense Test • Resources + Q&A

Orlando Code Camp 2026 Sessionize Event Upcoming

April 2026 Sanford, Florida, United States

DeveloperWeek 2026

Master Vibe Coding and Deploy AI Agents to Production

February 2026 San Jose, California, United States

PyLadies San Francisco @ LinkedIn

Have a Conversation with Your Videos: Video Analysis Agents in Python"

January 2026 San Francisco, California, United States

Python Meetup - Extending AI agents: Custom tools and Model Context Protocol

Extending AI agents: Custom tools and Model Context Protocol

November 2025 San Francisco, California, United States

Tech Talk: Moving Agents to production with Strand and Agentcore

Tech Talk: Moving Agents to production with Strand and Agentcore

October 2025 San Francisco, California, United States

DevFest Fresno - Build with AI Sessionize Event

October 2025 Fresno, California, United States

DataWeek 2025 Sessionize Event

September 2025 Santa Clara, California, United States

MCP Dev Day 2005

Tech Talk: Extending AI agents: Custom tools and Model Context Protocol

August 2025 San Francisco, California, United States

AICamp Women in AI 2025

Agentic AI: Designing with Intelligence & Autonomy.
Description: About building AI agents for early-career developers with Strands Agents.

August 2025 Palo Alto, California, United States

Meetup - AWS User Group Ajolotes Ciudad de Mexico

Agentes Multi-Modales con Python: Procesando Imágenes, Videos y Documentos en Pocas Líneas de Código

August 2025 Mexico City, Mexico

Pycon US 2025

Construyendo un Buscador Multimodal: Combinando Texto e Imágenes para una Búsqueda Inteligente.

En el mundo actual basado en datos, procesar y analizar eficientemente grandes volúmenes de datos es crucial para muchas aplicaciones. Exploremos juntos cómo crear y administrar embeddings de texto e imágenes para búsqueda de similitudes en una base de datos PostgreSQL. Nos sumergiremos en un ejemplo práctico utilizando Python para demostrar cómo pueden crear buscadores que empleen lenguaje natural.

May 2025 Pittsburgh, Pennsylvania, United States

AWSome Women Summit Latam 2025 Sessionize Event

March 2025 Lima, Peru

AWS Community Day Chile 2024 Sessionize Event

November 2024 Santiago, Chile

AWS Community Day Argentina 2024 Sessionize Event

September 2024 Buenos Aires, Argentina

KCD Argentina 2024 Sessionize Event

May 2024 Buenos Aires, Argentina

AWS Community Day 2024 Sessionize Event

April 2024 Lima, Peru

Nerdearla Chile 2024 Sessionize Event

April 2024 Santiago, Chile

AWS Women Summit 2024 Argentina Sessionize Event

March 2024 Buenos Aires, Argentina

AWS Community Day Uruguay 2023 Sessionize Event

November 2023 Montevideo, Uruguay

CodeCampSDQ 2023 Sessionize Event

October 2023 Santo Domingo, Dominican Republic

CDK Day 2023 Sessionize Event

September 2023

AWS UG Perú Conf 2023 Sessionize Event

September 2023 Lima, Peru

PyDay Chile 2023 Sessionize Event

June 2023

Elizabeth Fuentes Leone

Developer Advocate

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top