Session

Efficient Agent Memory Retrieval with Semantic Search

Your AI agent has accumulated 8 memory sections about a user: persona, travel preferences, food allergies, work schedule, 6 past trips with ratings, loyalty programs, communication style, and emergency contacts. The user asks: "What food do I like and what should I avoid?" The naive approach dumps all 8 sections into context. The agent receives emergency contacts (irrelevant), work schedule (irrelevant), loyalty program miles (irrelevant), and somewhere in that noise, the food preferences it actually needs. Tokens wasted. Response quality degraded. And this gets worse as memory grows. The problem scales: A user with 8 sections today has 20 sections next month. Dumping everything becomes impossible as memory grows. context windows fill up, older memories get pushed out, and the agent starts hallucinating from information overload. I will cover why dump-all memory retrieval wastes 60-98% of tokens on irrelevant data, how keyword search improves on dump-all but misses synonyms and related concepts, how semantic search uses embedding similarity to find conceptually related memories (top-3 per query), multi-turn retrieval where different queries load different memory sections automatically, and research validation from Zep (94.8% DMR, 90% less latency), PersonaAgent (+56.1% F1), and HippoRAG 2. You'll walk away with: • Working semantic search over core memory using SentenceTransformers • Comparison of dump-all vs keyword vs semantic retrieval with real token metrics • Multi-turn pattern where each query retrieves different memory sections • Understanding of when each retrieval strategy makes sense • Open-source code with 8-section user profile and 4 retrieval scenarios Most RAG talks focus on searching external documents. This applies the same semantic search techniques to agent self-memory, searching what the agent knows about YOU. You will see real token counts, real precision differences, and the exact moment where semantic beats keyword.

Outline: • The Memory Overload Problem • Scenario 1: Dump All • Scenario 2: Keyword Search • Scenario 3: Semantic Search Top-3 • Scenario 4: Multi-Turn Retrieval • Decision Framework + Resources

Elizabeth Fuentes Leone

Developer Advocate

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top