Efficient Agent Memory Retrieval with Semantic Search

Your AI agent has 8 memory sections about a user: persona, travel and food preferences, work schedule, past trips, loyalty programs, emergency contacts. The user asks "What food do I like and what should I avoid?" The naive approach dumps all 8 into context, so the food preferences end up buried under irrelevant emergency contacts, work schedule, and loyalty miles. Tokens wasted, quality degraded, and worse as memory grows to 20 sections. This talk shows why dump-all wastes 60-98% of tokens, how keyword search improves on it but misses synonyms, and how semantic search uses embedding similarity to find conceptually related memories (top-3 per query). A multi-turn scenario loads different sections per query, backed by Zep (94.8% DMR, 90% less latency), PersonaAgent (+56.1% F1), and HippoRAG 2. You'll leave with working semantic search over core memory using SentenceTransformers, a comparison of dump-all, keyword, and semantic retrieval with real token metrics, and open-source code. Most RAG talks search external documents; this applies the same techniques to what the agent knows about the user.

Outline: • The Memory Overload Problem • Scenario 1: Dump All • Scenario 2: Keyword Search • Scenario 3: Semantic Search Top-3 • Scenario 4: Multi-Turn Retrieval • Decision Framework + Resources

Elizabeth Fuentes Leone

Developer Advocate

San Francisco, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Efficient Agent Memory Retrieval with Semantic Search

Elizabeth Fuentes Leone

Links

Actions