Session

Multimodal AI Agents with Long-Term Memory

Your chatbot forgets who you are after every conversation. Your video agent cannot remember what it analyzed yesterday. A user asks your multimodal agent to "compare this video with the one I shared last week," and the agent has no idea what video they mean. You have built something intelligent that has no memory, and without memory, intelligence is just computation. The memory problem in multimodal agents is harder than it looks. Text-only agents can summarize conversations into compact strings. But multimodal agents process video frames, image features, audio patterns, and text simultaneously. What should be remembered? The raw video? A description of it? The embeddings? The user's reaction? And how do you retrieve the right memory when a new conversation references something from three sessions ago using natural language? Traditional session stores and database-backed chat histories were not designed for this. In this talk, I will show you: • How to build multimodal agents using an open-source agent SDK, an open-source framework for creating production-ready agent systems (similar patterns apply to LangGraph, AutoGen, or other frameworks) • How to create custom tools for video content analysis that extract structured information from video, images, and audio • How to convert agent tools into MCP servers so they can be shared across agents, teams, and projects without code duplication • How to implement scalable chat memory with a managed vector store that stores multimodal conversation context and retrieves relevant memories using semantic search • A live demo: building a complete multi-agent system where agents share tools via MCP, remember past conversations, and deliver personalized multimodal responses You will walk away with: • A working multi-agent architecture using an open-source agent SDK with custom tools and MCP server integration • Patterns for creating reusable MCP servers from agent tools, so you build once and use across your entire agent flee

Outline: • The Agent That Forgets • Building Multimodal Agents with Strands • Converting Tools to MCP Servers • Scalable Chat Memory with S3 Vectors • The Complete System and Resources

Elizabeth Fuentes Leone

Developer Advocate

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top