Making Agents Practical: Semantic Caching, Memory, and Workflow Acceleration

Agent systems often look impressive in prototypes—but struggle in production due to latency, cost, repetition, and lack of memory. This talk focuses on the infrastructure patterns needed to make agents practical in real workflows.

We’ll explore how semantic caching and agent-aware memory layers can dramatically improve performance, reduce redundant reasoning, and stabilize agent behavior over time. Using a semantic caching proxy as a case study, the session demonstrates how agent requests, decisions, and intermediate results can be reused safely across workflows.

Key topics include:

- Semantic caching strategies for LLM and agent workflows
- Agent memory models: short-term context vs long-term semantic recall
- Accelerating agent pipelines without sacrificing correctness
- Cost, latency, and reliability trade-offs in production agent systems
- Where caching fits into multi-agent and tool-calling architectures

This talk is ideal for developers, platform engineers, and architects who want to move agent systems from experiments to dependable, cost-effective production services.

Alexander Chernov

🤖 Link-Think-Act · Associate Principal Data Engineer @ AstraZeneca · M.Sc. Physics · M.Sc. Information and Communications Engineering

Toronto, Canada

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Making Agents Practical: Semantic Caching, Memory, and Workflow Acceleration

Alexander Chernov

Links

Actions