Session
Don't call the LLM for everything! Save money, time, and energy with Vector Similarity Search!
LLMs are powerful—but every call costs real money, adds latency, and uses up energy. Not every task needs a massive model. In this talk, we’ll show how to avoid unnecessary LLM calls and build systems that are faster, cheaper, and simpler to maintain.
We’ll focus on three common tasks: classification, routing, and caching. You’ll see how to handle them using semantic techniques like vector search, embedding similarity, and lightweight rules. No LLM needed. We’ll show how to cache responses based on meaning, classify inputs without tokens, and route queries without delay.
This isn’t about avoiding LLMs—it’s about using them when they matter most. You’ll walk away with patterns that reduce cloud costs, speed up response times, and save developer effort. If you’re building AI-powered systems, this talk will help you get the most out of every call... by making fewer of them.

Raphael De Lio
Developer Advocate @ Redis
Amsterdam, The Netherlands
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top