Vector Compression & Memory/Storage Economics: FP16 Vectors, and Matryoshka Embeddings

The single biggest issue with adopting vector search, despite its many benefits, is the cost of running it at scale. As vector dimensions and the datasets needed for them grow, the memory footprint, cache pressure, and storage bandwidth all start to increase. This talk examines vector search from a storage-first and systems engineering perspective, focusing on how representation directly impacts performance, cost, and retrieval quality.

We start by understanding the basics of vector search, then delve into how high-dimensional float32 embeddings lead to a linear increase in storage and memory costs, which are further compounded across indexing, caching, and query execution. Using demos, we demonstrate that reducing vector size is an important design decision that can improve scalability and reduce costs. FP15 vectors can significantly contribute to this by cutting storage in half. We then get into an in-depth examination of various dimensionality strategies. Naive truncation and post‑hoc reduction are compared with more principled approaches that preserve retrieval quality. Reducing vector dimensionality directly removes information from the embedding space, often altering nearest‑neighbor relationships in subtle but impactful ways. The core of the session focuses on Matryoshka embeddings, a compression-based training paradigm that fundamentally changes how engineers think about vector size. We focus on understanding what these embeddings mean, how they are generated, and how useful they are compared to simply going from FP32 to FP16. We then conclude by reframing vector compression as an architectural choice rather than a compression trick. Audience takeaways will include a practical, systems‑level framework for choosing vector representations that balance memory, performance, and retrieval quality—without treating embeddings as immutable black boxes. The learning will help anyone dealing with vector search queries, although demos are specific to SQL Server 2025.

Mala Mahadevan

SQL Server DBA/Database Engineer , ChannelAdvisor Corp, Passionate about community, co lead #TriPASSUG

Raleigh, North Carolina, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Vector Compression & Memory/Storage Economics: FP16 Vectors, and Matryoshka Embeddings

Mala Mahadevan

Links

Actions