Session
ASAPQuery: A drop-in sketch-based accelerator for Clickhouse
ClickHouse excels at exact real-time analytics. However, there exist many query workloads where some amount of approximation is tolerable, especially if it provides significant reduction in querying cost and latency.
Our research at Carnegie Mellon University and University of Maryland explores this from first principles -- what if we build a query engine that treats approximation primitives as first-class citizens?
ASAPQuery is an open-source (https://github.com/ProjectASAP/ASAPQuery) drop-in accelerator that sits in front of ClickHouse. It requires no changes to your application or your queries — your existing SQL work as-is. ASAPQuery intercepts queries transparently, routes eligible ones to pre-computed sketch summaries, and falls back to ClickHouse for everything else. ASAPQuery provides Clickhouse users with a new performance-accuracy tradeoff point with minimal effort. Need exact? Use Clickhouse. Okay with approximation? Use ASAPQuery.
Early results show that ASAPQuery can improve query latency compared to Clickhouse by > 2x, while maintaining 99% accurate query results.
ASAPQuery's key technique is the principled use of "sketches".
Sketches are approximate data summaries that can estimate aggregates over large streams of data with very low resources. For instance, a 10KB sketch can be used to measure the percentiles of 100 million data points (~800MB) with > 99% accuracy - 4 orders of magnitude memory reduction!
ASAPQuery is conceptually similar to ClickHouse’s incremental materialized views — sketches are precomputed continuously at ingest time, so queries hit sketches rather than raw data. The key difference is that ASAPQuery does this automatically for sketch-based approximations, without needing query rewrites or manual view maintenance. While sketches aren't a new concept, their use requires expert knowledge. ASAPQuery democratizes the benefits of sketches for users of Clickhouse, by automatically translating SQL queries into sketch-based query plans and executing them.
In this talk, we will provide an intuition on how sketches work, and how ASAPQuery uses them in query execution. We'll show benchmark results across real query workloads, characterize which query classes benefit most, and demonstrate ASAPQuery's drop-in deployment against an existing ClickHouse stack.
Milind Srivastava
PhD student at CMU, working on making your analytics and observability 100x faster and cheaper
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top