Session
Covering Indexes in the Data Lake with Hyperspace
At Adobe, we use the Iceberg table format inside the Adobe Experience Platform Data Lake. Although Iceberg offers the capability for file skipping, when the data is properly laid out and used, in many cases this is not enough, and the queries executed over the data take a long time to complete. Similar to an RDBMS use case where high latency on queries can be alleviated with additional indexing at the cost of some extra storage, in data lakes the same pattern can be used. Hyperspace is an early phase indexing subsystem for Apache Spark that introduces the ability for users to build indexes on data, and together with Iceberg it can bring major improvements in query response time – up to 25 times faster in some cases. Hyperspace accommodates our two major data flow use cases – stale datasets and fast-changing datasets – and assures consistency when used.

Andrei Ionescu
Senior Software Engineer, Adobe
Bucharest, Romania
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top