Speaker

Marc Laforet

Marc Laforet

Toronto, Canada

Actions

Area of Expertise

  • Information & Communications Technology

Topics

  • Data Warehousing
  • Big Data
  • Cloud Storage
  • Data Science & AI

Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino

Dataset interoperability between data platform components continues to be a difficult hurdle to overcome. This difficulty often results in siloed data and frustrated users. Although open table formats like Apache Iceberg aim to break down these silos by providing a consistent and scalable table abstraction, migrating your pre-existing data archive to a new format can still be daunting. This talk will outline challenges we faced when rewriting petabytes of Shopify’s data into Iceberg table format using the Trino engine. A rapidly evolving landscape, I will highlight recent contributions to Trino’s Iceberg integration that made our work possible while also illustrating how we designed our system to scale. Topics will include: what to consider when designing your migration strategy, how we optimized Trino’s write performance and how to recover from corrupt table states. Finally, I will compare the query performance of old and migrated datasets using Shopify’s datasets as benchmarks.

Trino Summit 2022 Sessionize Event

November 2022

Marc Laforet

Toronto, Canada

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top