Session
How Content Analytics at Spotify leverages dbt to strategically ingest and export data in GCS
aka Avoiding Data Indigestion:
Spotify has been a power GCS user since the early days and as a result, we have built the majority of our data ecosystem leveraging an internal data transformation tool that writes data with sharded partitions. Upon adopting dbt as our team’s primary data transformation tool, we were faced with the challenge of strategically accessing data produced by other teams. In order to do this, we developed an internal package called Waluigi (the opposite of Luigi) with a variety of options to access a specific partition, the most recent partition, or a list or range of partitions. The tables we write out of dbt are all natively partitioned, so as more teams shift from our internal transformation tool to dbt, we had to build similar access strategies to work with natively partitioned tables. Not only is this allowing our team to efficiently and safely access the data we need, but it further empowers other teams to adopt dbt and leverage the data we produce.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top