Session
Columnar Data Storage: A Deep-Dive into Parquet, Delta Lake, Columnstore Indexes, and More!
Analytic data storage on the Microsoft data platform has evolved greatly over the years. From the early days of PowerPivot and SQL Server Analysis Services to the advent of columnstore indexes and the eventual adoption of the parquet format as the de-facto storage standard for analytic data in Azure, a lot has happened in the past fifteen years.
This session is a deep-dive into how columnstore technologies work, including:
• Overview and effectiveness of columnstore storage formats
• Encoding and compression algorithms
• Columnstore indexes in SQL Server
• Parquet file format
• Delta Parquet file format
• Vertipaq (row order) optimization
Understanding how analytic data is stored can allow for optimizations to be made to queries and the decisions made when architecting data structures. These improvements can decrease data size, speed-up analytics performance, and reduce computational overhead, thereby reducing Azure hosting costs.
These technologies will continue to evolve as data grows larger and organizational needs become more complex. Working effectively with these data storage formats will allow for fast querying of large amounts of data, both now and in the future.
Edward Pollack
Data Architect | Microsoft Data Platform MVP
Albany, New York, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top