Analytics for not-so-big data with DuckDB

In the past decade the industry has seen hundreds of new databases. Most of these newcomers are operational databases, meant for online workloads and being a primary datastore for applications. A handful of new databases are meant for analytical use-cases, mainly large scale big data workloads. Which makes DuckDB an interesting exception, because it's built for workloads that are too big for traditional databases, but not so big that they justify complicated big data tools. It's a lightweight, open-source, analytical database for people with gigabytes or single terabytes of data, not companies with hundreds of terabytes and teams of data engineers.

In this session we'll take DuckDB out for a test drive with live demos and discussion of interesting use-cases. We'll see how to use it to quickly run analytical queries on data from multiple data sources. We'll look at how to use DuckDB to transform and manipulate diverse datasets, such as turning a bunch of raw CSV data in S3 into a set of tables in MySQL with a single command. We'll check out its embedded capabilities, by running the database directly inside a Python application. And finally, we'll build a quick-and-dirty Data Lake by using DuckDB, without any complicated big data tools.

I'm not affiliated with DuckDB in any way, I just think it's a cool technology that fills an interesting niche in the data ecosystem and more people should be aware of its potential.

David Ostrovsky

Software Engineer at Meta

Netanya, Israel

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Analytics for not-so-big data with DuckDB

David Ostrovsky

Links

Actions