Matt Topol

Author of "In-Memory Analytics with Apache Arrow" | Apache Arrow PMC | ASF Member | Iceberg Committer

Norwalk, Connecticut, United States

Actions

Hailing from the faraway land of Brentwood, NY and currently residing in the rolling hills of Connecticut, Matt Topol has always been passionate about software. Matt has worked in infrastructure and application development, has lead development teams, and architected large-scale distributed systems for processing analytics on financial data. Matt is a PMC member for the Apache Arrow project, frequently enhancing the Golang library among other enhancements and helping to grow the Arrow Community. He wrote the first book on Apache Arrow "In-Memory Analytics with Apache Arrow" and spent the last couple years working on the Apache Arrow libraries full time and growing the Arrow Golang community. Matt is now a member of the ASF and also a committer on Apache Iceberg. Most recently, Matt and two colleagues have started the company, Columnar, focusing on data connectivity using Arrow Database Connectivity (ADBC).

In his spare time, Matt likes to bash his head against a keyboard, develop/run delightfully demented games of fantasy for his victims--er--friends, and share his knowledge with anyone interested who'll listen to his rants.

Badges

Area of Expertise

Information & Communications Technology

Topics

Open Source Software
Enterprise Software
golang
C++
Apache Arrow and Arrow Flight
Data Science
Analytics and Big Data
Analytics
Apache Arrow
Data Analytics
Databases
Data Platform
All things data
Data Engineering
open source
Apache Iceberg

Gophers Continuing Up the Iceberg

Looking for a non-JVM solution to interact with Iceberg? Avoid spinning up a Spark cluster and use Iceberg with Go! The Golang implementation of Iceberg is continuing along with plenty of progress over the last year.

This talk will introduce classic use cases showing the beauty of utilizing Iceberg with Go, along with what features have been implemented since last year. We'll also cover the roadmap plan for the project in terms of feature support and plans.

Where are the Dataframes for Go?

One of the “must have” features lately for data scientists to adopt a particular language or toolset is an intuitive and performant data frame library. Existing data frame libraries for Go seem to be inactive and/or unmaintained. Taking inspiration from the Polars library, I’ve put together a data frame library that is also built upon the Apache Arrow in-memory columnar format for performance and interoperability. Come along for a deep dive into efficiently working with tables of data in Go. With a bit of luck - and open source contributions - we can lay a solid data frame structure on which to build models!

ODBC takes an Arrow to the knee: ADBC

For decades, ODBC/JDBC have been the standard for row-oriented database access. However, modern OLAP systems tend instead to be column-oriented for performance - leading to significant conversion costs when requesting data from database systems. This is where Arrow Database Connectivity comes in!

ADBC is similar to ODBC/JDBC in that it defines a single API which is implemented by drivers to provide access to different databases. The difference being that ADBC's API is defined in terms of the Apache Arrow in-memory columnar format. Applications can code to this standard API much like they would for ODBC or JDBC, but fetch result sets in the Arrow format, avoiding transposition and conversion costs if possible..

This talk will cover goals, use-cases, and examples of using ADBC to communicate with different Data APIs (such as Snowflake, Flight SQL or postgres) with Arrow Native in-memory data.

Gophers are Climbing the Iceberg!

Apache Iceberg has become a popular topic in data management lately. It’s a high-performance format for huge analytic tables and is used by various big data compute engines, including Spark, Trino, Presto, Snowflake, and Dremio. As a hugely important table format for big data, ETL tools need to be able to read from and write to the Iceberg format. Many ETL systems utilize Go, but the only public APIs for Iceberg have been written in Python or Java.

This session introduces the official Iceberg Go library, a pure Go implementation of the Apache Iceberg table format. I cover Iceberg’s current capabilities, roadmap, and goals. Learn how you can easily integrate Iceberg into your current Go workflows.

From Arrow-Native to Accelerator-Native

You may have heard of Apache Arrow, a standardized in-memory format for tabular data, and how it can improve the performance of processing data. Less well-known is that Arrow can also help system developers make the jump from CPU based systems to leveraging accelerated hardware for analytics such as GPUs. Why would you want to do this? Because of something we have termed "The Wall".

Follow along with the description of an holistic "machine" designed to support system-level acceleration from compute to memory, networking, and storage. This talk will connect the dots between adopting Arrow standards and leaping over "The Wall": the difference between how fast machine learning systems can compute and how quickly data processing systems can deliver to them.

Embrace the chaos: Composable data systems with fewer asterisks

Have you ever been asked "how do I get this data I need right now?" and the answer was "it's complicated and it depends"? Welcome to The Bad Data Place! Organizations don't store data in just one place anymore, almost always spreading it across many different locations: on premise databases, multiple clouds, and so many other systems. Dealing with this data sprawl necessitates solutions that can reach across storage layers and databases. Frequently this means dealing with costly migrations, lengthy rewrites, and mazes of glue code. How can we reduce or avoid this? Composability and Standards.

This talk will cover a few essential connectivity standards for designing modular and composable data systems that will make the data sprawl feel smaller to your end users. We'll also cover how you can leverage the ecosystem around Apache Arrow to reduce the number of asterisks you'll have to include when explaining how users can retrieve the data they need.

Apache Arrow and Go: A match made in Data

With Apache Arrow fast becoming a standard for working with data, most people are primarily familiar with the Python, C++ and Java libraries. This talk instead focuses on the Golang implementations of Apache Arrow and Parquet. The concurrency primitives in Go make it ideal for constructing efficient pipelines for parallel processing of large amounts of data.

This talk will cover getting started using the Go Arrow and Parquet libraries and building a simple data pipeline. It will touch on reading/writing CSV and Parquet data using the Go Arrow modules along with why you'd want to use Go in the first place as opposed to other languages/implementations.

ADBC: Arrow Database Connectivity

The Apache Arrow ecosystem lacked standard database interfaces built around using Arrow Data, particularly for efficient fetching of large data (ie. with minimal or no serialization and copying). Without a common API, the end result is a mix of custom protocols (e.g. BigQuery, Snowflake) and adapters (e.g. turbodbc) scattered across languages. ADBC aims to provide a minimal database client API standard, based on Arrow, for C, Go, and Java (with bindings for other languages). Applications can code to this API standard much like they would for JDBC or ODBC, but fetch result sets in the Apache Arrow format.

This talk will cover goals, use-cases, and examples of using ADBC to communicate with different Data APIs (such as Flight SQL or postgres) with Arrow Native in-memory data.

GOing Native with Arrow Flight, Apache Drill and Dremio

While some Cloud Data Lake technologies have great integrations in multiple languages, others are actually a bit difficult to use with Golang as your primary language. This is how and why we pursued Native Golang connectors to Apache Drill and Dremio, as well as a native Golang Apache Arrow Flight server and client implementation. All packages are open source and free for anyone to use!

Iceberg Summit 2025 Sessionize Event

April 2025 San Francisco, California, United States

Øredev 2024 Sessionize Event

November 2024 Malmö, Sweden

Community Over Code NA 2024 Sessionize Event

October 2024 Denver, Colorado, United States

Atlanta Cloud Conference 2024 Sessionize Event

March 2024 Marietta, Georgia, United States

Matt Topol

Author of "In-Memory Analytics with Apache Arrow" | Apache Arrow PMC | ASF Member | Iceberg Committer

Norwalk, Connecticut, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Most Active Speaker

Matt Topol

Actions

Links

Badges

Area of Expertise

Topics

Sessions

Gophers Continuing Up the Iceberg

Where are the Dataframes for Go?

ODBC takes an Arrow to the knee: ADBC

Gophers are Climbing the Iceberg!

From Arrow-Native to Accelerator-Native

Embrace the chaos: Composable data systems with fewer asterisks

Apache Arrow and Go: A match made in Data

ADBC: Arrow Database Connectivity

GOing Native with Arrow Flight, Apache Drill and Dremio

Events

Iceberg Summit 2025 Sessionize Event

Øredev 2024 Sessionize Event

Community Over Code NA 2024 Sessionize Event

Atlanta Cloud Conference 2024 Sessionize Event

Matt Topol

Links

Actions