Speaker

Matt Topol

Matt Topol

Author of "In-Memory Analytics with Apache Arrow" | Staff Software Engineer at Voltron Data

Norwalk, Connecticut, United States

Actions

Hailing from the faraway land of Brentwood, NY and currently residing in the rolling hills of Connecticut, Matt Topol has always been passionate about software. After graduating from Brooklyn Polytechnic (now NYU-Poly), he joined FactSet Research Systems, Inc. in 2009 developing financial software. In the time since, Matt has worked in infrastructure and application development, has lead development teams, and architected large-scale distributed systems for processing analytics on financial data. Matt is a PMC member for the Apache Arrow project, frequently enhancing the Golang library among other enhancements and helping to grow the Arrow Community. Recently, Matt wrote the first and only book on Apache Arrow "In-Memory Analytics with Apache Arrow" and joined Voltron Data in order to work on the Apache Arrow libraries full time and grow the Arrow Golang community.

In his spare time, Matt likes to bash his head against a keyboard, develop/run delightfully demented games of fantasy for his victims--er--friends, and share his knowledge with anyone interested who'll listen to his rants.

Area of Expertise

  • Information & Communications Technology

Topics

  • Open Source Software
  • Enterprise Software
  • golang
  • C++
  • Apache Arrow and Arrow Flight
  • Data Science
  • Analytics and Big Data
  • Analytics
  • Apache Arrow
  • Data Analytics
  • Databases
  • Data Platform
  • All things data
  • Data Engineering
  • open source
  • Apache Iceberg

Where are the Dataframes for Go?

One of the “must have” features lately for data scientists to adopt a particular language or toolset is an intuitive and performant data frame library. Existing data frame libraries for Go seem to be inactive and/or unmaintained. Taking inspiration from the Polars library, I’ve put together a data frame library that is also built upon the Apache Arrow in-memory columnar format for performance and interoperability. Come along for a deep dive into efficiently working with tables of data in Go. With a bit of luck - and open source contributions - we can lay a solid data frame structure on which to build models!

ODBC takes an Arrow to the knee: ADBC

For decades, ODBC/JDBC have been the standard for row-oriented database access. However, modern OLAP systems tend instead to be column-oriented for performance - leading to significant conversion costs when requesting data from database systems. This is where Arrow Database Connectivity comes in!

ADBC is similar to ODBC/JDBC in that it defines a single API which is implemented by drivers to provide access to different databases. The difference being that ADBC's API is defined in terms of the Apache Arrow in-memory columnar format. Applications can code to this standard API much like they would for ODBC or JDBC, but fetch result sets in the Arrow format, avoiding transposition and conversion costs if possible..

This talk will cover goals, use-cases, and examples of using ADBC to communicate with different Data APIs (such as Snowflake, Flight SQL or postgres) with Arrow Native in-memory data.

Gophers are Climbing the Iceberg!

Apache Iceberg has become a popular topic in data management lately. It’s a high-performance format for huge analytic tables and is used by various big data compute engines, including Spark, Trino, Presto, Snowflake, and Dremio. As a hugely important table format for big data, ETL tools need to be able to read from and write to the Iceberg format. Many ETL systems utilize Go, but the only public APIs for Iceberg have been written in Python or Java.

This session introduces the official Iceberg Go library, a pure Go implementation of the Apache Iceberg table format. I cover Iceberg’s current capabilities, roadmap, and goals. Learn how you can easily integrate Iceberg into your current Go workflows.

From Arrow-Native to Accelerator-Native

You may have heard of Apache Arrow, a standardized in-memory format for tabular data, and how it can improve the performance of processing data. Less well-known is that Arrow can also help system developers make the jump from CPU based systems to leveraging accelerated hardware for analytics such as GPUs. Why would you want to do this? Because of something we have termed "The Wall".

Follow along with the description of an holistic "machine" designed to support system-level acceleration from compute to memory, networking, and storage. This talk will connect the dots between adopting Arrow standards and leaping over "The Wall": the difference between how fast machine learning systems can compute and how quickly data processing systems can deliver to them.

Embrace the chaos: Composable data systems with fewer asterisks

Have you ever been asked "how do I get this data I need right now?" and the answer was "it's complicated and it depends"? Welcome to The Bad Data Place! Organizations don't store data in just one place anymore, almost always spreading it across many different locations: on premise databases, multiple clouds, and so many other systems. Dealing with this data sprawl necessitates solutions that can reach across storage layers and databases. Frequently this means dealing with costly migrations, lengthy rewrites, and mazes of glue code. How can we reduce or avoid this? Composability and Standards.

This talk will cover a few essential connectivity standards for designing modular and composable data systems that will make the data sprawl feel smaller to your end users. We'll also cover how you can leverage the ecosystem around Apache Arrow to reduce the number of asterisks you'll have to include when explaining how users can retrieve the data they need.

Apache Arrow and Go: A match made in Data

With Apache Arrow fast becoming a standard for working with data, most people are primarily familiar with the Python, C++ and Java libraries. This talk instead focuses on the Golang implementations of Apache Arrow and Parquet. The concurrency primitives in Go make it ideal for constructing efficient pipelines for parallel processing of large amounts of data.

This talk will cover getting started using the Go Arrow and Parquet libraries and building a simple data pipeline. It will touch on reading/writing CSV and Parquet data using the Go Arrow modules along with why you'd want to use Go in the first place as opposed to other languages/implementations.

ADBC: Arrow Database Connectivity

The Apache Arrow ecosystem lacked standard database interfaces built around using Arrow Data, particularly for efficient fetching of large data (ie. with minimal or no serialization and copying). Without a common API, the end result is a mix of custom protocols (e.g. BigQuery, Snowflake) and adapters (e.g. turbodbc) scattered across languages. ADBC aims to provide a minimal database client API standard, based on Arrow, for C, Go, and Java (with bindings for other languages). Applications can code to this API standard much like they would for JDBC or ODBC, but fetch result sets in the Apache Arrow format.

This talk will cover goals, use-cases, and examples of using ADBC to communicate with different Data APIs (such as Flight SQL or postgres) with Arrow Native in-memory data.

GOing Native with Arrow Flight, Apache Drill and Dremio

While some Cloud Data Lake technologies have great integrations in multiple languages, others are actually a bit difficult to use with Golang as your primary language. This is how and why we pursued Native Golang connectors to Apache Drill and Dremio, as well as a native Golang Apache Arrow Flight server and client implementation. All packages are open source and free for anyone to use!

Atlanta Cloud Conference 2024 Sessionize Event

March 2024 Marietta, Georgia, United States

Matt Topol

Author of "In-Memory Analytics with Apache Arrow" | Staff Software Engineer at Voltron Data

Norwalk, Connecticut, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top