Most Active Speaker

Alex Merced

Alex Merced

Co-Author of O'Reilly's "Apache Iceberg: The Definitive Guide"

New York City, New York, United States

Actions

Alex Merced is Head of DevRel for Dremio and co-author of "Apache Iceberg: The definitive guide" from O'reilly and has worked as a developer and instructor for companies like GenEd Systems, Crossfield Digital, CampusGuard and General Assembly.
Alex is passionate about technology and has put out tech content on outlets such as blogs, videos and his podcasts Datanation and Web Dev 101. Alex Merced has contributed a variety of libraries in the Javascript & Python worlds including SencilloDB, CoquitoJS, dremio-simple-query and more.

Badges

  • Most Active Speaker 2024

Area of Expertise

  • Business & Management
  • Information & Communications Technology

Topics

  • Data Engineering
  • Data Lake
  • Data Lakehouse
  • Apache Iceberg
  • Project Nessie
  • Apache Arrow
  • Dremio
  • Data Analytics
  • Data Science
  • Database
  • Data Warehouse
  • Web Development
  • Javascript
  • Rust
  • Go
  • DevOps
  • kubernetes
  • Cloud & DevOps
  • python

IceFrame: A Python Toolkit for Building and Operating Iceberg Data Pipelines

IceFrame is a Python library that wraps Apache Iceberg with a clean, high-level API. It removes boilerplate from common tasks such as creating tables, reading and writing data, evolving schemas, managing partitions, branching, and rolling back snapshots. It also adds features the core Iceberg libraries do not provide, including compaction tools, partition evolution helpers, async operations, and a natural-language agent that can inspect schemas and generate Python code.
This talk shows how IceFrame streamlines daily work for engineers who build or maintain Iceberg pipelines and gives direct examples taken from real scripts and workflows.

Why Open, Community-Driven Projects Become the Standards

This talk explains why technical standards almost always come from open, community-run projects. Companies may release early tools that gain attention, but broad adoption depends on trust, shared ownership, and stable governance. When two open-source projects compete in the same space, the one with a transparent process and a diverse contributor base tends to win. It moves faster, survives leadership changes, and reflects real user needs rather than a single vendor’s roadmap.

We look at patterns from past ecosystems. Projects that rely on one sponsor often stall when priorities shift. Projects that invite many contributors build deeper support. Users adopt them because they want control and long-term safety. Vendors adopt them because they reduce risk and integrate cleanly with other tools. Over time, this aligns everyone around the same project. That project becomes the default, not because it had the best start, but because it created the broadest seat at the table.

How Apache Iceberg is used for Query Acceleration through Dremio’s Reflections

Join us for an insightful exploration into the powerful synergy between Dremio's Reflections and Apache Iceberg, revolutionizing query acceleration in modern data analytics. Reflections, a cutting-edge feature of Dremio, eliminates the complexities of traditional materialized views, BI extracts, and cubes, while seamlessly integrating with Iceberg's robust table format. This talk delves into how Reflections dynamically create materialized datasets from diverse data lake sources, leveraging Iceberg's features like partition transforms for optimal acceleration. We'll discuss the customization options available, showcasing real-world examples of how this integration significantly enhances query performance and simplifies data processing workflows. Don't miss this opportunity to discover how Reflections and Iceberg collaborate to reshape the landscape of data query acceleration.

Exploring the Apache Iceberg Ecosystem

This talk provides a concise overview of the Apache Iceberg ecosystem, a pivotal component in the evolution of open lakehouse architectures. We'll delve into key aspects:

Querying Tools: Learn about efficient tools for querying Apache Iceberg tables, enhancing data querying within the open lakehouse framework.

Cataloging Vendors: Discover vendors offering solutions for cataloging and managing Apache Iceberg tables, crucial for maintaining metadata and data organization.

Unique Use Cases: Explore innovative uses of Apache Iceberg tables in various products and technologies, showcasing their adaptability.

Open-Source Projects: Uncover valuable open-source projects that complement Apache Iceberg, expanding its functionality and adoption.

Join us to gain insights into this essential ecosystem, whether you're a data engineer, analyst, or architect, and harness Apache Iceberg's potential in your lakehouse journey.

The Anatomy of a Data Lakehouse

Alex Merced discusses the value proposition of a data lakehouse and the components of building a successful data lakehouse. We'll discuss the role of each component in delivering cost savings through increased performance and reduced storage and the options within each category.
- file format
- table format
- query engines
- semantic layer

Apache Iceberg: An Architectural Look Under the Covers

Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format, released by Facebook in 2009 that addresses some of these problems, but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.

Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.

Watch Alex Merced, Developer Advocate at Dremio, for this webinar to learn the architectural details of why the Hive table format falls short and why the Iceberg table format resolves them, as well as the benefits that stem from Iceberg’s approach.

You will learn:

The issues that arise when using the Hive table format at scale, and why we need a new table format
How a straightforward, elegant change in table format structure has enormous positive effects
The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
The resulting benefits of this architectural design

Open Source and the Data Lakehouse

The open data lakehouse offers those frustrated with the costs and complex pipelines of using traditional warehouses an alternative that offers performance with affordability and simpler pipelines. In this talk, we'll be talking about technologies that are making the open data lakehouse possible.

In this talk we will learn:

What is a data lakehouse
What are the components of a data lakehouse
What is Apache Arrow
What is Apache Iceberg
What is Project Nessie

Apache Arrow Flight SQL: a universal standard for high-performance data transfers from databases

This talk covers why ODBC & JDBC don’t cut it in today’s data world and the problems solved by Arrow, Arrow Flight, and Arrow Flight SQL. We’ll go through how each of these building blocks works as well as an overview of universal ODBC & JDBC drivers built on Arrow Flight SQL, enabling clients to take advantage of this increased performance with zero application changes

Lakehouse Catalogs 101 - Governing and Transporting your Iceberg, Delta and Hudi tables

Join Senior Technical Evangelist for Dremio, Alex Merced, as he explores one of the most critical frontiers in the lakehouse ecosystem: catalogs. As the industry embraces the lakehouse paradigm and the variety of table formats like Iceberg, Hudi, and Delta, the next key challenge is understanding the role of lakehouse catalogs. These catalogs govern and track your lakehouse assets, providing essential metadata and ensuring smooth management across different computing engines. In this talk, Alex will demystify leading catalog solutions such as Apache Polaris (incubating), Nessie, Unity Catalog, Gravitino, Dremio Catalog, and AWS Glue, and guide you through navigating this evolving landscape to effectively manage your lakehouse.

Apache Iceberg, Agentic AI and Data Integration

The ability to integrate and unify diverse data sources is more critical than ever in the world of Agentic AI. Apache Iceberg provides a powerful foundation for building a central source of truth, ensuring reliable, scalable, and efficient data management. When combined with Dremio’s ability to integrate Iceberg with a long tail of disparate data sources, organizations can seamlessly deliver high-quality, well-prepared data to Agentic AI tools.

We’ll explore how Apache Iceberg enables consistent and performant data access, how Dremio unifies Iceberg with other data in Data Lakes/Databases/Data Warehouses, and how these capabilities empower AI frameworks like LangChain to drive intelligent automation. Through a hands-on example, we’ll demonstrate how to query and transform data stored in Iceberg, making it AI-ready and unlocking the full potential of Agentic AI applications. If you're looking to build a modern data architecture that fuels AI with governed, high-quality data, this talk is for you.

The Who, What, and Why of Data Lake Table Formats

Data lake table formats are a critical component of modern data analytics. They provide a way to organize and manage data in a data lake, and they offer several benefits for business analytics, including:

- Scalability: Data lake table formats can scale to handle large amounts of data.
Performance: Data lake table formats can improve the performance of queries on large datasets.

- Durability: Data lake table formats can ensure that data is durable and recoverable.

- Auditability: Data lake table formats can help to ensure that data is auditable and compliant.

This presentation will explore the who, what, and why of data lake table formats. We will discuss the different data lake table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake. We will also discuss the benefits of using data lake table formats for business analytics.

By the end of this presentation, you will better understand data lake table formats and how they can be used to improve business analytics.

Key takeaways:
- Data lake table formats are a critical component of modern data analytics.

- They offer a number of benefits for business analytics, including scalability, performance, durability, and auditability.

- There are a variety of data lake table formats available, including Apache Iceberg, Apache Hudi, and Delta Lake.

A comprehensive exploration of the intricacies of Data Lake Table Formats and their impact on business analytics.

Optimizing Data: Partitioning, Sorting, Compaction, Row Group Sizing, and more

Data optimization is a critical process for improving the performance and efficiency of data-driven applications. Several techniques can be used to optimize data, including partitioning, sorting, compaction, and row group sizing.

In this presentation, we will explore the myriad ways of optimizing data. We will discuss the different techniques available and the benefits and drawbacks of each technique. We will also provide practical advice on choosing the right optimization techniques for your needs.

By the end of this presentation, you will better understand data optimization and how it can be used to improve the performance and efficiency of your data-driven applications.

Key takeaways:

- Data optimization is a critical process for improving the performance and efficiency of data-driven applications.

- Several techniques can be used to optimize data, including partitioning, sorting, compaction, and row group sizing.

- The best optimization techniques for a particular dataset will depend on the application's specific requirements.

Data optimization can be a complex process, but it can be well worth the effort in improving performance and efficiency.

Materialized Views vs Dremio Data Reflections

A comparative study between Materialized Views and Dremio Data Reflections, highlighting their applications and benefits.

Materialized views and Dremio data reflections are both techniques for pre-computing data queries to improve the performance of subsequent queries. However, there are some critical differences between the two techniques.
Materialized views are a traditional database concept, and they are typically implemented as tables populated with the results of pre-computed queries.

Dremio data reflections on the other hand, are Apache Iceberg representations of a view of the raw data or aggregations that can apply custom sorting, partitioning, and other optimizations.

We will discuss the challenges of Materialized views and how Data Reflections meet these challenges with a more flexible and robust approach unique to the Dremio Data Lakehouse platform.

This presentation will also explore the different types of Reflections:

- Raw reflections: These reflections consist of all of the rows and one or more fields of the underlying table or view that they are created from. They can be customized by vertically partitioning data (choosing a subset of fields), horizontally partitioning the data (by defining one or more columns to be partition keys), and sorting the data on one or more fields.

- Aggregation reflections: These reflections accelerate BI-style queries that involve aggregations (GROUP BY queries). They can also be configured to work on a subset of the fields of a data source.

Benefits of using Dremio data reflections:

- Improved performance: Dremio data reflections can significantly improve the performance of queries by pre-computing the results of those queries.

- Flexibility: Dremio data reflections can be dynamically generated based on the results of user queries, making them more flexible than traditional materialized views.

- Scalability: Dremio data reflections can be scaled to handle large datasets.

DataTune 2024 Sessionize Event

March 2024 Nashville, Tennessee, United States

Orlando Code Camp 2024 Sessionize Event

February 2024 Sanford, Florida, United States

Open Source Analytics Conference 2023 Sessionize Event

December 2023

Alex Merced

Co-Author of O'Reilly's "Apache Iceberg: The Definitive Guide"

New York City, New York, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top