Alex Merced
Senior Tech Evangelist for Dremio
Atlanta, Georgia, United States
Actions
Alex Merced is a Senior Tech Evangelist for Dremio and has worked as a developer and instructor for companies like GenEd Systems, Crossfield Digital, CampusGuard and General Assembly.
Alex is passionate about technology and has put out tech content on outlets such as blogs, videos and his podcasts Datanation and Web Dev 101. Alex Merced has contributed a variety of libraries in the Javascript & Python worlds including SencilloDB, CoquitoJS, dremio-simple-query and more.
Area of Expertise
Topics
The Power of dbt and Dremio in implementing a Data Lakehouse
"The Power of dbt and Dremio in Implementing a Data Lakehouse" is a forward-looking talk designed to unpack the data lakehouse concept, elucidate Dremio's integral role in this architecture, and demonstrate how dbt (data build tool) can streamline the creation and management of your lakehouse's semantic layer.
We begin with an introduction to the data lakehouse model, highlighting its significance in fostering data democratization, enhancing analytics capabilities, and providing a unified platform for machine learning and BI applications. This section sets the stage for a deeper dive into how Dremio facilitates the transition to a data lakehouse by offering a direct query layer that bridges the gap between raw data in data lakes and refined insights.
The focus then shifts to the transformative impact of dbt in this ecosystem. dbt empowers data teams to define, deploy, and manage complex data transformation workflows directly within the data lakehouse, promoting a code-first approach to building a semantic layer. Through practical examples, attendees will learn how dbt's modular, version-controlled, and collaborative development environment complements Dremio's capabilities, making the process of curating and managing the semantic layer more efficient and scalable.
By attending this talk, participants will gain a comprehensive understanding of the data lakehouse architecture, appreciate Dremio's pivotal role in enabling such an ecosystem, and discover how dbt can simplify the curation of the semantic layer, leading to a more agile, manageable, and powerful data analytics platform. Whether you're a data engineer, data analyst, or data architect, this session will equip you with the knowledge and strategies to leverage dbt and Dremio in realizing the full potential of your data lakehouse.
How Apache Iceberg is used for Query Acceleration through Dremio’s Reflections
Join us for an insightful exploration into the powerful synergy between Dremio's Reflections and Apache Iceberg, revolutionizing query acceleration in modern data analytics. Reflections, a cutting-edge feature of Dremio, eliminates the complexities of traditional materialized views, BI extracts, and cubes, while seamlessly integrating with Iceberg's robust table format. This talk delves into how Reflections dynamically create materialized datasets from diverse data lake sources, leveraging Iceberg's features like partition transforms for optimal acceleration. We'll discuss the customization options available, showcasing real-world examples of how this integration significantly enhances query performance and simplifies data processing workflows. Don't miss this opportunity to discover how Reflections and Iceberg collaborate to reshape the landscape of data query acceleration.
Building a Data Lakehouse with Dremio, DBT, and Apache Iceberg
Organizations are increasingly moving towards data lakehouse architectures to combine the flexibility and scalability of data lakes with the management features of traditional data warehouses. This talk introduces a cutting-edge approach to building a data lakehouse utilizing Dremio, DBT (Data Build Tool), and Apache Iceberg, offering attendees a comprehensive blueprint for implementing a scalable, efficient, and cost-effective data platform.
We'll start by exploring the fundamentals of the data lakehouse architecture and the unique benefits it provides over conventional data storage solutions. The focus will then shift to how Dremio acts as a core engine, enabling lightning-fast query performance directly on data lake storage, thus eliminating the need for costly and complex data movement and duplication.
The integration of DBT with Dremio will be a major highlight, detailing the "why" and "how" behind leveraging DBT for data transformation within the Dremio environment. We'll discuss how this combination facilitates a more agile and collaborative workflow among data teams, streamlines the development process, and ensures data quality and reliability across the enterprise.
Apache Iceberg's role in this architecture will also be dissected, illustrating how its open table format plays a pivotal role in managing large-scale analytic datasets with high concurrency and providing schema evolution without performance penalties.
Attendees will leave with actionable insights on:
- Designing a scalable data lakehouse architecture that aligns with business goals.
- Leveraging Dremio for optimized query performance and cost savings.
- Implementing DBT in a Dremio environment to enhance data transformation processes.
- Utilizing Apache Iceberg to manage data at scale and ensure consistency and reliability.
Exploring the Apache Iceberg Ecosystem
This talk provides a concise overview of the Apache Iceberg ecosystem, a pivotal component in the evolution of open lakehouse architectures. We'll delve into key aspects:
Querying Tools: Learn about efficient tools for querying Apache Iceberg tables, enhancing data querying within the open lakehouse framework.
Cataloging Vendors: Discover vendors offering solutions for cataloging and managing Apache Iceberg tables, crucial for maintaining metadata and data organization.
Unique Use Cases: Explore innovative uses of Apache Iceberg tables in various products and technologies, showcasing their adaptability.
Open-Source Projects: Uncover valuable open-source projects that complement Apache Iceberg, expanding its functionality and adoption.
Join us to gain insights into this essential ecosystem, whether you're a data engineer, analyst, or architect, and harness Apache Iceberg's potential in your lakehouse journey.
Best Practices for Building an Iceberg Data Lakehouse with Dremio
This presentation will cover best practices for building an Iceberg lakehouse, including ingesting data into Iceberg, automating table optimization, creating virtual datasets, and using Git for Data for catalog versioning.
Sub-second Power BI Dashboards Directly on the Data in Your ADLS Storage Using Dremio
Unleash the full potential of your data with our talk, "Sub-second Power BI Dashboards Directly on the Data in Your ADLS Storage Using Dremio." In this engaging discussion, we'll dive into the challenges that arise when using extracts and cubes to accelerate Power BI dashboards, and how Dremio, with its data reflections, simplifies the process while preserving consistency.
Traditional methods of accelerating Power BI dashboards often involve the creation of extracts and cubes. While these techniques can enhance query performance, they introduce complexities in data preparation, maintenance, and synchronization. The quest for faster dashboards often results in a trade-off between speed and consistency.
Enter Dremio, a game-changer in the world of data acceleration. In this talk, we'll explore how Dremio's approach to data reflections revolutionizes the way you create dashboards directly on your data stored in Azure Data Lake Storage (ADLS). Key highlights include:
- A comprehensive examination of the complexities and consistency issues associated with extract and cube-based approaches for accelerating Power BI dashboards.
- How Dremio's data reflections eliminate the need for extracts and cubes by providing sub-second query performance directly on your data in ADLS.
- Real-world examples showcasing how Dremio empowers data professionals to create high-performance dashboards while maintaining data consistency.
- The benefits of a simplified and more agile approach to data acceleration, resulting in faster insights and reduced overhead.
Join us to discover how Dremio's data reflections can redefine your Power BI dashboard acceleration strategy. Say goodbye to the complexities of extracts and cubes, and embrace a more direct, efficient, and consistent way to access and visualize your data. Don't miss this opportunity to learn how Dremio can supercharge your analytics workflow.
Connecting Your On-Premise Data to the World of Data Sharing
Whether you have an on-prem Hadoop cluster or on-prem SQL Server estates, there is a world of third-party data markets in the cloud with data you may want to integrate to enrich your on-prem data.
In this session you'll learn:
- What is Dremio
- How Dremio can extend your on-prem data to more easily take advantage of cloud data marketplaces
The Ins & Outs of Data Lakehouse Versioning at the File, Table, and Catalog Level
Data lakehouse versioning is a critical technique for ensuring the accuracy and reliability of data in a data lakehouse. It allows you to track changes to data over time, which can be helpful for troubleshooting problems, auditing data, and reproducing experiments.
This presentation will explore the ins and outs of data lakehouse versioning. We will discuss the different levels of versioning, including catalog, file, and table-level versioning. We will also discuss the benefits of data lakehouse versioning and the pros and cons of each type of versioning.
By the end of this presentation, you will have a better understanding of data lakehouse versioning and how it can be used to improve the accuracy and reliability of your data.
Key takeaways:
- Data lakehouse versioning is a critical technique for ensuring the accuracy and reliability of data in a data lakehouse.
- There are three levels of data lakehouse versioning: catalog, file, and table level versioning.
- Each type of versioning has its own benefits and drawbacks.
- Data lakehouse versioning can be used to troubleshoot problems, audit data, and reproduce experiments.
The Anatomy of a Data Lakehouse
Alex Merced discusses the value proposition of a data lakehouse and the components of building a successful data lakehouse. We'll discuss the role of each component in delivering cost savings through increased performance and reduced storage and the options within each category.
- file format
- table format
- query engines
- semantic layer
Cloud Data Lakehouse with Apache Iceberg and Project Nessie
Alex Merced will discuss how open-source technologies like Apache Iceberg and Project Nessie are expanding what is possible with a data lakehouse and how platforms like Dremio make it easy to take advantage of these innovations.
Data as Code: Project Nessie brings a Git-like experience for Apache Iceberg Tables
Multi-table transactions have existed in data warehouses for some time, but with the open source Project Nessie, multi-table transactions and an innovative git-like experience become available to data lakehouses. In this session, learn how Project Nessie enables the new “Data as Code” paradigm allowing for workload isolation, multi-table transactions and experimentation when working with Apache Iceberg tables.
In this session you'll learn about the Data-as-code paradigm, what is the open source Project Nessie and the new patterns in data engineering it enables.
Apache Iceberg: An Architectural Look Under the Covers
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format, released by Facebook in 2009 that addresses some of these problems, but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, for this webinar to learn the architectural details of why the Hive table format falls short and why the Iceberg table format resolves them, as well as the benefits that stem from Iceberg’s approach.
You will learn:
The issues that arise when using the Hive table format at scale, and why we need a new table format
How a straightforward, elegant change in table format structure has enormous positive effects
The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
The resulting benefits of this architectural design
Open Source and the Data Lakehouse
The open data lakehouse offers those frustrated with the costs and complex pipelines of using traditional warehouses an alternative that offers performance with affordability and simpler pipelines. In this talk, we'll be talking about technologies that are making the open data lakehouse possible.
In this talk we will learn:
What is a data lakehouse
What are the components of a data lakehouse
What is Apache Arrow
What is Apache Iceberg
What is Project Nessie
Apache Arrow Flight SQL: a universal standard for high-performance data transfers from databases
This talk covers why ODBC & JDBC don’t cut it in today’s data world and the problems solved by Arrow, Arrow Flight, and Arrow Flight SQL. We’ll go through how each of these building blocks works as well as an overview of universal ODBC & JDBC drivers built on Arrow Flight SQL, enabling clients to take advantage of this increased performance with zero application changes
The Who, What, and Why of Data Lake Table Formats
Data lake table formats are a critical component of modern data analytics. They provide a way to organize and manage data in a data lake, and they offer several benefits for business analytics, including:
- Scalability: Data lake table formats can scale to handle large amounts of data.
Performance: Data lake table formats can improve the performance of queries on large datasets.
- Durability: Data lake table formats can ensure that data is durable and recoverable.
- Auditability: Data lake table formats can help to ensure that data is auditable and compliant.
This presentation will explore the who, what, and why of data lake table formats. We will discuss the different data lake table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake. We will also discuss the benefits of using data lake table formats for business analytics.
By the end of this presentation, you will better understand data lake table formats and how they can be used to improve business analytics.
Key takeaways:
- Data lake table formats are a critical component of modern data analytics.
- They offer a number of benefits for business analytics, including scalability, performance, durability, and auditability.
- There are a variety of data lake table formats available, including Apache Iceberg, Apache Hudi, and Delta Lake.
A comprehensive exploration of the intricacies of Data Lake Table Formats and their impact on business analytics.
Optimizing Data: Partitioning, Sorting, Compaction, Row Group Sizing, and more
Data optimization is a critical process for improving the performance and efficiency of data-driven applications. Several techniques can be used to optimize data, including partitioning, sorting, compaction, and row group sizing.
In this presentation, we will explore the myriad ways of optimizing data. We will discuss the different techniques available and the benefits and drawbacks of each technique. We will also provide practical advice on choosing the right optimization techniques for your needs.
By the end of this presentation, you will better understand data optimization and how it can be used to improve the performance and efficiency of your data-driven applications.
Key takeaways:
- Data optimization is a critical process for improving the performance and efficiency of data-driven applications.
- Several techniques can be used to optimize data, including partitioning, sorting, compaction, and row group sizing.
- The best optimization techniques for a particular dataset will depend on the application's specific requirements.
Data optimization can be a complex process, but it can be well worth the effort in improving performance and efficiency.
Materialized Views vs Dremio Data Reflections
A comparative study between Materialized Views and Dremio Data Reflections, highlighting their applications and benefits.
Materialized views and Dremio data reflections are both techniques for pre-computing data queries to improve the performance of subsequent queries. However, there are some critical differences between the two techniques.
Materialized views are a traditional database concept, and they are typically implemented as tables populated with the results of pre-computed queries.
Dremio data reflections on the other hand, are Apache Iceberg representations of a view of the raw data or aggregations that can apply custom sorting, partitioning, and other optimizations.
We will discuss the challenges of Materialized views and how Data Reflections meet these challenges with a more flexible and robust approach unique to the Dremio Data Lakehouse platform.
This presentation will also explore the different types of Reflections:
- Raw reflections: These reflections consist of all of the rows and one or more fields of the underlying table or view that they are created from. They can be customized by vertically partitioning data (choosing a subset of fields), horizontally partitioning the data (by defining one or more columns to be partition keys), and sorting the data on one or more fields.
- Aggregation reflections: These reflections accelerate BI-style queries that involve aggregations (GROUP BY queries). They can also be configured to work on a subset of the fields of a data source.
Benefits of using Dremio data reflections:
- Improved performance: Dremio data reflections can significantly improve the performance of queries by pre-computing the results of those queries.
- Flexibility: Dremio data reflections can be dynamically generated based on the results of user queries, making them more flexible than traditional materialized views.
- Scalability: Dremio data reflections can be scaled to handle large datasets.
CI/CD on the Data Lakehouse with Project Nessie
Continuous Integration and Continuous Delivery (CI/CD) is a software development practice that aims to improve the quality and speed of software delivery. In a data lakehouse environment, CI/CD can be used to automate ingesting, transforming, and loading data.
Project Nessie is an open-source project that provides a Git-like approach to version control for data lakehouse tables. Project Nessie can be used to implement CI/CD for data lakehouse environments by providing a way to track changes to data over time and to automate the process of deploying changes to production.
In this presentation, we will discuss the benefits of implementing CI/CD in a data lakehouse environment and how Project Nessie can achieve this. We will also discuss some of the challenges of implementing CI/CD in a data lakehouse environment and how to overcome them.
Key takeaways:
- CI/CD can be used to improve the quality and speed of software delivery in a data lakehouse environment.
- Project Nessie is an open-source project that can be used to implement CI/CD for data lakehouse tables.
- There are a number of challenges to implementing CI/CD in a data lakehouse environment, but these challenges can be overcome.
Project Nessie and Lakehouse Catalog Versioning
Project Nessie is an open-source project that provides a Git-like approach to version control for data lakehouse tables. This makes it possible to track data changes over time and revert to previous versions if necessary.
In a lakehouse environment, catalog versioning is essential for ensuring the accuracy and reliability of data. By tracking changes to the catalog, you can ensure that everyone is working with the same data version. This can help to prevent errors and inconsistencies.
Project Nessie can be used to implement catalog versioning in a lakehouse environment. This can be done by creating a Nessie repository for the catalog and then tracking changes to the repository using Git.
This presentation will discuss the benefits of using Project Nessie for catalog versioning in a lakehouse environment. We will also discuss how to implement catalog versioning using Project Nessie.
Key takeaways:
- Project Nessie can be used to track changes to data over time in a lakehouse environment.
- Catalog versioning is essential for ensuring the accuracy and reliability of data in a lakehouse environment.
Project Nessie can be used to implement catalog versioning in a lakehouse environment.
Zero-ETL and Virtual Data Marts: Made Possible by Dremio and Data Reflections
Embark on a transformative journey into data management with our talk titled "Zero-ETL and Virtual Data Marts: Made Possible by Dremio and Data Reflections." In this enlightening discussion, we'll unveil the challenges of data movement, the shortcomings of legacy virtualization platforms, and how Dremio, with the power of Data Reflections, is redefining data lakehouse architecture.
The era of data lakehouses promised to solve critical issues that had plagued traditional data warehouses. However, one formidable challenge remained—data movement. The need to shuttle data from source to lakehouse hindered agility, increased complexity, and incurred unnecessary costs.
Legacy virtualization platforms attempted to address these challenges but often fell short. They struggled to deliver the scalable performance and flexibility required to create true Zero-ETL environments. Data engineers and analysts continued to grapple with fragmented data landscapes and limited query performance.
Enter Dremio, armed with Data Reflections—a groundbreaking technology that changes the game. In this talk, we'll explore how Dremio's scalable performance and Data Reflections make the dream of Zero-ETL data lakehouses built on Virtual Data Marts a reality.
Key highlights include:
- An in-depth look at the intricacies of data movement and its impact on data lakehouse architectures.
- A critical examination of the limitations of legacy virtualization platforms and their inability to deliver the desired scalability and performance.
- How Dremio, through Data Reflections, offers the scalability, speed, and agility needed to create Zero-ETL environments.
- Real-world examples showcasing the transformative power of Dremio's platform and its role in building Virtual Data Marts.
- The future of data architecture, where Dremio and Data Reflections enable organizations to break free from data movement constraints and embrace true data unification.
Join us as we unveil the groundbreaking potential of Zero-ETL and Virtual Data Marts, made possible by Dremio and Data Reflections. Say goodbye to data movement woes, and hello to a new era of data-driven insights and agility. Take advantage of this opportunity to be part of the data revolution.
The Zero Movement Lakehouse: Reaching the Pinnacle of Data Lakehouse Architecture
Join us for an enlightening talk titled "The Zero Movement Lakehouse: Reaching the Pinnacle of Data Lakehouse Architecture," where we embark on a journey through the evolution of data management, from data warehouses to data lakehouses, and finally, to a future where data no longer needs to move.
Data lakehouses emerged as the answer to the limitations of traditional data warehousing, promising scalability, flexibility, and cost-efficiency. However, data engineers soon discovered that data movement remained an unavoidable challenge even in lakehouses until now.
In this talk, we explore how Dremio spearheads the Zero Movement Lakehouse revolution with its advanced lakehouse, virtualization, and modeling capabilities. We'll dive into the unique advantages of Dremio's platform, which empowers organizations to unify their data stored in data lakehouses, warehouses, and databases within a centralized access layer—without any data movement. This groundbreaking approach reduces complexity and introduces the era of zero-ETL.
Key highlights include:
- A retrospective on how data lakehouses addressed problems that plagued data warehouses.
- The enduring challenge of data movement within the lakehouse and its implications
- Dremio's game-changing solution eliminates data movement while providing a unified, central access layer.
- The role of Dremio's semantic layer in creating Virtual Data Marts, enhancing data modeling, governance, and documentation.
Join us as we explore the future of data architecture, where Dremio's innovative platform unlocks the potential of true data unification, eliminates the need for data movement, and ushers in the era of the Zero Movement Lakehouse—a monumental leap in data management and analytics. Don't miss this opportunity to be at the forefront of this exciting data revolution.
ZeroETL dreams and Virtual Datamart Wishes: How Dremio makes Data Engineering Less Painful
ZeroETL dreams and Virtual Datamart Wishes: How Dremio makes Data Engineering Less Painful
In the ever-evolving landscape of data engineering, the complexities of managing data pipelines from lakes to warehouses and marts have long plagued organizations. As data volumes grow and the need for quick and agile access to insights intensifies, the dream of a zero ETL (Extract, Transform, Load) world becomes more tantalizing. Enter Dremio, the game-changer in the world of data engineering.
Join us for a captivating talk, "ZeroETL Dreams and Virtual Datamart Wishes: How Dremio Makes Data Engineering Less Painful," where we explore how Dremio revolutionizes data engineering by simplifying the journey from raw data sources to insightful data marts. We'll delve into the challenges of traditional data pipelines and the intricacies of transforming and managing data at scale.
Discover how Dremio's innovative approach breaks down the barriers by connecting directly to diverse data sources and enabling the virtual modeling of data marts. Say goodbye to the traditional ETL grind and embrace the power of data virtualization, allowing you to transform your data landscape without moving or duplicating data.
In this talk, we'll cover:
- The pains and complexities of traditional data engineering.
- The allure of a zero ETL world and why it's essential for modern organizations.
- How Dremio empowers data engineers to connect directly to sources and build virtual data marts on top of disparate data sources.
- Best practices for harnessing the full potential of Dremio in your data-driven journey.
Join us on this exciting expedition into the future of data engineering, where Dremio's transformative capabilities will help you realize your "ZeroETL Dreams and Virtual Datamart Wishes." Say goodbye to data engineering pain and hello to data-driven agility.
Lakehouse Solutions to Data Problems: Use cases Dremio, Iceberg, Nessie and Lakehouse Architecture
In today's data-driven world, organizations grapple with vast volumes of data coming from diverse sources. Managing, processing, and extracting value from this data efficiently is a critical challenge. This talk explores how Lakehouse solutions, powered by cutting-edge technologies such as Dremio, Iceberg, Nessie, and Lakehouse architecture, address these data problems and revolutionize data management practices.
We will delve into real-world use cases and scenarios where Lakehouse solutions shine. Discover how Dremio empowers data consumers with self-service access to data, bridging the gap between data engineering and analytics. Explore the benefits of Iceberg, a table format designed for large-scale data lakes, providing schema evolution and transactional capabilities. Learn how Nessie brings version control to data lakes, ensuring data integrity and collaboration. Dive deep into the Lakehouse architecture, which combines the best of data lakes and data warehouses for unified, scalable, and performant analytics.
Whether you're a data engineer, data scientist, or data architect, this talk will provide valuable insights into leveraging Lakehouse solutions to overcome data challenges, optimize data workflows, and unlock the full potential of your data lake. Join us to explore the future of data management and analytics.
Lakehouse Solutions to Data Problems: Use cases Dremio, Iceberg, Nessie and Lakehouse Architecture
In today's data-driven world, organizations grapple with vast volumes of data coming from diverse sources. Managing, processing, and extracting value from this data efficiently is a critical challenge. This talk explores how Lakehouse solutions, powered by cutting-edge technologies such as Dremio, Iceberg, Nessie, and Lakehouse architecture, address these data problems and revolutionize data management practices.
We will delve into real-world use cases and scenarios where Lakehouse solutions shine. Discover how Dremio empowers data consumers with self-service access to data, bridging the gap between data engineering and analytics. Explore the benefits of Iceberg, a table format designed for large-scale data lakes, providing schema evolution and transactional capabilities. Learn how Nessie brings version control to data lakes, ensuring data integrity and collaboration. Dive deep into the Lakehouse architecture, which combines the best of data lakes and data warehouses for unified, scalable, and performant analytics.
Whether you're a data engineer, data scientist, or data architect, this talk will provide valuable insights into leveraging Lakehouse solutions to overcome data challenges, optimize data workflows, and unlock the full potential of your data lake. Join us to explore the future of data management and analytics.
Lakehouse Catalogs 101 - Governing and Transporting your Iceberg, Delta and Hudi tables
Join Senior Technical Evangelist for Dremio, Alex Merced, as he explores one of the most critical frontiers in the lakehouse ecosystem: catalogs. As the industry embraces the lakehouse paradigm and the variety of table formats like Iceberg, Hudi, and Delta, the next key challenge is understanding the role of lakehouse catalogs. These catalogs govern and track your lakehouse assets, providing essential metadata and ensuring smooth management across different computing engines. In this talk, Alex will demystify leading catalog solutions such as Apache Polaris (incubating), Nessie, Unity Catalog, Gravitino, Dremio Catalog, and AWS Glue, and guide you through navigating this evolving landscape to effectively manage your lakehouse.
DataTune 2024 Sessionize Event
Orlando Code Camp 2024 Sessionize Event
Open Source Analytics Conference 2023 Sessionize Event
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top