Andrew Madson
Head of Developer Relations at Fivetran | Author of "Apache Polaris - The Definitive Guide". Authoring "AI-Ready Data" for Wiley and "Data Transformation" for O'Reilly
Paris, France
Actions
Andrew Madson leads Developer Relations at Fivetran, where he builds global programs that help developers and data teams adopt modern data and AI tooling. He has built DevRel, education, and evangelism functions at Fivetran, Tobiko Data, and Dremio, delivering technical content, large-scale community programs, and keynotes across major industry conferences.
Andrew is a published author of the O’Reilly “Definitive Guide to Apache Polaris” and a graduate professor of data science and engineering. Andrew previously led AI and data teams at Arizona State University, J.P. Morgan Chase, and MassMutual among others.
Area of Expertise
Topics
Iceberg for Agents - Elevate Data to Context For Modern AI Systems
AI agents fail in production not because models are weak, but because the data stack is. Fragmented silos, inconsistent definitions, and logic buried in tribal knowledge leave agents overwhelmed with data yet starved for context.
Apache Iceberg fixes the foundation. ACID transactions, time travel, and schema evolution turn storage into a live, versioned context layer that agents can reason over reliably. dbt fixes the meaning. Semantic models and the dbt MCP server give agents a governed interface to your data — translating natural language into structured queries grounded in business logic.
Together, they power Structured RAG: retrieval that understands schema, respects governance, and returns interpretable results.
This session includes a live demo of a fully open-source Structured RAG stack built on Iceberg and dbt, featuring semantic query translation, hybrid retrieval, and governed agent reasoning via the dbt MCP. Expect architecture diagrams, real code, and practical guidance.
To Infinity & Beyond! - Future-proofing pipelines with open standards
Your data belongs to you. That sounds obvious, but for decades it hasn't been true in practical terms.
Databases coupled everything together, making analytics painful and scaling nearly impossible. Warehouses separated storage from compute but locked data in proprietary formats you couldn't read without the vendor's engine. Lakes gave you open formats and engine choice but crumbled without ACID guarantees.
Each evolution solved one problem and revealed the next. The lakehouse is the culmination of decades of innovation. Open table formats deliver ACID transactions, schema evolution, and time travel on open storage. Open metadata ensures any compatible engine can reach your data. The result: flexibility, interoperability, and robust systems, exactly what AI demands.
This talk traces that evolution and shows why the lakehouse isn't just another architecture. You'll learn how open standards on both sides of the pipeline (ingest and consumption) keep your infrastructure vendor-neutral, AI-ready, and built to evolve as fast as the landscape around it.
You Deserve a REST - How the Iceberg REST Spec Unlocks Lakehouse
Before the Iceberg REST Catalog spec, metadata access was tightly coupled to specific engines, JVMs, or deployment patterns.
The Iceberg REST Catalog specification changes that. It decouples table management from engine-specific implementations, opening the door to broader interoperability, lighter-weight clients, and a more modular, composable ecosystem.
1. What Catalogs Were Like Before REST – Why Hive Metastore was brittle, Java-based catalogs limited flexibility, and cross-engine compatibility challenges
2. What the REST Spec Enables – A truly open interface for listing tables, fetching schemas, committing snapshots, and creating metadata transactions
3. Where It’s Going – How REST unlocks remote catalog access for non-Java tools, empowers projects like Polaris and Lakekeeper, and simplifies deploying Iceberg in multi-tenant, cloud-native architectures.
We’ll walk through examples of how query engines like Trino and DuckDB interact with REST catalogs today.
This session will show you why the Iceberg REST spec is one of the most powerful pieces of the modern lakehouse.
What Got Us Here Won't Get us There - Open Data Infrastructure and the AI of Tomorrow
Every generation of data infrastructure was built for the needs of its time — and broke when GenAI changed the world. Warehouses offered structure but enforced proprietary formats and enginesw. Hadoop promised openness but delivered limitations. Cloud platforms offered scalability, then quietly locked organizations in at the storage layer.
Today's AI workloads — training runs, feature stores, RAG, agents — demand capabilities no prior generation anticipated. The pace of innovation requires infrastructure flexibility.
AI needs open standards. Apache Iceberg decouples storage, data, metadata, and compute behind an open spec. The data lakehouse serves analytics, ML, and real-time AI from one governed dataset. dbt makes the transformation layer portable, encoding business logic in version-controlled models instead of a vendor's metadata store.
Together, they create infrastructure that adapts as fast as AI does — without rebuilding the foundation every time the landscape shifts.
Iceberg for Agents - Turning Lakehouse Data Into AI-Ready Context
AI agents fail in production because they're overwhelmed with data but starved for context. LLM models aren’t the problem. The bottleneck is the data stack: fragmented silos, inconsistent definitions, and logic hidden in tribal knowledge. Agents need structured, reliable, and interpretable context—not just data access.
In this session, we'll show how Apache Iceberg becomes the backbone of AI-ready pipelines. You’ll learn how to elevate your Iceberg implementation from a storage format to a live context layer that powers structured retrieval-augmented generation (RAG), schema-aware agents, and autonomous reasoning grounded in truth.
What we’ll cover:
1. Iceberg Foundations for AI - from ACID to Time Travel
2. From Rows to Relationships - The role of the semantic layer
3. Structured RAG in Practice - Fully open source
The session includes a live demo of a fully open-source Structured RAG stack built on Apache Iceberg, featuring semantic query translation, hybrid retrieval, and governed agent reasoning. Expect architecture diagrams, real code, and practical guidance.
The Who, What, and Why of Data Lake Table Formats
A comprehensive exploration of the intricacies of Data Lake Table Formats and their impact on business analytics.
Data lake table formats are a critical component of modern data analytics. They provide a way to organize and manage data in a data lake, and they offer several benefits for business analytics, including:
Scalability: Data lake table formats can scale to handle large amounts of data.
Performance: Data lake table formats can improve the performance of queries on large datasets.
Durability: Data lake table formats can ensure that data is durable and recoverable.
Auditability: Data lake table formats can help to ensure that data is auditable and compliant.
This presentation will explore the who, what, and why of data lake table formats. We will discuss the different data lake table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake. We will also discuss the benefits of using data lake table formats for business analytics.
By the end of this presentation, you will better understand data lake table formats and how they can be used to improve business analytics.
Key takeaways:
Data lake table formats are a critical component of modern data analytics.
They offer a number of benefits for business analytics, including scalability, performance, durability, and auditability.
There are a variety of data lake table formats available, including Apache Iceberg, Apache Hudi, and Delta Lake.
AI considerations for Education
Artificial Intelligence (AI) is revolutionizing the educational landscape, offering unprecedented opportunities to enhance learning experiences, personalize education, and streamline administrative processes. This session, "AI Considerations for Education," explores the critical factors educators, administrators, and policymakers need to consider when integrating AI into educational settings.
Participants will delve into the ethical, technical, and practical aspects of AI implementation in education. Through an examination of cutting-edge AI technologies and their applications, attendees will gain insights into how AI can support diverse learning needs, improve student outcomes, and drive institutional efficiency. The session will also address the challenges and risks associated with AI in education, providing a balanced perspective on this transformative technology.
Key Takeaways:
1. AI in the Classroom: Understand how AI can be used to create adaptive learning environments, offer personalized instruction, and support teachers in delivering more effective education.
2. Ethical and Privacy Considerations: Explore the ethical implications of AI in education, including data privacy, bias mitigation, and the importance of maintaining human oversight in AI-driven decisions.
3. Technical Integration: Learn about the technical requirements and infrastructure needed to successfully implement AI solutions in educational institutions.
4. Improving Student Outcomes: Discover how AI can help identify at-risk students, tailor interventions, and support diverse learning needs to enhance overall student performance.
5. Administrative Efficiency: Examine how AI can optimize administrative tasks, from enrollment and scheduling to resource management, freeing up educators to focus more on teaching and student engagement.
6. Challenges and Solutions: Gain insights into the common challenges faced during AI implementation and explore practical solutions to overcome these barriers.
Join us for this forward-thinking session to explore the multifaceted considerations of AI in education and learn how to harness its potential responsibly and effectively.
AI Considerations For Enterprise Change Management
As enterprises navigate digital transformation, integrating Artificial Intelligence (AI) into change management processes is crucial for success. This session, "AI Considerations for Enterprise Change Management," explores the strategic role of AI in facilitating and accelerating organizational change, ensuring seamless transitions and enhanced business outcomes.
Key Takeaways:
-AI in Change Management: Understand the foundational concepts of integrating AI into change management, including its impact on planning, execution, and monitoring of change initiatives..
-Process Automation: Explore the role of AI in automating routine tasks and processes, reducing manual effort, and increasing efficiency during organizational transitions.
-Employee Engagement and Training: Discover how AI can enhance employee engagement, provide personalized training, and support smoother adaptation to change.
-Challenges and Solutions: Address the common challenges of implementing AI in change management, including data privacy concerns, resistance to change, and ethical considerations.
AI Ready Data with Apache Iceberg: Unifying, Controlling, and Optimizing Your Data for Effective AI
Title: AI Ready Data with Apache Iceberg: Unifying, Controlling, and Optimizing Your Data for Effective Artificial Intelligence
Target Audience:
Data engineers
Data scientists
Data architects
Technical leaders (CTOs, CIOs)
Anyone interested in improving data quality for AI/ML initiatives
Abstract
In today's data-driven world, the effectiveness of Artificial Intelligence (AI) and Machine Learning (ML) models depends heavily on the quality and organization of your underlying data. "AI Ready Data with Apache Iceberg" addresses this challenge and describes how Apache Iceberg can facilitate unifying, governing, and optimizing your data, making it truly AI ready.
Key Takeaways:
The Data Lakehouse Advantage:
Explain how Apache Iceberg, combined with the lakehouse architecture, provides a unified platform for all types of data, breaking down silos and simplifying data management.
Git-Like Data Governance with Nessie:
Introduce Nessie and demonstrate how its Git-like functionality brings version control, branching, and collaboration to your data, enabling efficient experimentation and ensuring data reproducibility.
Data Contracts for Quality Assurance:
Discuss the concept of data contracts and how they can be used to define and enforce quality standards, ensuring that data meets the necessary criteria for AI/ML workloads.
Iceberg's Optimized Data Structures:
Highlight how Iceberg's optimized data layouts (e.g., columnar formats, partitioning, hidden partitioning) improve query performance and resource utilization, leading to faster AI/ML model training and inference.
Real-World Use Cases:
Share examples of how organizations are using Iceberg, Nessie, and data contracts to build robust data pipelines, enhance data quality, and achieve tangible results with their AI initiatives.
The Who, What, and Why of Data Lake Table Formats
A comprehensive exploration of the intricacies of Data Lake Table Formats and their impact on business analytics.
Data lake table formats are a critical component of modern data analytics. They provide a way to organize and manage data in a data lake, and they offer several benefits for business analytics, including:
Scalability: Data lake table formats can scale to handle large amounts of data.
Performance: Data lake table formats can improve the performance of queries on large datasets.
Durability: Data lake table formats can ensure that data is durable and recoverable.
Auditability: Data lake table formats can help to ensure that data is auditable and compliant.
This presentation will explore the who, what, and why of data lake table formats. We will discuss the different data lake table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake. We will also discuss the benefits of using data lake table formats for business analytics.
By the end of this presentation, you will better understand data lake table formats and how they can be used to improve business analytics.
Key takeaways:
Data lake table formats are a critical component of modern data analytics.
They offer a number of benefits for business analytics, including scalability, performance, durability, and auditability.
There are a variety of data lake table formats available, including Apache Iceberg, Apache Hudi, and Delta Lake.
Sub-second Power BI Dashboards Directly on ADLS Storage
Unleash the full potential of your data with our talk, "Sub-second Power BI Dashboards Directly on the Data in Your ADLS Storage Using Dremio." In this engaging discussion, we'll dive into the challenges that arise when using extracts and cubes to accelerate Power BI dashboards and how Dremio, with its data reflections, simplifies the process while preserving consistency.
Traditional methods of accelerating Power BI dashboards often involve the creation of extracts and cubes. While these techniques can enhance query performance, they introduce data preparation, maintenance, and synchronization complexities. The quest for faster dashboards often results in a trade-off between speed and consistency.
Enter Dremio, a game-changer in the world of data acceleration. In this talk, we'll explore how Dremio's approach to data reflections revolutionizes how you create dashboards directly on your data stored in Azure Data Lake Storage (ADLS). Key highlights include:
A comprehensive examination of the complexities and consistency issues associated with extract and cube-based approaches for accelerating Power BI dashboards.
How Dremio's data reflections eliminate the need for extracts and cubes by providing sub-second query performance directly on your data in ADLS.
Real-world examples showcasing how Dremio empowers data professionals to create high-performance dashboards while maintaining data consistency.
The benefits of a simplified and more agile approach to data acceleration resulting in faster insights and reduced overhead.
Join us to discover how Dremio's data reflections can redefine your Power BI dashboard acceleration strategy. Say goodbye to the complexities of extracts and cubes, and embrace a more direct, efficient, and consistent way to access and visualize your data. Take advantage of this opportunity to learn how Dremio can supercharge your analytics workflow.
Building and Scaling Analytics Products
Organizations are increasingly recognizing the value of data-driven decision-making. Simply collecting and storing data isn't enough. To truly harness its potential, businesses need to transform their data into actionable insights and deliver those insights to their end-users through well-designed analytics products.
This presentation will explore the concept of "analytics as a product" and provide a practical framework for building, launching, and scaling analytics products that drive business value.
Key topics will include:
The Product Mindset: Understanding the principles of product management and how they apply to analytics.
Defining Your Target Audience: Identifying the needs and pain points of your end-users and designing solutions that address them.
Building Your Analytics Product: From MVP to full-fledged product, understanding the development lifecycle and key considerations for success.
Data Storytelling: Effectively communicating insights and recommendations to stakeholders through compelling narratives and visualizations.
Scaling Your Product: Strategies for expanding your user base, monetizing your product, and continuously improving its value proposition.
Whether you're a data analyst, product manager, or business leader, this presentation will equip you with the knowledge and tools to successfully build and scale analytics products that deliver real impact for your organization.
**Key Takeaways:**
Attendees will gain a deeper understanding of the "analytics as a product" mindset.
Learn a practical framework for building and scaling analytics products.
Discover strategies for effective data storytelling and communication.
Gain insights into the latest trends and best practices in analytics product management.
Accelerate Your Analytics with SQL Server + Data Lakehouse
Organizations are collecting enormous amounts of structured and unstructured data. This data holds valuable insights, but traditional data warehouses can struggle to handle the volume, variety, and velocity of modern data sets. This presentation will explore how combining SQL Server with a data lakehouse architecture can revolutionize your analytics capabilities.
We'll dive into the concept of a data lakehouse, which brings the best of data lakes and data warehouses together. You'll learn how to leverage SQL Server's robust relational capabilities alongside the scalability and flexibility of a data lake, all while maintaining data quality and governance. We'll discuss how to:
Ingest and store: Efficiently capture and store data from diverse sources, including databases, IoT devices, and cloud applications, in your data lakehouse.
Transform and model: Prepare your data for analysis using familiar SQL Server tools and techniques, ensuring data quality and consistency.
Query and analyze: Leverage the power of SQL Server's query engine, augmented by data lakehouse capabilities, for high-performance analytics and reporting.
Govern and secure: Implement comprehensive data governance and security measures to protect sensitive data and ensure compliance with regulatory requirements.
Whether you're a data engineer, analyst, or business leader, this session will provide actionable insights and strategies to accelerate your analytics initiatives, reduce costs, and unlock the full potential of your data assets. Discover how SQL Server and data lakehouse can empower your organization to make faster, more informed decisions and drive business growth.
Data-Centric AI: Accelerate Success with Apache Iceberg Data Products
Data-Centric AI: Accelerating Success Through Modern Data Products
Target Audience
- Data Leaders & Architects
- Data Engineers & Platform Engineers
- ML/AI Engineers & Data Scientists
- Analytics Engineers & Practitioners
Abstract
While organizations rush to advance their AI models, research shows that "reducing the technological gap alone is not enough to ensure success in AI projects" [The Data Death Cycle, 2024]. The key to accelerating AI success lies not in model optimization, but in data excellence. This session reveals how a data-centric approach, powered by Apache Iceberg and modern data architectures, dramatically improves AI systems by ensuring complete, consistent, and curated datasets from the start.
Overview
AI and analytics initiatives demand high-quality data, yet traditional model-centric approaches often overlook this fundamental requirement. We'll explore how data products, implemented through a combination of data mesh and data fabric patterns, provide the systematic data excellence that AI requires. Learn how Apache Iceberg's lakehouse architecture eliminates costly ETL while enabling "git-like" version control through metadata catalogs like Polaris and Nessie, providing comprehensive write-audit-publish capabilities for data changes.
Through real-world examples and architectural patterns, we'll demonstrate how organizations can:
- Accelerate AI success through systematic data excellence rather than just model optimization
- Create trusted data products that ensure quality, completeness, and consistency
- Implement efficient data integration without expensive ETL processes
- Enable version control and auditability for data changes
- Balance centralized governance with domain agility
Key Takeaways
1. Data-Centric Advantage
- Why focusing on data quality accelerates AI success more effectively than model optimization
- How systematic data excellence reduces the 80% of time data scientists spend on data preparation
- The critical role of complete, consistent, and curated datasets in AI/ML success
2. Modern Data Products
- How data products enable systematic management of data quality
- Why combining data mesh and data fabric creates the ideal architecture for AI-ready data
- Patterns for implementing data products that ensure quality, governance, and reliability
3. Technical Foundation
- How Apache Iceberg enables efficient data integration without costly ETL
- The role of metadata catalogs in providing git-like version control for data
- Practical patterns for implementing write-audit-publish workflows for data changes
4. Implementation Path
- Steps for transitioning to a data-centric approach
- How to begin implementing data products in your organization
- Methods for measuring and demonstrating success
Whether you're struggling with AI initiatives or looking to accelerate existing programs, this session provides practical insights into building the data foundation that modern AI demands. You'll learn why data-centric approaches succeed where model-centric efforts fail, and how to implement the technical architecture that makes it possible.
Join us to discover how combining data-centric thinking with modern technologies like Apache Iceberg can transform your organization's ability to deliver successful AI initiatives. Leave with concrete steps for implementing data products that provide the complete, consistent, and curated data that AI systems require.
This is an updated version of my most popular conference talk, which was requested at 20+ conferences in 2024
What Got Us Here Won't Get Us There: The Future of Data Infrastructure and AI Agents
AI doesn't have a model problem. It has a data problem. Models are rapidly commoditizing. The real differentiator is the quality and quantity of data flowing into them, yet most enterprise data estates remain fragmented, inflexible, and riddled with conflicting systems. This isn't an accident. Decades of storage evolution, from mainframes to data warehouses to cloud platforms, have left organizations with layers of incompatible infrastructure never designed to work together, let alone serve the demands of modern AI.
This talk traces how we got here and makes the case for where we need to go: Open Data Infrastructure. More than a technology stack, Open Data Infrastructure is a design principle that prioritizes flexibility, interoperability, and data ownership while building explicitly for tomorrow's AI workloads. By consolidating fragmented systems around open standards (Apache Iceberg for lakehouse storage, interoperable catalogs and query engines, dbt for transformation, APIs for ingestion, and Apache Arrow via ADBC for high-performance data access) organizations can replace brittle, vendor-locked architectures with composable, future-proof foundations.
The stakes are rising. AI agents need to interact with trustworthy, structured data at a pace and scale that legacy infrastructure cannot support. Open Data Infrastructure provides the substrate that grounds agents in reality, enabling organizations to confidently delegate business outcomes to autonomous systems. If you're a data leader, data engineer, or AI engineer planning your next infrastructure investment, this session will reframe how you think about the relationship between your data platform and your AI ambitions.
Key Takeaways
1. Models are commodities. Data infrastructure is the differentiator. As foundation models converge in capability and cost, competitive advantage shifts to the organizations that can deliver high-quality data at scale. Investing in better models without fixing the underlying data platform yields diminishing returns.
2. Enterprise data fragmentation is a historical inevitability, not a failure of execution. Each era of data storage solved the problems of its time while creating the silos, incompatibilities, and rigidity that plague organizations today. Understanding this history is essential to breaking the cycle.
3. Open Data Infrastructure is a design principle, not a product. Flexibility, interoperability, and ownership are the pillars. Open standards like Apache Iceberg, dbt, Apache Arrow (ADBC), and API-driven ingestion allow organizations to build composable architectures that avoid vendor lock-in and adapt as requirements evolve.
4. AI agents raise the bar for data infrastructure by orders of magnitude. Agents operate autonomously, at machine speed, across large volumes of data. They require structured, trustworthy, and accessible data to deliver reliable business outcomes. Infrastructure that barely supports human-driven analytics will not survive this shift.
5. The organizations that invest in open, composable data foundations now will be the ones that successfully deploy agentic AI at scale. Open Data Infrastructure is not a future aspiration. It is the prerequisite for trusting autonomous systems to act on your behalf.
Iceberg Summit 2026 Sessionize Event Upcoming
Azure AI Connect Sessionize Event
Devnexus 2025 Sessionize Event
Global Power Platform Bootcamp Houston 2025 Sessionize Event
DeveloperWeek 2025 Sessionize Event
CANCELLED - Cleveland Data Rocks '25 Sessionize Event
SQL Saturday Houston Sessionize Event
The Commit Your Code Conference Sessionize Event
DevFest 2024 by GDG Burnaby Sessionize Event
Open Source Analytics Conference 2024 Sessionize Event
AI Summit Vancouver Sessionize Event
DataPopkorn - a bite-sized knowledge! Sessionize Event
DATA BASH '24 Sessionize Event
AI Community Conference - Vancouver 2024 Sessionize Event
2024 All Day DevOps Sessionize Event
AI Community Conference - Boston 2024 Sessionize Event
2024 Data.SQL.Saturday.SD (SQLSatSD) Sessionize Event
BSides Colorado springs Sessionize Event
Community Days Los Angeles 2024 Sessionize Event
Data Saturday Dallas 2024 Sessionize Event
SQLSaturday Denver 2024 Sessionize Event
Andrew Madson
Head of Developer Relations at Fivetran | Author of "Apache Polaris - The Definitive Guide". Authoring "AI-Ready Data" for Wiley and "Data Transformation" for O'Reilly
Paris, France
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top