Session

Data-Centric AI: Accelerate Success with Apache Iceberg Data Products

Data-Centric AI: Accelerating Success Through Modern Data Products

Target Audience
- Data Leaders & Architects
- Data Engineers & Platform Engineers
- ML/AI Engineers & Data Scientists
- Analytics Engineers & Practitioners

Abstract
While organizations rush to advance their AI models, research shows that "reducing the technological gap alone is not enough to ensure success in AI projects" [The Data Death Cycle, 2024]. The key to accelerating AI success lies not in model optimization, but in data excellence. This session reveals how a data-centric approach, powered by Apache Iceberg and modern data architectures, dramatically improves AI systems by ensuring complete, consistent, and curated datasets from the start.

Overview
AI and analytics initiatives demand high-quality data, yet traditional model-centric approaches often overlook this fundamental requirement. We'll explore how data products, implemented through a combination of data mesh and data fabric patterns, provide the systematic data excellence that AI requires. Learn how Apache Iceberg's lakehouse architecture eliminates costly ETL while enabling "git-like" version control through metadata catalogs like Polaris and Nessie, providing comprehensive write-audit-publish capabilities for data changes.

Through real-world examples and architectural patterns, we'll demonstrate how organizations can:
- Accelerate AI success through systematic data excellence rather than just model optimization
- Create trusted data products that ensure quality, completeness, and consistency
- Implement efficient data integration without expensive ETL processes
- Enable version control and auditability for data changes
- Balance centralized governance with domain agility

Key Takeaways

1. Data-Centric Advantage
- Why focusing on data quality accelerates AI success more effectively than model optimization
- How systematic data excellence reduces the 80% of time data scientists spend on data preparation
- The critical role of complete, consistent, and curated datasets in AI/ML success

2. Modern Data Products
- How data products enable systematic management of data quality
- Why combining data mesh and data fabric creates the ideal architecture for AI-ready data
- Patterns for implementing data products that ensure quality, governance, and reliability

3. Technical Foundation
- How Apache Iceberg enables efficient data integration without costly ETL
- The role of metadata catalogs in providing git-like version control for data
- Practical patterns for implementing write-audit-publish workflows for data changes

4. Implementation Path
- Steps for transitioning to a data-centric approach
- How to begin implementing data products in your organization
- Methods for measuring and demonstrating success

Whether you're struggling with AI initiatives or looking to accelerate existing programs, this session provides practical insights into building the data foundation that modern AI demands. You'll learn why data-centric approaches succeed where model-centric efforts fail, and how to implement the technical architecture that makes it possible.

Join us to discover how combining data-centric thinking with modern technologies like Apache Iceberg can transform your organization's ability to deliver successful AI initiatives. Leave with concrete steps for implementing data products that provide the complete, consistent, and curated data that AI systems require.

This is an updated version of my most popular conference talk, which was requested at 20+ conferences in 2024

Andrew Madson

Dremio | Data Science, AI, and Analytics Evangelist

New City, New York, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top