Session

From Data Chaos to Trusted Insights: Open Source Entity Resolution in the Microsoft Fabric Lakehouse

Most analytics failures are not model failures or visualization failures. They are data identity failures — the quiet accumulation of duplicate records, fragmented customer profiles, and mismatched entities that corrupt every metric downstream before any analyst ever touches the data.

The same customer exists under three names. Two suppliers are the same company. A patient has records in four systems with no shared key. By the time these land in your Fabric Lakehouse and flow into Power BI, your segment counts are inflated, your revenue numbers are wrong, and your AI models are training on fiction.

This problem has a name — entity resolution — and it is solvable. But the traditional toolkit (exact ID matching, hand-coded fuzzy rules, manual review queues) breaks down at the scale and variety that a conventional data pipeline is designed to handle. And proprietary MDM platforms that promise to solve it are expensive, rigid, and notoriously slow to implement.

There is a better path: open source.

In this session, Sonal Goyal — Founder of Zingg.ai and creator of Zingg, one of the most widely adopted open source frameworks for entity resolution — will demonstrate how open source ML-based entity resolution runs natively on Fabric's Spark engine, reads and writes directly to OneLake, and requires no labeled training data to get started. Zingg is free, transparent, extensible, and already trusted by Fortune 500 companies and U.S. federal agencies in production.

Entity resolution sits at the heart of your data trust layer — the logic that decides which records belong together. That logic should be auditable, version-controlled, and owned by your team.

You will see a live end-to-end walkthrough: raw messy data in, clean unified entities out, directly consumable by a Power BI semantic model.

Key takeaways:
- Why entity resolution is the missing step between data ingestion and trustworthy analytics
- Why open source is the right foundation for data quality logic you need to trust and control
- How ML-based deduplication works and why it outperforms hand-coded rules at scale
- A live demo of the full open source pipeline running inside Fabric Notebooks with Python
- Architecture patterns for embedding entity resolution into your Fabric data pipelines
- Lessons from production deployments across Fortune 500 companies and federal agencies

If you build data pipelines, design semantic models, or simply care about whether the numbers your organization makes decisions on are actually correct — this session is for you.

Sonal Goyal

Building Open Source Entity Resolution Zingg AI

Noida, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top