Speaker

Sonal Goyal

Sonal Goyal

Building Open Source Entity Resolution Zingg AI

Noida, India

Actions

I am the Founder of Zingg.ai and the creator of Zingg — the leading open source framework for entity resolution and record linkage, native on the modern data stack across Databricks, Snowflake, Fabric, GCP, AWS and more.

Zingg ranks in the top 12% of all PyPI packages and is one of the top open source repositories in its space on GitHub. What started as an open source project is now trusted by Fortune 500 companies and U.S. federal agencies to solve some of the hardest data unification problems at scale.

Zingg's ML-based approach removes the need for hand-coded rules and labeled training data, making production-grade deduplication accessible to any organization.

I speak and write on entity resolution, data quality, and the real-world messiness of making data trustworthy at scale.

Area of Expertise

  • Information & Communications Technology

Topics

  • Master Data Management
  • entity resolution
  • identity resolution
  • Customer 360 Analytics
  • MarTech
  • AML / KYC
  • Customer Data Platforms (CDP)

From Data Chaos to Trusted Insights: Open Source Entity Resolution in the Microsoft Fabric Lakehouse

Most analytics failures are not model failures or visualization failures. They are data identity failures — the quiet accumulation of duplicate records, fragmented customer profiles, and mismatched entities that corrupt every metric downstream before any analyst ever touches the data.

The same customer exists under three names. Two suppliers are the same company. A patient has records in four systems with no shared key. By the time these land in your Fabric Lakehouse and flow into Power BI, your segment counts are inflated, your revenue numbers are wrong, and your AI models are training on fiction.

This problem has a name — entity resolution — and it is solvable. But the traditional toolkit (exact ID matching, hand-coded fuzzy rules, manual review queues) breaks down at the scale and variety that a conventional data pipeline is designed to handle. And proprietary MDM platforms that promise to solve it are expensive, rigid, and notoriously slow to implement.

There is a better path: open source.

In this session, Sonal Goyal — Founder of Zingg.ai and creator of Zingg, one of the most widely adopted open source frameworks for entity resolution — will demonstrate how open source ML-based entity resolution runs natively on Fabric's Spark engine, reads and writes directly to OneLake, and requires no labeled training data to get started. Zingg is free, transparent, extensible, and already trusted by Fortune 500 companies and U.S. federal agencies in production.

Entity resolution sits at the heart of your data trust layer — the logic that decides which records belong together. That logic should be auditable, version-controlled, and owned by your team.

You will see a live end-to-end walkthrough: raw messy data in, clean unified entities out, directly consumable by a Power BI semantic model.

Key takeaways:
- Why entity resolution is the missing step between data ingestion and trustworthy analytics
- Why open source is the right foundation for data quality logic you need to trust and control
- How ML-based deduplication works and why it outperforms hand-coded rules at scale
- A live demo of the full open source pipeline running inside Fabric Notebooks with Python
- Architecture patterns for embedding entity resolution into your Fabric data pipelines
- Lessons from production deployments across Fortune 500 companies and federal agencies

If you build data pipelines, design semantic models, or simply care about whether the numbers your organization makes decisions on are actually correct — this session is for you.

Sonal Goyal

Building Open Source Entity Resolution Zingg AI

Noida, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top