Session

What's What and Who's Who? Cross System Record Linkage with Fabric

Finding records representing the same real-world entity either in multiple sources or duplicated within a single data set is a common pain point when seeking to build a single definitive view of the truth in a data warehouse.
Variances in format, changes over time, and data entry errors mean connecting our Tom, Dick, and Harries to our Thom, Dirk, and Harriets can a big challenge.

This session introduces one approach to solving this problem using Fabric notebooks to run Splink, an open-source Python library developed by the Ministry of Justice in the UK to implement probabilistic record linkage.
You will be will introduced to the basics of deterministic, fuzzy, and probabilistic methods of matching records, and then we'll dive in with an end to end example using notebooks to create, apply and, analyse the results of a matching model.

By the end of the session you'll be ready to take what we've learned back home with you and start applying it to your own data.

Barney Lawrence

Consultant focusing on data analytics and engineering on the Microsoft platform

Chesterfield, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top