Session

What's What and Who's Who? Cross System Record Linkage with Fabric and Splink.

Finding records representing the same real-world entity either in multiple sources or duplicated within a single system is a common need when seeking to build a single definitive view of the truth with your data.
Variances in format, changes over time and data entry errors mean connecting our Tom, Dick, and Harries in one system to our Thom, Dirk, and Harriets in another can be challenging.
This session introduces one approach using Fabric notebooks to run Splink, an open-source Python library developed by the Ministry of Justice to implement probabilistic record linkage.
This session will cover the basics of deterministic, fuzzy, and probabilistic matching and then dive in to creating and applying a model to example data and how to analyse the results.
By the time we're done you'll be ready to take what we've learned back home with you and apply it to your own data.

Barney Lawrence

Consultant focusing on data analytics and engineering on the Microsoft platform

Chesterfield, United Kingdom

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top