Session
What's What and Who's Who? Cross System Record Linkage with Fabric
Finding records representing the same real-world entity either in multiple sources or duplicated within a single system is a common pain point when seeking to build a single definitive view of the truth of your data across. Variances in format, changes over time, and data entry errors mean connecting our Tom, Dick, and Harries in one system to our Thom, Dirk, and Harriets in another can a big challenge.
This session introduces one approach to solving this problem using Fabric notebooks to run Splink, an open-source Python library developed by the Ministry of Justice in the UK to implement probabilistic record linkage.
This session will introduce you to the basics of deterministic, fuzzy, and probabilistic matching of records and then dive in with an example notebook to create and apply a match an example data set and how to analyse the results.
By the time we're done you'll be ready to take what we've learned back home with you and start applying it to your own data.
Barney Lawrence
Consultant focusing on data analytics and engineering on the Microsoft platform
Chesterfield, United Kingdom
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top