Session
What's What and Who's Who? Cross System Record Linkage with Fabric
Finding records representing the same real-world entity either in multiple sources or duplicated within a single data set is a common pain point when seeking to build a single definitive view of the truth in a data warehouse.
Variances in format, changes over time, and data entry errors mean connecting our Tom, Dick, and Harries to our Thom, Dirk, and Harriets can a big challenge.
This session introduces one approach to solving this problem using Fabric notebooks to run Splink, an open-source Python library developed by the Ministry of Justice in the UK to implement probabilistic record linkage.
You will be will introduced to the basics of deterministic, fuzzy, and probabilistic methods of matching records, and then we'll dive in with an end to end example using notebooks to create, apply and, analyse the results of a matching model.
By the end of the session you'll be ready to take what we've learned back home with you and start applying it to your own data.
Barney Lawrence
Consultant focusing on data analytics and engineering on the Microsoft platform
Chesterfield, United Kingdom
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top