Session
Applying GDRP/CCPA to CDC events using a pseudonymic Kafka Connect architecture
Do you face the challenge of abiding to the compliancy rules set by a regulator that you need to apply to your data sets? At Essent (a Dutch energy company), we faced this issue the moment we started ingesting CDC(change data capture) events from SAP into our data lake. Initially, we used the medallion architecture(bronze, silver, gold). We started by processing the raw data into the bronze layer, gave it structure in the silver layer, after which we anonymised the data and placed it in the gold layer, ready for business purposes. Although complying with the GDPR rules, it was cumbersome and hard to maintain. What if we could do this in a generic and standardised manner?
After looking for solutions we choose to implement the highly recommended pseudonymization process as soon as possible. The implementation we created is based on Kafka Connect architecture which gave us a data structure to provide a generic format, allowing us to handle our data in a generic manner.
This talk will provide insight on how we implemented our solution:
Use a Sink as the base, using the standard Kafka Connect structure transformation.
Utilizing a specific configuration object, we dynamically build a “lookup record” and the pseudonized version of the original.
Once done, both of the records are placed on specific target topics. This leaves us with a topic containing pseudonized records for anyone to use.
The second topic, containing the lookup data is shielded from non-authorized users.
                                
                            Pieter van der Meer
Engineer @ Dataworkz
Amsterdam, The Netherlands
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top