Session

Data Minimisation strategy for Data Governance in the post GDPR era

Data Minimisation strategy for Data Governance in the post GDPR era

Data Minimisation strategy for Data Governance in the post GDPR era
Background
Most of us might be familiar with the adage - “There is no such thing as a free lunch”, which suggests that it is impossible to get something for nothing in return. However, looking at the business model that a lot of businesses are increasingly starting to adopt, wherein the consumer’s data is monetised resulting in free-to-use products, it would certainly seem that these lunches are indeed free.

However, as expected, that is not the case: behind the scenes, consumer data is collected for a myriad of reasons, digital advertising being one of the major ones.
MiQ, one of the leading programmatic advertising companies, uses terabytes of the said data on a daily basis to generate insights and thereby helps companies drive a better ROI on their ad spending. Hence, being at the forefront of digital advertisement, it truly is our responsibility to make sure we make User Privacy our utmost priority, and that’s what we have done as part of our Data Minimisation initiative - more on that in a bit.
What is GDPR, and how does it affect MiQ?
The GDPR or the EU General Data Protection Regulation, as its name suggests, regulates Internet rights and legislation in Europe. It was enforced in 2018 and is designed to:
Harmonise data privacy laws across Europe
Protect and empower all EU citizens data privacy
Reshape the way organisations across the region approach data privacy
Thanks to that, the EU can manage and regulate essential aspects such as subject data rights, condition of consent, the right to Internet access, data portability, privacy by design, and others. These establish fines and penalties of up to €20 Million, for those who do not comply with their regulations.
The Data Minimisation strategy
We at MiQ appreciate the gravity of these privacy laws and the intent of protecting the user’s privacy, and hence as part of our Data Governance initiatives, we have introduced the Data Minimisation strategy. Data Governance includes other aspects such as Data Security, Data Cataloging, etc - a couple of which MiQ’s data ingestion team has already handled - however, for the purposes of this document, we will just focus on the strategy for data privacy.

As part of the Data Minimisation strategy, we actively make sure that the data containing PII (personally identifiable information) is scrubbed off / masked before it is used for generating insights. This involves hashing (i.e non-reversible masking) cookie IDs, IP addresses and Device IDs so that they can still be joined across other datasets to generate insights, but not trace back to the original user. Additionally, PII that need not be used to generate insights, such as user email IDs as part of referrer URLs, are masked.

This ties to the simple overarching theme - avoid storing data that you don’t need lest there be any malicious attempts to procure this data. On the other hand, we do have use cases wherein the unmasked / raw data is needed for ad targeting and triggering custom strategies. In these cases, we have placed access restrictions on the raw data stores and have made sure that teams across MiQ can only read the data for activation use cases with the help of our internal-facing products, thereby limiting the risk of unwanted PII-containing data usage

Here is how the process looks like:

Business Impact
These aforementioned procedures, powered by a complex technological system built upon an AWS foundation, allow us to make sure that hundreds of data feed onboarded into the MiQ system go through an automated “privacy scrutiny” before landing in the hands of our Data Analysts and Scientists for generating insights. Going a step further, we have now introduced these steps as part of our internal user-facing data ingestion product, allowing users to ingest data whilst adhering to these rules by default, leaving nothing to chance.

From early conversations around data minimisation in 2018 to releasing this capability as part of our product offering, this process has certainly taken its time in its integration with our data ingestion systems, but thanks to MiQ’s proactive response to GDPR laws, we are now a bit closer to our North Star of ensuring that we do our part in keeping the identities of internet users safe. And while doing so, we have taken a platform mindset to Data Governance that we hope will refine our product strategy for the better, going forward.
Technological Impact
With privacy regulations getting stringent every day, the technical challenges and accountability have proportionally surged. With a vision of establishing a platform that can sustain any entering changes, we have chosen the right set of tools to be agnostic to any such change. Our journey of applying Data Governance at MiQ via a platform involves an entire ecosystem that can ingest data via an Event-Driven Microservice architecture and effectively apply Data Minimisation using Spark-based complex algorithms. With close to ~10TB of compressed data ingested every day, we have ensured to make this a future proof system that would eventually help us abide by the privacy requirements. The entire ecosystem has been a stepping stone in evangelising the Data Lake Platform for the post GDPR era.
Technological advancements
The platform takes into account the best practices like keeping the microservices decoupled, that run on the AWS EKS. This gives us the ability to scale on-demand and integrate with tools like Apache NiFi, Streamsets Data Collector, AWS Kinesis and Big Data platforms like AWS EMR, Qubole, Databricks and so on. At MiQ, we have streamlined the batch and real-time ingestion to efficiently integrate with Data Governance while in transit for analytics. For instance, we have ensured to provide Observability metrics on our pipelines and infrastructure to know if we can reduce the cost of onboarding or providing these datasets on time, into our S3 data lake and Redshift. This capability allows us to ingest data from many different kinds of sources and deliver an uninterrupted flow of data for our Analytics team. The platform now empowers every solution that MiQ provides and helps us abide by the privacy policies for all the datasets that we use.

Rohit Srivastava

Engineering Lead with an experience in evolving People, Product & Technology to deliver Scalable and Innovative solutions.

Bengaluru, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top