
Rohit Srivastava
Engineering Lead with an experience in evolving People, Product & Technology to deliver Scalable and Innovative solutions.
Bengaluru, India
Actions
Engineering Lead with an experience in evolving People, Product & Technology to deliver Scalable and Innovative solutions. Recognised with Great Manager Award 2022 (Top 75 PAN India) conducted by People Consulting & Economic Times group
Company - MiQ Digital India Pvt Ltd
Position - Director, Platform
Topics
Tech powered solutions across fragmented ecosystems in privacy first AdTech world
Tech powered solutions across fragmented ecosystems in privacy first AdTech world
Document Link - https://docs.google.com/document/d/12NLc2WgiEe6OBkchPxsyssXd6_CvV398/edit
Background
With privacy first approach, In the next couple of years, the morsels of data used by ad tech companies to track users around the internet for targeting and measuring ad success will soon be blocked giving rise to a fragmented ecosystem. In this new ecosystem, advertisers are now looking for more transparency and flexibility in understanding how their marketing performs across media channels such as search, display, video and audio.
We need an environment where we are able to derive campaign measurement, audience refinement, supply optimization, and more, enabling advertisers to make more informed decisions about their cross-channel marketing investments.
Fig1. Depiction of fragmented ecosystem in Open & Closed web
At MIQ Digital, for the closed web ecosystem, we are working with multiple clean room technologies. Clean rooms technologies like Google ADH, AMC, Snowflake etc. provides a secure, privacy-safe, and dedicated cloud-based environment in which advertisers can easily perform analytics across multiple, pseudonymized data sets to generate aggregated reports. For an open web ecosystem, we use liveramp, appnexus etc, which enables advertisers to onboard offline datasets to target them online.
Technology Empowerment
We have our data pipelines running at scale to onboard around ~300GB data from diff data sources and later push data into AWS S3 for processing.These pipelines are capable of processing GBs of data using the platforms like Databricks with Apache Spark built on top of high compute AWS Graviton EC2 instances which helps our Data Scientists to apply complex machine learning algorithms to derive feature score based output which is consumed to generate custom bidding script.
Business Impact
Percentage conversion increase on an average
Out of total advertisers who had set up CB in their LIs, we were able to create and deliver structured performance evaluation for around 50% of our advertisers.
37% (3/8) of adv had an KPI perf improvement of 60% on an average, and 76% for the selected confirmation activity ids.
Reaching Incremental Audiences using Reach Frequency Analysis - Closed Web Solution
The aim of this analysis is to calculate the optimal exposure frequency by funnel/tactic. The output data would involve -
Impression Frequency (taking into account 30-day attribution window)
Conversion Frequency (taking into account 30-day attribution window)
Total Impression wastage outside the optimal frequency range identified
Technology Empowerment
These pipelines have an extensive usage of clean room query engines which helps us export GBs of aggregated data into cloud storage like AWS S3, GCS which will then be loaded into Lakehouse solutions using Delta, SQL Analytics with modernisation around analytics on the same.
Business Impact
Following campaign could see major optimization in terms of reach:
Prospecting Campaign:
Optimal Frequency Range - 6 - 10 Impressions
Impression Saving (Wastage Control) - 21%
Potential Incremental Reach - 13%
Potential Incremental Converters - 16%
Data Minimisation strategy for Data Governance in the post GDPR era
Data Minimisation strategy for Data Governance in the post GDPR era
Data Minimisation strategy for Data Governance in the post GDPR era
Background
Most of us might be familiar with the adage - “There is no such thing as a free lunch”, which suggests that it is impossible to get something for nothing in return. However, looking at the business model that a lot of businesses are increasingly starting to adopt, wherein the consumer’s data is monetised resulting in free-to-use products, it would certainly seem that these lunches are indeed free.
However, as expected, that is not the case: behind the scenes, consumer data is collected for a myriad of reasons, digital advertising being one of the major ones.
MiQ, one of the leading programmatic advertising companies, uses terabytes of the said data on a daily basis to generate insights and thereby helps companies drive a better ROI on their ad spending. Hence, being at the forefront of digital advertisement, it truly is our responsibility to make sure we make User Privacy our utmost priority, and that’s what we have done as part of our Data Minimisation initiative - more on that in a bit.
What is GDPR, and how does it affect MiQ?
The GDPR or the EU General Data Protection Regulation, as its name suggests, regulates Internet rights and legislation in Europe. It was enforced in 2018 and is designed to:
Harmonise data privacy laws across Europe
Protect and empower all EU citizens data privacy
Reshape the way organisations across the region approach data privacy
Thanks to that, the EU can manage and regulate essential aspects such as subject data rights, condition of consent, the right to Internet access, data portability, privacy by design, and others. These establish fines and penalties of up to €20 Million, for those who do not comply with their regulations.
The Data Minimisation strategy
We at MiQ appreciate the gravity of these privacy laws and the intent of protecting the user’s privacy, and hence as part of our Data Governance initiatives, we have introduced the Data Minimisation strategy. Data Governance includes other aspects such as Data Security, Data Cataloging, etc - a couple of which MiQ’s data ingestion team has already handled - however, for the purposes of this document, we will just focus on the strategy for data privacy.
As part of the Data Minimisation strategy, we actively make sure that the data containing PII (personally identifiable information) is scrubbed off / masked before it is used for generating insights. This involves hashing (i.e non-reversible masking) cookie IDs, IP addresses and Device IDs so that they can still be joined across other datasets to generate insights, but not trace back to the original user. Additionally, PII that need not be used to generate insights, such as user email IDs as part of referrer URLs, are masked.
This ties to the simple overarching theme - avoid storing data that you don’t need lest there be any malicious attempts to procure this data. On the other hand, we do have use cases wherein the unmasked / raw data is needed for ad targeting and triggering custom strategies. In these cases, we have placed access restrictions on the raw data stores and have made sure that teams across MiQ can only read the data for activation use cases with the help of our internal-facing products, thereby limiting the risk of unwanted PII-containing data usage
Here is how the process looks like:
Business Impact
These aforementioned procedures, powered by a complex technological system built upon an AWS foundation, allow us to make sure that hundreds of data feed onboarded into the MiQ system go through an automated “privacy scrutiny” before landing in the hands of our Data Analysts and Scientists for generating insights. Going a step further, we have now introduced these steps as part of our internal user-facing data ingestion product, allowing users to ingest data whilst adhering to these rules by default, leaving nothing to chance.
From early conversations around data minimisation in 2018 to releasing this capability as part of our product offering, this process has certainly taken its time in its integration with our data ingestion systems, but thanks to MiQ’s proactive response to GDPR laws, we are now a bit closer to our North Star of ensuring that we do our part in keeping the identities of internet users safe. And while doing so, we have taken a platform mindset to Data Governance that we hope will refine our product strategy for the better, going forward.
Technological Impact
With privacy regulations getting stringent every day, the technical challenges and accountability have proportionally surged. With a vision of establishing a platform that can sustain any entering changes, we have chosen the right set of tools to be agnostic to any such change. Our journey of applying Data Governance at MiQ via a platform involves an entire ecosystem that can ingest data via an Event-Driven Microservice architecture and effectively apply Data Minimisation using Spark-based complex algorithms. With close to ~10TB of compressed data ingested every day, we have ensured to make this a future proof system that would eventually help us abide by the privacy requirements. The entire ecosystem has been a stepping stone in evangelising the Data Lake Platform for the post GDPR era.
Technological advancements
The platform takes into account the best practices like keeping the microservices decoupled, that run on the AWS EKS. This gives us the ability to scale on-demand and integrate with tools like Apache NiFi, Streamsets Data Collector, AWS Kinesis and Big Data platforms like AWS EMR, Qubole, Databricks and so on. At MiQ, we have streamlined the batch and real-time ingestion to efficiently integrate with Data Governance while in transit for analytics. For instance, we have ensured to provide Observability metrics on our pipelines and infrastructure to know if we can reduce the cost of onboarding or providing these datasets on time, into our S3 data lake and Redshift. This capability allows us to ingest data from many different kinds of sources and deliver an uninterrupted flow of data for our Analytics team. The platform now empowers every solution that MiQ provides and helps us abide by the privacy policies for all the datasets that we use.
Elevating Standards: A Journey of ~35% CMMI and CIS score improvement in 1 year
How MiQ improved its Security standards in 1 year that resulted in 35% improvement in CIS & CMMI score. Focus will be on Technological & Process changes that were done to get to this esteemed feat.
Cost Reporting Service for Multi Cloud & Multi Systems
MiQ has developed a cost reporting service aggregator for cloud & systems that consolidates & analyses cost-related data from various cloud services and systems to provide a unified view of expenses.
Cost Optimisation journey to reduce organisation Cost/GP by 20%
This session would talk about how MiQ went to reduce organisation Cost/GP by 20%. This will also cover how do we measure and track the Tech & Infra cost to tie it back to company's revenue.
CLEANATHON: Hacking the Cloud, Saving the Spend
CLEANATHON is a hack event focused on optimizing cloud resources & reducing waste. Developers collaborate to identify inefficiencies, automate cleanup & implement saving strategies to drive smarter cloud usage while maximizing performance & savings.
Beyond the Console:Elevating Engineering Excellence with our AI powered IDP, Saving ~15hrs/Dev/Month
Our AI-powered Internal Developer Portal streamlines workflows, automates processes & provides intelligent insights, saving ~15 hrs/dev/moth. By reducing friction & enhancing collaboration, it empowers teams to focus on innovation, not overhead.
Achieving a strong & sustainable financial governance across AWS cloud ecosystem
Achieving a strong & sustainable financial governance across AWS cloud ecosystem.
Document Link - https://docs.google.com/document/d/1zYILy77Ck3PXuEcfeKTUTocNqQ8G96mT/edit#
Background
Being a data-driven business, MiQ maintains an extensive data infrastructure.
At MiQ, on average, we onboard 15+ TB of compressed Adtech data daily into our infrastructure, with real time and batched workloads comprising approximately 300 pipelines. We are running AWS EMR, Apache Spark, Presto, Apache Hive, AWS Redshift, AWS Kinesis & Firehose, AWS Athena, Apache Airflow, Streamsets, Apache Nifi. MIQ, being a medium sized company, it is imperative to achieve sustainable growth at scale so managing cost and using resources efficiently is of great importance to us.
OKR for Financial Governance - “0% deviation from P&T budget for Tech infrastructure“
Annual Cost Forecasts
At the beginning of every year we have our product and tech budgets estimated and precise understanding of cost are key to ensure the estimations are correct and helps to forecast infrastructure cost growths in comparison to topline and bottomline numbers. We have cost related OKRs to make sure that everyone in the Org is aligned and to bring in a culture where teams are aware of the costs associated with infra. The AWS Pricing Calculator helps forecast precise estimates for new services / products being built on AWS as well as and our historical data here is equally helpful in forecasting.
Cost Tracking Tools & Techniques
Once budgets are approved we try our best to track and keep an eye on the cost trends over a weekly and monthly basis. For this purpose we have built an in-house Cost Reporting Service which integrates with AWS Cost Explorer, Qubole Cost Reporting Reports and Databricks log level cost data and many such third party providers that we work with. This sends us weekly and monthly automated reports and deviation alerts if any. Based on the alerts, if someone wants to debug anything in detail, we leverage a combination of AWS CloudTrail Insights, CloudTrail, S3, Athena. We also use AWS Budgets & CUR reports extensively for alerting and reporting aspects.
As a result of all the tools and techniques discussed above, teams have visibility and awareness of cost implications of any new service / product / pipeline development and productionizing. Leadership team also has visibility to high level dashboards and reports at all levels of granularity as they like to view.
Cost Dashboards (1,2,3) & Weekly Team Wise Cost Report in Mail (4)
Talking of some of the cost optimizations practices we have followed in the last few years. We generate periodic reports giving us average utilization of all the different Machines, Kube Pods, Databases, Caches, etc and do time to time calibration in terms of right sizing of resources, using the right instance types. We have utilized AWS Cost Savings Plan and Reserved Instance Types to have more stability and efficiency in terms of cost as well as performance. As indicated earlier, MIQ being a data-driven company we use different SPOT machine offerings from AWS & AWS Glacier very heavily because of the reporting nature of our workloads.
Impact
After following the above practices and discipline in the last 2-3 years we have been able to grow as a business in revenue but still have minimal to negative deviation in Product and Tech approved budgets in year start. Few key results we have been able to achieve are :
0 or -ve deviation from our estimated costs
>25% reduction in AWS cost in the first year of tracking & right sizing implementation
100% awareness / visibility among different teams regarding cost applicability of their products and service offerings.
Many of the above mentioned techniques can be followed by any company with a small amount of effort and the cost savings can be big and might surprise you. “Because at the end of the day, the dollars saved are dollars earned”.
Scaling Data Cleanroom solutions for privacy first data analytics & modelling platform
With privacy first approach, In the next couple of years, the morsels of data used by ad tech companies to track users around the internet for targeting and measuring ad success will soon be blocked giving rise to a fragmented ecosystem. In this new ecosystem, advertisers are now looking for more transparency and flexibility in understanding how their marketing performs across media channels such as search, display, video and audio.
We need an environment where we are able to derive campaign measurement, audience refinement, supply optimization, and more, enabling advertisers to make more informed decisions about their cross-channel marketing investments.
At MIQ Digital India Pvt. Ltd, for the closed web ecosystem, we are working with multiple clean room technologies. Clean rooms technologies like Google ADH, AMC etc. provides a secure, privacy-safe, and dedicated cloud-based environment in which advertisers can easily perform analytics across multiple, pseudonymized data sets to generate aggregated reports. For an open web ecosystem, we use liveramp, appnexus etc, which enables advertisers to onboard offline datasets to target them online.
Business Impact
Percentage conversion increase on an average
Out of total advertisers who had set up CB in their LIs, we were able to create and deliver structured performance evaluation for around 50% of our advertisers.
37% (3/8) of adv had an KPI perf improvement of 60% on an average, and 76% for the selected confirmation activity ids.
Technology Empowerment
These pipelines have an extensive usage of clean room query engines which helps us export GBs of aggregated data into cloud storage like AWS S3, GCS which will then be loaded into cloud data warehouses like redshift, BigQuery built on top of highly scalable data warehouse clusters to run data analysis at scale.
Predicting tune-in for a TV content via programmatic ACR data using PySpark, MLlib & Delta Lakehouse
TV advertisement has always been the most preferable medium for marketers to reach a mass audience. The networks/channels get the money from selling ad slots for their content and It's very important for networks to predict tune-in for their content to get most out of ad slots. Traditionally networks have to rely on TV audience measurement but It's often not helping networks to estimate the tune-in because of variety in contents, different airing time, changing behaviour of TV viewers, etc. Since the inception of smart/connected TVs, now we have access to second by second viewing data of households about their TV watching behaviour with the consent.
At MIQ Digital India Pvt. Ltd. we collect and process this high volume data and apply machine learning models to predict tune-in with new viewers and repetitive viewers category to help TV networks get maximum out of their ad slot selling.
We use Apache spark MLlib to model and PySpark for data wrangling and feature engineering with Kafka based event driven microservices architecture. It uses a well defined Data Engineering ecosystem of Lakehouse architecture built on top of Delta Engine.
This talk would cover details around scaling MiQ's TV product to market across >50 advertisers ultimately generating media delivery of ~40 million dollars. Details of pipeline optimisation for data at TB scale along with cost optimisations for model generations and prediction are key aspects that would be highlighted in the talk.
Building Identity Graph at Scale for Programmatic Media Buying Using Apache Spark and Delta Lake
The proliferation of digital channels has made it mandatory for marketers to understand an individual across multiple touchpoints. In order to develop market effectiveness, marketers need have a pretty good sense of its consumer’s identity so that it can reach him via mobile device, desktop or a big TV screen on living room. Examples of such identity tokens include cookies, app IDs etc.A consumer can use multiple devices at the same time and so the same consumer should not be treated as different people in the advertising space. The idea of identity resolution comes with this mission and goal to have an omnichannel view of a consumer.
Identity Spine is MIQ’s proprietary identity graph, using identity signals across our ecosystem to create a unified source of reference to be consumed by product, business analysis and solutions teams for insights and activation. We have been able to build a strong data pipeline using Spark and Delta Lake, thereby strengthening our connected media products offerings for cross channel insights and activation.
This talk mostly highlights :
* The journey of building a scalable data pipeline that handles 10TB+ of data daily.
*How we were able to save our processing cost by 50%
* Optimization strategies implemented to onboard new dataset to enrich the graph
Scaling Proximity Targeting via Delta Lakehouse based Data Platforms Ecosystem
Proximity Targeting is a marketing technique that uses mobile location services to reach consumers in real-time when they are around a store location or point of interest. This is done by defining a radius around a specific location. If a consumer has opted into location services on their mobile phone and enters within this radius, proximity targeting helps in triggering an advertisement or message to consumers in an effort to influence their behaviour. This can be combined with the ability to purchase impressions through programmatic ad platforms that are powered by real-time bidding which can help businesses formulate the right strategy of influencing their users on a particular geographical area. They can build user groups based on certain characteristics (such as neighbourhoods, demographics, interests, and other data), and subsequently launch another campaign that targets anyone with those characteristics.
The growth of mobile devices has led to enormous data generation which offers tremendous potential when used effectively for business. Thus we need an efficient platform where we can process such huge data efficiently and with minimum latency and cost. This talk describes MIQ's journey into building a fast, scalable & cost effective processing platform using Spar, MLLib, Kafka's Event Driven microservices, Delta Lakehouse architecture, delivering faster and actionable insights for Proximity targeting which has empowered the creation of a product generating ~30 million dollar revenue on a year to year basis.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top