Session
Building Data Lake Platform fully serverless
Data platforms are notorious for needing infrastructure that needs to run 24x7x365. We took on a challenge to implement a self service Data Lake platform using only Serverless cloud services.
This session will present a story of the endeavour from business requirements, design to implementation in an enterprise setup.
I share my experience of building this data lake platform at Polestar cars.
The talk has 4 major parts
1. The business context - I will begin with how we uncovered the business use case and created the first blue prints of the solution.
2. Why we chose a data lake over other data paradigms? - I will explain why we choose a data lake to implement this platform. I will break down the data lake concept into it "logical" components and highlight services from AWS that we considered.
3. Architectural walkthrough of our solution - A technical walkthrough of the solution. I will explain why we choose to make it into a self service platform and the operational model. I will highlight how we made this entire solution self service oriented.
4. Tradeoffs and lessons learnt- I will talk about the friction we faced for adoption, challenges around data security and governance, training management to change their way of thinking to adopt this.
Key questions answered from the session as takeaways for the participants -
1. When to platform data lakes aka centralise vs decentralise?
2. How to ensure the data lakes remains usable?
3. How to enhance developer experience via self service mechanisms?
4. When should you use a data lake over other data products like data warehouse?
Anurag Kale
AWS Data Hero, Cloud and Data Architect at Aurobay Sweden
Göteborg, Sweden
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top