Customize Flink for Pinterest user cases at scale
At Pinterest, we streaming data processing team built flink as a service to onboard and operate mission critical use cases, directly affect Pinterest revenue and user engagement. In this talk, we will share the customization we made on top of open source flink to enhance reliability, development speed, and cost-effectiveness. Our discussion will feature:
Common Libraries: We will talk about our unified source offering unlimited log, essential operators like deduplication, and how we scale the service connectors.
Flink Internals: Enhancements in serialization, scheduler improvements, and mini batch optimizations will be covered to demonstrate how we've boosted our processing efficiency.
Ecosystem: Delve into our monitoring systems, job debugging, job tuning, and the streamlined procedures we use to move job from development to production seamlessly.
Join us to uncover the innovative techniques we use to advance the capabilities of our streaming processing at Pinterest.
Growing a managed Flink streaming platform from scratch
Facing quick adoption of Flink, onboarding and proactively managing many jobs has become one of the biggest challenges to the Flink platform team at Pinterest. In the past several months, we’ve gone from having no users to having more than 25 projects building out Flink jobs on our platform in H1 and we have productionized more than 27 of their jobs by the end of H1 2020. Needless to say, we are working hard to keep up with the demand for better job tracking, validation automation, and resource management.
In this talk, we will discuss some of the processes and tools we have been building to ensure the scalability and efficiency of our Flink platform. First, we will talk about JobService, our Flink job lifecycle manager. Then, will discuss our configuration driven unification of FileSource(sink) and KafkaSource(sink) along with CI/CD pipeline. and how we use this unified source(sink) to backfill and validate a job’s ability to run at higher QPS with expected outcome as well as ensure it not wasting resources.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top