Log Ingestion and Data Replication at Twitter

Data Analytics at Twitter rely on petabytes of data across data lakes and analytics databases. Data could come from log events generated by twitter micro services based on user action(in the range of trillions of events per day) or data is generated by processing jobs which processes the log events. The Data Lifecycle Team at twitter manages large scale data ingestion and replication of data across twitter data centers and public cloud. Delivering the data either in streaming or batch fashion to data lakes(HDFS, GCS) and data warehouse(Google BigQuery) in a reliable and scalable way at lowest possible latency is a complex problem. In this talk, we will explain our log ingestion architecture and data replication architecture across storage systems and explain how we use beam based ingestion/replication pipelines for both batch and streaming use cases to achieve our goal.

Praveen Killamsetti

Staff Engineer at Twitter

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Log Ingestion and Data Replication at Twitter

Praveen Killamsetti

Links

Actions