Speaker

Praveen Killamsetti

Praveen Killamsetti

Staff Engineer at Twitter

Praveen Killamsetti is a Staff Engineer at Twitter leading the team that manages data replication and retention across data lakes and data warehouse systems present in Twitter Data Centers and Google Public Cloud. He has 15+ years of experience in distributed storage systems, replication, ingestion technologies, holds 15+ patents. He has a master degree in computer science from IIT Madras. Before joining Twitter, Praveen worked on building distributed storage systems at Nimble Storage, NetApp and built various products including Synchronous Replication across multiple data centers with automatic failover, Write Optimized KV stores, Dedupe and Compression stack, Efficient Cloning features, Archiving Storage Snapshots to S3 efficiently etc.

Log Ingestion and Data Replication at Twitter

Data Analytics at Twitter rely on petabytes of data across data lakes and analytics databases. Data could come from log events generated by twitter micro services based on user action(in the range of trillions of events per day) or data is generated by processing jobs which processes the log events. The Data Lifecycle Team at twitter manages large scale data ingestion and replication of data across twitter data centers and public cloud. Delivering the data either in streaming or batch fashion to data lakes(HDFS, GCS) and data warehouse(Google BigQuery) in a reliable and scalable way at lowest possible latency is a complex problem. In this talk, we will explain our log ingestion architecture and data replication architecture across storage systems and explain how we use beam based ingestion/replication pipelines for both batch and streaming use cases to achieve our goal.

Praveen Killamsetti

Staff Engineer at Twitter

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top