Speaker

Wenlong Xiong

Wenlong Xiong

Software Engineer - Service Discovery & Traffic Platform @ Robinhood

Actions

Wenlong Xiong is a software engineer in the Service Discovery & Traffic team at Robinhood. He helps develop and operate the company's Envoy load balancing infrastructure, as well their Golang / Python gRPC clients. Previously, he was on Robinhood's Streaming Platform & Batch Compute teams, working on scaling Kafka and Spark infrastructure.

Removing Kafka related SPoFs in Robinhood’s order path

Robinhood uses Kafka in every line of its business, from stock and crypto trading to its self-clearing system and online data analytics. Most critical is the order path, which refers to all microservices and operations required for placing and filling customer orders in equities, options and crypto. Robinhood’s order path relies heavily on kafka for interservice communication and queueing needs, and it also leverages outbox patterns using postgres and kafka for at least once guarantees for order execution. Previously we have been using a single production kafka cluster for all of these purposes, thus making the cluster a single point of failure (SPoF) for the order-path.

This talk discusses how we removed this SPoF through investments in Kafka infrastructure and our client libraries, letting us horizontally shard our order path kafka clusters. We would also discuss about the following:
* Intelligent and reactive client libraries we built to make this sharding easier
* Producer proxy sidecars to help our python processes work with sharded architecture
* How we do safe sharded kafka cluster rollouts
* Sharding vs Redundancy for different use cases

Dead letter queues for Kafka consumers in Robinhood

Robinhood uses Kafka in every line of its business, from stock and crypto trading to its self-clearing system and online data analytics. Robinhood’s fleet of microservices use Apache Kafka for building an event-driven architecture where services communicate with each other asynchronously. Producers and consumers to a kafka topic are almost always completely different teams, thus the schema of events in kafka is the only API for downstreams to rely on. We have seen over time that there can be multiple ways an event fails to be processed successfully by a downstream kafka consumer. The reasons range from being unable to deserialize, upstream code changes resulting in bad data, etc..

This talk discusses how we built libraries, templated micro services and tooling that leverages Postgres and Kafka for safely dealing with dead letters, inspecting and querying them, and republishing them to retry kafka topics for safe reprocessing at a later time. We also dive deeper into how this improved the operability and on-call health of all of our kafka application developers

Current 2022: The Next Generation of Kafka Summit Sessionize Event

October 2022 Austin, Texas, United States

Wenlong Xiong

Software Engineer - Service Discovery & Traffic Platform @ Robinhood

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top