Break the building but keep the tenants happy

We have thousands of frontend servers in seven data centers serving more than 500k HTTP requests per second. Nearly 300k of those requests relate to notifications about the user's actions, clicks, visibility, etc.

In the past, we had a single type of service that handled other types of events, as well as these requests. This design puts us at risk of losing events (money) if there is an issue with any logic occurring on these services.

In this session, I will describe our complicated and lengthy process for separating the event handling flow into a dedicated pipeline. Better SLA, ability to recover lost data when production issues arise, fully isolated event handling pipeline, and better developer experience.

The most important part is we managed to do all this while handling events at all times, without any breaking changes for our developers, and without a single downtime.

Gal Shelach

Team leader of a production team in the Infrastructure group - Taboola

Tel Aviv, Israel

View Speaker Profile