Deduplicating and analysing time-series data with Apache Beam and QuestDB

Time series data pipelines tend to prioritise speed and freshness over completeness and integrity. In such scenarios, it is very common to ingest duplicate data, which may be fine for many analytical use cases, but is very inconvenient for others.

There are many open source databases built specifically for the speed and query semantics of time series, and most of them lack automatic deduplication of events in near real-time. One such database is QuestDB, which requires a manual batch process to deduplicate ingested data.

In this talk, we will see how we can successfully use Apache Beam to deduplicate streaming time series, which can then be analysed by a time series database.

Javier Ramirez

Developer —and Agent— Advocate at QuestDB. Fan of open source, developer communities, and data/ML. All around happy person. He/him

Madrid, Spain

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Deduplicating and analysing time-series data with Apache Beam and QuestDB

Javier Ramirez

Links

Actions