Deduplicating and analysing time-series data with Apache Beam and QuestDB

Speaker(s):

Deduplicating and analysing time-series data with Apache Beam and QuestDB

Jun-14 12:00-12:25 in Palisades

Add to Calendar 06/14/2023 12:00 PM 06/14/2023 12:25 PM America/New_York BS25: Deduplicating and analysing time-series data with Apache Beam and QuestDB

Time series data pipelines tend to prioritise speed and freshness over completeness and integrity. In such scenarios, it is very common to ingest duplicate data, which may be fine for many analytical use cases, but is very inconvenient for others.

There are many open source databases built specifically for the speed and query semantics of time series, and most of them lack automatic deduplication of events in near real-time. One such database is QuestDB, which requires a manual batch process to deduplicate ingested data.

In this talk, we will see how we can successfully use Apache Beam to deduplicate streaming time series, which can then be analysed by a time series database.

Palisades

Download slides

In this talk, we will see how we can successfully use Apache Beam to deduplicate streaming time series, which can then be analysed by a time series database.

Deduplicating and analysing time-series data with Apache Beam and QuestDB

Javier Ramirez

Deduplicating and analysing time-series data with Apache Beam and QuestDB