Speaker(s):

How to handle duplicate data in streaming pipelines using Dataflow and Pub/Sub

By Zeeshan
Aug-5 17:00-17:50
Add to Calendar 08/05/2021 5:00 PM 08/05/2021 5:50 PM America/Los_Angeles AS24: How to handle duplicate data in streaming pipelines using Dataflow and Pub/Sub

This session will provide a detailed overview of the origin of duplicates in your streaming data pipelines built using Pub/Sub and Dataflow. We’ll then go over some techniques that Apache Beam SDK provides to handle such duplicate data along with technical trade-offs of each option. There would also be some Q/A and discussion on some common mistakes developers may make.


This session will provide a detailed overview of the origin of duplicates in your streaming data pipelines built using Pub/Sub and Dataflow. We’ll then go over some techniques that Apache Beam SDK provides to handle such duplicate data along with technical trade-offs of each option. There would also be some Q/A and discussion on some common mistakes developers may make.