Workshop: Testing Apache Beam Pipelines

Jun-15 10:45-12:15 UTC
Room: Upper Bay

Everyone understands the importance of testing. The correctness of data pipelines is critical for the downstream applications (e.g., AI/ML workloads, Reporting and demand forecasting). In this workshop, through a series of examples/use cases, we describe different approaches of testing (unit test/integration testing). We also explore testing from the perspective of code plane and data plane. Testing is data plane are mostly data quality and data validation checks. We go through some data pipeline patterns used in real world to handle the data plane related issues with dead letter queue or dead letter table. Testing in code plane validates the correctness of the code. For code plane, we explore software quality metrics that are applicable. We provide some examples where integration test can be replicated across multiple data pipelines. In this workshop we encourage to have a balance between the unit testing and integration tests and provide recommendations based on our experience with multiple customers.