Speaker(s):

Data Lineage in Beam

Sep-4 11:00-11:25 in Hamina (MP4)
Add to Calendar 09/04/2024 11:00 AM 09/04/2024 11:25 AM America/Los_Angeles AS24: Data Lineage in Beam

In this presentation, we delve into the critical world of data lineage within Apache Beam, exploring its significance and demonstrating its practical implementation. We begin by establishing the motivation behind data lineage, highlighting its role in enhancing data governance, debugging, and impact analysis. Next, we introduce Google Cloud Dataplex, a unified data management platform, and its integration with Beam’s lineage capabilities.

We’ll then embark on a technical journey, showcasing how lineage support is built into Apache Beam’s core. Following this, we will dissect the process of constructing a lineage graph for an Apache Beam job and seamlessly reporting it to Dataplex for insightful visualization.

The presentation will empower the audience with actionable knowledge on how to integrate lineage tracking into their own I/O operations, ensuring greater transparency and control over their data pipelines. Finally, a live demonstration will bring these concepts to life, showcasing data lineage in action for an Apache Beam job executing on Dataflow, and visually exploring its lineage within Dataplex.

By the end of this talk, attendees will possess the knowledge and tools to effectively leverage Apache Beam’s lineage support, fostering transparency and trust within their data pipelines.

Hamina (MP4)

In this presentation, we delve into the critical world of data lineage within Apache Beam, exploring its significance and demonstrating its practical implementation. We begin by establishing the motivation behind data lineage, highlighting its role in enhancing data governance, debugging, and impact analysis. Next, we introduce Google Cloud Dataplex, a unified data management platform, and its integration with Beam’s lineage capabilities.

We’ll then embark on a technical journey, showcasing how lineage support is built into Apache Beam’s core. Following this, we will dissect the process of constructing a lineage graph for an Apache Beam job and seamlessly reporting it to Dataplex for insightful visualization.

The presentation will empower the audience with actionable knowledge on how to integrate lineage tracking into their own I/O operations, ensuring greater transparency and control over their data pipelines. Finally, a live demonstration will bring these concepts to life, showcasing data lineage in action for an Apache Beam job executing on Dataflow, and visually exploring its lineage within Dataplex.

By the end of this talk, attendees will possess the knowledge and tools to effectively leverage Apache Beam’s lineage support, fostering transparency and trust within their data pipelines.