Speaker(s):

Reuniting the Two Distant Cousins: Calling a Beam Pipeline from an Airflow Job

Sep-5 11:30-11:55 in Walker Canyon
Add to Calendar 09/05/2024 11:30 AM 09/05/2024 11:55 AM America/Los_Angeles AS24: Reuniting the Two Distant Cousins: Calling a Beam Pipeline from an Airflow Job

Apache Beam and Apache Airflow are powerful tools in the data engineering ecosystem, often used separately but rarely in tandem. This talk explores the synergy between these “distant cousins” by demonstrating how to seamlessly integrate Beam pipelines within Airflow workflows.

We’ll dive into the challenges of orchestrating complex data processing tasks and show how combining Airflow’s scheduling capabilities with Beam’s robust data processing framework can create a more efficient and manageable data pipeline architecture.

Attendees will learn how to leverage Airflow’s DAG (Directed Acyclic Graph) to trigger Beam jobs seamlessly, enabling them to orchestrate sophisticated, distributed data processing tasks across data platforms, such as Google Cloud Dataflow. By the end of this session, participants will gain practical insights into integrating these technologies, enhancing their ability to build and maintain resilient, efficient data pipelines that meet the demands of modern data-driven applications.

Walker Canyon

Apache Beam and Apache Airflow are powerful tools in the data engineering ecosystem, often used separately but rarely in tandem. This talk explores the synergy between these “distant cousins” by demonstrating how to seamlessly integrate Beam pipelines within Airflow workflows.

We’ll dive into the challenges of orchestrating complex data processing tasks and show how combining Airflow’s scheduling capabilities with Beam’s robust data processing framework can create a more efficient and manageable data pipeline architecture.

Attendees will learn how to leverage Airflow’s DAG (Directed Acyclic Graph) to trigger Beam jobs seamlessly, enabling them to orchestrate sophisticated, distributed data processing tasks across data platforms, such as Google Cloud Dataflow. By the end of this session, participants will gain practical insights into integrating these technologies, enhancing their ability to build and maintain resilient, efficient data pipelines that meet the demands of modern data-driven applications.