Apache Beam is a powerful tool for building data processing pipelines with a unified programming model for batch and streaming
Beam Summit is coming back in 2023 as a free, in-person event followed by an online-viewing experience.
We’ll host sessions to share new use cases from companies using Apache Beam, as well as community driven talks, technical deep dives and
in-depth workshops.
Unified Streaming and Batch Pipelines at LinkedIn using Beam
Many use cases at LinkedIn require real-time processing and periodic backfilling of data. Running a single codebase for both needs is an emerging requirement. In this talk, we will share how we leverage Apache Beam to unify Samza stream and Spark batch processing. We will present the first unified production use case Standardization. By leveraging Beam on Spark for its backfilling, we reduced the backfilling time by 93% while only using 50% of resources.