Speaker(s):

Integration of Batch and Streaming data processing with Apache Beam

Jul-8 11:15-11:40 in Palisades
Add to Calendar 07/08/2025 11:15 AM 07/08/2025 11:40 AM BS25: Integration of Batch and Streaming data processing with Apache Beam

Mercari utilizes Apache Beam for batch and streaming processing for various purposes, such as transferring data to CRM and providing incentives to users. To avoid having to develop similar data pipelines in different departments within the company, Mercari Pipeline has been developed and released as OSS as a tool that allows users to build pipelines by simply defining the processing via JSON or YAML. (https://github.com/mercari/pipeline)

In this session, we will introduce an example of utilizing the features of Apache Beam, which allows Batch and Streaming processes to share the same code, to divert time-series aggregate values generated and verified by Batch to Streaming processes.

  • Integration of multiple data sources
  • Aggregate computation with window function using State API
  • Aggregate computation with window function utilizing external data store (Bigtable) and CDC
  • Inference with ONNX model
  • Configuration management by Batch and Streaming difference
Palisades

Mercari utilizes Apache Beam for batch and streaming processing for various purposes, such as transferring data to CRM and providing incentives to users. To avoid having to develop similar data pipelines in different departments within the company, Mercari Pipeline has been developed and released as OSS as a tool that allows users to build pipelines by simply defining the processing via JSON or YAML. (https://github.com/mercari/pipeline)

In this session, we will introduce an example of utilizing the features of Apache Beam, which allows Batch and Streaming processes to share the same code, to divert time-series aggregate values generated and verified by Batch to Streaming processes.

  • Integration of multiple data sources
  • Aggregate computation with window function using State API
  • Aggregate computation with window function utilizing external data store (Bigtable) and CDC
  • Inference with ONNX model
  • Configuration management by Batch and Streaming difference