All times are in Coordinated Universal Time (UTC) timezone.
Mercari utilizes Apache Beam for batch and streaming processing for various purposes, such as transferring data to CRM and providing incentives to users. To avoid having to develop similar data pipelines in different departments within the company, Mercari Pipeline has been developed and released as OSS as a tool that allows users to build pipelines by simply defining the processing via JSON or YAML. (https://github.com/mercari/pipeline)
In this session, we will introduce an example of utilizing the features of Apache Beam, which allows Batch and Streaming processes to share the same code, to divert time-series aggregate values generated and verified by Batch to Streaming processes.
Mercari utilizes Apache Beam for batch and streaming processing for various purposes, such as transferring data to CRM and providing incentives to users. To avoid having to develop similar data pipelines in different departments within the company, Mercari Pipeline has been developed and released as OSS as a tool that allows users to build pipelines by simply defining the processing via JSON or YAML. (https://github.com/mercari/pipeline)
In this session, we will introduce an example of utilizing the features of Apache Beam, which allows Batch and Streaming processes to share the same code, to divert time-series aggregate values generated and verified by Batch to Streaming processes.
Large language models (LLMs) have transformed how we process and generate text. In this session, I’ll talk about Langchain-Beam, an open-source library that integrates LLMs and embedding models into Apache Beam pipelines as transform using LangChain.
We will explore how Langchain-Beam transform performs remote LLM inference with OpenAi and Anthropic models. Provide data processing logic as prompt and use the models to transform the data based on the prompt. Use embedding models to generate vector embeddings for text in pipeline and Learn about real-world use cases Like,
Repository : https://github.com/Ganeshsivakumar/langchain-beam
OnlineLarge language models (LLMs) have transformed how we process and generate text. In this session, I’ll talk about Langchain-Beam, an open-source library that integrates LLMs and embedding models into Apache Beam pipelines as transform using LangChain.
We will explore how Langchain-Beam transform performs remote LLM inference with OpenAi and Anthropic models. Provide data processing logic as prompt and use the models to transform the data based on the prompt. Use embedding models to generate vector embeddings for text in pipeline and Learn about real-world use cases Like,
Beam has become a core part of the data processing ecosystem through a combination of innovation and hard work from the Beam community. As the data landscape continues to evolve, however, so too must Beam. During this talk, Kenn (Beam PMC chair) and Danny (Beam PMC) will explore some of the opportunities and challenges in front of Beam, culminating in a vision for the future of Beam. Attendees will gain a clear idea of where Beam is headed, how they can leverage Beam even more effectively moving forward, and how they can contribute to helping Beam become the best that it can be.
OnlineBeam has become a core part of the data processing ecosystem through a combination of innovation and hard work from the Beam community. As the data landscape continues to evolve, however, so too must Beam. During this talk, Kenn (Beam PMC chair) and Danny (Beam PMC) will explore some of the opportunities and challenges in front of Beam, culminating in a vision for the future of Beam. Attendees will gain a clear idea of where Beam is headed, how they can leverage Beam even more effectively moving forward, and how they can contribute to helping Beam become the best that it can be.
Modern data architectures are no longer built around a single tool — they thrive on interoperability and community-driven integration. This session explores how Apache Beam serves as the flexible processing engine that connects streaming platforms like Kafka with modern, ACID-compliant data lakehouse solutions like Apache Iceberg.
Through real-world architecture patterns and practical examples, we’ll dive into how organizations are using Beam to unify disparate data sources, enable real-time and batch analytics, and future-proof their data platforms. You’ll also gain insights into how the open-source community continues to drive innovation across this ecosystem — from new connectors to performance optimizations and beyond.
Whether you’re designing pipelines, modernizing ETL, or exploring community-powered tooling, this session gives you the blueprint to build scalable, production-ready data ecosystems with confidence.
OnlineModern data architectures are no longer built around a single tool — they thrive on interoperability and community-driven integration. This session explores how Apache Beam serves as the flexible processing engine that connects streaming platforms like Kafka with modern, ACID-compliant data lakehouse solutions like Apache Iceberg.
Through real-world architecture patterns and practical examples, we’ll dive into how organizations are using Beam to unify disparate data sources, enable real-time and batch analytics, and future-proof their data platforms. You’ll also gain insights into how the open-source community continues to drive innovation across this ecosystem — from new connectors to performance optimizations and beyond.
Mercari utilizes Apache Beam for batch and streaming processing for various purposes, such as transferring data to CRM and providing incentives to users. To avoid having to develop similar data pipelines in different departments within the company, Mercari Pipeline has been developed and released as OSS as a tool that allows users to build pipelines by simply defining the processing via JSON or YAML. (https://github.com/mercari/pipeline)
In this session, we will introduce an example of utilizing the features of Apache Beam, which allows Batch and Streaming processes to share the same code, to divert time-series aggregate values generated and verified by Batch to Streaming processes.
Large language models (LLMs) have transformed how we process and generate text. In this session, I’ll talk about Langchain-Beam, an open-source library that integrates LLMs and embedding models into Apache Beam pipelines as transform using LangChain.
We will explore how Langchain-Beam transform performs remote LLM inference with OpenAi and Anthropic models. Provide data processing logic as prompt and use the models to transform the data based on the prompt. Use embedding models to generate vector embeddings for text in pipeline and Learn about real-world use cases Like,
Beam has become a core part of the data processing ecosystem through a combination of innovation and hard work from the Beam community. As the data landscape continues to evolve, however, so too must Beam. During this talk, Kenn (Beam PMC chair) and Danny (Beam PMC) will explore some of the opportunities and challenges in front of Beam, culminating in a vision for the future of Beam. Attendees will gain a clear idea of where Beam is headed, how they can leverage Beam even more effectively moving forward, and how they can contribute to helping Beam become the best that it can be.
Modern data architectures are no longer built around a single tool — they thrive on interoperability and community-driven integration. This session explores how Apache Beam serves as the flexible processing engine that connects streaming platforms like Kafka with modern, ACID-compliant data lakehouse solutions like Apache Iceberg.
Through real-world architecture patterns and practical examples, we’ll dive into how organizations are using Beam to unify disparate data sources, enable real-time and batch analytics, and future-proof their data platforms. You’ll also gain insights into how the open-source community continues to drive innovation across this ecosystem — from new connectors to performance optimizations and beyond.