BeamStack: An open source Framework for running Machine Learning Pipelines with Apache Beam

We introduce you to Beamstack, an open-source framework currently under development, aimed at facilitating the deployment of Machine Learning and GenAI workflow pipelines with Apache Beam on Kubernetes, whether on-premises or in the cloud. It encompasses a holistic solution, featuring abstraction layers that optimize the deployment of various components of machine learning pipelines, data processing workflows, and deployment infrastructure.

At the core of Beamstack’s functionalities lie Kubernetes Custom Resource Definitions (CRDs). These CRDs constitute a potent mechanism for extending the Kubernetes API, facilitating the seamless integration of ML-centric resources within the Kubernetes ecosystem. By this approach, Beamstack empowers users to capitalize on the comprehensive capabilities and features offered by Kubernetes while unlocking the boundless potential of Apache Beam for Mache Learning development by various teams in any organization.

In this session, we will discuss BeamStack’s use cases and product roadmap, as well as features that have already been implemented, those currently under implementation, and those planned for the future. We will also address our current challenges and areas where we need support from contributors. Join us in shaping the future of ML development tooling around Apache Beam by becoming a part of the Beamstack community.

BeamStack: An open source Framework for running Machine Learning Pipelines with Apache Beam

Olufunbi Babalola

BeamStack: An open source Framework for running Machine Learning Pipelines with Apache Beam