Building Banking Synthetic Data for a Lakehouse with Gemma

Jul-9 10:30-10:55 in Horizon Hall
Add to Calendar 07/09/2025 10:30 AM 07/09/2025 10:55 AM BS25: Building Banking Synthetic Data for a Lakehouse with Gemma

Building a Beam pipeline to preprocess, generate and validate Synthetic data for a Datawarehouse migration into GCP.

Why It’s New: Synthetic data generation is a hot topic, and using GenAI with Beam is a novel approach, we were inspired by https://developers.googleblog.com/en/gemma-for-streaming-ml-with-dataflow/ and decided to use Beam for preprocessing, generation and validation to scale synthetic data generation from 1 up to 3000 tables and leverage the model forkeeping primary keys referencial integrity among them.

Tech Stack: Apache Beam, TensorFlow, Google Cloud.

Horizon Hall

Building a Beam pipeline to preprocess, generate and validate Synthetic data for a Datawarehouse migration into GCP.

Why It’s New: Synthetic data generation is a hot topic, and using GenAI with Beam is a novel approach, we were inspired by https://developers.googleblog.com/en/gemma-for-streaming-ml-with-dataflow/ and decided to use Beam for preprocessing, generation and validation to scale synthetic data generation from 1 up to 3000 tables and leverage the model forkeeping primary keys referencial integrity among them.

Tech Stack: Apache Beam, TensorFlow, Google Cloud.