Zero-loss Spanner to BigQuery Redeployment

Introducing a YAML-driven Dataflow Flex Template design enabling product teams to self-serve Spanner to BigQuery replication, supporting both current-state and append-only modes for analytics, downstream applications, and audit trails. This session focuses on streaming use cases.

In a federated deployment model where each product team owns and operates its own pipeline, redeployments emerged as a recurring risk event. None of Dataflow’s existing restart mechanisms work perfectly for our use cases, leaving every redeployment risks data loss or duplicate data.

A proper fix belongs in Dataflow and SpannerIO. In the meantime, an interim solution has kept redeployments routine in production, with no data loss or duplication.

This session covers the problem, the approach, and the tradeoffs that remain.

Zero-loss Spanner to BigQuery Redeployment

Jiufeng Liu

Zero-loss Spanner to BigQuery Redeployment