Parallelizing Skewed Hbase Regions using Splittable Dofn

Jun-14 14:00-14:25 UTC
Room: Palisades

During HBase to Cloud BigTable Migrations, HBase snapshots will be imported to Cloud Bigtable. Each Snapshot contains several HBase regions and certain HBase regions can be quite large due to skewed data.

In this presentation along with code snippets and benchmark test results, we showcase how to parallelize a skewed HBase Regions using Splittable DoFn and reduce pipeline runtime.