Speaker(s):

Count-distinct using HLL++ algorithm

This talk goes through how to write a Beam pipeline to efficiently count the number of distinct elements in a massive data set using the HyperLogLog++ algorithm.