Speaker(s):

Data Quality in ML Pipelines

Demonstrate two approaches for integrating data quality into ML pipelines: Schema based approach and UDF based approach, where Apache Beam does the data quality based filtering. If there is time, demonstrate how to integrate data quality related features into the dataset using a PreTransform component that takes in a UDF.