Speaker(s):

Data Quality in ML Pipelines

Jul-9 11:30-11:55 in Palisades
Add to Calendar 07/09/2025 11:30 AM 07/09/2025 11:55 AM BS25: Data Quality in ML Pipelines

Demonstrate two approaches for integrating data quality into ML pipelines: Schema based approach and UDF based approach, where Apache Beam does the data quality based filtering. If there is time, demonstrate how to integrate data quality related features into the dataset using a PreTransform component that takes in a UDF.

Palisades

Demonstrate two approaches for integrating data quality into ML pipelines: Schema based approach and UDF based approach, where Apache Beam does the data quality based filtering. If there is time, demonstrate how to integrate data quality related features into the dataset using a PreTransform component that takes in a UDF.