Streaming NLP infrastructure on Dataflow

Jul-19 15:00-15:50 in 204
Add to Calendar 07/19/2022 3:00 PM 07/19/2022 3:50 PM America/Los_Angeles AS24: Streaming NLP infrastructure on Dataflow

Trustpilot is an e-commerce reviews platform delivering millions of new reviews to businesses each week. We are using Apache Beam on GCP Dataflow to deliver real-time streaming inferences with the latest NLP transformer models.

Our talk will touch on:

  • Infrastructure setup to enable Python Beam to interface with Kafka for streaming data
  • Taking advantage of Beam’s unified programming model to enable batch jobs for backfilling via BigQuery
  • Working with GPUs on Dataflow to speed up local model inference
  • MLOps: Using Dataflow as part of a continuous evaluation model monitoring setup
204

Trustpilot is an e-commerce reviews platform delivering millions of new reviews to businesses each week. We are using Apache Beam on GCP Dataflow to deliver real-time streaming inferences with the latest NLP transformer models.

Our talk will touch on:

  • Infrastructure setup to enable Python Beam to interface with Kafka for streaming data
  • Taking advantage of Beam’s unified programming model to enable batch jobs for backfilling via BigQuery
  • Working with GPUs on Dataflow to speed up local model inference
  • MLOps: Using Dataflow as part of a continuous evaluation model monitoring setup