Speaker(s):

Mapping Data to FHIR with Apache Beam

Jul-18 15:30-15:55 in A
Add to Calendar 07/18/2023 3:30 PM 07/18/2023 3:55 PM America/Los_Angeles AS24: Mapping Data to FHIR with Apache Beam

A common use case across various teams at League is to to map regular data into its FHIR format (Fast Healthcare Interoperability Resources). This is not straightforward as it requires an understanding of the FHIR format, the original data source, as well as the mapping pipeline in Dataflow itself, owned by the data platform team. As a result, this causes a bottleneck on a single team, leading to inefficient and repeated work.

While the data comes from many different sources, the use case of mapping a certain data format into FHIR is actually quite a repeated pattern regardless of whether it is required in real-time or batch, or where the data the source of the data is. The solution:

We developed a reusable self-serve system based on Apache Beam to make it easier for other teams develop and deploy their own mappers easily using Python UDFs. Now with minimal guidance from the data platform team, any team can write their own mapper and deploy a batch OR real-time dataflow job, without needing the dataflow/infra knowledge. The job template handles everything from the reading of the data from the source (PubSub, BigQuery, GCS) and the writing to the destination (Cloud Healthcare API), including error handling and alerting.

A

A common use case across various teams at League is to to map regular data into its FHIR format (Fast Healthcare Interoperability Resources). This is not straightforward as it requires an understanding of the FHIR format, the original data source, as well as the mapping pipeline in Dataflow itself, owned by the data platform team. As a result, this causes a bottleneck on a single team, leading to inefficient and repeated work.

While the data comes from many different sources, the use case of mapping a certain data format into FHIR is actually quite a repeated pattern regardless of whether it is required in real-time or batch, or where the data the source of the data is. The solution:

We developed a reusable self-serve system based on Apache Beam to make it easier for other teams develop and deploy their own mappers easily using Python UDFs. Now with minimal guidance from the data platform team, any team can write their own mapper and deploy a batch OR real-time dataflow job, without needing the dataflow/infra knowledge. The job template handles everything from the reading of the data from the source (PubSub, BigQuery, GCS) and the writing to the destination (Cloud Healthcare API), including error handling and alerting.