FHIR is not a database and FHIR queries are not SQL. What’s the solution if you have large analytical requirements for your FHIR data?
The big cloud providers have all arrived at the same conclusion.
You copy your FHIR data into a data warehouse and run your heavy duty queries there. It may not be ideal, but it’s the direction everyone is moving in.
Azure Data Lake
Microsoft are true to form in providing a fully fledged open source project that you can tailor to your needs.
Their “FHIR to Synapse Sync Agent” lets you populate Azure Data Lake from your FHIR server and run queries against it using Synapse Analytics.
The OS project: https://github.com/microsoft/FHIR-Analytics-Pipelines
GCP BigQuery
Google recommends exporting your FHIR data to their Big Query data warehouse and performing your analytics there. They provide three steps:
– Perform a single export: https://cloud.google.com/healthcare-api/docs/how-tos/fhir-export-bigquery
– Stream FHIR resource changes: https://cloud.google.com/healthcare-api/docs/how-tos/fhir-bigquery-streaming
– Analyze your data: https://cloud.google.com/architecture/analyzing-fhir-data-in-bigquery
AWS Redshift
Amazon recommend using their Redshift data warehouse by way of S3 buckets. They provide a detailed blog post documenting each step of the process.
Not light reading: https://aws.amazon.com/blogs/big-data/analyzing-healthcare-fhir-data-with-amazon-redshift-partiql/
But what if you’re not on the cloud or your server provider doesn’t come with a data warehouse to migrate to?
Take a look at SQL on FHIR.
Spearheaded by Health Samurai, and with participants from Microsoft, Google and others, it takes a stab at flattening out FHIR resource data and making it accessible via SQL.
Early days but worth keeping an eye on: https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/
The lesson here if you’re trying and failing to run big queries via your FHIR API is to stop and rethink.
---
Sign up to “The Tuesday FHIR Sessions” and receive an email every Tuesday where I go deep on a single FHIR topic.