Exploring Microsoft’s FHIR Anonymization tool

Anonymizing healthcare data while maintaining its value is a problem many of us have faced.

But how do you go about it when that data is stored in a FHIR server?

Microsoft’s FHIR Anonymization tool is a good starting point. It’s open source and comes as a complete C# project with an MIT license — free to use, even commercially. It’s about 5 years old, so well and truly battle tested.

I spent some time test driving it recently. This is how it went.

1. I downloaded the repository, opened the FHIR project and built the command line tool.

2. I dropped a FHIR bundle into an “input” folder I created — a Synthea bundle with a few hundred resources.

3. I ran the command line tool with the input and output folders as parameters.

4. I compared the “before and after” bundles to see how the data was anonymized.

That was the starting point, and it took about five minutes. If you get this far you’ll find that all identifiers, names, codes, detailed addresses and extensions have been removed.

The next step was to dig into the configuration file. This is where you get to customize the behaviour of the tool.

The default config file is called configuration-sample.json and you’ll find it in the same folder as the EXE file. Open it up in Notepad++ and take a look.

There are some global parameters, but the meat of the file is a list of fhirPathRules. This is where you define precise anonymization rules for any resource element you choose — even extensions.

I started by making some changes to expose the Code, Display and Text values of all CodeableConcepts, and followed up by exploring how to allow specific extension values through.

This turned out to be easier than expected as it’s based around fhirPath expressions which I was already familiar with.

There are multiple different ways to run the the project:

– The command line tool
– Build the code libraries into your application and call them directly
– As part of a pipeline
– Via a parametrized $export operation on Azure’s FHIR servers

The true power of this project lies in the level of customization that is possible in the configuration file. You can tailor the anonymization engine to your specific requirements — business or regulatory.

This is not a one day project. But if you’re looking to anonymize your FHIR data it is a fantastic starting point.

It’s C#, but the implementation, approach and concepts can easily be transferred to projects in other languages and applied to FHIR servers running in different environments.

The repo.
The documentation.

Discussion

---

Download my “FHIR Architecture Decisions” book

Related

Discover more from Darren Devitt