FHIR data mapping with Claude.ai

How would a mature AI like Claude perform at a real-world task on a real-world FHIR project?

I’m talking about the kind of task you might give to your FHIR SME or hand off to your data team for a few days investigation. A task where you would need strong FHIR knowledge to complete it effectively and one that was littered with sub-optimal answers and dead ends.

I came up with a test case from my own experience a few months ago where I was advising a team working for a start-up in the US. I wanted to see if Claude would come up with the same solution or if it would settle for a “low hanging fruit” answer.

There are links at the bottom of the page to the Claude project as well as a link to the final artifact produced (PDF).

Here’s the first prompt.

Prompt 1

A project is using FHIR R4. It receives many Observation resources from different data providers. Some of these providers are doctors entering data into an app, others are automated events driven by button clicks or device actions. A third category are AI generated Observations.
Suggest different ways of capturing the fact that an Observation is generated by an AI and by one specific AI over another. From your list of suggestions, highlight the preferred approach.
Be careful to only use FHIR resource types and elements from FHIR version R4.

It’s a detailed prompt with enough information for Claude to get to work. The reminder to focus on FHIR version R4 was required, otherwise we’d likely see a mix of R4 and R5 elements.

Use of the word “fact” was inadvertent and possibly a mistake, as “fact” has its own meaning in the context of Observations and AI but it does not appear to have influenced the outcome.

It came up with 5 options, almost all of which were bad.

1. Observation.device
Populated with a reference link to a Device resource that documents the AI. This had merit and possibility but was not ideal, for reasons that will become clear later.

2. Observation.performer
Populated with a reference link to an Organization resource that signified the AI assistant. Not a good idea.

3. Observation.method
An interesting suggestion and not one that I had considered myself. The FHIR documentation says of method: “Indicates the mechanism used to perform the observation.” I could see how that might fit, at a stretch.

4. Observation.extension
A reference link to the Device resource outlined in option 1 above, stored in an extension. It got points for the generic nature of the extension which had the benefit of not limiting the solution specifically to an Observation resource, but overall not a good enough solution.

5. Observation.category
Added a second coding to the category element that marked the resource as AI generated. This is a terrible solution and suggested that Claude did not understand the “concept” part of CodeableConcept.

Overall, not a good start.

Claude had focused heavily on the Observation resource itself, seeking to find a “resource specific” solution to what it should have seen as a more general problem to solve with possible solutions above and beyond the Observation resource.

My initial prompt asked for two things: a way to identify the Observation as AI generated AND a way to identify the AI source itself. Before I go any further let me outline the solution I was hoping for.

An “official” AIAST code already exists with recommendations on how to use it in a security label to identify AI generated data. The Provenance resource type is the established and recognized way to capture the source of a FHIR resource. Combined, this is roughly the solution we came up with at the start-up I mentioned earlier.

It took Claude 3 more prompts to get there and a further prompt to pull everything together.

Prompt 2

You did not make any recommendations for using the Provenance resource type. Why is this and is it an effective method of capturing AI source information?

Once I pushed Claude in a clear direction many of the poor options above disappeared, leaving only two recommendations.

1. Observation.device was maintained “for simple identification and efficient querying”. This was Claude’s answer to the first part of my request but it was not a very good way of doing so as it required access to the Device resource in order to identify the Observation as AI generated.

2. A Provenance resource was recommended and valid reasons were given for using it.

Regulated environments requiring full provenance
Multi-step AI workflows
Cases where human oversight is involved
Complex AI pipelines with multiple models

This was where Claude came into its own and provided real value. Expanding on the prompts it was given and searching out and documenting related points and arguments.

It still hadn’t learned about the security label. Time to push it a little closer.

Prompt 3

You also made no reference to using security labels or tags in the Observation resource. This may or may not be an appropriate method. Expand on this.

Claude went off and weighed up the pros and cons of security labels and tags. It did not like either, providing a list of reasons why they were bad ideas, with several significant limitations and possibilities for confusion. It was right about tags but wrong about security labels.

Despite being wrong it was bold in its recommendation: “Do NOT use security labels for AI attribution – this is a misuse of their intended purpose.”

My next prompt was equally bold, pointing it unambiguously in the direction I wanted it to go.

Prompt 4

Look at this article: https://healthcaresecprivacy.blogspot.com/2024/09/healthcare-ai-provenance-of-ai-outputs.html
It talks about using the AIAST security label as well as the Provenance resource.

Claude went to the referenced web page, consumed the content and on that basis changed its analysis significantly. The fact that the AIAST code was “officially sanctioned” by HL7 seems to have weighed heavily in the decision to change its recommendation.

The security label now became its primary method of identifying the Observation as AI generated, with use of the Observation.device element and the Provenance resource being maintained for “detailed attribution”.

There was no reason to hold onto the Observation.device element at this point as the security label more than adequately filled this role, so I added a little more clarification to see if that changed the outcome.

Prompt 5

We need to capture two different pieces of information. The first is the fact that an AI was used to generate the Observation. The second is details of the precise AI used. Make a final recommendation for both.

Claude provided a comprehensive explanation as to why each of the three recommendations were suitable, even providing JSON resource samples. But it held on to the Observation.device reference even though the information was duplicated in the Provenance resource.

I had to push it a little further to make it question this.

Prompt 6

Use of Observation.device makes the proposed solution specific to the Observation resource type. If there were a future requirement to extend AI generated resource types beyond Observation (Appointment, Encounter, etc.), then the modelling would need to be changed. With this in mind, is your solution still the best method?

Finally it dropped the obsession with Observation.device, leaving me with the solution I was hoping for from that very first prompt. It highlighted the scalability of the new recommendation, providing language that would suit any business document or official sounding email:

“This approach provides maximum scalability and consistency across the entire FHIR ecosystem while maintaining semantic accuracy and standards compliance.”

My final prompt was to ask for a polished artifact.

Prompt 7

Produce an artifact from your final answer.

If you’re unfamiliar with Claude, an artifact is a polished document where it takes its earlier recommendations, expands on them and wraps them into a properly structured and readable PDF.

The artifact produced contained lots of JSON samples, tables outlining possible AI use cases across different resource types as well as reasons why all of its suggestions were good ideas. It got bonus points at this stage for adding a “verifier” agent to the sample Provenance resource, signifying that a human practitioner had approved the AI generated resource.

Claude artifacts look impressive but still require careful reading and editing to weed out mistakes and expand on explanations. Towards the bottom of the document it fell over one last time by producing an SQL query in place of a FHIR query.

Here’s the final artifact (PDF).

And here’s the complete set of prompts and answers so that you can see how Claude handled each prompt one after the other.

Conclusion

Claude got to where I wanted it to go, but I had to treat it like a first year undergraduate student and prompt it repeatedly, pushing it closer and closer to my ideal solution each time by feeding it hints and suggestions.

Like many students, it can read the material but it isn’t always capable of thinking further than the first “it works” solution. Very much a C student who shows up for each lecture, writes with far too much confidence but is not capable of producing anything genuinely enlightening.

But once it came to the “correct” conclusion it was exceptionally good at coming up with strong and convincing reasons for that conclusion. There’s value in that. An excellent assistant if not a great researcher.

---

Download my “FHIR Architecture Decisions” book