There is a widespread assumption that FHIR and AI are a natural fit. That once your data is in a FHIR server it is structured, standardized and ready to be consumed by AI.
This is not always true.
FHIR enforces a consistent data structure. It uses standardized terminology and defines clear relationships between resources. On paper, it looks like exactly what AI needs.
The problem is that the data inside most FHIR servers does not look like this.
The same data quality problems that exist in legacy systems can exist in FHIR. In some cases the FHIR layer introduces new problems. An AI system working with a poorly implemented FHIR dataset has no way to know that the data it’s reading might be incomplete, stale, or just plain wrong. It will work with what it has and make assumptions and draw conclusions from what it has.
Here is where FHIR data goes wrong.
Identity and duplication
Without a Master Patient Index and enforced unique identifiers, the same patient can exist multiple times in a FHIR server. Poorly implemented merge operations can create orphaned resources pointing to inactive patients, while data flowing in from multiple sources can produce duplicate patients with no common identifiers.
An AI system building a clinical picture from fragmented or duplicated records will produce a picture that is incomplete or dangerously wrong.
Terminology
Incorrect SNOMED and LOINC codes caused by poor data mapping can look valid and even find their way into profiles and ValueSets. Poorly constructed CodeableConcepts are widespread. This can manifest as conflicting codes, free text in place of standard codes, and display text that contradicts the code value. And then there are custom code systems that may begin life filling gaps in SNOMED but end up as the default code system.
Data completeness
A resource can pass profile validation and still be analytically useless. Null or default values can be inserted by the mapping layer to pass validation. This data can look real but is essentially made up. On top of this, excessive use of the dataAbsentReason extension can leave AI systems facing inconsistent data across source systems.
Data freshness
In hybrid systems, sync processes are rarely real-time. An AI system may be reading data that is hours or days old without knowing it. When a data sync goes wrong it can produce duplicate resources with no way to tell the copy from the original. Manual corrections applied directly to the FHIR server cause a different problem, leading to the server and the source system drifting further and further apart.
Profiles and Implementation Guides
Profiles that are not enforced by the FHIR server can lead to resources that do not conform to any agreed structure. Implementation Guide versions (especially ValueSets) can be inconsistent across environments and tenants, and historical data that was loaded before profile changes can mean the data an AI system reads today may differ from what it reads tomorrow.
Data access
Bulk export is often never tested at production level data volumes. The first time an AI pipeline tries to pull everything, it may time out or return incomplete data. Consent restrictions and access control rules can silently limit what the AI system can see, especially if there’s a proxy layer intercepting and altering data as it flows out of the server.
Resource relationships
FHIR resources reference each other. An Observation references a Patient, an Encounter and a Practitioner. In poorly maintained FHIR servers, broken and missing references are common, especially when manual corrections are made to the data. An AI system following those references hits dead ends.
Provenance
AI systems benefit from knowing where data came from – the source system, the doctor or hospital who signed off, whether an element was updated or manually corrected after the fact. Missing Provenance resources or inconsistencies between the provenance and the actors and practitioners in individual resources can make this unreliable.
Conclusion
FHIR is not a silver bullet.
Pushing data from legacy systems to a FHIR server does not make that data AI-ready. The real “AI-ready” project is everything you need to build into your data pipelines to ensure your FHIR resources are clean, consistent and fully accessible.
That means resolved patient identity, accurate terminology, reliable references, consistent structure, current data, and provenance that can be trusted.
---