athenean owl

Explaining Medallion Data Architectures in Healthcare

Faster Insight, Better Reuse, and Scalable Data Foundations

Healthcare organisations face growing demand for better use of data: improving operational performance, supporting population health management, enabling AI, and accelerating research. Yet many still rely on fragmented pipelines, duplicated transformations, and slow bespoke data requests.

At the same time, the economics of technology have changed. Modern cloud platforms now provide highly durable, scalable storage at costs that make retaining large volumes of raw data practical and economical. This has enabled a shift in data architecture design: rather than transforming data before storage, organisations can preserve source data first and refine it iteratively over time.

The Medallion Architecture, popularised by Databricks, provides a practical model for this approach. Data progresses through three logical layers:

  • Bronze – raw source data retained with provenance
  • Silver – cleansed, linked, standardised, reusable data assets
  • Gold – trusted datasets optimised for operational, analytical, or research use

For healthcare, this model offers substantial advantages. Once common cleansing, linkage, terminology mapping, and quality controls are established in Silver, Gold datasets for specific use cases can be produced rapidly and repeatedly. This shortens time to insight, reduces duplicated engineering effort, and strengthens governance consistency.


Why Data Architecture Must Change

Traditional healthcare data estates often evolved around individual reporting needs, local applications, or one-off integrations. Common consequences include:

  • Repeated extraction of the same source data
  • Multiple inconsistent versions of metrics
  • Long lead times for new data requests
  • Limited ability to reuse previous work
  • Poor lineage and provenance
  • High cost of maintaining bespoke pipelines

Historically, these designs reflected technical constraints. Storage was expensive, compute was fixed, and systems favoured structured schemas.

Modern cloud platforms changed those assumptions:

  • Cheap, scalable storage
  • Elastic compute on demand
  • Native support for structured and semi-structured data
  • Streaming ingestion
  • Separation of storage and compute resources

This creates the opportunity for a more durable and reusable model.


From ETL to ELT

Traditional warehouses commonly used ETL (Extract, Transform, Load):

  1. Extract from source systems
  2. Transform externally
  3. Load final shaped data

Modern platforms increasingly favour ELT (Extract, Load, Transform):

  1. Extract source data
  2. Load raw data quickly into platform storage
  3. Transform using in-platform compute

ELT is better suited to healthcare because it supports:

  • Large data volumes
  • Frequent source system changes
  • Streaming feeds
  • Reprocessing when business rules change
  • Multiple downstream outputs from one source ingestion
  • Preservation of source fidelity for audit and replay

Instead of deciding once what data should become, organisations can decide many times.


What Is a Medallion Architecture?

The Medallion model organises data into progressive layers of trust and usability.

Bronze Layer – Raw Data

The Bronze layer stores source data substantially as received.

Examples:

  • EPR extracts
  • Laboratory feeds
  • Claims data
  • HL7 / FHIR messages
  • Device telemetry
  • Scheduling events
  • Legacy flat files

Key principles:

  • Preserve original records
  • Maintain timestamps and provenance
  • Support replay and reprocessing
  • Avoid premature data loss

Silver Layer – Cleaned and Reusable Data

Silver is where data becomes enterprise-grade.

Typical processing:

  • Validation and schema checks
  • Deduplication
  • Standardisation
  • Identity linkage / pseudonymisation
  • Reference data enrichment
  • Coding alignment to SNOMED CT, ICD-10, OPCS-4
  • Conformance to models such as OMOP

This layer creates reusable assets rather than one-off outputs.

Gold Layer – Consumption Ready Data

Gold contains products shaped for specific users.

Examples:

  • Board performance dashboards
  • Waiting list metrics
  • Population health segmentation
  • Clinical pathway analytics
  • Service planning marts
  • Research cohorts
  • AI feature datasets

Why This Is Powerful in Healthcare

Healthcare repeatedly asks new questions of old data.

Examples:

  • Which cohorts are at highest risk this winter?
  • What predicts delayed discharge?
  • How equitable is service access?
  • Which interventions improved outcomes?
  • Can a research cohort be assembled safely for oncology?

Without a layered architecture, each request may restart engineering effort.

With Medallion:

  • Bronze already holds historical source data
  • Silver already contains reusable linked and standardised assets
  • Gold can be generated quickly for the new question

This changes delivery speed dramatically.


Population Health and Research Use Cases

Population Health

Gold datasets can support wide-scale analysis across:

  • Long-term conditions
  • Prevention opportunities
  • Health inequalities
  • Demand forecasting
  • Primary / secondary care utilisation
  • Place-based planning

Research and Innovation

Gold datasets can support approved linkable cohorts for:

  • Oncology outcomes
  • Cardiovascular studies
  • Rare disease analysis
  • Genomics-enabled studies
  • Medicines safety
  • Pathway redesign evaluation

Because much of the hard work is already completed in Silver, research mobilisation becomes faster and more repeatable.


Governance by Design

In healthcare, speed without trust fails.

A Medallion Architecture should embed:

  • Data minimisation
  • Role-based access control
  • Pseudonymisation
  • Full lineage
  • Audit trails
  • Approval workflows
  • Safe outputs controls
  • Policy-as-code enforcement where possible

This allows Gold datasets to be generated quickly without weakening governance.


Strategic Benefits

For executives, the model delivers:

Faster Time to Insight

Weeks become days; days become hours.

Lower Total Cost

Reuse common transformation logic rather than rebuilding pipelines. Pseudonymise on input once.

Better Quality

Shared standards reduce conflicting numbers.

Stronger Governance

Traceability from Gold outputs back to Bronze source. Traceability on all transforms applied.

AI Readiness

Curated Silver and Gold layers support analytics and machine learning. Bronze layers pseudonymised can be used by private LLMs

Future Flexibility

New use cases can be served from existing foundations.


Common Mistakes to Avoid

  • Treating Bronze as a dumping ground with no metadata, ownership, history
  • Creating too many bespoke Gold datasets with no ownership
  • Skipping data quality controls in Silver
  • Ignoring governance until the end
  • Designing around tools rather than operating model
  • Failing to assign product owners to Gold assets

Architecture alone does not solve accountability.


Recommended Healthcare Implementation Approach

Phase 1 – Foundation

Prioritise ingestion, metadata, lineage, identity controls.

Phase 2 – Silver Core Assets

Create reusable patient, encounter, provider, activity, coding, and geography models.

Phase 3 – Gold Priority Products

Deliver highest-value dashboards and research datasets first.

Phase 4 – Scale and Automate

Add self-service, policy automation, data product ownership, and advanced analytics.


Conclusion

The Medallion Architecture reflects a modern reality: storage is affordable, compute is elastic, and healthcare needs faster answers from increasingly complex data.

By retaining raw data, building trusted reusable Silver assets, and rapidly generating Gold products, healthcare organisations can move from fragmented reporting toward an industrialised data capability.

For systems seeking better operational performance, stronger population health insight, and faster research enablement, Medallion Architecture is not just a technical option—it is a strategic operating model for data.

Leave a comment