Understanding AI Classifiers, Terminologies, terminology Engines

These three concepts get conflated constantly in healthcare informatics conversations. All deal with clinical codes and they all do fundamentally different jobs. Understanding what each does is critical if you’re building anything that touches coded clinical data.

Category What it does Major examples Healthcare context
Terminology
Clinical code systems
Defines the codes, concepts, descriptions and relationships. Passive reference data — doesn’t run, query or reason SNOMED CT ICD-10 ICD-11 LOINC dm+d OPCS-4 RxNorm CPT SNOMED CT has 350k+ active concepts with hierarchical relationships. ICD-10 is used for mortality/morbidity classification and billing. LOINC covers lab observations. dm+d is the NHS drug dictionary
Engine
Terminology servers
Indexes and serves terminologies via API. Handles lookup, validation, subsumption, value set expansion. Makes the dictionary queryable at runtime Ontoserver (CSIRO) Snowstorm NHS England Terminology Server Apelon DTS Open Concept Lab HAPI FHIR Ontoserver powers the NHS Digital and Australian NCTS. Snowstorm is SNOMED International’s open-source server on Elasticsearch. These answer “is code X a subtype of Y?” — not “what will happen next?”
Classifier
ML / gradient boosting
Learns statistical patterns from historical coded data to predict outcomes. Treats codes as categorical features, not semantic concepts CatBoost XGBoost LightGBM scikit-learn Random Forest scikit-learn GBM CatBoost handles SNOMED/ICD codes natively as categories via ordered target statistics. XGBoost and LightGBM require manual encoding first. Used for demand forecasting, risk prediction, readmission scoring

A terminology is a dictionary

SNOMED CT is a terminology. So is ICD-10, OPCS-4, and dm+d. The terminology defines meaning. It doesn’t serve it, search it, or reason over it.

A terminology gives you a controlled set of concepts with unique identifiers, human-readable descriptions, and relationships between them. SNOMED CT concept 73211009 is a type of disorder of the endocrine system. ICD-10 E11.9 is “Type 2 diabetes mellitus without complications.”

The terminology is a structured reference set, often as thousands of text files that define what codes exist and how they relate to each other. SNOMED’s international release is distributed as RF2 files: tab-separated text that describes concepts, descriptions, and relationships. You could open them in Excel if you wanted to, though you’d regret it given there are over 350,000 active concepts.

A terminology engine makes the terminology usable

The terminology engine doesn’t interpret clinical data. It doesn’t look at a patient record and decide anything. It answers questions about codes and their relationships. It provides the reference infrastructure.

Ontoserver is a terminology engine, specifically, a FHIR-native terminology server. It takes raw terminology content (those SNOMED RF2 files, ICD-10 tabular lists, dm+d releases) and make them queryable through a standardised API. When a clinical system needs to look up a SNOMED code, validate that a code exists, expand a value set, or find all descendants of a concept, it calls the terminology server rather than shipping the entire terminology locally and parsing it itself.

The key operations a terminology engine provides are concept lookup (give me the display term for this code), validation (is this code active and does it belong to this value set), subsumption testing (is concept A a subtype of concept B — which is how you answer “is this patient’s diagnosis a type of diabetes”), and value set expansion (give me all the codes that match this definition, such as “all SNOMED codes that are subtypes of diabetes mellitus”).

A classifier learns patterns from data

CatBoost is a gradient boosted decision tree library. It sits in an entirely different category. Where the terminology defines what codes mean and the engine serves those definitions, CatBoost takes historical data (rows of patient encounters with their coded features and outcomes) and learns statistical patterns that let it predict something about future encounters.

In the A&E demand forecasting example, CatBoost takes features like day of week, weather category, trust type, flu rate, and historical attendance counts, and learns that certain combinations predict higher or lower demand. The SNOMED and ICD codes in the training data are input features, categorical variables that the model uses to discover patterns, not things the model understands semantically.

This is the critical distinction. CatBoost doesn’t know that E11.9 is a type of diabetes. It doesn’t know that 73211009 sits underneath “disorder of the endocrine system” in a hierarchy. It treats those codes as opaque category labels and learns from the data that certain codes co-occur with certain outcomes at certain frequencies. The ordered target statistics that CatBoost uses to encode categorical features are purely statistical, they measure the average target value associated with each code, not the clinical meaning of the code.

This is why the terminology and the classifier are complementary. The terminology tells you what a code means. The classifier tells you what patterns in the data involving that code can predict. One is knowledge-driven, the other is data-driven.

Where they overlap in practice

In a real NHS analytics pipeline, all three work together. The clinical system records a patient encounter using SNOMED-coded diagnoses, validated at the point of entry by a terminology server, that checks the codes are active and belong to the correct value set. That coded data flows into the Trust’s data warehouse. An analytics team then pulls the historical coded data and feeds it into CatBoost as categorical features for a demand forecasting or risk prediction model.

The terminology server might also play a role in feature engineering. If you want to group thousands of specific SNOMED diagnosis codes into broader categories for the model, collapsing all the subtypes of diabetes into a single “diabetes” feature, for instance you’d use the terminology server’s subsumption API to identify which codes are descendants of the diabetes concept. That grouped feature then gets passed to CatBoost, which treats it as a categorical variable and learns its predictive value from the data.

The terminology defines the clinical meaning, the engine makes that meaning queryable and usable in data pipelines, and the classifier discovers statistical patterns across the coded data. They operate at different layers of the stack, and confusing them leads to architectural mistakes. For example, using a terminology hierarchy as a predictive model, or expecting a classifier to understand that two codes are clinically related when they’ve never co-occurred in the training data.

Leave a comment