Inheritance in Graph Databases: How to Model a Type and Its Subtypes

Relational databases handle type hierarchies badly. A common workaround is a “type” column in a table, which flattens all subtypes into a single wide row and forces application code to pick which columns apply. Object-oriented code has classes, inheritance but Java struggles with multiple parents. Graph databases have something more profound. The relationship itself carries the semantic weight, and the type hierarchy can be queried at runtime just like any other part of the data.

This matters in any domain where types evolve. Telecoms network inventory is a clean example. A fibreoptic cable is a fibreoptic cable. But BT being BT, means that Blown Fibre, BT Install Fibre, and BT Dark Fibre are all subtypes with their own specific properties, their own provisioning rules, and their own rate of change. The challenge is to model them in a way that lets generic queries find all fibre cables, while letting specialist queries distinguish between subtypes. Lastly once the next iteration of hyperoptic fibre arrives, it should not require a schema change every time BT’s product catalogue introduces a new variant.

There are three patterns for handling this in a property graph. Each solves a different version of the problem.


Pattern 1: Multiple Labels

Neo4j allows a single node to carry multiple labels. A BT Blown Fibre cable node can simultaneously be :Cable, :Fibreoptic, and :BTBlownFibre. All three labels are independently indexed.

CREATE (c:Cable:Fibreoptic:BTBlownFibre {
id: 'CBL-0044',
coreCount: 96,
wavelength: '1550nm',
tubeId: 'TUBE-A3',
blowingPressure: '3bar',
ductSection: 'D-441'
})

A capacity planning query that needs all fibre uses MATCH (c:Fibreoptic) — it finds every fibreoptic node regardless of subtype. A provisioning query that specifically handles blown-fibre uses MATCH (c:BTBlownFibre). Both traverse a label index, so both are fast.

One physical node carries all three labels. The label set is the type hierarchy, flattened onto the node. Generic queries match on :Fibreoptic; subtype queries match on :BTBlownFibre.

When to use it: the subtype set is finite and stable. If BT’s cable variants are unlikely to multiply, multiple labels are the simplest and fastest option. Label index lookups are O(1).

The problem: adding a new subtype requires adding a new label to the schema. In a large Neo4j deployment with a running application, that is a schema change, not just a data change. It also gives you no queryable structure over the type hierarchy itself — you cannot ask “what are all the subtypes of Fibreoptic?” without inspecting node labels, which is an application-level concern rather than a graph query.


Pattern 2: IS_A Type Hierarchy

This is the pattern used by SNOMED CT for clinical terminology and applies directly to network inventory. Instead of encoding types as labels, you build a separate subgraph of type nodes connected by IS_A edges. Instance nodes point to their most specific type via an INSTANCE_OF relationship.

// Build the type graph
CREATE (:CableType {name: 'Cable'})
CREATE (:CableType {name: 'Fibreoptic'})-[:IS_A]->(:CableType {name: 'Cable'})
CREATE (:CableType {name: 'BTBlownFibre'})-[:IS_A]->(:CableType {name: 'Fibreoptic'})
CREATE (:CableType {name: 'BTInstallFibre'})-[:IS_A]->(:CableType {name: 'Fibreoptic'})
// Create an instance pointing to its most specific type
CREATE (c:Cable {id: 'CBL-0044', coreCount: 96, tubeId: 'TUBE-A3'})
-[:INSTANCE_OF]->(:CableType {name: 'BTBlownFibre'})

Finding all fibreoptic cables — regardless of how deeply nested the subtype — is a variable-length traversal up the IS_A hierarchy. The same syntax works in Gremlin on TinkerPop compatible databases.

MATCH (root:CableType {name: 'Fibreoptic'})
MATCH (c:Cable)-[:INSTANCE_OF]->(t:CableType)
WHERE (t)-[:IS_A*0..]->(root)
RETURN c

The type graph is a first-class subgraph. New cable types are added as data — a new CableType node and an IS_A edge. No schema change required. Instances always point to their most specific type; the traversal finds them at any depth.

When to use it: the type hierarchy needs to grow without schema changes, or you need to query the type structure itself — “what subtypes does Fibreoptic have?”, “which types share a common ancestor?”. In a telco context where the product catalogue evolves continuously, this is usually the right choice. The cost is one extra IS_A traversal per query, which is trivial for shallow hierarchies but worth indexing carefully for deep ones.

The operational benefit is significant: when BT introduces a new fibre variant, a network engineer adds a CableType node and an IS_A edge. They do not touch the application, the schema, or any existing instance data.


Pattern 3: Specialisation Node

The third pattern separates the base type from its subtype-specific properties entirely, using a linked specification node:

// Base node: lean, queryable by all consumers
CREATE (c:Cable:Fibreoptic {
id: 'CBL-0044',
coreCount: 96,
wavelength: '1550nm'
})
// Spec node: carries blown-fibre specifics
CREATE (spec:BTBlownFibreSpec {
tubeId: 'TUBE-A3',
blowingPressure: '3bar',
ductSection: 'D-441',
installDate: '2022-03-14',
contractor: 'BT Openreach'
})
// Link them
CREATE (c)-[:HAS_SPEC]->(spec)

A capacity planning system queries (c:Fibreoptic) and gets every fibre cable, working entirely against the lean base node. A provisioning system that needs blown-fibre engineering details traverses (c)-[:HAS_SPEC]->(spec) to reach the full property set.

The base node is clean and fast for generic queries. Subtype-specific properties live in a separate node, only accessed by consumers that need them. Different subtypes have different spec nodes with entirely different property sets.

When to use it: when subtype property sets are large, volatile, or owned by different teams. In the BT context, the core network inventory team owns the base :Fibreoptic node. The field engineering system owns the BTBlownFibreSpec details. Keeping them physically separate in the graph reflects that ownership boundary — changes to blown-fibre spec properties (perhaps adding a new regulatory field) do not touch the base inventory node at all.

The trade-off is an additional hop in queries that need the spec. This is usually acceptable — it is one relationship traversal — but it should be anticipated in query design and indexed appropriately at the entry point.


Combining the Patterns

In practice the three patterns are not mutually exclusive. The BT SRIMS network inventory used a combination that reflects the stability of each level:

The stable semantic types — :Cable, :Fibreoptic, :Copper — are labels. Every cable node carries the appropriate base labels. These are fixed categories that change rarely and drive the bulk of operational queries. Label indexes make them fast.

The product catalogue variants — BTBlownFibre, BTInstallFibre, BTDarkFibre — live in an IS_A type hierarchy. New variants are data changes, not schema changes. The provisioning system traverses the type graph to determine what rules apply to a given cable; the network inventory system does not need to know.

Engineering-specific or regulatory property sets — blowing pressures, duct schedules, contractor records — live in specialisation nodes. The data is accessible when needed but does not pollute the core graph that handles the majority of queries.

The decision logic is straightforward: if the type is stable and drives high-volume queries, use a label. If the type evolves independently of the application, use an IS_A type hierarchy. If the subtype property set is large, volatile, or consumer-specific, use a specialisation node.


The Deeper Principle

What makes inheritance tractable in a graph database is that the type system is part of the data, not external to it. You can query the type hierarchy with the same language you use to query the instance data. You can traverse from a type node to all its instances, or from an instance up to its root type, or across the hierarchy to find sibling types — all in a single Cypher or Gremlin expression.

In a relational system, the type hierarchy lives in application code or in a separate reference table that requires explicit joins. When it changes, both the schema and the code change. In a graph, adding a new cable type to BT’s product catalogue is a graph write, not a deployment.

That is the operational advantage: the model evolves at the speed of the business, not the speed of the release cycle.

Leave a Reply