Ontologies and schemas¶
Raw data collections (“tier 0”) in the production system do not adhere to a fixed schema or ontology, but instead have a schema which is very close to the raw data. Modifications to field names tend to be quite basic, such as lowercase and removal of whitespace in favour of a single underscore.
Processed data (“tier 1”) is intended for public consumption, using a common ontology. The convention we use is as follows:
- Field names are composed of up to three terms: a
- Each term (e.g.
firstName) is written in lowerCamelCase.
firstNameterms correspond to a restricted set of basic quantities.
middleNameterms correspond to a restricted set of modifiers (e.g. adjectives) which add nuance to the
firstNameterm. Note, the special
ofis reserved as the default value in case no
lastNameterms correspond to a restricted set of entity types.
Valid examples are
Tier 0 fields are implictly excluded from tier 1 if they are missing from the
schema_transformation file. Tier 1 schema field names are applied via nesta.packages.decorator.schema_transform
Although not-yet-implemented, the tier 2 schema is reserved for future graph ontologies. Don’t expect any changes any time soon!