NOTE this is largely replaced by the The OAK Guide
Ontology Concepts
Here we describe some of the over-arching concepts in this library. Note that distinct Datamodels may impose their own specific views of the world, but the concepts here are intended as a kind of lingua-franca.
Ontology
This library has a very broad concept of what an Ontology is, in keeping with the broad range of use cases addressed.
For many users the concept of an ontology is quite straightforward - it’s something like what is described in the GO paper from 2000, a collection of thousands of inter-related terms. However, the concept of “ontology” turns out to be very flexible and malleable, and might include things like:
things that are more “schema-like”, such as schema.org or PROV
formal logical artifacts like BFO
an “instance graph” for example of countries and their connections
a knowledge base encoded in RDF
the entirety of wikidata
in OWL, an ontology is just a collection of axioms
We try to be as pluralistic as possible and provide a way to access all of the above using the appropriate abstractions. However, the main community served is “classic” ontologies such as those found in the OBO library or those encoded in OWL.
Ontology Element
Ontologies can be conceived of as collection of different kinds of elements, which can loosely be thought of as something with a persistent identifier, and optionally having various kinds of metadata associated with it.
The different kinds of elements are:
Classes or Concepts – the predominant kind, for most ontologies
Properties or Relationship Types
Individuals or Instances
Subsets
Ontologies (an ontology is itself an ontology element)
Various other elements used for particular purposes
These are not necessarily disjoint categories!
Note
We sometimes use the term “term” but this can be ambiguous. Sometimes it is equated with the OWL concept of a Class, sometimes it it used more broadly to encompass other elements that may have names/labels (for example, Scotland, which is an instance, not a class). And sometimes it is used to refer to the string by which the element is denoted!
When working with a specific datamodel these may be partitioned more strictly. For example, in OWL, there are three disjoint kinds of properties:
ObjectProperties
AnnotationProperties
DatatypeProperties
The BasicOntologyInterface does not discriminate between different kinds of elements. This can be confusing, if you ask for all elements thinking you might get back only the “terms” but you would also get elements for relationship types, subsets, etc.
Imports and Ontologies within Ontologies
Many users are accustomed to ontologies being simple stand-alone monolithic entities, and a lot of tooling makes that assumption.
In fact, many ontologies are organized as modular components that are imported by other ontologies, much the same way that software has evolved from monolithic programs to modular systems. Sometimes for ontologies releases, the imports are merged such that what appears to be one ontology has pieces of other ontologies incorporated in.
This library is designed to handle all of these scenarios. In the BasicOntologyInterface, you don’t have to worry about imports, you just get a view where everything appears as if it were in a single ontologies (even this ontologies actually a combination of ontologies). Other interfaces let you explore the compositional structure in more detail.
URIs and CURIEs and identifiers
Some communities prefer to use prefixed identifiers like GO:0008152, others prefer to use URIs as identifiers. This is driven largely by the tools and infrastructure used, with “semantic web” stacks using URIs and data science/bioinformatics tools using identifiers.
We bridge these worlds by using CURIEs, essentially prefixed identifiers where there is a well-defined prefix expansion.
Most methods in the interfaces in this library accept CURIEs, but these can always be expanded and contracted.
Prefix Maps
A prefix map maps between prefixes and their URI base expansions.
Relationships / Edges
Note
It may seem surprising but the OWL standard has no construct that directly corresponds to what we call a relationship here.
Mappings
Ontology Format
Statements and Axioms
Note
You only need to understand this if you are working with the OwlInterface or the RdfInterface.
Subsets
Labels/Names
It is common for biological ontologies to use an opaque identifier for each element, and include exactly one “name” or “label” which serves as a unique string for humans to identify the element. In OWL representations, the name is typically represented using rdfs:label.
This is by no means universal:
some ontologies use non-opaque identifiers, and omit a separate label field
some ontologies may use a different property, such as skos:prefLabel
some ontologies may have some elements that are “dangling” and do not have label populated
sometimes the same label may be shared by different identifiers, even within an ontology
some ontologies may have multiple labels for an element
this may be intentional, as in the case of different languages (wikidata)
or it may be unintentional, for example, resulting from ontology merges of different versions of the same ontology
The OWL datamodel allows for complete flexibility here, giving ontology providers the freedom to model things however they like. The OBO Format datamodel (and the corresponding obojson) is a little more restrictive here, for example, disallowing multiple labels.
The OBO community have defined a suite of QC checks implemented in the OBO dashboard to try and get ontologies align to a datamodel where elements have exactly one label.
This library aims to be pluralistic and allow for all scenarios. However, it makes the common case the most “convenient”. And it may also be the case that some interfaces impose a certain restriction - for example, the obograph interface uses the obograph datamodel which has a maximum cardinality of 1 for labels.
Some implementations may also impose their own restrictions - e.g. pronto, OLS, and bioportal all roughly adhere to the OBO model of making label be single-valued.