Associations and Curated Annotations
Background
The main purpose of OAK is to provide uniform access onto an Ontology. Ontologies are frequently used in combination with some form of tagging data entities. In the bio-ontologies and other realms, this kind of tagging is usually called Annotation (but this term can be ambiguous).
There are many different formats and data models for associations. The Gene Ontology uses the GAF format, which associates genes or gene products with terms in the ontology, alongside additional contextual information and provenance. There is a similar format for Human Phenotype Ontology associations, which associate disease identifiers with phenotypic feature terms, alongside information about severity, age of onset, as well as provenance. Outside the bio-ontology world, the Open Annotation standard provides a way of associating a wide range of entities of different types.
The difference in use cases make supporting a single data model challenging. However, there are a number of core elements that are typically shared.
The association:
is typically about something, i.e the Subject of the association
relates the subject to another thing (the Object), typically a class from an ontology
may have an (explicit or implicit) Predicate indicating the nature of the relationship between subject and object
should have provenance, typically indicated via CURIEs to publications like DOIs or PMIDs
may have some kind of semantic modifier, including a negation flag
may have any number of pieces of additional evidence, providence, or administrative metadata
may include additional denormalized fields for convenience.
The first three of these constitute the OAK Edge data model. You may well ask, why treat associations differently from other kinds of edges in the ontology?
There are a variety of answers to this question. Some are pragmatically oriented:
associations have historically been separated from ontology relationships in many domains
the operations we may want to do on one may differ from those on the other
associations typically emphasize the importance of provenance and additional metadata whereas ontology relationships are taken “as given”
associations are typically curated by different groups than those that curate ontologies
Others answers are more formally oriented:
ontology relationships have strict OWL logical semantics (usually some combination of SubClassOf and SomeValuesFrom), whereas associations don’t have defined semantics (or are weak Some-Some axioms)
ontology relationships represent term invariant relationships, whereas associations are contingent
For a more detailed treatment of these formal aspects, see On beyond Gruber: “Ontologies” in today’s biomedical information systems and the limits of OWL.
Association support in OAK
Warning
The current way associations are loaded and modeled in OAK is subject to change
Data Model
See the Association data model for details of the data model.
The data model is intentionally minimalist, and intends to capture the core features of multiple
association data models. A generic PropertyValue
object captures domain-specific extensions.
Selecting association sources
There are a number of ways to select an association source.
On the command line you can supplement the main ontology input (passed with --input
or -i
) with
an --associations
option (shorthand -g
). You will also need to specify the association
format (--associations-type
or -G
).
The following will query HPO associations for any diseases associated with “Abnormal lacrimal gland morphology” or any is-a Descendant:
wget http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa
runoak -i sqlite:obo:hp -G hpoa -g phenotype.hpoa associations -p i HP:0011482