Identifying entities: CURIEs and URIs
Prefix maps
Every Entity in OAK has a unique Identifier. OAK is consistent with semantic web formalisms where everything is identified by an IRI, but in OAK these are typically compressed into a CURIE, using a Prefix Map.
For example, to reference the concept of “heart” in the Uberon ontology,
we use the curie UBERON:0000948
. This is a compressed form of the URL http://purl.obolibrary.org/obo/UBERON_0000948,
using the OBO Foundry prefix map.
On the command line, most OAK commands take CURIEs or lists of CURIEs as inputs (although primary labels and queries can also be supplied). For example:
$ runoak -i ubergraph info UBERON:0000948
UBERON:0000948 ! heart
Under the hood, on the backend (in this case, Ubergraph Adapter), the concept is stored as a full URI.
Similarly, in Python:
>>> from oaklib import get_adapter
>>> ont = get_adapter('ubergraph:')
>>> id = "UBERON:0000948"
>>> print(ont.label(id))
heart
>>> print(ont.curie_to_uri(id))
http://purl.obolibrary.org/obo/UBERON_0000948
OAK uses the prefixmaps package to manage CURIEs and URIs, and by default will use a certain set of standard prefixmaps, including the OBO one, as well as a linked data prefix map, which provides a set of standard prefixes for non-OBO resources such as schema.org.
Querying prefixmaps
You can get a list of all prefixes known to OAK using the prefixes
command:
$ runoak prefixes
...
PO http://purl.obolibrary.org/obo/PO_
PORO http://purl.obolibrary.org/obo/PORO_
...
owl http://www.w3.org/2002/07/owl#
skos http://www.w3.org/2004/02/skos/core#
...
You can also query the prefixmap for a particular prefix or set of prefixes:
$ runoak prefixes UBERON CL oio skos schema
This will return a table:
prefix |
uri |
---|---|
UBERON |
|
CL |
|
oio |
http://www.geneontology.org/formats/oboInOwl# |
skos |
|
schema |
See the prefixes command for more details.
Non-default prefixmaps
For most purposes, the default prefixmap should suffice.
You can also choose to override the default prefixmap with your own, using the --prefix
or --named-prefix-map
options.
In python this can be done by accessing the prefixmap directly:
>>> from oaklib import get_adapter
>>> soil_oi = get_adapter("tests/input/soil-profile.skos.nt")
>>> soil_oi.prefix_map()["soilprofile"] = "http://anzsoil.org/def/au/asls/soil-profile/"
>>>
>>> # trivial example: show all CURIEs and labels
>>> for entity, label in soil_oi.labels(soil_oi.entities()):
... print(f"{entity} ! {label}")
Structure of identifiers
OAK doesn’t impose any expectations on the structure of identifiers.
For OBO ontologies, all identifiers should conform to the OBO identifier pattern, which is the prefix (typically all uppercase) followed by a local identifier which is all numeric (typically zero-padded with 7 digits). However, this is not a requirement for OAK.
Many semantic web ontologies such as schema.org use “semantic” URIs that a human can understand. These can be used in the same way:
$ wget https://schema.org/version/latest/schemaorg-all-http.ttl -O tests/output/schema.rdf
$ runoak --prefix schema=http://schema.org/ -i tests/output/schema.rdf relationships schema:Person
subject subject_label predicate predicate_label object object_label
schema:Person Person rdfs:subClassOf None schema:Thing Thing