OAK mappings command
This notebook is intended as a supplement to the main OAK CLI docs.
This notebook provides examples for the mappings
command, which can be used to lookup mappings that are bundled with ontologies.
Overall background on the concepts here can be found in the OAK Guide to Mappings.
Help Option
You can get help on any OAK command using --help
[1]:
!runoak mappings --help
Usage: runoak mappings [OPTIONS] [TERMS]...
List all mappings encoded in the ontology
Example:
runoak -i sqlite:obo:envo mappings
The default output is SSSOM YAML. To use the (canonical) csv format:
runoak -i sqlite:obo:envo mappings -O sssom
By default, labels are not included. Use --autolabel to include labels (but
note that if the label is not in the source ontology, then no label will be
retrieved)
runoak -i sqlite:obo:envo mappings -O sssom
To constrain the mapped object source:
runoak -i sqlite:obo:foodon mappings -O sssom --maps-to-source
SUBSET_SIREN
Python API:
https://incatools.github.io/ontology-access-kit/interfaces/mapping-
provider
Data model:
https://w3id.org/oak/mapping-provider
Options:
-o, --output FILENAME Output file, e.g. obo file
-O, --output-type TEXT Desired output type
--autolabel / --no-autolabel If set, results will automatically have labels
assigned [default: autolabel]
-M, --maps-to-source TEXT Return only mappings with subject or object
source equal to this
--mapper TEXT A selector for an adapter that is to be used
for the main lookup operation
--help Show this message and exit.
Set up an alias
For convenience we will set up an alias for use in this notebook. This will allow us to use uberon ...
rather than runoak -i sqlite:obo:uberon ...
for the rest of the notebook.
We use Uberon as an example, as Uberon bundles a lot of diverse mappings. See Uberon docs.
[2]:
alias uberon runoak -i sqlite:obo:uberon
Direct mappings for a subject term
First we will look up the mappings for the Uberon term for the CA4 region of the hippocampus
[4]:
uberon mappings UBERON:0003884
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: DHBA:10300
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: DHBA
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: EFO:0002457
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: EFO
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: EMAPA:32771
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: EMAPA
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: FMA:75741
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: FMA
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: HBA:12895
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: HBA
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: MA:0000953
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: MA
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: NCIT:C32249
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: NCIT
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: PBA:10074
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: PBA
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: UMLS:C2328406
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: UMLS
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: Wikipedia:Region_IV_of_hippocampus_proper
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: Wikipedia
---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: neuronames:181
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: neuronames
The above YAML follow the SSSOM datamodel (https://w3id.org/sssom).
We can get the results back in SSSOM tsv format (this time querying for “brain”). Here we will view it via pandas:
[11]:
uberon mappings UBERON:0000955 -o output/brain-mappings.tsv -O sssom
/Users/cjm/Library/Caches/pypoetry/virtualenvs/oaklib-OeQZizwE-py3.9/lib/python3.9/site-packages/sssom/util.py:168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
df.replace("", np.nan, inplace=True)
[13]:
import pandas as pd
df = pd.read_csv("output/brain-mappings.tsv", sep="\t", comment="#")
df
[13]:
subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification | subject_source | object_source | |
---|---|---|---|---|---|---|---|---|
0 | UBERON:0000955 | brain | oio:hasDbXref | AAO:0010478 | NaN | semapv:UnspecifiedMatching | UBERON | AAO |
1 | UBERON:0000955 | brain | oio:hasDbXref | ABA:Brain | NaN | semapv:UnspecifiedMatching | UBERON | ABA |
2 | UBERON:0000955 | brain | oio:hasDbXref | BAMS:Br | NaN | semapv:UnspecifiedMatching | UBERON | BAMS |
3 | UBERON:0000955 | brain | oio:hasDbXref | BAMS:Brain | NaN | semapv:UnspecifiedMatching | UBERON | BAMS |
4 | UBERON:0000955 | brain | oio:hasDbXref | BILA:0000135 | NaN | semapv:UnspecifiedMatching | UBERON | BILA |
5 | UBERON:0000955 | brain | oio:hasDbXref | BIRNLEX:796 | NaN | semapv:UnspecifiedMatching | UBERON | BIRNLEX |
6 | UBERON:0000955 | brain | oio:hasDbXref | BTO:0000142 | NaN | semapv:UnspecifiedMatching | UBERON | BTO |
7 | UBERON:0000955 | brain | oio:hasDbXref | CALOHA:TS-0095 | NaN | semapv:UnspecifiedMatching | UBERON | CALOHA |
8 | UBERON:0000955 | brain | oio:hasDbXref | DHBA:10155 | NaN | semapv:UnspecifiedMatching | UBERON | DHBA |
9 | UBERON:0000955 | brain | oio:hasDbXref | EFO:0000302 | NaN | semapv:UnspecifiedMatching | UBERON | EFO |
10 | UBERON:0000955 | brain | oio:hasDbXref | EHDAA2:0000183 | NaN | semapv:UnspecifiedMatching | UBERON | EHDAA2 |
11 | UBERON:0000955 | brain | oio:hasDbXref | EHDAA:2641 | NaN | semapv:UnspecifiedMatching | UBERON | EHDAA |
12 | UBERON:0000955 | brain | oio:hasDbXref | EHDAA:6485 | NaN | semapv:UnspecifiedMatching | UBERON | EHDAA |
13 | UBERON:0000955 | brain | oio:hasDbXref | EMAPA:16894 | NaN | semapv:UnspecifiedMatching | UBERON | EMAPA |
14 | UBERON:0000955 | brain | oio:hasDbXref | EV:0100164 | NaN | semapv:UnspecifiedMatching | UBERON | EV |
15 | UBERON:0000955 | brain | oio:hasDbXref | FBbt:00005095 | NaN | semapv:UnspecifiedMatching | UBERON | FBbt |
16 | UBERON:0000955 | brain | oio:hasDbXref | FMA:50801 | NaN | semapv:UnspecifiedMatching | UBERON | FMA |
17 | UBERON:0000955 | brain | oio:hasDbXref | GAID:571 | NaN | semapv:UnspecifiedMatching | UBERON | GAID |
18 | UBERON:0000955 | brain | oio:hasDbXref | HBA:4005 | NaN | semapv:UnspecifiedMatching | UBERON | HBA |
19 | UBERON:0000955 | brain | oio:hasDbXref | MA:0000168 | NaN | semapv:UnspecifiedMatching | UBERON | MA |
20 | UBERON:0000955 | brain | oio:hasDbXref | MAT:0000098 | NaN | semapv:UnspecifiedMatching | UBERON | MAT |
21 | UBERON:0000955 | brain | oio:hasDbXref | MBA:8 | NaN | semapv:UnspecifiedMatching | UBERON | MBA |
22 | UBERON:0000955 | brain | oio:hasDbXref | MBA:997 | NaN | semapv:UnspecifiedMatching | UBERON | MBA |
23 | UBERON:0000955 | brain | oio:hasDbXref | MESH:D001921 | NaN | semapv:UnspecifiedMatching | UBERON | MESH |
24 | UBERON:0000955 | brain | oio:hasDbXref | MIAA:0000098 | NaN | semapv:UnspecifiedMatching | UBERON | MIAA |
25 | UBERON:0000955 | brain | oio:hasDbXref | NCIT:C12439 | NaN | semapv:UnspecifiedMatching | UBERON | NCIT |
26 | UBERON:0000955 | brain | oio:hasDbXref | PBA:3999 | NaN | semapv:UnspecifiedMatching | UBERON | PBA |
27 | UBERON:0000955 | brain | oio:hasDbXref | SCTID:258335003 | NaN | semapv:UnspecifiedMatching | UBERON | SCTID |
28 | UBERON:0000955 | brain | oio:hasDbXref | TAO:0000008 | NaN | semapv:UnspecifiedMatching | UBERON | TAO |
29 | UBERON:0000955 | brain | oio:hasDbXref | UMLS:C0006104 | NaN | semapv:UnspecifiedMatching | UBERON | UMLS |
30 | UBERON:0000955 | brain | oio:hasDbXref | UMLS:C1269537 | NaN | semapv:UnspecifiedMatching | UBERON | UMLS |
31 | UBERON:0000955 | brain | oio:hasDbXref | VHOG:0000157 | NaN | semapv:UnspecifiedMatching | UBERON | VHOG |
32 | UBERON:0000955 | brain | oio:hasDbXref | Wikipedia:Brain | NaN | semapv:UnspecifiedMatching | UBERON | Wikipedia |
33 | UBERON:0000955 | brain | oio:hasDbXref | XAO:0000010 | NaN | semapv:UnspecifiedMatching | UBERON | XAO |
34 | UBERON:0000955 | brain | oio:hasDbXref | ZFA:0000008 | NaN | semapv:UnspecifiedMatching | UBERON | ZFA |
35 | UBERON:0000955 | brain | oio:hasDbXref | galen:Brain | NaN | semapv:UnspecifiedMatching | UBERON | galen |
36 | UBERON:0000955 | brain | oio:hasDbXref | neuronames:21 | NaN | semapv:UnspecifiedMatching | UBERON | neuronames |
37 | _:riog00027434 | NaN | oio:hasDbXref | UBERON:0000955 | brain | semapv:UnspecifiedMatching | _ | UBERON |
If we are only interested in a particular source we can use --maps-to-source
(-M
).
E.g to filter to the Allen institute Developmental Human Brain Atlas (DHBA):
[15]:
uberon mappings UBERON:0000955 -M DHBA
subject_id: UBERON:0000955
predicate_id: oio:hasDbXref
object_id: DHBA:10155
mapping_justification: semapv:UnspecifiedMatching
subject_label: brain
subject_source: UBERON
object_source: DHBA
In theory all mappings should be to CURIEs registered in bioregistry.io, but in practice different ontologies may have a number of ad-hoc unmapped targets,
Mapping via reciprocal term
We can also query Uberon for mappings to an external term:
[17]:
uberon mappings DHBA:10155
subject_id: UBERON:0000955
predicate_id: oio:hasDbXref
object_id: DHBA:10155
mapping_justification: semapv:UnspecifiedMatching
subject_label: brain
subject_source: UBERON
object_source: DHBA
Complex queries
Like most OAK commands, the mapping
command can take lists of labels, lists of IDs, or even complex query terms (which might themselves involve graphs).
For example, we can look up mappings for all brain regions:
[18]:
uberon mappings .desc//p=i,p brain -M ZFA -O sssom -o output/all-brain-zfa-mappings.tsv
/Users/cjm/Library/Caches/pypoetry/virtualenvs/oaklib-OeQZizwE-py3.9/lib/python3.9/site-packages/sssom/util.py:168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
df.replace("", np.nan, inplace=True)
[19]:
df = pd.read_csv("output/all-brain-zfa-mappings.tsv", sep="\t", comment="#")
df
[19]:
subject_id | subject_label | predicate_id | object_id | mapping_justification | subject_source | object_source | |
---|---|---|---|---|---|---|---|
0 | UBERON:0000007 | pituitary gland | oio:hasDbXref | ZFA:0000118 | semapv:UnspecifiedMatching | UBERON | ZFA |
1 | UBERON:0000203 | pallium | oio:hasDbXref | ZFA:0000505 | semapv:UnspecifiedMatching | UBERON | ZFA |
2 | UBERON:0000204 | ventral part of telencephalon | oio:hasDbXref | ZFA:0000304 | semapv:UnspecifiedMatching | UBERON | ZFA |
3 | UBERON:0000430 | ventral intermediate nucleus of thalamus | oio:hasDbXref | ZFA:0000370 | semapv:UnspecifiedMatching | UBERON | ZFA |
4 | UBERON:0000935 | anterior commissure | oio:hasDbXref | ZFA:0001108 | semapv:UnspecifiedMatching | UBERON | ZFA |
... | ... | ... | ... | ... | ... | ... | ... |
280 | UBERON:2005340 | nucleus of the posterior recess | oio:hasDbXref | ZFA:0005340 | semapv:UnspecifiedMatching | UBERON | ZFA |
281 | UBERON:2007001 | dorso-rostral cluster | oio:hasDbXref | ZFA:0007001 | semapv:UnspecifiedMatching | UBERON | ZFA |
282 | UBERON:2007002 | ventro-rostral cluster | oio:hasDbXref | ZFA:0007002 | semapv:UnspecifiedMatching | UBERON | ZFA |
283 | UBERON:2007003 | ventro-caudal cluster | oio:hasDbXref | ZFA:0007003 | semapv:UnspecifiedMatching | UBERON | ZFA |
284 | UBERON:2007004 | epiphysial cluster | oio:hasDbXref | ZFA:0007004 | semapv:UnspecifiedMatching | UBERON | ZFA |
285 rows × 7 columns
Predicates
At the time of writing most ontologies bundle their mappings as oio:hasDbXref in the ontology. Some ontologies are starting to release richer SSSOM files. Other ontologies include both xref mappings and mappings with richer skos predicates as a part of the ontology release (this allows for backwards compatibility with tools that expect xrefs, but allows more modern tools to use the richer mappings).
We will use mondo as an example here
[20]:
alias mondo runoak -i sqlite:obo:mondo
[22]:
mondo mappings MONDO:0000179 -M NCIT
ERROR:root:Skipping statements(subject=MONDO:0000179,predicate=skos:exactMatch,object=<https://omim.org/phenotypicSeries/PS256520>,value=None,datatype=None,language=None,); ValueError: <https://omim.org/phenotypicSeries/PS256520> is not a valid URI or CURIE
subject_id: MONDO:0000179
predicate_id: oio:hasDbXref
object_id: NCIT:C14089
mapping_justification: semapv:UnspecifiedMatching
subject_label: Neu-Laxova syndrome
subject_source: MONDO
object_source: NCIT
---
subject_id: MONDO:0000179
predicate_id: skos:exactMatch
object_id: NCIT:C14089
mapping_justification: semapv:UnspecifiedMatching
subject_label: Neu-Laxova syndrome
subject_source: MONDO
object_source: NCIT
Here we can see what appears to be a duplicate mapping - but this is on purpose, Mondo includes the xref for backwards compatibility, and the skos:exactMatch for more modern tools.
Generating Mappings
The lexmatch
command can be used to generate mappings between ontologies. This is a complex topic and is covered in the OAK Guide to Mappings.
See also OBO Academy section of lexmatch
Validating Mappings
See the ValidateMappings notebook for details on how to validate mappings using rule-based and LLM methods.
[ ]: