OAK mappings command

This notebook is intended as a supplement to the main OAK CLI docs.

This notebook provides examples for the mappings command, which can be used to lookup mappings that are bundled with ontologies.

Overall background on the concepts here can be found in the OAK Guide to Mappings.

Help Option

You can get help on any OAK command using --help

[1]:
!runoak mappings --help
Usage: runoak mappings [OPTIONS] [TERMS]...

  List all mappings encoded in the ontology

  Example:

      runoak -i sqlite:obo:envo mappings

  The default output is SSSOM YAML. To use the (canonical) csv format:

      runoak -i sqlite:obo:envo mappings -O sssom

  By default, labels are not included. Use --autolabel to include labels (but
  note that if the label is not in the source ontology, then no label will be
  retrieved)

      runoak -i sqlite:obo:envo mappings -O sssom

  To constrain the mapped object source:

      runoak -i sqlite:obo:foodon mappings -O sssom --maps-to-source
      SUBSET_SIREN

  Python API:

     https://incatools.github.io/ontology-access-kit/interfaces/mapping-
     provider

  Data model:

     https://w3id.org/oak/mapping-provider

Options:
  -o, --output FILENAME         Output file, e.g. obo file
  -O, --output-type TEXT        Desired output type
  --autolabel / --no-autolabel  If set, results will automatically have labels
                                assigned  [default: autolabel]
  -M, --maps-to-source TEXT     Return only mappings with subject or object
                                source equal to this
  --mapper TEXT                 A selector for an adapter that is to be used
                                for the main lookup operation
  --help                        Show this message and exit.

Set up an alias

For convenience we will set up an alias for use in this notebook. This will allow us to use uberon ... rather than runoak -i sqlite:obo:uberon ... for the rest of the notebook.

We use Uberon as an example, as Uberon bundles a lot of diverse mappings. See Uberon docs.

[2]:
alias uberon runoak -i sqlite:obo:uberon

Direct mappings for a subject term

First we will look up the mappings for the Uberon term for the CA4 region of the hippocampus

[4]:
uberon mappings UBERON:0003884
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: DHBA:10300
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: DHBA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: EFO:0002457
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: EFO

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: EMAPA:32771
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: EMAPA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: FMA:75741
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: FMA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: HBA:12895
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: HBA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: MA:0000953
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: MA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: NCIT:C32249
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: NCIT

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: PBA:10074
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: PBA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: UMLS:C2328406
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: UMLS

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: Wikipedia:Region_IV_of_hippocampus_proper
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: Wikipedia

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: neuronames:181
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: neuronames

The above YAML follow the SSSOM datamodel (https://w3id.org/sssom).

We can get the results back in SSSOM tsv format (this time querying for “brain”). Here we will view it via pandas:

[11]:
uberon mappings UBERON:0000955 -o output/brain-mappings.tsv -O sssom
/Users/cjm/Library/Caches/pypoetry/virtualenvs/oaklib-OeQZizwE-py3.9/lib/python3.9/site-packages/sssom/util.py:168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df.replace("", np.nan, inplace=True)
[13]:
import pandas as pd
df = pd.read_csv("output/brain-mappings.tsv", sep="\t", comment="#")
df
[13]:
subject_id subject_label predicate_id object_id object_label mapping_justification subject_source object_source
0 UBERON:0000955 brain oio:hasDbXref AAO:0010478 NaN semapv:UnspecifiedMatching UBERON AAO
1 UBERON:0000955 brain oio:hasDbXref ABA:Brain NaN semapv:UnspecifiedMatching UBERON ABA
2 UBERON:0000955 brain oio:hasDbXref BAMS:Br NaN semapv:UnspecifiedMatching UBERON BAMS
3 UBERON:0000955 brain oio:hasDbXref BAMS:Brain NaN semapv:UnspecifiedMatching UBERON BAMS
4 UBERON:0000955 brain oio:hasDbXref BILA:0000135 NaN semapv:UnspecifiedMatching UBERON BILA
5 UBERON:0000955 brain oio:hasDbXref BIRNLEX:796 NaN semapv:UnspecifiedMatching UBERON BIRNLEX
6 UBERON:0000955 brain oio:hasDbXref BTO:0000142 NaN semapv:UnspecifiedMatching UBERON BTO
7 UBERON:0000955 brain oio:hasDbXref CALOHA:TS-0095 NaN semapv:UnspecifiedMatching UBERON CALOHA
8 UBERON:0000955 brain oio:hasDbXref DHBA:10155 NaN semapv:UnspecifiedMatching UBERON DHBA
9 UBERON:0000955 brain oio:hasDbXref EFO:0000302 NaN semapv:UnspecifiedMatching UBERON EFO
10 UBERON:0000955 brain oio:hasDbXref EHDAA2:0000183 NaN semapv:UnspecifiedMatching UBERON EHDAA2
11 UBERON:0000955 brain oio:hasDbXref EHDAA:2641 NaN semapv:UnspecifiedMatching UBERON EHDAA
12 UBERON:0000955 brain oio:hasDbXref EHDAA:6485 NaN semapv:UnspecifiedMatching UBERON EHDAA
13 UBERON:0000955 brain oio:hasDbXref EMAPA:16894 NaN semapv:UnspecifiedMatching UBERON EMAPA
14 UBERON:0000955 brain oio:hasDbXref EV:0100164 NaN semapv:UnspecifiedMatching UBERON EV
15 UBERON:0000955 brain oio:hasDbXref FBbt:00005095 NaN semapv:UnspecifiedMatching UBERON FBbt
16 UBERON:0000955 brain oio:hasDbXref FMA:50801 NaN semapv:UnspecifiedMatching UBERON FMA
17 UBERON:0000955 brain oio:hasDbXref GAID:571 NaN semapv:UnspecifiedMatching UBERON GAID
18 UBERON:0000955 brain oio:hasDbXref HBA:4005 NaN semapv:UnspecifiedMatching UBERON HBA
19 UBERON:0000955 brain oio:hasDbXref MA:0000168 NaN semapv:UnspecifiedMatching UBERON MA
20 UBERON:0000955 brain oio:hasDbXref MAT:0000098 NaN semapv:UnspecifiedMatching UBERON MAT
21 UBERON:0000955 brain oio:hasDbXref MBA:8 NaN semapv:UnspecifiedMatching UBERON MBA
22 UBERON:0000955 brain oio:hasDbXref MBA:997 NaN semapv:UnspecifiedMatching UBERON MBA
23 UBERON:0000955 brain oio:hasDbXref MESH:D001921 NaN semapv:UnspecifiedMatching UBERON MESH
24 UBERON:0000955 brain oio:hasDbXref MIAA:0000098 NaN semapv:UnspecifiedMatching UBERON MIAA
25 UBERON:0000955 brain oio:hasDbXref NCIT:C12439 NaN semapv:UnspecifiedMatching UBERON NCIT
26 UBERON:0000955 brain oio:hasDbXref PBA:3999 NaN semapv:UnspecifiedMatching UBERON PBA
27 UBERON:0000955 brain oio:hasDbXref SCTID:258335003 NaN semapv:UnspecifiedMatching UBERON SCTID
28 UBERON:0000955 brain oio:hasDbXref TAO:0000008 NaN semapv:UnspecifiedMatching UBERON TAO
29 UBERON:0000955 brain oio:hasDbXref UMLS:C0006104 NaN semapv:UnspecifiedMatching UBERON UMLS
30 UBERON:0000955 brain oio:hasDbXref UMLS:C1269537 NaN semapv:UnspecifiedMatching UBERON UMLS
31 UBERON:0000955 brain oio:hasDbXref VHOG:0000157 NaN semapv:UnspecifiedMatching UBERON VHOG
32 UBERON:0000955 brain oio:hasDbXref Wikipedia:Brain NaN semapv:UnspecifiedMatching UBERON Wikipedia
33 UBERON:0000955 brain oio:hasDbXref XAO:0000010 NaN semapv:UnspecifiedMatching UBERON XAO
34 UBERON:0000955 brain oio:hasDbXref ZFA:0000008 NaN semapv:UnspecifiedMatching UBERON ZFA
35 UBERON:0000955 brain oio:hasDbXref galen:Brain NaN semapv:UnspecifiedMatching UBERON galen
36 UBERON:0000955 brain oio:hasDbXref neuronames:21 NaN semapv:UnspecifiedMatching UBERON neuronames
37 _:riog00027434 NaN oio:hasDbXref UBERON:0000955 brain semapv:UnspecifiedMatching _ UBERON

If we are only interested in a particular source we can use --maps-to-source (-M).

E.g to filter to the Allen institute Developmental Human Brain Atlas (DHBA):

[15]:
 uberon mappings UBERON:0000955 -M DHBA
subject_id: UBERON:0000955
predicate_id: oio:hasDbXref
object_id: DHBA:10155
mapping_justification: semapv:UnspecifiedMatching
subject_label: brain
subject_source: UBERON
object_source: DHBA

In theory all mappings should be to CURIEs registered in bioregistry.io, but in practice different ontologies may have a number of ad-hoc unmapped targets,

Mapping via reciprocal term

We can also query Uberon for mappings to an external term:

[17]:
 uberon mappings DHBA:10155
subject_id: UBERON:0000955
predicate_id: oio:hasDbXref
object_id: DHBA:10155
mapping_justification: semapv:UnspecifiedMatching
subject_label: brain
subject_source: UBERON
object_source: DHBA

Complex queries

Like most OAK commands, the mapping command can take lists of labels, lists of IDs, or even complex query terms (which might themselves involve graphs).

For example, we can look up mappings for all brain regions:

[18]:
uberon mappings .desc//p=i,p brain -M ZFA -O sssom -o output/all-brain-zfa-mappings.tsv
/Users/cjm/Library/Caches/pypoetry/virtualenvs/oaklib-OeQZizwE-py3.9/lib/python3.9/site-packages/sssom/util.py:168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df.replace("", np.nan, inplace=True)
[19]:
df = pd.read_csv("output/all-brain-zfa-mappings.tsv", sep="\t", comment="#")
df
[19]:
subject_id subject_label predicate_id object_id mapping_justification subject_source object_source
0 UBERON:0000007 pituitary gland oio:hasDbXref ZFA:0000118 semapv:UnspecifiedMatching UBERON ZFA
1 UBERON:0000203 pallium oio:hasDbXref ZFA:0000505 semapv:UnspecifiedMatching UBERON ZFA
2 UBERON:0000204 ventral part of telencephalon oio:hasDbXref ZFA:0000304 semapv:UnspecifiedMatching UBERON ZFA
3 UBERON:0000430 ventral intermediate nucleus of thalamus oio:hasDbXref ZFA:0000370 semapv:UnspecifiedMatching UBERON ZFA
4 UBERON:0000935 anterior commissure oio:hasDbXref ZFA:0001108 semapv:UnspecifiedMatching UBERON ZFA
... ... ... ... ... ... ... ...
280 UBERON:2005340 nucleus of the posterior recess oio:hasDbXref ZFA:0005340 semapv:UnspecifiedMatching UBERON ZFA
281 UBERON:2007001 dorso-rostral cluster oio:hasDbXref ZFA:0007001 semapv:UnspecifiedMatching UBERON ZFA
282 UBERON:2007002 ventro-rostral cluster oio:hasDbXref ZFA:0007002 semapv:UnspecifiedMatching UBERON ZFA
283 UBERON:2007003 ventro-caudal cluster oio:hasDbXref ZFA:0007003 semapv:UnspecifiedMatching UBERON ZFA
284 UBERON:2007004 epiphysial cluster oio:hasDbXref ZFA:0007004 semapv:UnspecifiedMatching UBERON ZFA

285 rows × 7 columns

Predicates

At the time of writing most ontologies bundle their mappings as oio:hasDbXref in the ontology. Some ontologies are starting to release richer SSSOM files. Other ontologies include both xref mappings and mappings with richer skos predicates as a part of the ontology release (this allows for backwards compatibility with tools that expect xrefs, but allows more modern tools to use the richer mappings).

We will use mondo as an example here

[20]:
alias mondo runoak -i sqlite:obo:mondo
[22]:
mondo mappings  MONDO:0000179 -M NCIT
ERROR:root:Skipping statements(subject=MONDO:0000179,predicate=skos:exactMatch,object=<https://omim.org/phenotypicSeries/PS256520>,value=None,datatype=None,language=None,); ValueError: <https://omim.org/phenotypicSeries/PS256520> is not a valid URI or CURIE
subject_id: MONDO:0000179
predicate_id: oio:hasDbXref
object_id: NCIT:C14089
mapping_justification: semapv:UnspecifiedMatching
subject_label: Neu-Laxova syndrome
subject_source: MONDO
object_source: NCIT

---
subject_id: MONDO:0000179
predicate_id: skos:exactMatch
object_id: NCIT:C14089
mapping_justification: semapv:UnspecifiedMatching
subject_label: Neu-Laxova syndrome
subject_source: MONDO
object_source: NCIT

Here we can see what appears to be a duplicate mapping - but this is on purpose, Mondo includes the xref for backwards compatibility, and the skos:exactMatch for more modern tools.

Generating Mappings

The lexmatch command can be used to generate mappings between ontologies. This is a complex topic and is covered in the OAK Guide to Mappings.

See also OBO Academy section of lexmatch

Validating Mappings

See the ValidateMappings notebook for details on how to validate mappings using rule-based and LLM methods.

[ ]: