OAK validate-mappings command

This notebook is intended as a supplement to the main OAK CLI docs.

This notebook provides examples for the validate-mappings command. This forms part of a suite of validate commands.

Help Option

You can get help on any OAK command using --help

[2]:
!runoak validate-mappings --help
Usage: runoak validate-mappings [OPTIONS] [TERMS]...

  Validates mappings in ontology using additional ontologies.

  To run:

      runoak validate-mappings -i db/uberon.db

  For sssom:

      runoak validate-mappings -i db/uberon.db -o bad-mappings.sssom.tsv

  By default this will attempt to download and connect to sqlite versions of
  different ontologies.

  You can customize this:

      runoak validate-mappings -i db/uberon.db --adapter-mapping
      uberon=db/uberon.db             --adapter-mapping zfa=db/zfa.db

  You can use "*" as a wildcard, in the case where you have an application
  ontology with many mapped entities merged in:

      runoak validate-mappings -i db/uberon.db --adapter-mapping
      "*"=db/merged.db"

Options:
  --autolabel / --no-autolabel  If set, results will automatically have labels
                                assigned  [default: autolabel]
  -O, --output-type TEXT        Desired output type
  --adapter-mapping TEXT        Multiple prefix=selector pairs, e.g.
                                --adapter-mapping uberon=db/uberon.db
  -o, --output FILENAME         Output file, e.g. obo file
  --help                        Show this message and exit.

Example: Validate mappings in XAO

XAO is an anatomical ontology for Xenopus. It has mappings to ontologies like UBERON, GO, CL

[3]:
!runoak -i sqlite:obo:xao validate-mappings -O sssom -o output/xao-invalid.sssom.tsv
[5]:
import pandas as pd
df = pd.read_csv("output/xao-invalid.sssom.tsv", sep="\t", comment="#")
df
[5]:
subject_id subject_label predicate_id object_id object_label mapping_justification subject_source object_source mapping_cardinality comment
0 XAO:0000054 trunk region oio:hasDbXref UBERON:0002100 trunk semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
1 XAO:0000100 cardiovascular system oio:hasDbXref UBERON:0004535 cardiovascular system semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
2 XAO:0000204 peripheral nerve oio:hasDbXref UBERON:0002003 obsolete peripheral nerve semapv:UnspecifiedMatching XAO UBERON 1:1 object is obsolete
3 XAO:0000227 eye primordium oio:hasDbXref UBERON:0003071 eye primordium semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
4 XAO:0000282 visceral pouch oio:hasDbXref UBERON:0004117 pharyngeal pouch semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
5 XAO:0000376 omphalomesenteric vein oio:hasDbXref UBERON:0005487 vitelline vein semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
6 XAO:0000427 gasserian ganglion oio:hasDbXref UBERON:0001675 trigeminal ganglion semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
7 XAO:0000428 trigeminal ganglion oio:hasDbXref UBERON:0001675 trigeminal ganglion semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
8 XAO:0001010 circulatory system oio:hasDbXref UBERON:0004535 cardiovascular system semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
9 XAO:0003001 anatomical group oio:hasDbXref CARO:0000054 anatomical group semapv:UnspecifiedMatching XAO CARO 1:1 object is obsolete
10 XAO:0003012 cell oio:hasDbXref GO:0005623 obsolete cell semapv:UnspecifiedMatching XAO GO 1:1 object is obsolete
11 XAO:0003025 trunk oio:hasDbXref UBERON:0002100 trunk semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
12 XAO:0003160 anatomical cluster oio:hasDbXref CARO:0000041 anatomical cluster semapv:UnspecifiedMatching XAO CARO 1:1 object is obsolete
13 XAO:0003163 basal lamina oio:hasDbXref CARO:0000065 basal lamina semapv:UnspecifiedMatching XAO CARO 1:1 object is obsolete
14 XAO:0003257 myelin accumulating cell oio:hasDbXref CL:0000328 obsolete myelin accumulating cell semapv:UnspecifiedMatching XAO CL 1:1 object is obsolete
15 XAO:0004090 optic field oio:hasDbXref UBERON:0003071 eye primordium semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
16 XAO:0004147 vitelline vein oio:hasDbXref UBERON:0005487 vitelline vein semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
17 XAO:0004165 intersomitic artery oio:hasDbXref UBERON:0006001 NaN semapv:UnspecifiedMatching XAO UBERON 1:1 object is obsolete
18 XAO:0004260 pharyngeal pouch oio:hasDbXref UBERON:0004117 pharyngeal pouch semapv:UnspecifiedMatching XAO UBERON 1:n cardinality is 1:n
19 XAO:0004290 cell part oio:hasDbXref GO:0044464 obsolete cell part semapv:UnspecifiedMatching XAO GO 1:1 object is obsolete
20 XAO:0004615 basal body oio:hasDbXref GO:0005932 NaN semapv:UnspecifiedMatching XAO GO 1:1 object is obsolete
21 XAO:0004621 epidermal cell oio:hasDbXref CL:1000396 NaN semapv:UnspecifiedMatching XAO CL 1:1 object is obsolete
22 XAO:0005007 Muller cell oio:hasDbXref CL:0011107 obsolete Muller cell semapv:UnspecifiedMatching XAO CL 1:1 object is obsolete
23 XAO:1000007 tailbud stage oio:hasDbXref UBERON:0009741 obsolete tailbud stage semapv:UnspecifiedMatching XAO UBERON 1:1 object is obsolete

Here we can see a mixture of cardinality issues and obsoletion issues

Note that behind the scenes this command connected to external ontologies such as GO, CL, and UBERON in order to check obsoletion status etc.

[6]:
df["comment"].unique()
[6]:
array(['cardinality is 1:n', 'object is obsolete'], dtype=object)
[11]:
df.groupby("comment").size().reset_index(name='counts')
[11]:
comment counts
0 cardinality is 1:n 12
1 object is obsolete 12

Example: CL

CL has a broader range of mappings, in the ontology as xrefs

[14]:
!runoak --quiet -i sqlite:obo:cl validate-mappings -O sssom -o output/cl-invalid.sssom.tsv >& output/LOG
[15]:
df = pd.read_csv("output/cl-invalid.sssom.tsv", sep="\t", comment="#")
df
[15]:
subject_id subject_label predicate_id object_id object_label mapping_justification subject_source object_source mapping_cardinality comment
0 CARO:0000013 cell oio:hasDbXref GO:0005623 obsolete cell semapv:UnspecifiedMatching CARO GO 1:1 object is obsolete
1 CL:0000000 cell oio:hasDbXref GO:0005623 obsolete cell semapv:UnspecifiedMatching CL GO 1:1 object is obsolete
2 CL:0000019 sperm oio:hasDbXref BTO:0001277 spermatozoon semapv:UnspecifiedMatching CL BTO n:n cardinality is n:n
3 CL:0000019 sperm oio:hasDbXref BTO:0002046 spermatozoid semapv:UnspecifiedMatching CL BTO n:1 cardinality is n:1
4 CL:0000019 sperm oio:hasDbXref CALOHA:TS-0949 NaN semapv:UnspecifiedMatching CL CALOHA 1:n cardinality is 1:n
... ... ... ... ... ... ... ... ... ... ...
3824 UBERON:2001316 anterior lateral line placode oio:hasDbXref EFO:0003461 obsolete_anterior lateral line placode semapv:UnspecifiedMatching UBERON EFO 1:1 object is obsolete
3825 UBERON:2001389 regeneration epithelium of fin/limb oio:hasDbXref EFO:0003682 obsolete_regeneration epithelium semapv:UnspecifiedMatching UBERON EFO 1:1 object is obsolete
3826 UBERON:2001391 anterior lateral line ganglion oio:hasDbXref EFO:0003683 obsolete_anterior lateral line ganglion semapv:UnspecifiedMatching UBERON EFO 1:1 object is obsolete
3827 UBERON:2001468 anterior lateral line system oio:hasDbXref EFO:0003691 obsolete_anterior lateral line system semapv:UnspecifiedMatching UBERON EFO 1:1 object is obsolete
3828 UBERON:6000004 panarthropod head oio:hasDbXref FBbt:00000004 head semapv:UnspecifiedMatching UBERON FBbt 1:n cardinality is 1:n

3829 rows × 10 columns

[16]:
df.groupby("comment").size().reset_index(name='counts')
[16]:
comment counts
0 cardinality is 1:n 608
1 cardinality is n:1 2588
2 cardinality is n:n 155
3 object is obsolete 438
4 object is obsolete | cardinality is n:1 9
5 subject is obsolete 13
6 subject is obsolete | cardinality is 1:n 11
7 subject is obsolete | object is obsolete 2

We can summarize these in groups:

[18]:
df.groupby(["comment", "subject_source", "object_source"]).size().reset_index(name='counts')
[18]:
comment subject_source object_source counts
0 cardinality is 1:n CL BTO 35
1 cardinality is 1:n CL CALOHA 25
2 cardinality is 1:n CL FAO 3
3 cardinality is 1:n CL FMA 20
4 cardinality is 1:n CL GOC 4
... ... ... ... ...
121 subject is obsolete | cardinality is 1:n CL FAO 1
122 subject is obsolete | cardinality is 1:n CL FMA 3
123 subject is obsolete | cardinality is 1:n CL ILX 6
124 subject is obsolete | object is obsolete CL FBbt 1
125 subject is obsolete | object is obsolete RO RO 1

126 rows × 4 columns

[ ]: