OAK validate-mappings command
This notebook is intended as a supplement to the main OAK CLI docs.
This notebook provides examples for the validate-mappings
command. This forms part of a suite of validate commands.
Help Option
You can get help on any OAK command using --help
[2]:
!runoak validate-mappings --help
Usage: runoak validate-mappings [OPTIONS] [TERMS]...
Validates mappings in ontology using additional ontologies.
To run:
runoak validate-mappings -i db/uberon.db
For sssom:
runoak validate-mappings -i db/uberon.db -o bad-mappings.sssom.tsv
By default this will attempt to download and connect to sqlite versions of
different ontologies.
You can customize this:
runoak validate-mappings -i db/uberon.db --adapter-mapping
uberon=db/uberon.db --adapter-mapping zfa=db/zfa.db
You can use "*" as a wildcard, in the case where you have an application
ontology with many mapped entities merged in:
runoak validate-mappings -i db/uberon.db --adapter-mapping
"*"=db/merged.db"
Options:
--autolabel / --no-autolabel If set, results will automatically have labels
assigned [default: autolabel]
-O, --output-type TEXT Desired output type
--adapter-mapping TEXT Multiple prefix=selector pairs, e.g.
--adapter-mapping uberon=db/uberon.db
-o, --output FILENAME Output file, e.g. obo file
--help Show this message and exit.
Example: Validate mappings in XAO
XAO is an anatomical ontology for Xenopus. It has mappings to ontologies like UBERON, GO, CL
[3]:
!runoak -i sqlite:obo:xao validate-mappings -O sssom -o output/xao-invalid.sssom.tsv
[5]:
import pandas as pd
df = pd.read_csv("output/xao-invalid.sssom.tsv", sep="\t", comment="#")
df
[5]:
subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification | subject_source | object_source | mapping_cardinality | comment | |
---|---|---|---|---|---|---|---|---|---|---|
0 | XAO:0000054 | trunk region | oio:hasDbXref | UBERON:0002100 | trunk | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
1 | XAO:0000100 | cardiovascular system | oio:hasDbXref | UBERON:0004535 | cardiovascular system | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
2 | XAO:0000204 | peripheral nerve | oio:hasDbXref | UBERON:0002003 | obsolete peripheral nerve | semapv:UnspecifiedMatching | XAO | UBERON | 1:1 | object is obsolete |
3 | XAO:0000227 | eye primordium | oio:hasDbXref | UBERON:0003071 | eye primordium | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
4 | XAO:0000282 | visceral pouch | oio:hasDbXref | UBERON:0004117 | pharyngeal pouch | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
5 | XAO:0000376 | omphalomesenteric vein | oio:hasDbXref | UBERON:0005487 | vitelline vein | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
6 | XAO:0000427 | gasserian ganglion | oio:hasDbXref | UBERON:0001675 | trigeminal ganglion | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
7 | XAO:0000428 | trigeminal ganglion | oio:hasDbXref | UBERON:0001675 | trigeminal ganglion | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
8 | XAO:0001010 | circulatory system | oio:hasDbXref | UBERON:0004535 | cardiovascular system | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
9 | XAO:0003001 | anatomical group | oio:hasDbXref | CARO:0000054 | anatomical group | semapv:UnspecifiedMatching | XAO | CARO | 1:1 | object is obsolete |
10 | XAO:0003012 | cell | oio:hasDbXref | GO:0005623 | obsolete cell | semapv:UnspecifiedMatching | XAO | GO | 1:1 | object is obsolete |
11 | XAO:0003025 | trunk | oio:hasDbXref | UBERON:0002100 | trunk | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
12 | XAO:0003160 | anatomical cluster | oio:hasDbXref | CARO:0000041 | anatomical cluster | semapv:UnspecifiedMatching | XAO | CARO | 1:1 | object is obsolete |
13 | XAO:0003163 | basal lamina | oio:hasDbXref | CARO:0000065 | basal lamina | semapv:UnspecifiedMatching | XAO | CARO | 1:1 | object is obsolete |
14 | XAO:0003257 | myelin accumulating cell | oio:hasDbXref | CL:0000328 | obsolete myelin accumulating cell | semapv:UnspecifiedMatching | XAO | CL | 1:1 | object is obsolete |
15 | XAO:0004090 | optic field | oio:hasDbXref | UBERON:0003071 | eye primordium | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
16 | XAO:0004147 | vitelline vein | oio:hasDbXref | UBERON:0005487 | vitelline vein | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
17 | XAO:0004165 | intersomitic artery | oio:hasDbXref | UBERON:0006001 | NaN | semapv:UnspecifiedMatching | XAO | UBERON | 1:1 | object is obsolete |
18 | XAO:0004260 | pharyngeal pouch | oio:hasDbXref | UBERON:0004117 | pharyngeal pouch | semapv:UnspecifiedMatching | XAO | UBERON | 1:n | cardinality is 1:n |
19 | XAO:0004290 | cell part | oio:hasDbXref | GO:0044464 | obsolete cell part | semapv:UnspecifiedMatching | XAO | GO | 1:1 | object is obsolete |
20 | XAO:0004615 | basal body | oio:hasDbXref | GO:0005932 | NaN | semapv:UnspecifiedMatching | XAO | GO | 1:1 | object is obsolete |
21 | XAO:0004621 | epidermal cell | oio:hasDbXref | CL:1000396 | NaN | semapv:UnspecifiedMatching | XAO | CL | 1:1 | object is obsolete |
22 | XAO:0005007 | Muller cell | oio:hasDbXref | CL:0011107 | obsolete Muller cell | semapv:UnspecifiedMatching | XAO | CL | 1:1 | object is obsolete |
23 | XAO:1000007 | tailbud stage | oio:hasDbXref | UBERON:0009741 | obsolete tailbud stage | semapv:UnspecifiedMatching | XAO | UBERON | 1:1 | object is obsolete |
Here we can see a mixture of cardinality issues and obsoletion issues
Note that behind the scenes this command connected to external ontologies such as GO, CL, and UBERON in order to check obsoletion status etc.
[6]:
df["comment"].unique()
[6]:
array(['cardinality is 1:n', 'object is obsolete'], dtype=object)
[11]:
df.groupby("comment").size().reset_index(name='counts')
[11]:
comment | counts | |
---|---|---|
0 | cardinality is 1:n | 12 |
1 | object is obsolete | 12 |
Example: CL
CL has a broader range of mappings, in the ontology as xrefs
[14]:
!runoak --quiet -i sqlite:obo:cl validate-mappings -O sssom -o output/cl-invalid.sssom.tsv >& output/LOG
[15]:
df = pd.read_csv("output/cl-invalid.sssom.tsv", sep="\t", comment="#")
df
[15]:
subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification | subject_source | object_source | mapping_cardinality | comment | |
---|---|---|---|---|---|---|---|---|---|---|
0 | CARO:0000013 | cell | oio:hasDbXref | GO:0005623 | obsolete cell | semapv:UnspecifiedMatching | CARO | GO | 1:1 | object is obsolete |
1 | CL:0000000 | cell | oio:hasDbXref | GO:0005623 | obsolete cell | semapv:UnspecifiedMatching | CL | GO | 1:1 | object is obsolete |
2 | CL:0000019 | sperm | oio:hasDbXref | BTO:0001277 | spermatozoon | semapv:UnspecifiedMatching | CL | BTO | n:n | cardinality is n:n |
3 | CL:0000019 | sperm | oio:hasDbXref | BTO:0002046 | spermatozoid | semapv:UnspecifiedMatching | CL | BTO | n:1 | cardinality is n:1 |
4 | CL:0000019 | sperm | oio:hasDbXref | CALOHA:TS-0949 | NaN | semapv:UnspecifiedMatching | CL | CALOHA | 1:n | cardinality is 1:n |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3824 | UBERON:2001316 | anterior lateral line placode | oio:hasDbXref | EFO:0003461 | obsolete_anterior lateral line placode | semapv:UnspecifiedMatching | UBERON | EFO | 1:1 | object is obsolete |
3825 | UBERON:2001389 | regeneration epithelium of fin/limb | oio:hasDbXref | EFO:0003682 | obsolete_regeneration epithelium | semapv:UnspecifiedMatching | UBERON | EFO | 1:1 | object is obsolete |
3826 | UBERON:2001391 | anterior lateral line ganglion | oio:hasDbXref | EFO:0003683 | obsolete_anterior lateral line ganglion | semapv:UnspecifiedMatching | UBERON | EFO | 1:1 | object is obsolete |
3827 | UBERON:2001468 | anterior lateral line system | oio:hasDbXref | EFO:0003691 | obsolete_anterior lateral line system | semapv:UnspecifiedMatching | UBERON | EFO | 1:1 | object is obsolete |
3828 | UBERON:6000004 | panarthropod head | oio:hasDbXref | FBbt:00000004 | head | semapv:UnspecifiedMatching | UBERON | FBbt | 1:n | cardinality is 1:n |
3829 rows × 10 columns
[16]:
df.groupby("comment").size().reset_index(name='counts')
[16]:
comment | counts | |
---|---|---|
0 | cardinality is 1:n | 608 |
1 | cardinality is n:1 | 2588 |
2 | cardinality is n:n | 155 |
3 | object is obsolete | 438 |
4 | object is obsolete | cardinality is n:1 | 9 |
5 | subject is obsolete | 13 |
6 | subject is obsolete | cardinality is 1:n | 11 |
7 | subject is obsolete | object is obsolete | 2 |
We can summarize these in groups:
[18]:
df.groupby(["comment", "subject_source", "object_source"]).size().reset_index(name='counts')
[18]:
comment | subject_source | object_source | counts | |
---|---|---|---|---|
0 | cardinality is 1:n | CL | BTO | 35 |
1 | cardinality is 1:n | CL | CALOHA | 25 |
2 | cardinality is 1:n | CL | FAO | 3 |
3 | cardinality is 1:n | CL | FMA | 20 |
4 | cardinality is 1:n | CL | GOC | 4 |
... | ... | ... | ... | ... |
121 | subject is obsolete | cardinality is 1:n | CL | FAO | 1 |
122 | subject is obsolete | cardinality is 1:n | CL | FMA | 3 |
123 | subject is obsolete | cardinality is 1:n | CL | ILX | 6 |
124 | subject is obsolete | object is obsolete | CL | FBbt | 1 |
125 | subject is obsolete | object is obsolete | RO | RO | 1 |
126 rows × 4 columns
[ ]: