{
"cells": [
{
"cell_type": "markdown",
"id": "0a28b88d-4deb-4d0a-a110-f27adf077e23",
"metadata": {},
"source": [
"# OAK validate-mappings command\n",
"\n",
"This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n",
"\n",
"This notebook provides examples for the `validate-mappings` command.\n",
"This forms part of a suite of *validate* commands.\n",
" \n",
"## Help Option\n",
"\n",
"You can get help on any OAK command using `--help`"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c223f678-f82f-4b06-8e19-1a5b7323e571",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Usage: runoak validate-mappings [OPTIONS] [TERMS]...\n",
"\n",
" Validates mappings in ontology using additional ontologies.\n",
"\n",
" To run:\n",
"\n",
" runoak validate-mappings -i db/uberon.db\n",
"\n",
" For sssom:\n",
"\n",
" runoak validate-mappings -i db/uberon.db -o bad-mappings.sssom.tsv\n",
"\n",
" By default this will attempt to download and connect to sqlite versions of\n",
" different ontologies.\n",
"\n",
" You can customize this:\n",
"\n",
" runoak validate-mappings -i db/uberon.db --adapter-mapping\n",
" uberon=db/uberon.db --adapter-mapping zfa=db/zfa.db\n",
"\n",
" You can use \"*\" as a wildcard, in the case where you have an application\n",
" ontology with many mapped entities merged in:\n",
"\n",
" runoak validate-mappings -i db/uberon.db --adapter-mapping\n",
" \"*\"=db/merged.db\"\n",
"\n",
"Options:\n",
" --autolabel / --no-autolabel If set, results will automatically have labels\n",
" assigned [default: autolabel]\n",
" -O, --output-type TEXT Desired output type\n",
" --adapter-mapping TEXT Multiple prefix=selector pairs, e.g.\n",
" --adapter-mapping uberon=db/uberon.db\n",
" -o, --output FILENAME Output file, e.g. obo file\n",
" --help Show this message and exit.\n"
]
}
],
"source": [
"!runoak validate-mappings --help"
]
},
{
"cell_type": "markdown",
"id": "01f38163-db22-4c51-ae46-10e8b8e6d53c",
"metadata": {},
"source": [
"## Example: Validate mappings in XAO\n",
"\n",
"XAO is an anatomical ontology for *Xenopus*. It has mappings to ontologies like UBERON, GO, CL"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c9b86e52-87a7-449c-baac-81981e7ce632",
"metadata": {},
"outputs": [],
"source": [
"!runoak -i sqlite:obo:xao validate-mappings -O sssom -o output/xao-invalid.sssom.tsv"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5fc9b15d-cc81-400a-8660-f92491baa120",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" subject_id | \n",
" subject_label | \n",
" predicate_id | \n",
" object_id | \n",
" object_label | \n",
" mapping_justification | \n",
" subject_source | \n",
" object_source | \n",
" mapping_cardinality | \n",
" comment | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" XAO:0000054 | \n",
" trunk region | \n",
" oio:hasDbXref | \n",
" UBERON:0002100 | \n",
" trunk | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 1 | \n",
" XAO:0000100 | \n",
" cardiovascular system | \n",
" oio:hasDbXref | \n",
" UBERON:0004535 | \n",
" cardiovascular system | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 2 | \n",
" XAO:0000204 | \n",
" peripheral nerve | \n",
" oio:hasDbXref | \n",
" UBERON:0002003 | \n",
" obsolete peripheral nerve | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 3 | \n",
" XAO:0000227 | \n",
" eye primordium | \n",
" oio:hasDbXref | \n",
" UBERON:0003071 | \n",
" eye primordium | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 4 | \n",
" XAO:0000282 | \n",
" visceral pouch | \n",
" oio:hasDbXref | \n",
" UBERON:0004117 | \n",
" pharyngeal pouch | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 5 | \n",
" XAO:0000376 | \n",
" omphalomesenteric vein | \n",
" oio:hasDbXref | \n",
" UBERON:0005487 | \n",
" vitelline vein | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 6 | \n",
" XAO:0000427 | \n",
" gasserian ganglion | \n",
" oio:hasDbXref | \n",
" UBERON:0001675 | \n",
" trigeminal ganglion | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 7 | \n",
" XAO:0000428 | \n",
" trigeminal ganglion | \n",
" oio:hasDbXref | \n",
" UBERON:0001675 | \n",
" trigeminal ganglion | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 8 | \n",
" XAO:0001010 | \n",
" circulatory system | \n",
" oio:hasDbXref | \n",
" UBERON:0004535 | \n",
" cardiovascular system | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 9 | \n",
" XAO:0003001 | \n",
" anatomical group | \n",
" oio:hasDbXref | \n",
" CARO:0000054 | \n",
" anatomical group | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" CARO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 10 | \n",
" XAO:0003012 | \n",
" cell | \n",
" oio:hasDbXref | \n",
" GO:0005623 | \n",
" obsolete cell | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" GO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 11 | \n",
" XAO:0003025 | \n",
" trunk | \n",
" oio:hasDbXref | \n",
" UBERON:0002100 | \n",
" trunk | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 12 | \n",
" XAO:0003160 | \n",
" anatomical cluster | \n",
" oio:hasDbXref | \n",
" CARO:0000041 | \n",
" anatomical cluster | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" CARO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 13 | \n",
" XAO:0003163 | \n",
" basal lamina | \n",
" oio:hasDbXref | \n",
" CARO:0000065 | \n",
" basal lamina | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" CARO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 14 | \n",
" XAO:0003257 | \n",
" myelin accumulating cell | \n",
" oio:hasDbXref | \n",
" CL:0000328 | \n",
" obsolete myelin accumulating cell | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" CL | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 15 | \n",
" XAO:0004090 | \n",
" optic field | \n",
" oio:hasDbXref | \n",
" UBERON:0003071 | \n",
" eye primordium | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 16 | \n",
" XAO:0004147 | \n",
" vitelline vein | \n",
" oio:hasDbXref | \n",
" UBERON:0005487 | \n",
" vitelline vein | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 17 | \n",
" XAO:0004165 | \n",
" intersomitic artery | \n",
" oio:hasDbXref | \n",
" UBERON:0006001 | \n",
" NaN | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 18 | \n",
" XAO:0004260 | \n",
" pharyngeal pouch | \n",
" oio:hasDbXref | \n",
" UBERON:0004117 | \n",
" pharyngeal pouch | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" 19 | \n",
" XAO:0004290 | \n",
" cell part | \n",
" oio:hasDbXref | \n",
" GO:0044464 | \n",
" obsolete cell part | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" GO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 20 | \n",
" XAO:0004615 | \n",
" basal body | \n",
" oio:hasDbXref | \n",
" GO:0005932 | \n",
" NaN | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" GO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 21 | \n",
" XAO:0004621 | \n",
" epidermal cell | \n",
" oio:hasDbXref | \n",
" CL:1000396 | \n",
" NaN | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" CL | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 22 | \n",
" XAO:0005007 | \n",
" Muller cell | \n",
" oio:hasDbXref | \n",
" CL:0011107 | \n",
" obsolete Muller cell | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" CL | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 23 | \n",
" XAO:1000007 | \n",
" tailbud stage | \n",
" oio:hasDbXref | \n",
" UBERON:0009741 | \n",
" obsolete tailbud stage | \n",
" semapv:UnspecifiedMatching | \n",
" XAO | \n",
" UBERON | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" subject_id subject_label predicate_id object_id \\\n",
"0 XAO:0000054 trunk region oio:hasDbXref UBERON:0002100 \n",
"1 XAO:0000100 cardiovascular system oio:hasDbXref UBERON:0004535 \n",
"2 XAO:0000204 peripheral nerve oio:hasDbXref UBERON:0002003 \n",
"3 XAO:0000227 eye primordium oio:hasDbXref UBERON:0003071 \n",
"4 XAO:0000282 visceral pouch oio:hasDbXref UBERON:0004117 \n",
"5 XAO:0000376 omphalomesenteric vein oio:hasDbXref UBERON:0005487 \n",
"6 XAO:0000427 gasserian ganglion oio:hasDbXref UBERON:0001675 \n",
"7 XAO:0000428 trigeminal ganglion oio:hasDbXref UBERON:0001675 \n",
"8 XAO:0001010 circulatory system oio:hasDbXref UBERON:0004535 \n",
"9 XAO:0003001 anatomical group oio:hasDbXref CARO:0000054 \n",
"10 XAO:0003012 cell oio:hasDbXref GO:0005623 \n",
"11 XAO:0003025 trunk oio:hasDbXref UBERON:0002100 \n",
"12 XAO:0003160 anatomical cluster oio:hasDbXref CARO:0000041 \n",
"13 XAO:0003163 basal lamina oio:hasDbXref CARO:0000065 \n",
"14 XAO:0003257 myelin accumulating cell oio:hasDbXref CL:0000328 \n",
"15 XAO:0004090 optic field oio:hasDbXref UBERON:0003071 \n",
"16 XAO:0004147 vitelline vein oio:hasDbXref UBERON:0005487 \n",
"17 XAO:0004165 intersomitic artery oio:hasDbXref UBERON:0006001 \n",
"18 XAO:0004260 pharyngeal pouch oio:hasDbXref UBERON:0004117 \n",
"19 XAO:0004290 cell part oio:hasDbXref GO:0044464 \n",
"20 XAO:0004615 basal body oio:hasDbXref GO:0005932 \n",
"21 XAO:0004621 epidermal cell oio:hasDbXref CL:1000396 \n",
"22 XAO:0005007 Muller cell oio:hasDbXref CL:0011107 \n",
"23 XAO:1000007 tailbud stage oio:hasDbXref UBERON:0009741 \n",
"\n",
" object_label mapping_justification \\\n",
"0 trunk semapv:UnspecifiedMatching \n",
"1 cardiovascular system semapv:UnspecifiedMatching \n",
"2 obsolete peripheral nerve semapv:UnspecifiedMatching \n",
"3 eye primordium semapv:UnspecifiedMatching \n",
"4 pharyngeal pouch semapv:UnspecifiedMatching \n",
"5 vitelline vein semapv:UnspecifiedMatching \n",
"6 trigeminal ganglion semapv:UnspecifiedMatching \n",
"7 trigeminal ganglion semapv:UnspecifiedMatching \n",
"8 cardiovascular system semapv:UnspecifiedMatching \n",
"9 anatomical group semapv:UnspecifiedMatching \n",
"10 obsolete cell semapv:UnspecifiedMatching \n",
"11 trunk semapv:UnspecifiedMatching \n",
"12 anatomical cluster semapv:UnspecifiedMatching \n",
"13 basal lamina semapv:UnspecifiedMatching \n",
"14 obsolete myelin accumulating cell semapv:UnspecifiedMatching \n",
"15 eye primordium semapv:UnspecifiedMatching \n",
"16 vitelline vein semapv:UnspecifiedMatching \n",
"17 NaN semapv:UnspecifiedMatching \n",
"18 pharyngeal pouch semapv:UnspecifiedMatching \n",
"19 obsolete cell part semapv:UnspecifiedMatching \n",
"20 NaN semapv:UnspecifiedMatching \n",
"21 NaN semapv:UnspecifiedMatching \n",
"22 obsolete Muller cell semapv:UnspecifiedMatching \n",
"23 obsolete tailbud stage semapv:UnspecifiedMatching \n",
"\n",
" subject_source object_source mapping_cardinality comment \n",
"0 XAO UBERON 1:n cardinality is 1:n \n",
"1 XAO UBERON 1:n cardinality is 1:n \n",
"2 XAO UBERON 1:1 object is obsolete \n",
"3 XAO UBERON 1:n cardinality is 1:n \n",
"4 XAO UBERON 1:n cardinality is 1:n \n",
"5 XAO UBERON 1:n cardinality is 1:n \n",
"6 XAO UBERON 1:n cardinality is 1:n \n",
"7 XAO UBERON 1:n cardinality is 1:n \n",
"8 XAO UBERON 1:n cardinality is 1:n \n",
"9 XAO CARO 1:1 object is obsolete \n",
"10 XAO GO 1:1 object is obsolete \n",
"11 XAO UBERON 1:n cardinality is 1:n \n",
"12 XAO CARO 1:1 object is obsolete \n",
"13 XAO CARO 1:1 object is obsolete \n",
"14 XAO CL 1:1 object is obsolete \n",
"15 XAO UBERON 1:n cardinality is 1:n \n",
"16 XAO UBERON 1:n cardinality is 1:n \n",
"17 XAO UBERON 1:1 object is obsolete \n",
"18 XAO UBERON 1:n cardinality is 1:n \n",
"19 XAO GO 1:1 object is obsolete \n",
"20 XAO GO 1:1 object is obsolete \n",
"21 XAO CL 1:1 object is obsolete \n",
"22 XAO CL 1:1 object is obsolete \n",
"23 XAO UBERON 1:1 object is obsolete "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"output/xao-invalid.sssom.tsv\", sep=\"\\t\", comment=\"#\")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "f4209133-fd5c-4ecd-a0c4-a5dc4cb8a57a",
"metadata": {},
"source": [
"Here we can see a mixture of cardinality issues and obsoletion issues\n",
"\n",
"Note that behind the scenes this command connected to external ontologies such as GO, CL, and UBERON\n",
"in order to check obsoletion status etc."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "421c556c-df3e-4281-914b-613e3d467036",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['cardinality is 1:n', 'object is obsolete'], dtype=object)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"comment\"].unique()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "aea2cfe0-70bf-4b76-89e2-2bfdbdd3a084",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" comment | \n",
" counts | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" cardinality is 1:n | \n",
" 12 | \n",
"
\n",
" \n",
" 1 | \n",
" object is obsolete | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" comment counts\n",
"0 cardinality is 1:n 12\n",
"1 object is obsolete 12"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(\"comment\").size().reset_index(name='counts')"
]
},
{
"cell_type": "markdown",
"id": "47d16aab-beae-4797-b2e4-e567db7dd06f",
"metadata": {},
"source": [
"## Example: CL\n",
"\n",
"CL has a broader range of mappings, in the ontology as xrefs"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "59d83f58-058d-43d0-a5f9-17dde3b2af1a",
"metadata": {},
"outputs": [],
"source": [
"!runoak --quiet -i sqlite:obo:cl validate-mappings -O sssom -o output/cl-invalid.sssom.tsv >& output/LOG"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2515b428-1d54-4756-a429-9ca21002e0d4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" subject_id | \n",
" subject_label | \n",
" predicate_id | \n",
" object_id | \n",
" object_label | \n",
" mapping_justification | \n",
" subject_source | \n",
" object_source | \n",
" mapping_cardinality | \n",
" comment | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" CARO:0000013 | \n",
" cell | \n",
" oio:hasDbXref | \n",
" GO:0005623 | \n",
" obsolete cell | \n",
" semapv:UnspecifiedMatching | \n",
" CARO | \n",
" GO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 1 | \n",
" CL:0000000 | \n",
" cell | \n",
" oio:hasDbXref | \n",
" GO:0005623 | \n",
" obsolete cell | \n",
" semapv:UnspecifiedMatching | \n",
" CL | \n",
" GO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 2 | \n",
" CL:0000019 | \n",
" sperm | \n",
" oio:hasDbXref | \n",
" BTO:0001277 | \n",
" spermatozoon | \n",
" semapv:UnspecifiedMatching | \n",
" CL | \n",
" BTO | \n",
" n:n | \n",
" cardinality is n:n | \n",
"
\n",
" \n",
" 3 | \n",
" CL:0000019 | \n",
" sperm | \n",
" oio:hasDbXref | \n",
" BTO:0002046 | \n",
" spermatozoid | \n",
" semapv:UnspecifiedMatching | \n",
" CL | \n",
" BTO | \n",
" n:1 | \n",
" cardinality is n:1 | \n",
"
\n",
" \n",
" 4 | \n",
" CL:0000019 | \n",
" sperm | \n",
" oio:hasDbXref | \n",
" CALOHA:TS-0949 | \n",
" NaN | \n",
" semapv:UnspecifiedMatching | \n",
" CL | \n",
" CALOHA | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 3824 | \n",
" UBERON:2001316 | \n",
" anterior lateral line placode | \n",
" oio:hasDbXref | \n",
" EFO:0003461 | \n",
" obsolete_anterior lateral line placode | \n",
" semapv:UnspecifiedMatching | \n",
" UBERON | \n",
" EFO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 3825 | \n",
" UBERON:2001389 | \n",
" regeneration epithelium of fin/limb | \n",
" oio:hasDbXref | \n",
" EFO:0003682 | \n",
" obsolete_regeneration epithelium | \n",
" semapv:UnspecifiedMatching | \n",
" UBERON | \n",
" EFO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 3826 | \n",
" UBERON:2001391 | \n",
" anterior lateral line ganglion | \n",
" oio:hasDbXref | \n",
" EFO:0003683 | \n",
" obsolete_anterior lateral line ganglion | \n",
" semapv:UnspecifiedMatching | \n",
" UBERON | \n",
" EFO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 3827 | \n",
" UBERON:2001468 | \n",
" anterior lateral line system | \n",
" oio:hasDbXref | \n",
" EFO:0003691 | \n",
" obsolete_anterior lateral line system | \n",
" semapv:UnspecifiedMatching | \n",
" UBERON | \n",
" EFO | \n",
" 1:1 | \n",
" object is obsolete | \n",
"
\n",
" \n",
" 3828 | \n",
" UBERON:6000004 | \n",
" panarthropod head | \n",
" oio:hasDbXref | \n",
" FBbt:00000004 | \n",
" head | \n",
" semapv:UnspecifiedMatching | \n",
" UBERON | \n",
" FBbt | \n",
" 1:n | \n",
" cardinality is 1:n | \n",
"
\n",
" \n",
"
\n",
"
3829 rows × 10 columns
\n",
"
"
],
"text/plain": [
" subject_id subject_label predicate_id \\\n",
"0 CARO:0000013 cell oio:hasDbXref \n",
"1 CL:0000000 cell oio:hasDbXref \n",
"2 CL:0000019 sperm oio:hasDbXref \n",
"3 CL:0000019 sperm oio:hasDbXref \n",
"4 CL:0000019 sperm oio:hasDbXref \n",
"... ... ... ... \n",
"3824 UBERON:2001316 anterior lateral line placode oio:hasDbXref \n",
"3825 UBERON:2001389 regeneration epithelium of fin/limb oio:hasDbXref \n",
"3826 UBERON:2001391 anterior lateral line ganglion oio:hasDbXref \n",
"3827 UBERON:2001468 anterior lateral line system oio:hasDbXref \n",
"3828 UBERON:6000004 panarthropod head oio:hasDbXref \n",
"\n",
" object_id object_label \\\n",
"0 GO:0005623 obsolete cell \n",
"1 GO:0005623 obsolete cell \n",
"2 BTO:0001277 spermatozoon \n",
"3 BTO:0002046 spermatozoid \n",
"4 CALOHA:TS-0949 NaN \n",
"... ... ... \n",
"3824 EFO:0003461 obsolete_anterior lateral line placode \n",
"3825 EFO:0003682 obsolete_regeneration epithelium \n",
"3826 EFO:0003683 obsolete_anterior lateral line ganglion \n",
"3827 EFO:0003691 obsolete_anterior lateral line system \n",
"3828 FBbt:00000004 head \n",
"\n",
" mapping_justification subject_source object_source \\\n",
"0 semapv:UnspecifiedMatching CARO GO \n",
"1 semapv:UnspecifiedMatching CL GO \n",
"2 semapv:UnspecifiedMatching CL BTO \n",
"3 semapv:UnspecifiedMatching CL BTO \n",
"4 semapv:UnspecifiedMatching CL CALOHA \n",
"... ... ... ... \n",
"3824 semapv:UnspecifiedMatching UBERON EFO \n",
"3825 semapv:UnspecifiedMatching UBERON EFO \n",
"3826 semapv:UnspecifiedMatching UBERON EFO \n",
"3827 semapv:UnspecifiedMatching UBERON EFO \n",
"3828 semapv:UnspecifiedMatching UBERON FBbt \n",
"\n",
" mapping_cardinality comment \n",
"0 1:1 object is obsolete \n",
"1 1:1 object is obsolete \n",
"2 n:n cardinality is n:n \n",
"3 n:1 cardinality is n:1 \n",
"4 1:n cardinality is 1:n \n",
"... ... ... \n",
"3824 1:1 object is obsolete \n",
"3825 1:1 object is obsolete \n",
"3826 1:1 object is obsolete \n",
"3827 1:1 object is obsolete \n",
"3828 1:n cardinality is 1:n \n",
"\n",
"[3829 rows x 10 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\"output/cl-invalid.sssom.tsv\", sep=\"\\t\", comment=\"#\")\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "0723f096-67d4-4727-bf77-13badec03878",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" comment | \n",
" counts | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" cardinality is 1:n | \n",
" 608 | \n",
"
\n",
" \n",
" 1 | \n",
" cardinality is n:1 | \n",
" 2588 | \n",
"
\n",
" \n",
" 2 | \n",
" cardinality is n:n | \n",
" 155 | \n",
"
\n",
" \n",
" 3 | \n",
" object is obsolete | \n",
" 438 | \n",
"
\n",
" \n",
" 4 | \n",
" object is obsolete | cardinality is n:1 | \n",
" 9 | \n",
"
\n",
" \n",
" 5 | \n",
" subject is obsolete | \n",
" 13 | \n",
"
\n",
" \n",
" 6 | \n",
" subject is obsolete | cardinality is 1:n | \n",
" 11 | \n",
"
\n",
" \n",
" 7 | \n",
" subject is obsolete | object is obsolete | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" comment counts\n",
"0 cardinality is 1:n 608\n",
"1 cardinality is n:1 2588\n",
"2 cardinality is n:n 155\n",
"3 object is obsolete 438\n",
"4 object is obsolete | cardinality is n:1 9\n",
"5 subject is obsolete 13\n",
"6 subject is obsolete | cardinality is 1:n 11\n",
"7 subject is obsolete | object is obsolete 2"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(\"comment\").size().reset_index(name='counts')"
]
},
{
"cell_type": "markdown",
"id": "69020950-4636-4ec3-a03b-a552a837f3f9",
"metadata": {},
"source": [
"We can summarize these in groups:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "8af2fb74-91ec-4c61-90a9-68c860a6e16a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" comment | \n",
" subject_source | \n",
" object_source | \n",
" counts | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" cardinality is 1:n | \n",
" CL | \n",
" BTO | \n",
" 35 | \n",
"
\n",
" \n",
" 1 | \n",
" cardinality is 1:n | \n",
" CL | \n",
" CALOHA | \n",
" 25 | \n",
"
\n",
" \n",
" 2 | \n",
" cardinality is 1:n | \n",
" CL | \n",
" FAO | \n",
" 3 | \n",
"
\n",
" \n",
" 3 | \n",
" cardinality is 1:n | \n",
" CL | \n",
" FMA | \n",
" 20 | \n",
"
\n",
" \n",
" 4 | \n",
" cardinality is 1:n | \n",
" CL | \n",
" GOC | \n",
" 4 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 121 | \n",
" subject is obsolete | cardinality is 1:n | \n",
" CL | \n",
" FAO | \n",
" 1 | \n",
"
\n",
" \n",
" 122 | \n",
" subject is obsolete | cardinality is 1:n | \n",
" CL | \n",
" FMA | \n",
" 3 | \n",
"
\n",
" \n",
" 123 | \n",
" subject is obsolete | cardinality is 1:n | \n",
" CL | \n",
" ILX | \n",
" 6 | \n",
"
\n",
" \n",
" 124 | \n",
" subject is obsolete | object is obsolete | \n",
" CL | \n",
" FBbt | \n",
" 1 | \n",
"
\n",
" \n",
" 125 | \n",
" subject is obsolete | object is obsolete | \n",
" RO | \n",
" RO | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
126 rows × 4 columns
\n",
"
"
],
"text/plain": [
" comment subject_source object_source \\\n",
"0 cardinality is 1:n CL BTO \n",
"1 cardinality is 1:n CL CALOHA \n",
"2 cardinality is 1:n CL FAO \n",
"3 cardinality is 1:n CL FMA \n",
"4 cardinality is 1:n CL GOC \n",
".. ... ... ... \n",
"121 subject is obsolete | cardinality is 1:n CL FAO \n",
"122 subject is obsolete | cardinality is 1:n CL FMA \n",
"123 subject is obsolete | cardinality is 1:n CL ILX \n",
"124 subject is obsolete | object is obsolete CL FBbt \n",
"125 subject is obsolete | object is obsolete RO RO \n",
"\n",
" counts \n",
"0 35 \n",
"1 25 \n",
"2 3 \n",
"3 20 \n",
"4 4 \n",
".. ... \n",
"121 1 \n",
"122 3 \n",
"123 6 \n",
"124 1 \n",
"125 1 \n",
"\n",
"[126 rows x 4 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby([\"comment\", \"subject_source\", \"object_source\"]).size().reset_index(name='counts')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d699f471-d70a-4f3e-aac4-3cdd55e66380",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}