{ "cells": [ { "cell_type": "markdown", "id": "0a28b88d-4deb-4d0a-a110-f27adf077e23", "metadata": {}, "source": [ "# OAK validate-mappings command\n", "\n", "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n", "\n", "This notebook provides examples for the `validate-mappings` command.\n", "This forms part of a suite of *validate* commands.\n", " \n", "## Help Option\n", "\n", "You can get help on any OAK command using `--help`" ] }, { "cell_type": "code", "execution_count": 2, "id": "c223f678-f82f-4b06-8e19-1a5b7323e571", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak validate-mappings [OPTIONS] [TERMS]...\n", "\n", " Validates mappings in ontology using additional ontologies.\n", "\n", " To run:\n", "\n", " runoak validate-mappings -i db/uberon.db\n", "\n", " For sssom:\n", "\n", " runoak validate-mappings -i db/uberon.db -o bad-mappings.sssom.tsv\n", "\n", " By default this will attempt to download and connect to sqlite versions of\n", " different ontologies.\n", "\n", " You can customize this:\n", "\n", " runoak validate-mappings -i db/uberon.db --adapter-mapping\n", " uberon=db/uberon.db --adapter-mapping zfa=db/zfa.db\n", "\n", " You can use \"*\" as a wildcard, in the case where you have an application\n", " ontology with many mapped entities merged in:\n", "\n", " runoak validate-mappings -i db/uberon.db --adapter-mapping\n", " \"*\"=db/merged.db\"\n", "\n", "Options:\n", " --autolabel / --no-autolabel If set, results will automatically have labels\n", " assigned [default: autolabel]\n", " -O, --output-type TEXT Desired output type\n", " --adapter-mapping TEXT Multiple prefix=selector pairs, e.g.\n", " --adapter-mapping uberon=db/uberon.db\n", " -o, --output FILENAME Output file, e.g. obo file\n", " --help Show this message and exit.\n" ] } ], "source": [ "!runoak validate-mappings --help" ] }, { "cell_type": "markdown", "id": "01f38163-db22-4c51-ae46-10e8b8e6d53c", "metadata": {}, "source": [ "## Example: Validate mappings in XAO\n", "\n", "XAO is an anatomical ontology for *Xenopus*. It has mappings to ontologies like UBERON, GO, CL" ] }, { "cell_type": "code", "execution_count": 3, "id": "c9b86e52-87a7-449c-baac-81981e7ce632", "metadata": {}, "outputs": [], "source": [ "!runoak -i sqlite:obo:xao validate-mappings -O sssom -o output/xao-invalid.sssom.tsv" ] }, { "cell_type": "code", "execution_count": 5, "id": "5fc9b15d-cc81-400a-8660-f92491baa120", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idsubject_labelpredicate_idobject_idobject_labelmapping_justificationsubject_sourceobject_sourcemapping_cardinalitycomment
0XAO:0000054trunk regionoio:hasDbXrefUBERON:0002100trunksemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
1XAO:0000100cardiovascular systemoio:hasDbXrefUBERON:0004535cardiovascular systemsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
2XAO:0000204peripheral nerveoio:hasDbXrefUBERON:0002003obsolete peripheral nervesemapv:UnspecifiedMatchingXAOUBERON1:1object is obsolete
3XAO:0000227eye primordiumoio:hasDbXrefUBERON:0003071eye primordiumsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
4XAO:0000282visceral pouchoio:hasDbXrefUBERON:0004117pharyngeal pouchsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
5XAO:0000376omphalomesenteric veinoio:hasDbXrefUBERON:0005487vitelline veinsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
6XAO:0000427gasserian ganglionoio:hasDbXrefUBERON:0001675trigeminal ganglionsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
7XAO:0000428trigeminal ganglionoio:hasDbXrefUBERON:0001675trigeminal ganglionsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
8XAO:0001010circulatory systemoio:hasDbXrefUBERON:0004535cardiovascular systemsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
9XAO:0003001anatomical groupoio:hasDbXrefCARO:0000054anatomical groupsemapv:UnspecifiedMatchingXAOCARO1:1object is obsolete
10XAO:0003012celloio:hasDbXrefGO:0005623obsolete cellsemapv:UnspecifiedMatchingXAOGO1:1object is obsolete
11XAO:0003025trunkoio:hasDbXrefUBERON:0002100trunksemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
12XAO:0003160anatomical clusteroio:hasDbXrefCARO:0000041anatomical clustersemapv:UnspecifiedMatchingXAOCARO1:1object is obsolete
13XAO:0003163basal laminaoio:hasDbXrefCARO:0000065basal laminasemapv:UnspecifiedMatchingXAOCARO1:1object is obsolete
14XAO:0003257myelin accumulating celloio:hasDbXrefCL:0000328obsolete myelin accumulating cellsemapv:UnspecifiedMatchingXAOCL1:1object is obsolete
15XAO:0004090optic fieldoio:hasDbXrefUBERON:0003071eye primordiumsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
16XAO:0004147vitelline veinoio:hasDbXrefUBERON:0005487vitelline veinsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
17XAO:0004165intersomitic arteryoio:hasDbXrefUBERON:0006001NaNsemapv:UnspecifiedMatchingXAOUBERON1:1object is obsolete
18XAO:0004260pharyngeal pouchoio:hasDbXrefUBERON:0004117pharyngeal pouchsemapv:UnspecifiedMatchingXAOUBERON1:ncardinality is 1:n
19XAO:0004290cell partoio:hasDbXrefGO:0044464obsolete cell partsemapv:UnspecifiedMatchingXAOGO1:1object is obsolete
20XAO:0004615basal bodyoio:hasDbXrefGO:0005932NaNsemapv:UnspecifiedMatchingXAOGO1:1object is obsolete
21XAO:0004621epidermal celloio:hasDbXrefCL:1000396NaNsemapv:UnspecifiedMatchingXAOCL1:1object is obsolete
22XAO:0005007Muller celloio:hasDbXrefCL:0011107obsolete Muller cellsemapv:UnspecifiedMatchingXAOCL1:1object is obsolete
23XAO:1000007tailbud stageoio:hasDbXrefUBERON:0009741obsolete tailbud stagesemapv:UnspecifiedMatchingXAOUBERON1:1object is obsolete
\n", "
" ], "text/plain": [ " subject_id subject_label predicate_id object_id \\\n", "0 XAO:0000054 trunk region oio:hasDbXref UBERON:0002100 \n", "1 XAO:0000100 cardiovascular system oio:hasDbXref UBERON:0004535 \n", "2 XAO:0000204 peripheral nerve oio:hasDbXref UBERON:0002003 \n", "3 XAO:0000227 eye primordium oio:hasDbXref UBERON:0003071 \n", "4 XAO:0000282 visceral pouch oio:hasDbXref UBERON:0004117 \n", "5 XAO:0000376 omphalomesenteric vein oio:hasDbXref UBERON:0005487 \n", "6 XAO:0000427 gasserian ganglion oio:hasDbXref UBERON:0001675 \n", "7 XAO:0000428 trigeminal ganglion oio:hasDbXref UBERON:0001675 \n", "8 XAO:0001010 circulatory system oio:hasDbXref UBERON:0004535 \n", "9 XAO:0003001 anatomical group oio:hasDbXref CARO:0000054 \n", "10 XAO:0003012 cell oio:hasDbXref GO:0005623 \n", "11 XAO:0003025 trunk oio:hasDbXref UBERON:0002100 \n", "12 XAO:0003160 anatomical cluster oio:hasDbXref CARO:0000041 \n", "13 XAO:0003163 basal lamina oio:hasDbXref CARO:0000065 \n", "14 XAO:0003257 myelin accumulating cell oio:hasDbXref CL:0000328 \n", "15 XAO:0004090 optic field oio:hasDbXref UBERON:0003071 \n", "16 XAO:0004147 vitelline vein oio:hasDbXref UBERON:0005487 \n", "17 XAO:0004165 intersomitic artery oio:hasDbXref UBERON:0006001 \n", "18 XAO:0004260 pharyngeal pouch oio:hasDbXref UBERON:0004117 \n", "19 XAO:0004290 cell part oio:hasDbXref GO:0044464 \n", "20 XAO:0004615 basal body oio:hasDbXref GO:0005932 \n", "21 XAO:0004621 epidermal cell oio:hasDbXref CL:1000396 \n", "22 XAO:0005007 Muller cell oio:hasDbXref CL:0011107 \n", "23 XAO:1000007 tailbud stage oio:hasDbXref UBERON:0009741 \n", "\n", " object_label mapping_justification \\\n", "0 trunk semapv:UnspecifiedMatching \n", "1 cardiovascular system semapv:UnspecifiedMatching \n", "2 obsolete peripheral nerve semapv:UnspecifiedMatching \n", "3 eye primordium semapv:UnspecifiedMatching \n", "4 pharyngeal pouch semapv:UnspecifiedMatching \n", "5 vitelline vein semapv:UnspecifiedMatching \n", "6 trigeminal ganglion semapv:UnspecifiedMatching \n", "7 trigeminal ganglion semapv:UnspecifiedMatching \n", "8 cardiovascular system semapv:UnspecifiedMatching \n", "9 anatomical group semapv:UnspecifiedMatching \n", "10 obsolete cell semapv:UnspecifiedMatching \n", "11 trunk semapv:UnspecifiedMatching \n", "12 anatomical cluster semapv:UnspecifiedMatching \n", "13 basal lamina semapv:UnspecifiedMatching \n", "14 obsolete myelin accumulating cell semapv:UnspecifiedMatching \n", "15 eye primordium semapv:UnspecifiedMatching \n", "16 vitelline vein semapv:UnspecifiedMatching \n", "17 NaN semapv:UnspecifiedMatching \n", "18 pharyngeal pouch semapv:UnspecifiedMatching \n", "19 obsolete cell part semapv:UnspecifiedMatching \n", "20 NaN semapv:UnspecifiedMatching \n", "21 NaN semapv:UnspecifiedMatching \n", "22 obsolete Muller cell semapv:UnspecifiedMatching \n", "23 obsolete tailbud stage semapv:UnspecifiedMatching \n", "\n", " subject_source object_source mapping_cardinality comment \n", "0 XAO UBERON 1:n cardinality is 1:n \n", "1 XAO UBERON 1:n cardinality is 1:n \n", "2 XAO UBERON 1:1 object is obsolete \n", "3 XAO UBERON 1:n cardinality is 1:n \n", "4 XAO UBERON 1:n cardinality is 1:n \n", "5 XAO UBERON 1:n cardinality is 1:n \n", "6 XAO UBERON 1:n cardinality is 1:n \n", "7 XAO UBERON 1:n cardinality is 1:n \n", "8 XAO UBERON 1:n cardinality is 1:n \n", "9 XAO CARO 1:1 object is obsolete \n", "10 XAO GO 1:1 object is obsolete \n", "11 XAO UBERON 1:n cardinality is 1:n \n", "12 XAO CARO 1:1 object is obsolete \n", "13 XAO CARO 1:1 object is obsolete \n", "14 XAO CL 1:1 object is obsolete \n", "15 XAO UBERON 1:n cardinality is 1:n \n", "16 XAO UBERON 1:n cardinality is 1:n \n", "17 XAO UBERON 1:1 object is obsolete \n", "18 XAO UBERON 1:n cardinality is 1:n \n", "19 XAO GO 1:1 object is obsolete \n", "20 XAO GO 1:1 object is obsolete \n", "21 XAO CL 1:1 object is obsolete \n", "22 XAO CL 1:1 object is obsolete \n", "23 XAO UBERON 1:1 object is obsolete " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv(\"output/xao-invalid.sssom.tsv\", sep=\"\\t\", comment=\"#\")\n", "df" ] }, { "cell_type": "markdown", "id": "f4209133-fd5c-4ecd-a0c4-a5dc4cb8a57a", "metadata": {}, "source": [ "Here we can see a mixture of cardinality issues and obsoletion issues\n", "\n", "Note that behind the scenes this command connected to external ontologies such as GO, CL, and UBERON\n", "in order to check obsoletion status etc." ] }, { "cell_type": "code", "execution_count": 6, "id": "421c556c-df3e-4281-914b-613e3d467036", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['cardinality is 1:n', 'object is obsolete'], dtype=object)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"comment\"].unique()" ] }, { "cell_type": "code", "execution_count": 11, "id": "aea2cfe0-70bf-4b76-89e2-2bfdbdd3a084", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
commentcounts
0cardinality is 1:n12
1object is obsolete12
\n", "
" ], "text/plain": [ " comment counts\n", "0 cardinality is 1:n 12\n", "1 object is obsolete 12" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"comment\").size().reset_index(name='counts')" ] }, { "cell_type": "markdown", "id": "47d16aab-beae-4797-b2e4-e567db7dd06f", "metadata": {}, "source": [ "## Example: CL\n", "\n", "CL has a broader range of mappings, in the ontology as xrefs" ] }, { "cell_type": "code", "execution_count": 14, "id": "59d83f58-058d-43d0-a5f9-17dde3b2af1a", "metadata": {}, "outputs": [], "source": [ "!runoak --quiet -i sqlite:obo:cl validate-mappings -O sssom -o output/cl-invalid.sssom.tsv >& output/LOG" ] }, { "cell_type": "code", "execution_count": 15, "id": "2515b428-1d54-4756-a429-9ca21002e0d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idsubject_labelpredicate_idobject_idobject_labelmapping_justificationsubject_sourceobject_sourcemapping_cardinalitycomment
0CARO:0000013celloio:hasDbXrefGO:0005623obsolete cellsemapv:UnspecifiedMatchingCAROGO1:1object is obsolete
1CL:0000000celloio:hasDbXrefGO:0005623obsolete cellsemapv:UnspecifiedMatchingCLGO1:1object is obsolete
2CL:0000019spermoio:hasDbXrefBTO:0001277spermatozoonsemapv:UnspecifiedMatchingCLBTOn:ncardinality is n:n
3CL:0000019spermoio:hasDbXrefBTO:0002046spermatozoidsemapv:UnspecifiedMatchingCLBTOn:1cardinality is n:1
4CL:0000019spermoio:hasDbXrefCALOHA:TS-0949NaNsemapv:UnspecifiedMatchingCLCALOHA1:ncardinality is 1:n
.................................
3824UBERON:2001316anterior lateral line placodeoio:hasDbXrefEFO:0003461obsolete_anterior lateral line placodesemapv:UnspecifiedMatchingUBERONEFO1:1object is obsolete
3825UBERON:2001389regeneration epithelium of fin/limboio:hasDbXrefEFO:0003682obsolete_regeneration epitheliumsemapv:UnspecifiedMatchingUBERONEFO1:1object is obsolete
3826UBERON:2001391anterior lateral line ganglionoio:hasDbXrefEFO:0003683obsolete_anterior lateral line ganglionsemapv:UnspecifiedMatchingUBERONEFO1:1object is obsolete
3827UBERON:2001468anterior lateral line systemoio:hasDbXrefEFO:0003691obsolete_anterior lateral line systemsemapv:UnspecifiedMatchingUBERONEFO1:1object is obsolete
3828UBERON:6000004panarthropod headoio:hasDbXrefFBbt:00000004headsemapv:UnspecifiedMatchingUBERONFBbt1:ncardinality is 1:n
\n", "

3829 rows × 10 columns

\n", "
" ], "text/plain": [ " subject_id subject_label predicate_id \\\n", "0 CARO:0000013 cell oio:hasDbXref \n", "1 CL:0000000 cell oio:hasDbXref \n", "2 CL:0000019 sperm oio:hasDbXref \n", "3 CL:0000019 sperm oio:hasDbXref \n", "4 CL:0000019 sperm oio:hasDbXref \n", "... ... ... ... \n", "3824 UBERON:2001316 anterior lateral line placode oio:hasDbXref \n", "3825 UBERON:2001389 regeneration epithelium of fin/limb oio:hasDbXref \n", "3826 UBERON:2001391 anterior lateral line ganglion oio:hasDbXref \n", "3827 UBERON:2001468 anterior lateral line system oio:hasDbXref \n", "3828 UBERON:6000004 panarthropod head oio:hasDbXref \n", "\n", " object_id object_label \\\n", "0 GO:0005623 obsolete cell \n", "1 GO:0005623 obsolete cell \n", "2 BTO:0001277 spermatozoon \n", "3 BTO:0002046 spermatozoid \n", "4 CALOHA:TS-0949 NaN \n", "... ... ... \n", "3824 EFO:0003461 obsolete_anterior lateral line placode \n", "3825 EFO:0003682 obsolete_regeneration epithelium \n", "3826 EFO:0003683 obsolete_anterior lateral line ganglion \n", "3827 EFO:0003691 obsolete_anterior lateral line system \n", "3828 FBbt:00000004 head \n", "\n", " mapping_justification subject_source object_source \\\n", "0 semapv:UnspecifiedMatching CARO GO \n", "1 semapv:UnspecifiedMatching CL GO \n", "2 semapv:UnspecifiedMatching CL BTO \n", "3 semapv:UnspecifiedMatching CL BTO \n", "4 semapv:UnspecifiedMatching CL CALOHA \n", "... ... ... ... \n", "3824 semapv:UnspecifiedMatching UBERON EFO \n", "3825 semapv:UnspecifiedMatching UBERON EFO \n", "3826 semapv:UnspecifiedMatching UBERON EFO \n", "3827 semapv:UnspecifiedMatching UBERON EFO \n", "3828 semapv:UnspecifiedMatching UBERON FBbt \n", "\n", " mapping_cardinality comment \n", "0 1:1 object is obsolete \n", "1 1:1 object is obsolete \n", "2 n:n cardinality is n:n \n", "3 n:1 cardinality is n:1 \n", "4 1:n cardinality is 1:n \n", "... ... ... \n", "3824 1:1 object is obsolete \n", "3825 1:1 object is obsolete \n", "3826 1:1 object is obsolete \n", "3827 1:1 object is obsolete \n", "3828 1:n cardinality is 1:n \n", "\n", "[3829 rows x 10 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"output/cl-invalid.sssom.tsv\", sep=\"\\t\", comment=\"#\")\n", "df" ] }, { "cell_type": "code", "execution_count": 16, "id": "0723f096-67d4-4727-bf77-13badec03878", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
commentcounts
0cardinality is 1:n608
1cardinality is n:12588
2cardinality is n:n155
3object is obsolete438
4object is obsolete | cardinality is n:19
5subject is obsolete13
6subject is obsolete | cardinality is 1:n11
7subject is obsolete | object is obsolete2
\n", "
" ], "text/plain": [ " comment counts\n", "0 cardinality is 1:n 608\n", "1 cardinality is n:1 2588\n", "2 cardinality is n:n 155\n", "3 object is obsolete 438\n", "4 object is obsolete | cardinality is n:1 9\n", "5 subject is obsolete 13\n", "6 subject is obsolete | cardinality is 1:n 11\n", "7 subject is obsolete | object is obsolete 2" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"comment\").size().reset_index(name='counts')" ] }, { "cell_type": "markdown", "id": "69020950-4636-4ec3-a03b-a552a837f3f9", "metadata": {}, "source": [ "We can summarize these in groups:" ] }, { "cell_type": "code", "execution_count": 18, "id": "8af2fb74-91ec-4c61-90a9-68c860a6e16a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
commentsubject_sourceobject_sourcecounts
0cardinality is 1:nCLBTO35
1cardinality is 1:nCLCALOHA25
2cardinality is 1:nCLFAO3
3cardinality is 1:nCLFMA20
4cardinality is 1:nCLGOC4
...............
121subject is obsolete | cardinality is 1:nCLFAO1
122subject is obsolete | cardinality is 1:nCLFMA3
123subject is obsolete | cardinality is 1:nCLILX6
124subject is obsolete | object is obsoleteCLFBbt1
125subject is obsolete | object is obsoleteRORO1
\n", "

126 rows × 4 columns

\n", "
" ], "text/plain": [ " comment subject_source object_source \\\n", "0 cardinality is 1:n CL BTO \n", "1 cardinality is 1:n CL CALOHA \n", "2 cardinality is 1:n CL FAO \n", "3 cardinality is 1:n CL FMA \n", "4 cardinality is 1:n CL GOC \n", ".. ... ... ... \n", "121 subject is obsolete | cardinality is 1:n CL FAO \n", "122 subject is obsolete | cardinality is 1:n CL FMA \n", "123 subject is obsolete | cardinality is 1:n CL ILX \n", "124 subject is obsolete | object is obsolete CL FBbt \n", "125 subject is obsolete | object is obsolete RO RO \n", "\n", " counts \n", "0 35 \n", "1 25 \n", "2 3 \n", "3 20 \n", "4 4 \n", ".. ... \n", "121 1 \n", "122 3 \n", "123 6 \n", "124 1 \n", "125 1 \n", "\n", "[126 rows x 4 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby([\"comment\", \"subject_source\", \"object_source\"]).size().reset_index(name='counts')" ] }, { "cell_type": "code", "execution_count": null, "id": "d699f471-d70a-4f3e-aac4-3cdd55e66380", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }