{ "cells": [ { "cell_type": "markdown", "id": "1cd4a3da-5c5c-46ce-9423-3b7a48b7f6ca", "metadata": {}, "source": [ "# OAK statistics command\n", "\n", "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n", "\n", "This notebook provides examples for the `statistics` command, which can be used to calculate basic descriptive statistics\n", "for an ontology\n", "\n", "## Help Option\n", "\n", "You can get help on any OAK command using `--help`" ] }, { "cell_type": "code", "execution_count": 1, "id": "8940e44b-f1fc-4440-88ba-8064c33a48e6", "metadata": { "ExecuteTime": { "end_time": "2024-03-26T23:26:04.707363Z", "start_time": "2024-03-26T23:26:01.937733Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak statistics [OPTIONS] [BRANCHES]...\r\n", "\r\n", " Shows all descriptive/summary statistics\r\n", "\r\n", " Example: ------- runoak -i sqlite:obo:pr statistics\r\n", "\r\n", " By default, this will show combined summary statistics for all terms\r\n", "\r\n", " You can also break down the statistics in two ways:\r\n", "\r\n", " - by a collection of branch roots\r\n", "\r\n", " - by a metadata property (e.g. oio:hasOBONamespace, rdfs:isDefinedBy)\r\n", "\r\n", " - by prefix (e.g. GO, PR, CL, OBI)\r\n", "\r\n", " Example: ------- runoak -i sqlite:obo:pr statistics -p\r\n", " oio:hasOBONamespace\r\n", "\r\n", " Note: the oio:hasOBONamespace is *not* the same as the ID prefix, it is a\r\n", " field that is used by a subset of ontologies to partition classes into broad\r\n", " groupings, similar to subsets. Its use is non-standard, yet a lot of\r\n", " ontologies use this as the main partitioning mechanism.\r\n", "\r\n", " A note on bundled ontologies:\r\n", "\r\n", " The standard release many OBO ontologies \"bundles\" parts of other ontologies\r\n", " (formally, the release product includes a merged imports closure of import\r\n", " modules). This can complicate generation of statistics. A naive count of all\r\n", " classes in the main OBI release will include not only \"native\" OBI classes,\r\n", " but also classes from other ontologies that are bundled in the release.\r\n", "\r\n", " For bundled ontologies, we recommend some kind of partitioning, such as via\r\n", " defined roots, or via the CURIE prefix, using the ``--group-by-prefix``\r\n", " option.\r\n", "\r\n", " Output formats:\r\n", "\r\n", " The recommended output types for this command are yaml, json, or csv. The\r\n", " default output type is yaml, following the SummaryStatistics data model.\r\n", " This is naturally nested, as the statistics includes faceted groupings (e.g.\r\n", " edge counts are broken down by predicate). When specifying a flat format\r\n", " like csv, this is flattened into a single table, with dynamic column names.\r\n", "\r\n", " Change statistics:\r\n", "\r\n", " You can optionally combine the ontology statistics with a change summary\r\n", " relative to another ontology, using the ``--compare-with`` option.\r\n", "\r\n", " Example: ------- runoak -i v2.obo statistics --group-by-obo-namespace\r\n", " --compare-with v1.obo\r\n", "\r\n", " This will also include change stats broken down by KGCL change types. If a\r\n", " group-by option is specified, these will be grouped accordingly.\r\n", "\r\n", " Python API:\r\n", "\r\n", " https://incatools.github.io/ontology-access-kit/interfaces/summary-\r\n", " statistics\r\n", "\r\n", " Data model:\r\n", "\r\n", " https://w3id.org/oak/summary-statistics\r\n", "\r\n", "Options:\r\n", " -O, --output-type [obo|obojson|ofn|rdf|json|yaml|fhirjson|csv|tsv|nl]\r\n", " Desired output type\r\n", " --group-by-property TEXT group summaries by a metadata property, e.g.\r\n", " rdfs:isDefinedBy\r\n", " --group-by-obo-namespace / --no-group-by-obo-namespace\r\n", " shortcut for --group-by-property\r\n", " oio:hasOBONamespace (note this is distinct\r\n", " from the ID namespace) [default: no-group-\r\n", " by-obo-namespace]\r\n", " --group-by-prefix / --no-group-by-prefix\r\n", " shortcut for --group-by-property sh:prefix.\r\n", " Groups by the prefix of the CURIE [default:\r\n", " no-group-by-prefix]\r\n", " --group-by-defined-by / --no-group-by-defined-by\r\n", " shortcut for --group-by-property\r\n", " rdfs:isDefinedBy. This may be inferred from\r\n", " prefix if not set explicitly [default: no-\r\n", " group-by-defined-by]\r\n", " --include-residuals / --no-include-residuals\r\n", " If true include an OTHER category for terms\r\n", " that do not have the property\r\n", " -X, --compare-with TEXT Compare with another ontology\r\n", " -P, --has-prefix TEXT filter based on a prefix, e.g. OBI\r\n", " -o, --output FILENAME Output file, e.g. obo file\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!runoak statistics --help" ] }, { "cell_type": "markdown", "id": "ed1c706e-82e0-4168-bdaf-8bd96a3cd72a", "metadata": {}, "source": [ "## Set up an alias\n", "\n", "For convenience we will set up some aliases for use in this notebook" ] }, { "cell_type": "code", "execution_count": 18, "id": "c7350ec6-6070-45f5-a058-0da9e29ae086", "metadata": { "ExecuteTime": { "end_time": "2024-03-27T00:29:08.956988Z", "start_time": "2024-03-27T00:29:08.951340Z" } }, "outputs": [], "source": [ "alias chebi runoak -i sqlite:obo:chebi" ] }, { "cell_type": "markdown", "source": [ "## Calculating summary statistics (default YAML output)\n", "\n", "We can calculate the summary stats using the `statistics` command. The output is quite lengthy,\n", "so we will use `--output` (`-o`) to direct to a yamml file:" ], "metadata": { "collapsed": false }, "id": "4020400699697b8e" }, { "cell_type": "code", "execution_count": 19, "id": "81046b24-3811-4f7c-9eb1-e4e36bff370d", "metadata": { "ExecuteTime": { "end_time": "2024-03-27T00:30:20.090591Z", "start_time": "2024-03-27T00:29:10.137939Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:root:bad mapping: KEGG_COMPOUND \r\n", "WARNING:root:bad mapping: IUPAC\r\n", "WARNING:root:bad mapping: ChemIDplus\r\n", "WARNING:root:bad mapping: UniProt\r\n", "WARNING:root:bad mapping: DrugCentral\r\n", "WARNING:root:bad mapping: LINCS\r\n", "WARNING:root:bad mapping: KEGG_DRUG\r\n", "WARNING:root:bad mapping: ChEBI\r\n", "WARNING:root:bad mapping: ChEMBL\r\n", "WARNING:root:bad mapping: DrugBank\r\n", "WARNING:root:bad mapping: WHO_MedNet\r\n", "WARNING:root:bad mapping: PDBeChem\r\n", "WARNING:root:bad mapping: NIST_Chemistry_WebBook\r\n", "WARNING:root:bad mapping: PPDB\r\n", "WARNING:root:bad mapping: LIPID_MAPS\r\n", "WARNING:root:bad mapping: IUPHAR\r\n", "WARNING:root:bad mapping: HMDB\r\n", "WARNING:root:bad mapping: SUBMITTER\r\n", "WARNING:root:bad mapping: MetaCyc\r\n", "WARNING:root:bad mapping: JCBN\r\n", "WARNING:root:bad mapping: GlyTouCan\r\n", "WARNING:root:bad mapping: KNApSAcK\r\n", "WARNING:root:bad mapping: IUBMB\r\n", "WARNING:root:bad mapping: CBN\r\n", "WARNING:root:bad mapping: Alan_Wood's_Pesticides\r\n", "WARNING:root:bad mapping: GlyGen\r\n", "WARNING:root:bad mapping: KEGG_GLYCAN\r\n", "WARNING:root:bad mapping: RESID\r\n", "WARNING:root:bad mapping: PubChem\r\n", "WARNING:root:bad mapping: FooDB\r\n", "WARNING:root:bad mapping: VSDB\r\n", "WARNING:root:bad mapping: UM-BBD\r\n", "WARNING:root:bad mapping: MolBase\r\n", "WARNING:root:bad mapping: COMe\r\n", "WARNING:root:bad mapping: Beilstein\r\n", "WARNING:root:bad mapping: Patent\r\n", "WARNING:root:bad mapping: PDB\r\n", "WARNING:root:bad mapping: SMID\r\n" ] } ], "source": [ "chebi statistics -o output/chebi.stats.yaml" ] }, { "cell_type": "markdown", "source": [ "__Note__ CHEBI has a lot of bad xrefs, hence the output" ], "metadata": { "collapsed": false }, "id": "c23707ed5c1a794f" }, { "cell_type": "markdown", "source": [ "## Exploring the output\n", "\n", "Let's look at the top of the YAML file:" ], "metadata": { "collapsed": false }, "id": "a42ba18a15b14b48" }, { "cell_type": "code", "execution_count": 20, "id": "065686c3-24b6-4752-b290-2eb110f8913b", "metadata": { "ExecuteTime": { "end_time": "2024-03-27T00:30:20.236768Z", "start_time": "2024-03-27T00:30:20.092802Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "id: AllOntologies\r\n", "ontologies:\r\n", "- id: obo:chebi.owl\r\n", " version: obo:chebi/231/chebi.owl\r\n", "was_generated_by:\r\n", " started_at_time: '2024-03-26T17:29:56.627143'\r\n", " was_associated_with: OAK\r\n", " acted_on_behalf_of: cjm\r\n", "class_count: 217549\r\n", "deprecated_class_count: 18650\r\n", "non_deprecated_class_count: 198899\r\n", "class_count_with_text_definitions: 53575\r\n", "class_count_without_text_definitions: 163974\r\n", "object_property_count: 10\r\n", "annotation_property_count: 37\r\n", "named_individual_count: 0\r\n", "subset_count: 3\r\n", "rdf_triple_count: 6860047\r\n", "subclass_of_axiom_count: 368285\r\n", "equivalent_classes_axiom_count: 0\r\n", "edge_count_by_predicate:\r\n", " BFO:0000051:\r\n", " facet: BFO:0000051\r\n", " filtered_count: 4029\r\n", " RO:0000087:\r\n", " facet: RO:0000087\r\n", " filtered_count: 43636\r\n", " obo:chebi#has_functional_parent:\r\n", " facet: obo:chebi#has_functional_parent\r\n", " filtered_count: 19632\r\n", " obo:chebi#has_parent_hydride:\r\n", " facet: obo:chebi#has_parent_hydride\r\n", " filtered_count: 1799\r\n", " obo:chebi#is_conjugate_acid_of:\r\n", " facet: obo:chebi#is_conjugate_acid_of\r\n", " filtered_count: 8484\r\n", " obo:chebi#is_conjugate_base_of:\r\n", " facet: obo:chebi#is_conjugate_base_of\r\n", " filtered_count: 8484\r\n", " obo:chebi#is_enantiomer_of:\r\n", " facet: obo:chebi#is_enantiomer_of\r\n", " filtered_count: 2754\r\n", " obo:chebi#is_substituent_group_from:\r\n", " facet: obo:chebi#is_substituent_group_from\r\n", " filtered_count: 1287\r\n", " obo:chebi#is_tautomer_of:\r\n", " facet: obo:chebi#is_tautomer_of\r\n", " filtered_count: 1886\r\n", " rdfs:subClassOf:\r\n", " facet: rdfs:subClassOf\r\n" ] } ], "source": [ "!head -50 output/chebi.stats.yaml" ] }, { "cell_type": "markdown", "source": [ "Like all objects produced by OAK, there is a data dictionary / data model. The ontology stats\n", "one is [https://w3id.org/oak/summary-statistics](https://w3id.org/oak/summary-statistics),\n", "you can use this link to browse documentation etc.\n", "\n", "**A well defined data dictionary is necessary for communicating aggregate statistics accurately**.\n", "Often when ontologies are reported informally, it's ambiguous whether *number of terms* means:\n", "\n", "- number of *classes*, *classes plus relationship types*, or *classes plus some other elements*\n", "- including or excluding deprecated (obsolete) entities\n", "\n", "The OAK summary statistics data dictionary aims to provide a **standard for ontology reporting**.\n", "\n", "YAML allows for nesting which is a natural way to group things; for example:\n", "\n", "```yaml\n", "edge_count_by_predicate:\n", " BFO:0000051:\n", " facet: BFO:0000051\n", " filtered_count: 4003\n", " RO:0000087:\n", " facet: RO:0000087\n", " filtered_count: 43082\n", "```\n", "\n", "This says that there are 4003 part-of (BFO:0000050) and 43082 has-role (RO:00000087) [relationships](https://incatools.github.io/ontology-access-kit/glossary.html#term-Relationship).\n", "\n", "See the [OAK guide to relationships](https://incatools.github.io/ontology-access-kit/guide/relationships-and-graphs.html)\n", "to understand more.\n", "\n", "## Mapping Stats\n", "\n", "Further on in the YAML we can see mapping stats. See (https://w3id.org/ssssom)[https://w3id.org/ssssom] to\n", "understand the OAK mapping data model.\n", "\n", "These are broken down\n", "\n", "- by mapping predicate (for many ontologies this is only `oio:hasDbXref`)\n", "- my mapping object source (i.e. the database or ontology that is mapped to)" ], "metadata": { "collapsed": false }, "id": "66b8945aa523b8d5" }, { "cell_type": "code", "execution_count": 21, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mapping_statement_count_by_predicate:\r\n", " oio:hasDbXref:\r\n", " facet: oio:hasDbXref\r\n", " filtered_count: 345271\r\n", "mapping_statement_count_by_object_source:\r\n", " BFO:\r\n", " facet: BFO\r\n", " filtered_count: 1\r\n", " RO:\r\n", " facet: RO\r\n", " filtered_count: 1\r\n", " KNApSAcK:\r\n", " facet: KNApSAcK\r\n", " filtered_count: 5185\r\n", " KEGG:\r\n", " facet: KEGG\r\n", " filtered_count: 22228\r\n", " CAS:\r\n", " facet: CAS\r\n", " filtered_count: 28938\r\n", " KEGG_COMPOUND:\r\n", " facet: KEGG_COMPOUND\r\n", " filtered_count: 19870\r\n", " Beilstein:\r\n", " facet: Beilstein\r\n", " filtered_count: 9187\r\n", " IUPAC:\r\n", " facet: IUPAC\r\n", " filtered_count: 61013\r\n", " ChemIDplus:\r\n", " facet: ChemIDplus\r\n", " filtered_count: 33383\r\n", " UniProt:\r\n", " facet: UniProt\r\n", " filtered_count: 16047\r\n", " LINCS:\r\n", " facet: LINCS\r\n", " filtered_count: 41392\r\n", " Drug_Central:\r\n", " facet: Drug_Central\r\n", " filtered_count: 3784\r\n", " DrugCentral:\r\n", " facet: DrugCentral\r\n", " filtered_count: 6202\r\n", " Wikipedia:\r\n", "--\r\n", "mapping_statement_count_subject_by_object_source:\r\n", " BFO:\r\n", " facet: BFO\r\n", " filtered_count: 1\r\n", " RO:\r\n", " facet: RO\r\n", " filtered_count: 1\r\n", " KNApSAcK:\r\n", " facet: KNApSAcK\r\n", " filtered_count: 5091\r\n", " KEGG:\r\n", " facet: KEGG\r\n", " filtered_count: 20233\r\n", " CAS:\r\n", " facet: CAS\r\n", " filtered_count: 28615\r\n", " KEGG_COMPOUND:\r\n", " facet: KEGG_COMPOUND\r\n", " filtered_count: 19870\r\n", " Beilstein:\r\n", " facet: Beilstein\r\n", " filtered_count: 8704\r\n", " IUPAC:\r\n", " facet: IUPAC\r\n", " filtered_count: 61013\r\n", " ChemIDplus:\r\n", " facet: ChemIDplus\r\n", " filtered_count: 33383\r\n", " UniProt:\r\n", " facet: UniProt\r\n", " filtered_count: 16047\r\n", " LINCS:\r\n", " facet: LINCS\r\n", " filtered_count: 41389\r\n", " Drug_Central:\r\n", " facet: Drug_Central\r\n", " filtered_count: 3783\r\n", " DrugCentral:\r\n", " facet: DrugCentral\r\n", " filtered_count: 6202\r\n", " Wikipedia:\r\n" ] } ], "source": [ "!grep -A40 ^mapping_statement_count output/chebi.stats.yaml" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-03-27T00:30:20.378726Z", "start_time": "2024-03-27T00:30:20.237175Z" } }, "id": "688b55507ca72f41" }, { "cell_type": "markdown", "source": [ "As expected, CHEBI does not make use of SKOS mapping predicates, and mappings\n", "are dominated by databases like KEGG, CAS.\n" ], "metadata": { "collapsed": false }, "id": "65c80c02acc5a77d" }, { "cell_type": "markdown", "source": [ "## TSV Output\n", "\n", "YAML is not a very natural format for doing further data science or statistical processing.\n", "\n", "FOr that we can use the `csv` option (which actually defaults to tsv...)" ], "metadata": { "collapsed": false }, "id": "61767f28d19545d" }, { "cell_type": "code", "execution_count": 9, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:root:bad mapping: KEGG_COMPOUND\r\n", "WARNING:root:bad mapping: IUPAC\r\n", "WARNING:root:bad mapping: ChemIDplus\r\n", "WARNING:root:bad mapping: UniProt\r\n", "WARNING:root:bad mapping: DrugCentral\r\n", "WARNING:root:bad mapping: LINCS\r\n", "WARNING:root:bad mapping: KEGG_DRUG\r\n", "WARNING:root:bad mapping: ChEBI\r\n", "WARNING:root:bad mapping: ChEMBL\r\n", "WARNING:root:bad mapping: DrugBank\r\n", "WARNING:root:bad mapping: WHO_MedNet\r\n", "WARNING:root:bad mapping: PDBeChem\r\n", "WARNING:root:bad mapping: NIST_Chemistry_WebBook\r\n", "WARNING:root:bad mapping: LIPID_MAPS\r\n", "WARNING:root:bad mapping: IUPHAR\r\n", "WARNING:root:bad mapping: HMDB\r\n", "WARNING:root:bad mapping: SUBMITTER\r\n", "WARNING:root:bad mapping: MetaCyc\r\n", "WARNING:root:bad mapping: JCBN\r\n", "WARNING:root:bad mapping: GlyTouCan\r\n", "WARNING:root:bad mapping: KNApSAcK\r\n", "WARNING:root:bad mapping: IUBMB\r\n", "WARNING:root:bad mapping: EMBL\r\n", "WARNING:root:bad mapping: CBN\r\n", "WARNING:root:bad mapping: Alan_Wood's_Pesticides\r\n", "WARNING:root:bad mapping: GlyGen\r\n", "WARNING:root:bad mapping: PPDB\r\n", "WARNING:root:bad mapping: KEGG_GLYCAN\r\n", "WARNING:root:bad mapping: RESID\r\n", "WARNING:root:bad mapping: PubChem\r\n", "WARNING:root:bad mapping: FooDB\r\n", "WARNING:root:bad mapping: VSDB\r\n", "WARNING:root:bad mapping: UM-BBD\r\n", "WARNING:root:bad mapping: MolBase\r\n", "WARNING:root:bad mapping: COMe\r\n", "WARNING:root:bad mapping: EBI_Industry_Programme\r\n", "WARNING:root:bad mapping: EuroFIR\r\n", "WARNING:root:bad mapping: Beilstein\r\n", "WARNING:root:bad mapping: Patent\r\n", "WARNING:root:bad mapping: PDB\r\n", "WARNING:root:bad mapping: SMID\r\n" ] } ], "source": [ "chebi statistics -o output/chebi.stats.tsv -O csv" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-03-27T00:07:55.650586Z", "start_time": "2024-03-27T00:07:30.421752Z" } }, "id": "d35e47fb825f3f00" }, { "cell_type": "markdown", "source": [ "To illustrate this we will use pandas:" ], "metadata": { "collapsed": false }, "id": "88340e60db1a177b" }, { "cell_type": "code", "execution_count": 11, "outputs": [ { "data": { "text/plain": " id compared_with agents class_count deprecated_class_count \\\n0 AllOntologies NaN NaN 185295 18628 \n\n non_deprecated_class_count class_count_with_text_definitions \\\n0 166667 53049 \n\n class_count_without_text_definitions object_property_count \\\n0 132246 10 \n\n annotation_property_count ... \\\n0 37 ... \n\n mapping_statement_count_subject_by_object_source_CTX \\\n0 3 \n\n mapping_statement_count_subject_by_object_source_SMID \\\n0 307 \n\n class_count_by_subset_1_STAR class_count_by_subset_2_STAR \\\n0 2945 102919 \n\n class_count_by_subset_3_STAR was_generated_by_started_at_time \\\n0 60803 2024-03-26T17:07:33.778117 \n\n was_generated_by_was_associated_with was_generated_by_acted_on_behalf_of \\\n0 OAK cjm \n\n ontologies_id ontologies_version \n0 obo:chebi.owl obo:chebi/226/chebi.owl \n\n[1 rows x 177 columns]", "text/html": "
\n | id | \ncompared_with | \nagents | \nclass_count | \ndeprecated_class_count | \nnon_deprecated_class_count | \nclass_count_with_text_definitions | \nclass_count_without_text_definitions | \nobject_property_count | \nannotation_property_count | \n... | \nmapping_statement_count_subject_by_object_source_CTX | \nmapping_statement_count_subject_by_object_source_SMID | \nclass_count_by_subset_1_STAR | \nclass_count_by_subset_2_STAR | \nclass_count_by_subset_3_STAR | \nwas_generated_by_started_at_time | \nwas_generated_by_was_associated_with | \nwas_generated_by_acted_on_behalf_of | \nontologies_id | \nontologies_version | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \nAllOntologies | \nNaN | \nNaN | \n185295 | \n18628 | \n166667 | \n53049 | \n132246 | \n10 | \n37 | \n... | \n3 | \n307 | \n2945 | \n102919 | \n60803 | \n2024-03-26T17:07:33.778117 | \nOAK | \ncjm | \nobo:chebi.owl | \nobo:chebi/226/chebi.owl | \n
1 rows × 177 columns
\n\n | Property | \nValue | \n
---|---|---|
0 | \nid | \nAllOntologies | \n
1 | \ncompared_with | \nNaN | \n
2 | \nagents | \nNaN | \n
3 | \nclass_count | \n185295 | \n
4 | \ndeprecated_class_count | \n18628 | \n
5 | \nnon_deprecated_class_count | \n166667 | \n
6 | \nclass_count_with_text_definitions | \n53049 | \n
7 | \nclass_count_without_text_definitions | \n132246 | \n
8 | \nobject_property_count | \n10 | \n
9 | \nannotation_property_count | \n37 | \n
10 | \nnamed_individual_count | \n0 | \n
11 | \nsubset_count | \n3 | \n
12 | \nrdf_triple_count | \n6158555 | \n
13 | \nsubclass_of_axiom_count | \n330989 | \n
14 | \nequivalent_classes_axiom_count | \n0 | \n
15 | \nentailed_edge_count_by_predicate | \n{} | \n
16 | \ndistinct_synonym_count | \n332744 | \n
17 | \nsynonym_statement_count | \n346486 | \n
18 | \nclass_count_by_category | \n{} | \n
19 | \ncontributor_summary | \n{} | \n
20 | \nchange_summary | \n{} | \n
21 | \nmerged_class_query | \n18559 | \n
22 | \ndeprecated_property_count | \n0 | \n
23 | \nedge_count_by_predicate_BFO:0000051 | \n4003 | \n
24 | \nedge_count_by_predicate_RO:0000087 | \n43082 | \n
25 | \nedge_count_by_predicate_has_functional_parent | \n18664 | \n
26 | \nedge_count_by_predicate_has_parent_hydride | \n1764 | \n
27 | \nedge_count_by_predicate_is_conjugate_acid_of | \n8434 | \n
28 | \nedge_count_by_predicate_is_conjugate_base_of | \n8434 | \n
29 | \nedge_count_by_predicate_is_enantiomer_of | \n2728 | \n
30 | \nedge_count_by_predicate_is_substituent_group_from | \n1284 | \n
31 | \nedge_count_by_predicate_is_tautomer_of | \n1884 | \n
32 | \nedge_count_by_predicate_rdfs:subClassOf | \n240712 | \n
33 | \nedge_count_by_predicate_rdfs:subPropertyOf | \n6 | \n
34 | \nsynonym_statement_count_by_predicate_hasExactS... | \n100585 | \n
35 | \nsynonym_statement_count_by_predicate_hasRelate... | \n234002 | \n
36 | \nmapping_statement_count_by_predicate_hasDbXref | \n317151 | \n
37 | \nmapping_statement_count_by_object_source_BFO | \n1 | \n
38 | \nmapping_statement_count_by_object_source_RO | \n1 | \n
39 | \nmapping_statement_count_by_object_source_KNApSAcK | \n5152 | \n
\n | id | \ncompared_with | \nagents | \nclass_count | \ndeprecated_class_count | \nnon_deprecated_class_count | \nclass_count_with_text_definitions | \nclass_count_without_text_definitions | \nobject_property_count | \nannotation_property_count | \n... | \nclass_count_by_subset_non_informative | \nclass_count_by_subset_organ_slim | \nclass_count_by_subset_pheno_slim | \nclass_count_by_subset_phenotype_rcn | \nclass_count_by_subset_uberon_slim | \nclass_count_by_subset_unverified_taxonomic_grouping | \nclass_count_by_subset_upper_level | \nclass_count_by_subset_vertebrate_core | \nmapping_statement_count_by_object_source_GOREL | \nmapping_statement_count_subject_by_object_source_GOREL | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n<http | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n1 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
1 | \n<https | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n1 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
2 | \nBFO | \nNaN | \nNaN | \n15 | \n0 | \n15 | \n9 | \n6 | \n6 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
3 | \nBSPO | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n24 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
4 | \nCARO | \nNaN | \nNaN | \n20 | \n0 | \n20 | \n20 | \n0 | \n0 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
5 | \nCHEBI | \nNaN | \nNaN | \n123 | \n0 | \n123 | \n18 | \n105 | \n0 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
6 | \nCL | \nNaN | \nNaN | \n2969 | \n249 | \n2720 | \n2555 | \n414 | \n3 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
7 | \nGO | \nNaN | \nNaN | \n7265 | \n2 | \n7263 | \n7264 | \n1 | \n0 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
8 | \nIAO | \nNaN | \nNaN | \n6 | \n0 | \n6 | \n4 | \n2 | \n0 | \n23 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
9 | \nNCBITaxon | \nNaN | \nNaN | \n138 | \n0 | \n138 | \n0 | \n138 | \n0 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
10 | \nOMO | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n2 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
11 | \nPATO | \nNaN | \nNaN | \n185 | \n0 | \n185 | \n184 | \n1 | \n0 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
12 | \nPR | \nNaN | \nNaN | \n748 | \n0 | \n748 | \n747 | \n1 | \n0 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
13 | \nRO | \nNaN | \nNaN | \n1 | \n0 | \n1 | \n1 | \n0 | \n240 | \n16 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
14 | \nUBERON | \nNaN | \nNaN | \n4670 | \n0 | \n4670 | \n4308 | \n362 | \n0 | \n0 | \n... | \n47.0 | \n136.0 | \n1373.0 | \n3.0 | \n809.0 | \n1.0 | \n49.0 | \n448.0 | \nNaN | \nNaN | \n
15 | \ncito | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n1 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
16 | \ndce | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n6 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
17 | \ndcterms | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n7 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
18 | \nfoaf | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n2 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
19 | \nobo | \nNaN | \nNaN | \n10 | \n10 | \n0 | \n0 | \n10 | \n24 | \n116 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n2.0 | \n2.0 | \n
20 | \noio | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n61 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
21 | \nowl | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n1 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
22 | \nrdfs | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n3 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
23 | \nskos | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n1 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
24 | \nxsd | \nNaN | \nNaN | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n0 | \n... | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \nNaN | \n
25 rows × 512 columns
\n" }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"output/cl.stats.grouped.tsv\", sep=\"\\t\")\n", "df" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-03-27T00:36:40.903385Z", "start_time": "2024-03-27T00:36:40.861941Z" } }, "id": "206115e9a39e8ddf" }, { "cell_type": "markdown", "source": [ "Here we can see the numbers broken down by ontology. The number of classes in the CL row is now accurate.\n", "Note of course that the other numbers don't reflect totals for the external ontology as a whole -- it's\n", "just the number that has been merged into CL\n" ], "metadata": { "collapsed": false }, "id": "12c9fcd0a9363258" }, { "cell_type": "markdown", "source": [ "## Diff stats\n", "\n", "You can also use `--compare-with` to compare stats with a different release of an ontology. Note this\n", "is effictively the same as running `diff` with `--statistics`. See diff docs for details." ], "metadata": { "collapsed": false }, "id": "bc153c4a21629345" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false }, "id": "76d6c523e691af29" } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }