{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0a28b88d-4deb-4d0a-a110-f27adf077e23",
   "metadata": {},
   "source": [
    "# OAK validate-definitions command\n",
    "\n",
    "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n",
    "\n",
    "This notebook provides examples for the `validate-definitions` command.\n",
    "This forms part of a suite of *validate* commands.\n",
    "    \n",
    "## Help Option\n",
    "\n",
    "You can get help on any OAK command using `--help`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "c223f678-f82f-4b06-8e19-1a5b7323e571",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-04-15T00:50:27.966036Z",
     "start_time": "2024-04-15T00:50:25.530846Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Usage: runoak validate-definitions [OPTIONS] [TERMS]...\n",
      "\n",
      "  Checks presence and structure of text definitions.\n",
      "\n",
      "  To run:\n",
      "\n",
      "      runoak validate-definitions -i db/uberon.db -o results.tsv\n",
      "\n",
      "  By default this will apply basic text mining of text definitions to check\n",
      "  against machine actionable OBO text definition guideline rules. This can\n",
      "  result in an initial lag - to skip this, and ONLY perform checks for\n",
      "  *presence* of definitions, use --skip-text-annotation:\n",
      "\n",
      "  Example: -------\n",
      "\n",
      "      runoak validate-definitions -i db/uberon.db --skip-text-annotation\n",
      "\n",
      "  Like most OAK commands, this accepts lists of terms or term queries as\n",
      "  arguments. You can pass in a CURIE list to selectively validate individual\n",
      "  classes\n",
      "\n",
      "  Example: -------\n",
      "\n",
      "       runoak validate-definitions -i db/cl.db CL:0002053\n",
      "\n",
      "  Only on CL identifiers:\n",
      "\n",
      "      runoak validate-definitions -i db/cl.db i^CL:\n",
      "\n",
      "  Only on neuron hierarchy:\n",
      "\n",
      "      runoak validate-definitions -i db/cl.db .desc//p=i neuron\n",
      "\n",
      "  Output format:\n",
      "\n",
      "  This command emits objects conforming to the OAK validation datamodel. See\n",
      "  https://incatools.github.io/ontology-access-kit/datamodels for more on OAK\n",
      "  datamodels.\n",
      "\n",
      "  The default serialization of the datamodel is CSV.\n",
      "\n",
      "  Notes: -----\n",
      "\n",
      "  This command is largely redundant with the validate command, but is useful\n",
      "  for targeted validation focused solely on definitions\n",
      "\n",
      "Options:\n",
      "  --skip-text-annotation / --no-skip-text-annotation\n",
      "                                  If true, do not parse text annotations\n",
      "                                  [default: no-skip-text-annotation]\n",
      "  -C, --configuration-file TEXT   Path to a configuration file. This is\n",
      "                                  typically a YAML file, but may be a JSON\n",
      "                                  file\n",
      "  --adapter-mapping TEXT          Multiple prefix=selector pairs, e.g.\n",
      "                                  --adapter-mapping uberon=db/uberon.db\n",
      "  -O, --output-type TEXT          Desired output type\n",
      "  -o, --output FILENAME           Output file, e.g. obo file\n",
      "  --help                          Show this message and exit.\n"
     ]
    }
   ],
   "source": [
    "!runoak validate-definitions --help"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01f38163-db22-4c51-ae46-10e8b8e6d53c",
   "metadata": {},
   "source": [
    "## Example: Validation over Test Ontology\n",
    "\n",
    "To illustrate this command we will use a deliberately altered version of a subset of GO.\n",
    "\n",
    "We will query the subset that are descendants of cellular process using the query `.desc//p=i \"cellular_component\"`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c9b86e52-87a7-449c-baac-81981e7ce632",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-04-15T00:50:30.655424Z",
     "start_time": "2024-04-15T00:50:27.968820Z"
    }
   },
   "outputs": [],
   "source": [
    "!runoak -i simpleobo:input/validate-defs-test.obo validate-definitions -C input/validate-definition-conf.yaml .desc//p=i \"cellular_component\" -o output/validate-definitions.output.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27c1668fc8d1a8de",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "The output is a TSV file with a summary of the issues found.\n",
    "\n",
    "We can load this into a pandas dataframe for further analysis. This also has the advantage of\n",
    "displaying tables nicely in Jupyter notebooks such as this one.\n",
    "\n",
    "If you were actually using this on the command line you may prefer to use your own TSV processing tools,\n",
    "or to simply load into google sheets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "5fc9b15d-cc81-400a-8660-f92491baa120",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-04-15T00:50:30.953116Z",
     "start_time": "2024-04-15T00:50:30.658190Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | severity\n",
       " | instantiates\n",
       " | predicate\n",
       " | object\n",
       " | object_str\n",
       " | source\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043231\n",
       " | intracellular membrane-bounded organelle\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | Organized structure of distinctive morphology ...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 1\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043231\n",
       " | intracellular membrane-bounded organelle\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 2\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043231\n",
       " | intracellular membrane-bounded organelle\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 3\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0099568\n",
       " | cytoplasmic region\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | Any (proper) part of the cytoplasm of a single...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 4\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0099738\n",
       " | cell cortex region\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | complete extent of cell cortex\n",
       " | NaN\n",
       " | Did not match whole text: cell cortex < comple...\n",
       " | 
\n",
       "    \n",
       "      | 5\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0099738\n",
       " | cell cortex region\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | underlies some some region of the plasma membrane\n",
       " | NaN\n",
       " | Wrong position, 'cell cortex' not in 'underlie...\n",
       " | 
\n",
       "    \n",
       "      | 6\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0071944\n",
       " | cell periphery\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | The part of a cell encompassing the cell corte...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 7\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0031090\n",
       " | organelle membrane\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | is one of the two lipid bilayers of an organel...\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 8\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043229\n",
       " | intracellular organelle\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | Organized structure of distinctive morphology ...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 9\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043229\n",
       " | intracellular organelle\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 10\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043229\n",
       " | intracellular organelle\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 11\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0031967\n",
       " | organelle envelope\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A double membrane structure enclosing an organ...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 12\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0031975\n",
       " | envelope\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A multilayered structure surrounding all or pa...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 13\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0098590\n",
       " | plasma membrane region\n",
       " | INFO\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A membrane that is a (regional) part of the pl...\n",
       " | NaN\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 14\n",
       " | oaklib.om:DCC#S0\n",
       " | GO:0012505\n",
       " | endomembrane system\n",
       " | ERROR\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Missing text definition\n",
       " | 
\n",
       "    \n",
       "      | 15\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0005622\n",
       " | intracellular anatomical structure\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A component of a cell contained within (but no...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 16\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:9999998\n",
       " | fake term for testing pmid type\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | fake definition to test retracted typo in refe...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 17\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043227\n",
       " | membrane-bounded organelle\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | Organized structure of distinctive morphology ...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 18\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043227\n",
       " | membrane-bounded organelle\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 19\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0005938\n",
       " | cell cortex\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | region of a cell\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 20\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0005938\n",
       " | cell cortex\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | lies just beneath the plasma membrane and ofte...\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 21\n",
       " | oaklib.om:DCC#S7\n",
       " | GO:0009579\n",
       " | thylakoid\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | The structure in a plant cell that is known as...\n",
       " | NaN\n",
       " | Circular, thylakoid (GO:0009579 in definition\n",
       " | 
\n",
       "    \n",
       "      | 22\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:9999999\n",
       " | fake term for testing retraction\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | fake definition to test retracted reference\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 23\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0005575\n",
       " | cellular_component\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A location, relative to cellular compartments ...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 24\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0005634\n",
       " | nucleus\n",
       " | INFO\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A membrane-bounded organelle of eukaryotic cel...\n",
       " | NaN\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 25\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0016020\n",
       " | membrane\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A lipid bilayer along with all the proteins an...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 26\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0110165\n",
       " | cellular anatomical entity\n",
       " | INFO\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A part of a cellular organism that is either a...\n",
       " | NaN\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 27\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0005635\n",
       " | nuclear envelope\n",
       " | INFO\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A double lipid bilayer that is part of the nuc...\n",
       " | NaN\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 28\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0005886\n",
       " | plasma membrane\n",
       " | INFO\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | The membrane surrounding a cell that separates...\n",
       " | NaN\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 29\n",
       " | oaklib.om:DCC#S1\n",
       " | GO:0005773\n",
       " | vacuole\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Definiendum should not appear at the start\n",
       " | 
\n",
       "    \n",
       "      | 30\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0031965\n",
       " | nuclear membrane\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | envelope\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 31\n",
       " | oaklib.om:DCC#S1\n",
       " | GO:0005737\n",
       " | cytoplasm\n",
       " | NaN\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | Definiendum should not appear at the start\n",
       " | 
\n",
       "    \n",
       "      | 32\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0034357\n",
       " | photosynthetic membrane\n",
       " | INFO\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | A membrane enriched in complexes formed of rea...\n",
       " | NaN\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 33\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043226\n",
       " | organelle\n",
       " | WARNING\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | NaN\n",
       " | Organized structure of distinctive morphology ...\n",
       " | NaN\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 34\n",
       " | oaklib.om:DCC#S20.1\n",
       " | GO:9999998\n",
       " | fake term for testing pmid type\n",
       " | ERROR\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | PMID:9999999999999\n",
       " | fake definition to test retracted typo in refe...\n",
       " | NaN\n",
       " | publication not found: PMID:9999999999999\n",
       " | 
\n",
       "    \n",
       "      | 35\n",
       " | oaklib.om:DCC#S20.2\n",
       " | GO:9999999\n",
       " | fake term for testing retraction\n",
       " | ERROR\n",
       " | NaN\n",
       " | IAO:0000115\n",
       " | PMID:19717156\n",
       " | NaN\n",
       " | NaN\n",
       " | publication is retracted: A role for plasma tr...\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | counts\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | oaklib.om:DCC#Any\n",
       " | 6\n",
       " | 
\n",
       "    \n",
       "      | 1\n",
       " | oaklib.om:DCC#S0\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 2\n",
       " | oaklib.om:DCC#S1\n",
       " | 2\n",
       " | 
\n",
       "    \n",
       "      | 3\n",
       " | oaklib.om:DCC#S11\n",
       " | 10\n",
       " | 
\n",
       "    \n",
       "      | 4\n",
       " | oaklib.om:DCC#S20.1\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 5\n",
       " | oaklib.om:DCC#S20.2\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 6\n",
       " | oaklib.om:DCC#S3\n",
       " | 14\n",
       " | 
\n",
       "    \n",
       "      | 7\n",
       " | oaklib.om:DCC#S7\n",
       " | 1\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | object_str\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043231\n",
       " | intracellular membrane-bounded organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 1\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043231\n",
       " | intracellular membrane-bounded organelle\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 2\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043231\n",
       " | intracellular membrane-bounded organelle\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 3\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0099568\n",
       " | cytoplasmic region\n",
       " | Any (proper) part of the cytoplasm of a single...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 4\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0099738\n",
       " | cell cortex region\n",
       " | complete extent of cell cortex\n",
       " | Did not match whole text: cell cortex < comple...\n",
       " | 
\n",
       "    \n",
       "      | 5\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0099738\n",
       " | cell cortex region\n",
       " | underlies some some region of the plasma membrane\n",
       " | Wrong position, 'cell cortex' not in 'underlie...\n",
       " | 
\n",
       "    \n",
       "      | 6\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0071944\n",
       " | cell periphery\n",
       " | The part of a cell encompassing the cell corte...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 7\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0031090\n",
       " | organelle membrane\n",
       " | is one of the two lipid bilayers of an organel...\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 8\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043229\n",
       " | intracellular organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 9\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043229\n",
       " | intracellular organelle\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 10\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043229\n",
       " | intracellular organelle\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 11\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0031967\n",
       " | organelle envelope\n",
       " | A double membrane structure enclosing an organ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 12\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0031975\n",
       " | envelope\n",
       " | A multilayered structure surrounding all or pa...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 13\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0098590\n",
       " | plasma membrane region\n",
       " | A membrane that is a (regional) part of the pl...\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 14\n",
       " | oaklib.om:DCC#S0\n",
       " | GO:0012505\n",
       " | endomembrane system\n",
       " | NaN\n",
       " | Missing text definition\n",
       " | 
\n",
       "    \n",
       "      | 15\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0005622\n",
       " | intracellular anatomical structure\n",
       " | A component of a cell contained within (but no...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 16\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:9999998\n",
       " | fake term for testing pmid type\n",
       " | fake definition to test retracted typo in refe...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 17\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043227\n",
       " | membrane-bounded organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 18\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0043227\n",
       " | membrane-bounded organelle\n",
       " | NaN\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 19\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0005938\n",
       " | cell cortex\n",
       " | region of a cell\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 20\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0005938\n",
       " | cell cortex\n",
       " | lies just beneath the plasma membrane and ofte...\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 21\n",
       " | oaklib.om:DCC#S7\n",
       " | GO:0009579\n",
       " | thylakoid\n",
       " | The structure in a plant cell that is known as...\n",
       " | Circular, thylakoid (GO:0009579 in definition\n",
       " | 
\n",
       "    \n",
       "      | 22\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:9999999\n",
       " | fake term for testing retraction\n",
       " | fake definition to test retracted reference\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 23\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0005575\n",
       " | cellular_component\n",
       " | A location, relative to cellular compartments ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 24\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0005634\n",
       " | nucleus\n",
       " | A membrane-bounded organelle of eukaryotic cel...\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 25\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0016020\n",
       " | membrane\n",
       " | A lipid bilayer along with all the proteins an...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 26\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0110165\n",
       " | cellular anatomical entity\n",
       " | A part of a cellular organism that is either a...\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 27\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0005635\n",
       " | nuclear envelope\n",
       " | A double lipid bilayer that is part of the nuc...\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 28\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0005886\n",
       " | plasma membrane\n",
       " | The membrane surrounding a cell that separates...\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 29\n",
       " | oaklib.om:DCC#S1\n",
       " | GO:0005773\n",
       " | vacuole\n",
       " | NaN\n",
       " | Definiendum should not appear at the start\n",
       " | 
\n",
       "    \n",
       "      | 30\n",
       " | oaklib.om:DCC#S11\n",
       " | GO:0031965\n",
       " | nuclear membrane\n",
       " | envelope\n",
       " | Logical definition element not found in text: ...\n",
       " | 
\n",
       "    \n",
       "      | 31\n",
       " | oaklib.om:DCC#S1\n",
       " | GO:0005737\n",
       " | cytoplasm\n",
       " | NaN\n",
       " | Definiendum should not appear at the start\n",
       " | 
\n",
       "    \n",
       "      | 32\n",
       " | oaklib.om:DCC#Any\n",
       " | GO:0034357\n",
       " | photosynthetic membrane\n",
       " | A membrane enriched in complexes formed of rea...\n",
       " | No problems with definition\n",
       " | 
\n",
       "    \n",
       "      | 33\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043226\n",
       " | organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 34\n",
       " | oaklib.om:DCC#S20.1\n",
       " | GO:9999998\n",
       " | fake term for testing pmid type\n",
       " | fake definition to test retracted typo in refe...\n",
       " | publication not found: PMID:9999999999999\n",
       " | 
\n",
       "    \n",
       "      | 35\n",
       " | oaklib.om:DCC#S20.2\n",
       " | GO:9999999\n",
       " | fake term for testing retraction\n",
       " | NaN\n",
       " | publication is retracted: A role for plasma tr...\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | object_str\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 14\n",
       " | oaklib.om:DCC#S0\n",
       " | GO:0012505\n",
       " | endomembrane system\n",
       " | NaN\n",
       " | Missing text definition\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | object_str\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043231\n",
       " | intracellular membrane-bounded organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 3\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0099568\n",
       " | cytoplasmic region\n",
       " | Any (proper) part of the cytoplasm of a single...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 4\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0099738\n",
       " | cell cortex region\n",
       " | complete extent of cell cortex\n",
       " | Did not match whole text: cell cortex < comple...\n",
       " | 
\n",
       "    \n",
       "      | 6\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0071944\n",
       " | cell periphery\n",
       " | The part of a cell encompassing the cell corte...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 8\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043229\n",
       " | intracellular organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 11\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0031967\n",
       " | organelle envelope\n",
       " | A double membrane structure enclosing an organ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 12\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0031975\n",
       " | envelope\n",
       " | A multilayered structure surrounding all or pa...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 15\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0005622\n",
       " | intracellular anatomical structure\n",
       " | A component of a cell contained within (but no...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 16\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:9999998\n",
       " | fake term for testing pmid type\n",
       " | fake definition to test retracted typo in refe...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 17\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043227\n",
       " | membrane-bounded organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 22\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:9999999\n",
       " | fake term for testing retraction\n",
       " | fake definition to test retracted reference\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 23\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0005575\n",
       " | cellular_component\n",
       " | A location, relative to cellular compartments ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 25\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0016020\n",
       " | membrane\n",
       " | A lipid bilayer along with all the proteins an...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "    \n",
       "      | 33\n",
       " | oaklib.om:DCC#S3\n",
       " | GO:0043226\n",
       " | organelle\n",
       " | Organized structure of distinctive morphology ...\n",
       " | Cannot parse genus and differentia\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | object_str\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 21\n",
       " | oaklib.om:DCC#S7\n",
       " | GO:0009579\n",
       " | thylakoid\n",
       " | The structure in a plant cell that is known as...\n",
       " | Circular, thylakoid (GO:0009579 in definition\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | object_str\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 29\n",
       " | oaklib.om:DCC#S1\n",
       " | GO:0005773\n",
       " | vacuole\n",
       " | NaN\n",
       " | Definiendum should not appear at the start\n",
       " | 
\n",
       "    \n",
       "      | 31\n",
       " | oaklib.om:DCC#S1\n",
       " | GO:0005737\n",
       " | cytoplasm\n",
       " | NaN\n",
       " | Definiendum should not appear at the start\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | object_str\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 34\n",
       " | oaklib.om:DCC#S20.1\n",
       " | GO:9999998\n",
       " | fake term for testing pmid type\n",
       " | fake definition to test retracted typo in refe...\n",
       " | publication not found: PMID:9999999999999\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | type\n",
       " | subject\n",
       " | subject_label\n",
       " | object_str\n",
       " | info\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 35\n",
       " | oaklib.om:DCC#S20.2\n",
       " | GO:9999999\n",
       " | fake term for testing retraction\n",
       " | NaN\n",
       " | publication is retracted: A role for plasma tr...\n",
       " | 
\n",
       "  \n",
       "
\n",
       "