{ "cells": [ { "cell_type": "markdown", "id": "beb3c9fb", "metadata": {}, "source": [ "# OAK Developers Tutorial\n", "\n", "This tutorial is primarily for **Python Developers** who wish to use OAK in their applications. These include\n", "applications such as:\n", "\n", "- building ontology-driven data portals\n", "- creating curation tools\n", "- data science and machine learning applications\n", "- web services\n", "\n", "Some basic knowledge of the overall architecture and capabilities of OAK is assumed.\n", "\n", "You may want to start with the slides on the command line here: https://doi.org/10.5281/zenodo.7708963\n", "\n", "Or part 1 of the tutorial here: https://incatools.github.io/ontology-access-kit/intro/tutorial01.html\n", "\n", "There is a video of the walkthrough of this tutorial: https://www.youtube.com/watch?v=nVTWazO_Gu0\n", "\n", "\n", "## How to follow this tutorial\n", "\n", "The easiest way to run this tutorial is to clone the repo and run locally:\n", "\n", "1. clone the repo here https://github.com/INCATools/ontology-access-kit/\n", "2. cd ontology-access-kit\n", "3. poetry install\n", "4. poetry run jupyter notebook\n", "\n", "Alternatively, everything here should work on a fresh install of oak from pypi. You will need to make sure the test files from [tests/input](https://github.com/INCATools/ontology-access-kit/tree/main/tests/input) are accessible.\n", "\n", "Some of the examples work with these test files - others will work with versions of ontologies on the web." ] }, { "cell_type": "markdown", "id": "4c66dd5c", "metadata": {}, "source": [ "### Change directory so that test files are directly accessible\n", "\n", "Note: this is necessary if you are running from a checkout of the OAK repo, since this notebook is in a subfolder" ] }, { "cell_type": "code", "execution_count": 1, "id": "45adb248", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/cjm/repos/ontology-access-kit\n" ] } ], "source": [ "%cd .." ] }, { "cell_type": "markdown", "id": "aad26aed", "metadata": {}, "source": [ "The OAK documentation makes heavy use of some of the unit test files in the [tests/input](https://github.com/INCATools/ontology-access-kit/tree/main/tests/input) folder.\n", "\n", "This include a little mini test subset of GO, available in different formats for the purposes of testing different adapters:" ] }, { "cell_type": "code", "execution_count": 2, "id": "e2438fb1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tests/input/go-nucleus.cx tests/input/go-nucleus.ofn\r\n", "tests/input/go-nucleus.db tests/input/go-nucleus.owl\r\n", "tests/input/go-nucleus.json tests/input/go-nucleus.owl.ttl\r\n", "tests/input/go-nucleus.obo\r\n" ] } ], "source": [ "!ls tests/input/go-nucleus.*" ] }, { "cell_type": "markdown", "id": "8c0c62a0", "metadata": {}, "source": [ "If you are an OAK core developer it helps to be aware of these files, as you will likely be writing new unit tests.\n", "\n", "If one the other hand you just want to use OAK in your own code you don't need to know anything about these except that they are handy for quick testing." ] }, { "cell_type": "markdown", "id": "dae5de64", "metadata": {}, "source": [ "## Running Examples from the OAK sphinx docs\n", "\n", "The sphinx docs include code examples, these are visible from the `>>>`s\n", "\n", "For example, in:\n", "\n", "https://incatools.github.io/ontology-access-kit/packages/interfaces/basic\n", "\n", "You can see sections like this:\n", "\n", "![images/oak-docs-code.png](images/oak-docs-code.png)\n", "\n", "If you click the \"copy\" button it will copy the code only (no `>>>`s, and no output) such that you\n", "can paste directly into a Python REPL or a notebook (provided the paths are preserved).\n", "\n", "E.g. try copying this section from the docs\n", "\n", "```python\n", ">>> from oaklib import get_adapter\n", ">>> adapter = get_adapter('tests/input/go-nucleus.db')\n", ">>> print(adapter.label(\"GO:0005634\"))\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "id": "2ec55470", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nucleus\n" ] } ], "source": [ "from oaklib import get_adapter\n", "adapter = get_adapter('tests/input/go-nucleus.db')\n", "print(adapter.label(\"GO:0005634\"))" ] }, { "cell_type": "markdown", "id": "1fa26abb", "metadata": {}, "source": [ "Hurray! All examples throughout the OAK docs should work" ] }, { "cell_type": "markdown", "id": "1733bae5", "metadata": {}, "source": [ "Note that you can play with using different input formats - this should give the same results:" ] }, { "cell_type": "code", "execution_count": 6, "id": "34e9a5a1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nucleus\n" ] } ], "source": [ "from oaklib import get_adapter\n", "adapter = get_adapter('tests/input/go-nucleus.obo')\n", "print(adapter.label(\"GO:0005634\"))" ] }, { "cell_type": "markdown", "id": "b85e1771", "metadata": {}, "source": [ "## Working with whole ontologies\n", "\n", "Almost all the examples in this tutorial make use of pre-made sqlite versions of ontologies.\n", "\n", "These are specified using [selector](https://incatools.github.io/ontology-access-kit/packages/selectors.html) syntax:\n", "\n", "```\n", "sqlite:obo:ONTID\n", "```\n", "\n", "E.g.\n", "\n", "```\n", "sqlite:obo:cl\n", "```\n", "\n", "When you use this for the first time it will download and cache the file (using pystow), so there may be an initial lag" ] }, { "cell_type": "code", "execution_count": 8, "id": "1f324799", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "neuron\n" ] } ], "source": [ "from oaklib import get_adapter\n", "adapter = get_adapter(\"sqlite:obo:cl\")\n", "print(adapter.label(\"CL:0000540\"))" ] }, { "cell_type": "markdown", "id": "a53f8987", "metadata": {}, "source": [ "Hurray! We successfully fetched the label (name) for a class ID in the cell ontology!\n", "\n", "Note that the `label` method is part of the BasicOntologyInterface" ] }, { "cell_type": "markdown", "id": "bd705733", "metadata": {}, "source": [ "### BasicOntologyInterface\n", "\n", "The [BasicOntologyInterface](https://incatools.github.io/ontology-access-kit/packages/interfaces/basic) provides basic methods that encompass the majority of what most people need to do when\n", "working with ontologies - lookups of various kinds as well as simple graph operations.\n", "\n", "OAK has the architectural concept of separating interfaces from implementations. It helps to read about this concept, but for now you don't need to worry about it. We are using the sql adapter which fully *implements* almost all the existing OAK interfaces" ] }, { "cell_type": "markdown", "id": "e84e5425", "metadata": {}, "source": [ "### Fetching ancestors\n", "\n", "Next we are going to fetch ancestors. \n", "\n", "Note: it would help to review [tutorial part 1](https://incatools.github.io/ontology-access-kit/intro/tutorial01.html) to understand basic concepts of edges, ancestors, and predicates." ] }, { "cell_type": "code", "execution_count": 9, "id": "67273969", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:0000540 neuron\n", "BFO:0000002 continuant\n", "BFO:0000004 independent continuant\n", "BFO:0000040 material entity\n", "CL:0000000 cell\n", "CL:0000003 native cell\n", "CL:0000211 electrically active cell\n", "CL:0000255 eukaryotic cell\n", "CL:0000393 electrically responsive cell\n", "CL:0000404 electrically signaling cell\n", "CL:0000540 neuron\n", "CL:0000548 animal cell\n", "CL:0002319 neural cell\n", "CL:0002371 somatic cell\n" ] } ], "source": [ "from oaklib.datamodels.vocabulary import IS_A\n", "for anc in adapter.ancestors(\"CL:0000540\", predicates=[IS_A]):\n", " print(anc, adapter.label(anc))" ] }, { "cell_type": "markdown", "id": "91474cd7", "metadata": {}, "source": [ "### Fetching descendants\n", "\n", "Let's try working with descendants.\n", "\n", "This time we are going to demonstrate how OAK deals with collections.\n", "\n", "__note__ _we expect the following code **not** to work_" ] }, { "cell_type": "code", "execution_count": 10, "id": "954b22b5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROBLEM: object of type 'generator' has no len()\n" ] } ], "source": [ "neurons = adapter.descendants(\"CL:0000540\", predicates=[IS_A])\n", "try:\n", " print(len(neurons))\n", "except(Exception) as e:\n", " print(f\"PROBLEM: {e}\")" ] }, { "cell_type": "markdown", "id": "265d0cd9", "metadata": {}, "source": [ "Why didn't this work? What does `object of type 'generator' has no len()` mean?\n", "\n", "To understand why we will mention a key concept in OAK, that of the iterator" ] }, { "cell_type": "markdown", "id": "ae574c65", "metadata": {}, "source": [ "### Iterators\n", "\n", "OAK methods rarely return lists - instead they return iterators. This means that code is better adaptable to use cases\n", "where you want to work with potentially very large lists or you want to *stream* results. See:\n", "\n", "See [best practice](https://incatools.github.io/ontology-access-kit/packages/best-practice.html#iterators)\n", "\n", "However, if you don't care about this you can simple use `list(...)` to get a list:" ] }, { "cell_type": "code", "execution_count": 11, "id": "4239552a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "454\n" ] } ], "source": [ "neurons = list(adapter.descendants(\"CL:0000540\", predicates=[IS_A]))\n", "print(len(neurons))" ] }, { "cell_type": "markdown", "id": "818c83f9", "metadata": {}, "source": [ "You can also cast to `set()` to use set operations like intersections.\n", "\n", "For example, let's say we want to compose our neuron query above with a query to fetch all things in the forebrain,\n", "to get all neurons in the forebrain:" ] }, { "cell_type": "code", "execution_count": 12, "id": "695eef7c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "219\n" ] } ], "source": [ "from oaklib.datamodels.vocabulary import IS_A, PART_OF\n", "parts_of_forebrain = set(adapter.descendants(\"UBERON:0001890\", predicates=[IS_A, PART_OF]))\n", "print(len(parts_of_forebrain))" ] }, { "cell_type": "markdown", "id": "942af1e8", "metadata": {}, "source": [ "You may be wondering what Uberon terms are doing here given that we requested the cell ontology in `get_adapter`\n", "\n", "One under-appreciated fact of OBO is that many ontologies are in fact mini \"knowledge graphs\", linking out to nodes in other ontologies. See \n", "\n", " - [extracting using robot](https://oboacademy.github.io/obook/tutorial/robot-tutorial-1/) (OBO Academy)\n", " - [owl format variants](https://oboacademy.github.io/obook/explanation/owl-format-variants/) (OBO Academy)\n", " - [OAK basics](https://incatools.github.io/ontology-access-kit/guide/basics.html) (OAK Guide)\n", " \n", " \n", " OK next lets do an **intersection** between the two lists" ] }, { "cell_type": "code", "execution_count": 13, "id": "96a1a814", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:1001435 periglomerular cell\n", "CL:4023040 L2/3-6 intratelencephalic projecting glutamatergic cortical neuron\n", "CL:1001571 hippocampal pyramidal neuron\n", "CL:1001502 mitral cell\n", "CL:1001434 olfactory bulb interneuron\n", "CL:4023048 L4/5 intratelencephalic projecting glutamatergic neuron of the primary motor cortex\n", "CL:1001505 parvocellular neurosecretory cell\n", "CL:4023008 intratelencephalic-projecting glutamatergic cortical neuron\n", "CL:4023049 L5 intratelencephalic projecting glutamatergic neuron of the primary motor cortex\n", "CL:4023047 L2/3 intratelencephalic projecting glutamatergic neuron of the primary motor cortex\n", "CL:4023081 inverted L6 intratelencephalic projecting glutamatergic neuron of the primary motor cortex (Mmus)\n", "CL:4023050 L6 intratelencephalic projecting glutamatergic neuron of the primary motor cortex\n", "CL:1001503 olfactory bulb tufted cell\n", "CL:4023080 stellate L6 intratelencephalic projecting glutamatergic neuron of the primary motor cortex (Mmus)\n" ] } ], "source": [ "for cell in parts_of_forebrain.intersection(neurons):\n", " print(cell, adapter.label(cell))" ] }, { "cell_type": "markdown", "id": "562170d3", "metadata": {}, "source": [ "This is **all the neurons that are part of the forebrain**.\n", "\n", "Readers familiar with OWL and Protege may like to think of this as similar to a DL query for the expression `neuron and part-of some forebrain` -- there are some theoretical differences we won't get into here but for practical purposes the results should be the same." ] }, { "cell_type": "markdown", "id": "615f4deb", "metadata": {}, "source": [ "### Relationships\n", "\n", "The above example uses the ancestors and descendants query in [BasicOntologyInterface](https://incatools.github.io/ontology-access-kit/packages/interfaces/basic).\n", "\n", "We can also get the relationships using the `relationships` method:" ] }, { "cell_type": "code", "execution_count": 15, "id": "b61c3832", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('CL:0000540', 'RO:0002215', 'GO:0019226')\n", "('CL:0000540', 'rdfs:subClassOf', 'BFO:0000040')\n", "('CL:0000540', 'rdfs:subClassOf', 'CL:0000393')\n", "('CL:0000540', 'rdfs:subClassOf', 'CL:0000404')\n", "('CL:0000540', 'rdfs:subClassOf', 'CL:0002319')\n" ] } ], "source": [ "for rel in adapter.relationships([\"CL:0000540\"]):\n", " print(rel)" ] }, { "cell_type": "code", "execution_count": 17, "id": "eb1e98a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " RO:0002215 capable of GO:0019226 transmission of nerve impulse\n", " rdfs:subClassOf None BFO:0000040 material entity\n", " rdfs:subClassOf None CL:0000393 electrically responsive cell\n", " rdfs:subClassOf None CL:0000404 electrically signaling cell\n", " rdfs:subClassOf None CL:0002319 neural cell\n" ] } ], "source": [ "for _s, p, o in adapter.relationships([\"CL:0000540\"]):\n", " print(f\" {p} {adapter.label(p)} {o} {adapter.label(o)}\")" ] }, { "cell_type": "code", "execution_count": 18, "id": "e13fd38c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " rdfs:subClassOf FROM: CL:0000028 CNS neuron (sensu Nematoda and Protostomia)\n", " rdfs:subClassOf FROM: CL:0000029 neural crest derived neuron\n", " rdfs:subClassOf FROM: CL:0000099 interneuron\n", " rdfs:subClassOf FROM: CL:0000102 polymodal neuron\n", " rdfs:subClassOf FROM: CL:0000104 multipolar neuron\n", " rdfs:subClassOf FROM: CL:0000105 pseudounipolar neuron\n", " rdfs:subClassOf FROM: CL:0000106 unipolar neuron\n", " rdfs:subClassOf FROM: CL:0000108 cholinergic neuron\n", " rdfs:subClassOf FROM: CL:0000109 adrenergic neuron\n", " rdfs:subClassOf FROM: CL:0000110 peptidergic neuron\n" ] } ], "source": [ "for s, p, _o in list(adapter.relationships(objects=[\"CL:0000540\"]))[0:10]:\n", " print(f\" {p} FROM: {s} {adapter.label(s)}\")" ] }, { "cell_type": "code", "execution_count": 14, "id": "049cfd10", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BFO:0000002 continuant\n", "BFO:0000004 independent continuant\n", "BFO:0000040 material entity\n", "CARO:0000000 anatomical entity\n", "CARO:0000006 material anatomical entity\n", "CARO:0030000 biological entity\n", "UBERON:0000061 anatomical structure\n", "UBERON:0000465 material anatomical entity\n", "UBERON:0000467 anatomical system\n", "UBERON:0000468 multicellular organism\n", "UBERON:0001016 nervous system\n", "UBERON:0001062 anatomical entity\n", "UBERON:0010000 multicellular anatomical structure\n" ] } ], "source": [ "for _s, _p, o in adapter.relationships([\"CL:0000540\"], predicates=[PART_OF], include_entailed=True):\n", " print(o, adapter.label(o))" ] }, { "cell_type": "markdown", "id": "d478e8fb", "metadata": {}, "source": [ "This includes all the entailed part-of relationships from neuron, including trivial ones (\"every neuron is part of a material entity\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "b52e0519", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:0000705 R6 photoreceptor cell\n", "CL:4023108 oxytocin-secreting magnocellular cell\n", "CL:0004240 WF1 amacrine cell\n", "CL:0004242 WF3-1 amacrine cell\n", "CL:1000380 type 1 vestibular sensory cell of epithelium of macula of saccule of membranous labyrinth\n", "CL:1001582 lateral ventricle neuron\n", "CL:4023128 rostral periventricular region of the third ventricle KDNy neuron\n", "CL:0003020 retinal ganglion cell C outer\n", "CL:4023094 tufted pyramidal neuron\n", "CL:4023057 cerebellar inhibitory GABAergic interneuron\n" ] } ], "source": [ "for s, _p, o in list(adapter.relationships(objects=[\"CL:0000540\"], predicates=[IS_A], include_entailed=True))[0:10]:\n", " print(s, adapter.label(s))" ] }, { "cell_type": "markdown", "id": "d07c30cc", "metadata": {}, "source": [ "### Creating a Data Frame for Relationships\n", "\n", "Next we will see how to create a small data frame for relationships for forebrain neurons:" ] }, { "cell_type": "code", "execution_count": 20, "id": "f3afbee7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ss_labelpp_labeloo_label
0CL:1001434olfactory bulb interneuronBFO:0000050part ofUBERON:0002264olfactory bulb
1CL:1001434olfactory bulb interneuronRO:0002100has soma locationUBERON:0002264olfactory bulb
2CL:1001434olfactory bulb interneuronrdfs:subClassOfNoneCL:0000101sensory neuron
3CL:1001434olfactory bulb interneuronrdfs:subClassOfNoneCL:0000402CNS interneuron
4CL:1001434olfactory bulb interneuronrdfs:subClassOfNoneCL:0012001neuron of the forebrain
5CL:1001435periglomerular cellRO:0002100has soma locationUBERON:0005377olfactory bulb glomerular layer
6CL:1001435periglomerular cellrdfs:subClassOfNoneCL:1001434olfactory bulb interneuron
7CL:1001502mitral cellRO:0002100has soma locationUBERON:0004186olfactory bulb mitral cell layer
8CL:1001502mitral cellrdfs:subClassOfNoneCL:1001434olfactory bulb interneuron
9CL:1001503olfactory bulb tufted cellBFO:0000050part ofUBERON:0005376olfactory bulb external plexiform layer
10CL:1001503olfactory bulb tufted cellrdfs:subClassOfNoneCARO:0000000anatomical entity
11CL:1001503olfactory bulb tufted cellrdfs:subClassOfNoneCL:0000540neuron
12CL:1001505parvocellular neurosecretory cellBFO:0000050part ofUBERON:0001930paraventricular nucleus of hypothalamus
13CL:1001505parvocellular neurosecretory cellRO:0002215capable ofGO:0030103vasopressin secretion
14CL:1001505parvocellular neurosecretory cellrdfs:subClassOfNoneCARO:0000000anatomical entity
15CL:1001505parvocellular neurosecretory cellrdfs:subClassOfNoneCL:0000167peptide hormone secreting cell
16CL:1001505parvocellular neurosecretory cellrdfs:subClassOfNoneCL:0000381neurosecretory neuron
17CL:1001505parvocellular neurosecretory cellrdfs:subClassOfNoneCL:2000030hypothalamus cell
18CL:1001571hippocampal pyramidal neuronBFO:0000050part ofUBERON:0002313hippocampus pyramidal layer
19CL:1001571hippocampal pyramidal neuronrdfs:subClassOfNoneCL:0002608hippocampal neuron
20CL:1001571hippocampal pyramidal neuronrdfs:subClassOfNoneCL:4023111cerebral cortex pyramidal neuron
21CL:4023008intratelencephalic-projecting glutamatergic co...RO:0000053bearer ofPATO:0070034intratelencephalic projecting
22CL:4023008intratelencephalic-projecting glutamatergic co...rdfs:subClassOfNoneCL:0000679glutamatergic neuron
23CL:4023008intratelencephalic-projecting glutamatergic co...rdfs:subClassOfNoneCL:0010012cerebral cortex neuron
24CL:4023040L2/3-6 intratelencephalic projecting glutamate...rdfs:subClassOfNoneCL:4023008intratelencephalic-projecting glutamatergic co...
25CL:4023047L2/3 intratelencephalic projecting glutamaterg...RO:0002100has soma locationUBERON:0001384primary motor cortex
26CL:4023047L2/3 intratelencephalic projecting glutamaterg...RO:0002100has soma locationUBERON:8440000cortical layer II/III
27CL:4023047L2/3 intratelencephalic projecting glutamaterg...rdfs:subClassOfNoneCL:4023040L2/3-6 intratelencephalic projecting glutamate...
28CL:4023048L4/5 intratelencephalic projecting glutamaterg...RO:0002100has soma locationUBERON:0001384primary motor cortex
29CL:4023048L4/5 intratelencephalic projecting glutamaterg...RO:0002100has soma locationUBERON:8440001cortical layer IV/V
30CL:4023048L4/5 intratelencephalic projecting glutamaterg...rdfs:subClassOfNoneCL:4023040L2/3-6 intratelencephalic projecting glutamate...
31CL:4023049L5 intratelencephalic projecting glutamatergic...RO:0002100has soma locationUBERON:0001384primary motor cortex
32CL:4023049L5 intratelencephalic projecting glutamatergic...RO:0002100has soma locationUBERON:0005394cortical layer V
33CL:4023049L5 intratelencephalic projecting glutamatergic...rdfs:subClassOfNoneCL:4023040L2/3-6 intratelencephalic projecting glutamate...
34CL:4023050L6 intratelencephalic projecting glutamatergic...RO:0000053bearer ofPATO:0070019untufted pyramidal morphology
35CL:4023050L6 intratelencephalic projecting glutamatergic...RO:0002100has soma locationUBERON:0005395cortical layer VI
36CL:4023050L6 intratelencephalic projecting glutamatergic...rdfs:subClassOfNoneCL:2000049primary motor cortex pyramidal cell
37CL:4023050L6 intratelencephalic projecting glutamatergic...rdfs:subClassOfNoneCL:4023040L2/3-6 intratelencephalic projecting glutamate...
38CL:4023080stellate L6 intratelencephalic projecting glut...RO:0000053bearer ofPATO:0070020stellate pyramidal morphology
39CL:4023080stellate L6 intratelencephalic projecting glut...rdfs:subClassOfNoneCL:4023050L6 intratelencephalic projecting glutamatergic...
40CL:4023081inverted L6 intratelencephalic projecting glut...RO:0000053bearer ofPATO:0070021inverted pyramidal morphology
41CL:4023081inverted L6 intratelencephalic projecting glut...rdfs:subClassOfNoneCL:4023050L6 intratelencephalic projecting glutamatergic...
\n", "
" ], "text/plain": [ " s s_label \\\n", "0 CL:1001434 olfactory bulb interneuron \n", "1 CL:1001434 olfactory bulb interneuron \n", "2 CL:1001434 olfactory bulb interneuron \n", "3 CL:1001434 olfactory bulb interneuron \n", "4 CL:1001434 olfactory bulb interneuron \n", "5 CL:1001435 periglomerular cell \n", "6 CL:1001435 periglomerular cell \n", "7 CL:1001502 mitral cell \n", "8 CL:1001502 mitral cell \n", "9 CL:1001503 olfactory bulb tufted cell \n", "10 CL:1001503 olfactory bulb tufted cell \n", "11 CL:1001503 olfactory bulb tufted cell \n", "12 CL:1001505 parvocellular neurosecretory cell \n", "13 CL:1001505 parvocellular neurosecretory cell \n", "14 CL:1001505 parvocellular neurosecretory cell \n", "15 CL:1001505 parvocellular neurosecretory cell \n", "16 CL:1001505 parvocellular neurosecretory cell \n", "17 CL:1001505 parvocellular neurosecretory cell \n", "18 CL:1001571 hippocampal pyramidal neuron \n", "19 CL:1001571 hippocampal pyramidal neuron \n", "20 CL:1001571 hippocampal pyramidal neuron \n", "21 CL:4023008 intratelencephalic-projecting glutamatergic co... \n", "22 CL:4023008 intratelencephalic-projecting glutamatergic co... \n", "23 CL:4023008 intratelencephalic-projecting glutamatergic co... \n", "24 CL:4023040 L2/3-6 intratelencephalic projecting glutamate... \n", "25 CL:4023047 L2/3 intratelencephalic projecting glutamaterg... \n", "26 CL:4023047 L2/3 intratelencephalic projecting glutamaterg... \n", "27 CL:4023047 L2/3 intratelencephalic projecting glutamaterg... \n", "28 CL:4023048 L4/5 intratelencephalic projecting glutamaterg... \n", "29 CL:4023048 L4/5 intratelencephalic projecting glutamaterg... \n", "30 CL:4023048 L4/5 intratelencephalic projecting glutamaterg... \n", "31 CL:4023049 L5 intratelencephalic projecting glutamatergic... \n", "32 CL:4023049 L5 intratelencephalic projecting glutamatergic... \n", "33 CL:4023049 L5 intratelencephalic projecting glutamatergic... \n", "34 CL:4023050 L6 intratelencephalic projecting glutamatergic... \n", "35 CL:4023050 L6 intratelencephalic projecting glutamatergic... \n", "36 CL:4023050 L6 intratelencephalic projecting glutamatergic... \n", "37 CL:4023050 L6 intratelencephalic projecting glutamatergic... \n", "38 CL:4023080 stellate L6 intratelencephalic projecting glut... \n", "39 CL:4023080 stellate L6 intratelencephalic projecting glut... \n", "40 CL:4023081 inverted L6 intratelencephalic projecting glut... \n", "41 CL:4023081 inverted L6 intratelencephalic projecting glut... \n", "\n", " p p_label o \\\n", "0 BFO:0000050 part of UBERON:0002264 \n", "1 RO:0002100 has soma location UBERON:0002264 \n", "2 rdfs:subClassOf None CL:0000101 \n", "3 rdfs:subClassOf None CL:0000402 \n", "4 rdfs:subClassOf None CL:0012001 \n", "5 RO:0002100 has soma location UBERON:0005377 \n", "6 rdfs:subClassOf None CL:1001434 \n", "7 RO:0002100 has soma location UBERON:0004186 \n", "8 rdfs:subClassOf None CL:1001434 \n", "9 BFO:0000050 part of UBERON:0005376 \n", "10 rdfs:subClassOf None CARO:0000000 \n", "11 rdfs:subClassOf None CL:0000540 \n", "12 BFO:0000050 part of UBERON:0001930 \n", "13 RO:0002215 capable of GO:0030103 \n", "14 rdfs:subClassOf None CARO:0000000 \n", "15 rdfs:subClassOf None CL:0000167 \n", "16 rdfs:subClassOf None CL:0000381 \n", "17 rdfs:subClassOf None CL:2000030 \n", "18 BFO:0000050 part of UBERON:0002313 \n", "19 rdfs:subClassOf None CL:0002608 \n", "20 rdfs:subClassOf None CL:4023111 \n", "21 RO:0000053 bearer of PATO:0070034 \n", "22 rdfs:subClassOf None CL:0000679 \n", "23 rdfs:subClassOf None CL:0010012 \n", "24 rdfs:subClassOf None CL:4023008 \n", "25 RO:0002100 has soma location UBERON:0001384 \n", "26 RO:0002100 has soma location UBERON:8440000 \n", "27 rdfs:subClassOf None CL:4023040 \n", "28 RO:0002100 has soma location UBERON:0001384 \n", "29 RO:0002100 has soma location UBERON:8440001 \n", "30 rdfs:subClassOf None CL:4023040 \n", "31 RO:0002100 has soma location UBERON:0001384 \n", "32 RO:0002100 has soma location UBERON:0005394 \n", "33 rdfs:subClassOf None CL:4023040 \n", "34 RO:0000053 bearer of PATO:0070019 \n", "35 RO:0002100 has soma location UBERON:0005395 \n", "36 rdfs:subClassOf None CL:2000049 \n", "37 rdfs:subClassOf None CL:4023040 \n", "38 RO:0000053 bearer of PATO:0070020 \n", "39 rdfs:subClassOf None CL:4023050 \n", "40 RO:0000053 bearer of PATO:0070021 \n", "41 rdfs:subClassOf None CL:4023050 \n", "\n", " o_label \n", "0 olfactory bulb \n", "1 olfactory bulb \n", "2 sensory neuron \n", "3 CNS interneuron \n", "4 neuron of the forebrain \n", "5 olfactory bulb glomerular layer \n", "6 olfactory bulb interneuron \n", "7 olfactory bulb mitral cell layer \n", "8 olfactory bulb interneuron \n", "9 olfactory bulb external plexiform layer \n", "10 anatomical entity \n", "11 neuron \n", "12 paraventricular nucleus of hypothalamus \n", "13 vasopressin secretion \n", "14 anatomical entity \n", "15 peptide hormone secreting cell \n", "16 neurosecretory neuron \n", "17 hypothalamus cell \n", "18 hippocampus pyramidal layer \n", "19 hippocampal neuron \n", "20 cerebral cortex pyramidal neuron \n", "21 intratelencephalic projecting \n", "22 glutamatergic neuron \n", "23 cerebral cortex neuron \n", "24 intratelencephalic-projecting glutamatergic co... \n", "25 primary motor cortex \n", "26 cortical layer II/III \n", "27 L2/3-6 intratelencephalic projecting glutamate... \n", "28 primary motor cortex \n", "29 cortical layer IV/V \n", "30 L2/3-6 intratelencephalic projecting glutamate... \n", "31 primary motor cortex \n", "32 cortical layer V \n", "33 L2/3-6 intratelencephalic projecting glutamate... \n", "34 untufted pyramidal morphology \n", "35 cortical layer VI \n", "36 primary motor cortex pyramidal cell \n", "37 L2/3-6 intratelencephalic projecting glutamate... \n", "38 stellate pyramidal morphology \n", "39 L6 intratelencephalic projecting glutamatergic... \n", "40 inverted pyramidal morphology \n", "41 L6 intratelencephalic projecting glutamatergic... " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "forebrain_neurons = parts_of_forebrain.intersection(neurons)\n", "\n", "objs = []\n", "for s, p, o in adapter.relationships(forebrain_neurons):\n", " objs.append({\"s\": s, \"s_label\": adapter.label(s), \n", " \"p\": p, \"p_label\": adapter.label(p),\n", " \"o\": o, \"o_label\": adapter.label(o)})\n", "\n", "df = pd.DataFrame(objs)\n", "df" ] }, { "cell_type": "markdown", "id": "7802cfcc", "metadata": {}, "source": [ "### Aliases\n", "\n", "The BasicOntologyInterface has a deliberately simple datamodel for aliases that can be expressed by returning\n", "simple strings and tuples. Later on we can see how to leverage the more advanced OBO Graphs data model" ] }, { "cell_type": "code", "execution_count": 21, "id": "567b4685", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['nerve cell', 'neuron']" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adapter.entity_aliases(\"CL:0000540\")" ] }, { "cell_type": "code", "execution_count": 22, "id": "75ad933b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rdfs:label neuron\n", "oio:hasExactSynonym nerve cell\n" ] } ], "source": [ "for pred, alias in adapter.alias_relationships(\"CL:0000540\"):\n", " print(pred, alias)" ] }, { "cell_type": "markdown", "id": "cce14612", "metadata": {}, "source": [ "### Mappings\n", "\n", "Similar to aliases, the BasicOntologyInterface has a very simple model of mappings. Later on we will see how we\n", "can use the MappingProviderInterface to get more granular information." ] }, { "cell_type": "code", "execution_count": 25, "id": "a3af89a5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "oio:hasDbXref FMA:62364\n" ] } ], "source": [ "for pred, xref in adapter.simple_mappings_by_curie(\"CL:0000202\"):\n", " print(pred, xref)" ] }, { "cell_type": "markdown", "id": "236fd724", "metadata": {}, "source": [ "## Subsets\n", "\n", "See [Subsets](https://incatools.github.io/ontology-access-kit/glossary.html#term-Subset) in the OAK Glossary.\n", "\n", "Subsets allow terms to be placed into groups outside the hierarchy for different purposes.\n", "\n", "To illustrate we will switch our example to use GO which has a rich variety of subsets" ] }, { "cell_type": "code", "execution_count": 26, "id": "2d23990d", "metadata": {}, "outputs": [], "source": [ "go_adapter = get_adapter(\"sqlite:obo:go\")" ] }, { "cell_type": "code", "execution_count": 27, "id": "b32ece7e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chebi_ph7_3\n", "3_STAR\n", "1_STAR\n", "goslim_plant\n", "goslim_pir\n", "goslim_flybase_ribbon\n", "goslim_chembl\n", "goslim_agr\n", "goslim_metagenomics\n", "goslim_yeast\n", "goslim_pombe\n", "gocheck_do_not_annotate\n", "goslim_generic\n", "goslim_drosophila\n", "goslim_candida\n", "prokaryote_subset\n", "gocheck_do_not_manually_annotate\n", "goslim_synapse\n", "goslim_mouse\n", "SOFA\n", "Alliance_of_Genome_Resources\n", "biosapiens\n" ] } ], "source": [ "for subset in go_adapter.subsets():\n", " print(subset)" ] }, { "cell_type": "markdown", "id": "4a0941b7", "metadata": {}, "source": [ "note this includes subsets for ontologies that have been merged in" ] }, { "cell_type": "code", "execution_count": 28, "id": "f3c814cc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GO:0000228 nuclear chromosome\n", "GO:0000278 mitotic cell cycle\n", "GO:0000910 cytokinesis\n", "GO:0001618 virus receptor activity\n", "GO:0002181 cytoplasmic translation\n", "GO:0002376 immune system process\n", "GO:0003012 muscle system process\n", "GO:0003013 circulatory system process\n", "GO:0003014 renal system process\n", "GO:0003016 respiratory system process\n" ] } ], "source": [ "for e in list(go_adapter.subset_members(\"goslim_generic\"))[0:10]:\n", " print(e, go_adapter.label(e))" ] }, { "cell_type": "markdown", "id": "930cd6df", "metadata": {}, "source": [ "### Plotting how GO subsets inter-relate\n", "\n", "Now we are ready for a simple mini application - showing commonalities between " ] }, { "cell_type": "code", "execution_count": 29, "id": "072dce81", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "\n", "sets = []\n", "\n", "for subset in go_adapter.subsets():\n", " members = set([x for x in go_adapter.subset_members(subset) if x.startswith(\"GO:\")])\n", " if members:\n", " sets.append((subset, members))\n", "\n", "\n", "# Number of sets\n", "N = len(sets)\n", "\n", "# Initialize an empty matrix to store the number of members in common\n", "intersection_matrix = np.zeros((N, N))\n", "\n", "# Calculate the intersections between each pair of sets\n", "for i in range(N):\n", " for j in range(N):\n", " intersection_matrix[i, j] = len(sets[i][1].intersection(sets[j][1]))\n", "\n", "# Get the set names\n", "set_names = [s[0] for s in sets]\n", "\n", "# Create a pandas DataFrame with the intersection matrix and set names as index and columns\n", "intersection_df = pd.DataFrame(intersection_matrix, index=set_names, columns=set_names)\n", "\n", "# Plot the clustermap with dendrograms\n", "sns.clustermap(intersection_df, annot=True, cmap='viridis', fmt='g', figsize=(8, 6))\n", "plt.title('Clustermap of Common Members Between Sets with Dendrograms', y=1.03)\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "beb3469c", "metadata": {}, "source": [ "## Search Interface\n", "\n", "So far we have been doing basic lookup information, assuming we know the ID in advance.\n", "\n", "What if we don't know the ID but just have a label, or if we don't even have a particular concept in mind, and just want to search?\n", "\n", "If so, the [SearchInterface](https://incatools.github.io/ontology-access-kit/packages/interfaces/search.html) is your friend!" ] }, { "cell_type": "markdown", "id": "f9349be3", "metadata": {}, "source": [ "### Lookup by label" ] }, { "cell_type": "code", "execution_count": 30, "id": "58e1d5a5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:0000540\n" ] } ], "source": [ "for result in adapter.basic_search(\"neuron\"):\n", " print(result)" ] }, { "cell_type": "markdown", "id": "3f3b5459", "metadata": {}, "source": [ "now let's try searching for the capitalized form:" ] }, { "cell_type": "code", "execution_count": 31, "id": "708d0258", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(list(adapter.basic_search(\"Neuron\")))" ] }, { "cell_type": "markdown", "id": "1e63227b", "metadata": {}, "source": [ "uh oh!\n", "\n", "By design, the default is case sensitive. But we can pass a SearchConfiguration to make search more customizable.\n", "\n", "You can read more about the SearchConfiguration datamodel here:\n", "\n", "- https://w3id.org/oak/search\n", "\n", "__a note on data models__ the BasicOntologyInterface is designed to work without any particular data model, returning only simple lists and tuples. Other interfaces typically need to work with more sophisticated structures, so we use data models here." ] }, { "cell_type": "code", "execution_count": 32, "id": "795b40af", "metadata": {}, "outputs": [], "source": [ "from oaklib.datamodels.search import SearchConfiguration, SearchTermSyntax, SearchProperty" ] }, { "cell_type": "code", "execution_count": 33, "id": "c2ee9f4d", "metadata": {}, "outputs": [], "source": [ "config = SearchConfiguration(force_case_insensitive=True)" ] }, { "cell_type": "code", "execution_count": 34, "id": "492d2ab6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(list(adapter.basic_search(\"Neuron\", config)))" ] }, { "cell_type": "code", "execution_count": 35, "id": "cbcb7dd3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(list(adapter.basic_search(\"NeUrOn\", config)))" ] }, { "cell_type": "markdown", "id": "784a1c78", "metadata": {}, "source": [ "We can also do regexes, starts-with, ends with etc (but see below for caveat)" ] }, { "cell_type": "code", "execution_count": 37, "id": "3104e5cc", "metadata": {}, "outputs": [], "source": [ "config = SearchConfiguration(syntax=SearchTermSyntax.STARTS_WITH)" ] }, { "cell_type": "code", "execution_count": 38, "id": "4bb14054", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CARO:0001001 neuron projection bundle\n", "CL:0000006 neuronal receptor cell\n", "CL:0000095 neuron associated cell\n", "CL:0000123 neuron associated cell (sensu Vertebrata)\n", "CL:0000130 neuron associated cell (sensu Nematoda and Protostomia)\n", "CL:0000540 neuron\n", "CL:0000555 neuronal brush cell\n", "CL:0002611 neuron of the dorsal spinal cord\n", "CL:0002612 neuron of the ventral spinal cord\n", "CL:0002614 neuron of the substantia nigra\n", "CL:0012001 neuron of the forebrain\n", "GO:0001764 neuron migration\n", "GO:0019228 neuronal action potential\n", "GO:0030182 neuron differentiation\n", "GO:0031175 neuron projection development\n", "GO:0032589 neuron projection membrane\n", "GO:0042551 neuron maturation\n", "GO:0043005 neuron projection\n", "GO:0043025 neuronal cell body\n", "GO:0044306 neuron projection terminus\n", "GO:0048666 neuron development\n", "GO:0048812 neuron projection morphogenesis\n", "GO:0051402 neuron apoptotic process\n", "GO:0060705 neuron differentiation involved in salivary gland development\n", "GO:0070050 neuron cellular homeostasis\n", "GO:0070997 neuron death\n", "GO:0106027 neuron projection organization\n", "GO:0120111 neuron projection cytoplasm\n", "GO:0150099 neuron-glial cell signaling\n", "PATO:0070033 neuron projection quality\n", "PR:000005460 neuronal acetylcholine receptor subunit alpha-7\n", "PR:000044062 neuronal acetylcholine receptor subunit alpha-7, signal peptide removed form\n", "PR:000044063 neuronal acetylcholine receptor subunit alpha-7, signal peptide removed form (human)\n", "PR:P36544 neuronal acetylcholine receptor subunit alpha-7 (human)\n", "PR:P49582 neuronal acetylcholine receptor subunit alpha-7 (mouse)\n", "UBERON:0000122 neuron projection bundle\n", "UBERON:0004904 neuron projection bundle connecting eye with brain\n" ] } ], "source": [ "for result in adapter.basic_search(\"neuron\", config):\n", " print(result, adapter.label(result))" ] }, { "cell_type": "markdown", "id": "1fc847e6", "metadata": {}, "source": [ "now we can try a regex:" ] }, { "cell_type": "code", "execution_count": 39, "id": "8f36650c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CARO:0001001 neuron projection bundle\n", "CL:0000006 neuronal receptor cell\n", "CL:0000095 neuron associated cell\n", "CL:0000123 neuron associated cell (sensu Vertebrata)\n", "CL:0000130 neuron associated cell (sensu Nematoda and Protostomia)\n", "CL:0000540 neuron\n", "CL:0000555 neuronal brush cell\n", "CL:0002611 neuron of the dorsal spinal cord\n", "CL:0002612 neuron of the ventral spinal cord\n", "CL:0002614 neuron of the substantia nigra\n", "CL:0012001 neuron of the forebrain\n", "GO:0001764 neuron migration\n", "GO:0019228 neuronal action potential\n", "GO:0030182 neuron differentiation\n", "GO:0031175 neuron projection development\n", "GO:0032589 neuron projection membrane\n", "GO:0042551 neuron maturation\n", "GO:0043005 neuron projection\n", "GO:0043025 neuronal cell body\n", "GO:0044306 neuron projection terminus\n", "GO:0048666 neuron development\n", "GO:0048812 neuron projection morphogenesis\n", "GO:0051402 neuron apoptotic process\n", "GO:0060705 neuron differentiation involved in salivary gland development\n", "GO:0070050 neuron cellular homeostasis\n", "GO:0070997 neuron death\n", "GO:0106027 neuron projection organization\n", "GO:0120111 neuron projection cytoplasm\n", "GO:0150099 neuron-glial cell signaling\n", "PATO:0070033 neuron projection quality\n", "PR:000005460 neuronal acetylcholine receptor subunit alpha-7\n", "PR:000044062 neuronal acetylcholine receptor subunit alpha-7, signal peptide removed form\n", "PR:000044063 neuronal acetylcholine receptor subunit alpha-7, signal peptide removed form (human)\n", "PR:P36544 neuronal acetylcholine receptor subunit alpha-7 (human)\n", "PR:P49582 neuronal acetylcholine receptor subunit alpha-7 (mouse)\n", "UBERON:0000122 neuron projection bundle\n", "UBERON:0004904 neuron projection bundle connecting eye with brain\n" ] } ], "source": [ "config = SearchConfiguration(syntax=SearchTermSyntax.REGULAR_EXPRESSION)\n", "for result in adapter.basic_search(\"^neuron\", config):\n", " print(result, adapter.label(result))" ] }, { "cell_type": "markdown", "id": "f668d66e", "metadata": {}, "source": [ "### Caveat on regexes\n", "\n", "If your adapter is talking to sqlite, then the regex must be of a form that can be translated to a LIKE query\n", "\n", "(OAK takes care of this translation - as a developer you should only care about the interface, not implementation)\n", "\n", "In future we may have strategies to allow more powerful lexical search with sqlite..." ] }, { "cell_type": "markdown", "id": "0b4bbb16", "metadata": {}, "source": [ "### Searching on mapped identifiers\n", "\n", "You can search on arbitrary properties, such as synonyms or even mapped identifiers (`object_id` in SSSOM lingo)" ] }, { "cell_type": "code", "execution_count": 40, "id": "221c3e6a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:0000202 auditory hair cell\n", "CL:4023120 cochlea auditory hair cell\n" ] } ], "source": [ "config = SearchConfiguration(properties=[SearchProperty.MAPPED_IDENTIFIER])\n", "for result in adapter.basic_search(\"FMA:62364\", config):\n", " print(result, adapter.label(result))" ] }, { "cell_type": "markdown", "id": "a47980d5", "metadata": {}, "source": [ "### SSSOM Mappings\n", "\n", "Up above we saw that the default datamodel for mappings in OAK is simple. For more advanced operations, you can use:\n", "\n", "[MappingProviderInterface](https://incatools.github.io/ontology-access-kit/packages/interfaces/mapping-provider.html)\n", "\n", "This makes use of the https://w3id.org/sssom data model" ] }, { "cell_type": "code", "execution_count": 41, "id": "26eccc9f", "metadata": {}, "outputs": [], "source": [ "neurons = list(adapter.descendants(\"CL:0000540\", predicates=[IS_A]))\n", "mappings = list(adapter.sssom_mappings(neurons))" ] }, { "cell_type": "code", "execution_count": 42, "id": "eb41b4f8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "186" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(mappings)" ] }, { "cell_type": "code", "execution_count": 43, "id": "5da17c85", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mapping(subject_id='CL:0000099', predicate_id='oio:hasDbXref', object_id='BTO:0003811', mapping_justification='semapv:UnspecifiedMatching', subject_label=None, subject_category=None, predicate_label=None, predicate_modifier=None, object_label=None, object_category=None, author_id=[], author_label=[], reviewer_id=[], reviewer_label=[], creator_id=[], creator_label=[], license=None, subject_type=None, subject_source='CL', subject_source_version=None, object_type=None, object_source='BTO', object_source_version=None, mapping_provider=None, mapping_source=None, mapping_cardinality=None, mapping_tool=None, mapping_tool_version=None, mapping_date=None, confidence=None, curation_rule=[], curation_rule_text=[], subject_match_field=[], object_match_field=[], match_string=[], subject_preprocessing=[], object_preprocessing=[], semantic_similarity_score=None, semantic_similarity_measure=None, see_also=[], other=None, comment=None)\n" ] } ], "source": [ "print(mappings[0])" ] }, { "cell_type": "code", "execution_count": 44, "id": "d0196ae7", "metadata": {}, "outputs": [], "source": [ "from linkml_runtime.dumpers import yaml_dumper" ] }, { "cell_type": "code", "execution_count": 45, "id": "d916d6d9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "- subject_id: CL:0000099\n", " predicate_id: oio:hasDbXref\n", " object_id: BTO:0003811\n", " mapping_justification: semapv:UnspecifiedMatching\n", " subject_source: CL\n", " object_source: BTO\n", "- subject_id: CL:0000099\n", " predicate_id: oio:hasDbXref\n", " object_id: FBbt:00005125\n", " mapping_justification: semapv:UnspecifiedMatching\n", " subject_source: CL\n", " object_source: FBbt\n", "\n" ] } ], "source": [ "print(yaml_dumper.dumps(mappings[0:2]))" ] }, { "cell_type": "markdown", "id": "55b8c04c", "metadata": {}, "source": [ "## Text Annotation\n", "\n", "Interface: [TextAnnotatorInterface](https://incatools.github.io/ontology-access-kit/packages/interfaces/text-annotator.html)\n", "\n", "The text annotator uses the https://w3id.org/linkml/text-annotator data model. This models each annotation as an TextAnnotation object with fields such as subject_start and subject_end (marking the span in the text) and object_id and object_label (the matched concept):" ] }, { "cell_type": "code", "execution_count": 46, "id": "b61ee543", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18 21 CARO:0000013 cell\n", "18 21 CL:0000000 cell\n", "9 9 CHEBI:15339 A\n", "11 11 CHEBI:15428 G\n", "18 18 CHEBI:27594 C\n", "11 21 CL:0000160 goblet cell\n", "1 2 PR:000016301 TH\n", "1 2 PR:P07101 TH\n", "1 2 PR:P24529 Th\n", "1 2 UBERON:0001897 Th\n", "43 52 UBERON:0000483 epithelium\n", "32 52 UBERON:0001277 intestinal epithelium\n" ] } ], "source": [ "for ann in adapter.annotate_text(\"this is a goblet cell from the intestinal epithelium\"):\n", " print(ann.subject_start, ann.subject_end, ann.object_id, ann.object_label)" ] }, { "cell_type": "markdown", "id": "1b38acc0", "metadata": {}, "source": [ "### OBO Graph Interface\n", "\n", "[OboGraphInterface](https://incatools.github.io/ontology-access-kit/packages/interfaces/obograph.html)" ] }, { "cell_type": "code", "execution_count": 47, "id": "dc4ba581", "metadata": {}, "outputs": [], "source": [ "graph = adapter.ancestor_graph([\"CL:0000540\"], predicates=[IS_A, PART_OF])" ] }, { "cell_type": "code", "execution_count": 48, "id": "36a599dc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "25" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(graph.nodes)" ] }, { "cell_type": "code", "execution_count": 49, "id": "09979d95", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "29" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(graph.edges)" ] }, { "cell_type": "markdown", "id": "e43d9cb1", "metadata": {}, "source": [ "### Exporting subgraphs to GraphViz\n", "\n", "See also [part 5](https://incatools.github.io/ontology-access-kit/intro/tutorial05.html) of the tutorial" ] }, { "cell_type": "code", "execution_count": 50, "id": "62c2e672", "metadata": {}, "outputs": [], "source": [ "from oaklib.utilities.obograph_utils import graph_to_image" ] }, { "cell_type": "code", "execution_count": 51, "id": "a3c605ec", "metadata": {}, "outputs": [], "source": "graph_to_image(graph, seeds=[\"CL:0000540\"], imgfile=\"examples/output/neuron-v1.png\")" }, { "cell_type": "markdown", "id": "3200472b", "metadata": {}, "source": [ "![img](output/neuron-v1.png)" ] }, { "cell_type": "markdown", "id": "a0aa822e", "metadata": {}, "source": [ "### Adding a stylesheet\n", "\n", "The graph above is a little plain and boring looking. We can spice it up using a StyleMap.\n", "\n", "For now we will use the standard stylemap in [src/oaklib/conf/obograph-style.json](https://github.com/INCATools/ontology-access-kit/blob/main/src/oaklib/conf/obograph-style.json):" ] }, { "cell_type": "code", "execution_count": 52, "id": "deda20f6", "metadata": {}, "outputs": [], "source": [ "from oaklib.utilities.obograph_utils import default_stylemap_path" ] }, { "cell_type": "code", "execution_count": 53, "id": "edae4b80", "metadata": {}, "outputs": [], "source": "graph_to_image(graph, seeds=[\"CL:0000540\"], imgfile=\"examples/output/neuron-v2.png\", stylemap=default_stylemap_path())" }, { "cell_type": "markdown", "id": "09c581a6", "metadata": {}, "source": [ "![img](output/neuron-v2.png)" ] }, { "cell_type": "markdown", "id": "f53d9efb", "metadata": {}, "source": [ "## Working with annotations" ] }, { "cell_type": "code", "execution_count": 54, "id": "41dabf06", "metadata": {}, "outputs": [], "source": [ "hp = get_adapter(\"src/oaklib/conf/hpoa-g2p-input-spec.yaml\")" ] }, { "cell_type": "code", "execution_count": 55, "id": "3ea4ace2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "238269" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(list(hp.associations()))" ] }, { "cell_type": "code", "execution_count": 56, "id": "8c929abf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Association(subject='NCBIGene:ncbi_gene_id', predicate=None, object='hpo_id', property_values=[]) None\n", "Association(subject='NCBIGene:10', predicate=None, object='HP:0000007', property_values=[]) Autosomal recessive inheritance\n", "Association(subject='NCBIGene:10', predicate=None, object='HP:0001939', property_values=[]) Abnormality of metabolism/homeostasis\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0002460', property_values=[]) Distal muscle weakness\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0002451', property_values=[]) Limb dystonia\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0010871', property_values=[]) Sensory ataxia\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0009886', property_values=[]) Trichorrhexis nodosa\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0002421', property_values=[]) Poor head control\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0001298', property_values=[]) Encephalopathy\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0001290', property_values=[]) Generalized hypotonia\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0001273', property_values=[]) Abnormal corpus callosum morphology\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0001268', property_values=[]) Mental deterioration\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0002599', property_values=[]) Head titubation\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0001284', property_values=[]) Areflexia\n", "Association(subject='NCBIGene:16', predicate=None, object='HP:0001250', property_values=[]) Seizure\n" ] } ], "source": [ "for assoc in list(hp.associations())[0:15]:\n", " print(assoc, hp.label(assoc.object))" ] }, { "cell_type": "code", "execution_count": 58, "id": "ae02da0c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Fetch sensory ataxia genes (including those annotated to is-a descendants of the term)\n", "ataxia_assocs = list(hp.associations(objects=[\"HP:0010871\"], object_closure_predicates=[IS_A]))\n", "len(ataxia_assocs)" ] }, { "cell_type": "code", "execution_count": 59, "id": "17f238ef", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genes = list(set([assoc.subject for assoc in ataxia_assocs]))\n", "len(genes)" ] }, { "cell_type": "code", "execution_count": 60, "id": "8961221f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['NCBIGene:56652',\n", " 'NCBIGene:5428',\n", " 'NCBIGene:16',\n", " 'NCBIGene:1959',\n", " 'NCBIGene:57716']" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genes[0:5]" ] }, { "cell_type": "code", "execution_count": 61, "id": "d7773ee5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'UniProtKB:A0A024RDV7',\n", " 'UniProtKB:A0A140VJE4',\n", " 'UniProtKB:A0A2R8Y4V4',\n", " 'UniProtKB:A0A2R8Y746',\n", " 'UniProtKB:A0A2U3TZU2',\n", " 'UniProtKB:A0A514TP98',\n", " 'UniProtKB:A0A5F9ZI26',\n", " 'UniProtKB:A0A8I5KYI5',\n", " 'UniProtKB:A8KA82',\n", " 'UniProtKB:A8MU75',\n", " 'UniProtKB:B2RB38',\n", " 'UniProtKB:B4DE36',\n", " 'UniProtKB:E5KNU5',\n", " 'UniProtKB:E5KSY5',\n", " 'UniProtKB:O00505',\n", " 'UniProtKB:P06744',\n", " 'UniProtKB:P11161',\n", " 'UniProtKB:P25189',\n", " 'UniProtKB:P49588',\n", " 'UniProtKB:P54098',\n", " 'UniProtKB:P54802',\n", " 'UniProtKB:Q01453',\n", " 'UniProtKB:Q13217',\n", " 'UniProtKB:Q6FH25',\n", " 'UniProtKB:Q8TF17',\n", " 'UniProtKB:Q96K19',\n", " 'UniProtKB:Q96RR1',\n", " 'UniProtKB:Q9BXM0',\n", " 'UniProtKB:Q9H5I5',\n", " 'UniProtKB:Q9H6V3',\n", " 'UniProtKB:Q9Y5Y0'}" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "node_normalizer = get_adapter(\"translator:\")\n", "uniprot_ids = set()\n", "for gene in genes:\n", " for m in node_normalizer.sssom_mappings([gene], source=\"UniProtKB\"):\n", " uniprot_ids.add(m.object_id)\n", "uniprot_ids" ] }, { "cell_type": "code", "execution_count": 63, "id": "b14e0520", "metadata": {}, "outputs": [], "source": [ "go = get_adapter(\"src/oaklib/conf/go-human-input-spec.yaml\")" ] }, { "cell_type": "code", "execution_count": 64, "id": "10a687cc", "metadata": {}, "outputs": [], "source": [ "results = list(go.enriched_classes(uniprot_ids, object_closure_predicates=[IS_A, PART_OF], autolabel=True))" ] }, { "cell_type": "code", "execution_count": 66, "id": "b22cb405", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GO:0042552 'myelination' 9.72e-04\n", "GO:0008366 'axon ensheathment' 1.06e-03\n", "GO:0007272 'ensheathment of neurons' 1.06e-03\n", "GO:0007422 'peripheral nervous system development' 6.38e-03\n", "GO:0014037 'Schwann cell differentiation' 2.75e-02\n" ] } ], "source": [ "for result in results:\n", " print(f\"{result.class_id} '{result.class_label}' {result.p_value_adjusted:0.2e}\")" ] }, { "cell_type": "code", "execution_count": 67, "id": "6d382ca5", "metadata": {}, "outputs": [], "source": [ "terms = [r.class_id for r in results]\n", "graph = go.ancestor_graph(terms, predicates=[IS_A, PART_OF])" ] }, { "cell_type": "code", "execution_count": 68, "id": "a656d404", "metadata": {}, "outputs": [], "source": "graph_to_image(graph, seeds=terms, imgfile=\"examples/output/go-enrichment-from-hp.png\", stylemap=default_stylemap_path())" }, { "cell_type": "markdown", "id": "0e7d6fce", "metadata": {}, "source": [ "![img](output/go-enrichment-from-hp.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "117ca65c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }