{ "cells": [ { "cell_type": "markdown", "id": "b3b0dbee", "metadata": {}, "source": [ "# Neuro-CL tutorial\n", "\n", "* author: Chris Mungall\n", "* created: 2022-09-9\n", "\n", "This tutorial walks through the neuro-relevant subset of the [Cell Ontology](https://obofoundry.org/ontology/cl) (CL), the goals are:\n", "\n", "- to help understand the structure of CL\n", " - to show how CL uses relationships like *has-soma-location*\n", " - to show how CL relates to Uberon and other ontologies\n", "- to show how to do advanced OAK queries and visualization (CLI and programmatic) on CL\n", "- demonstrate rudimentary text annotation\n", "\n", "Running this notebook locally or on mybinder requires [0.1.41](https://github.com/INCATools/ontology-access-kit/releases/tag/v0.1.41) or higher\n" ] }, { "cell_type": "markdown", "id": "9aced0d9", "metadata": {}, "source": [ "## Create an alias\n", "\n", "For convenience we will set a bash alias.\n", "\n", "The first time you run this, a copy of cl.db is downloaded from S3, which may include a delay - for subsequent invocations,\n", "the cached copy will be used" ] }, { "cell_type": "code", "execution_count": 47, "id": "44bd10cd", "metadata": {}, "outputs": [], "source": [ "%alias cl runoak -i sqlite:obo:cl" ] }, { "cell_type": "markdown", "id": "09595dfc", "metadata": {}, "source": [ "### Basic lookup queries\n", "\n", "Let's check it's working. We will use the [info](https://incatools.github.io/ontology-access-kit/cli.html#runoak-info) command:" ] }, { "cell_type": "code", "execution_count": 4, "id": "81399308", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:0000540 ! neuron\r\n" ] } ], "source": [ "cl info neuron" ] }, { "cell_type": "markdown", "id": "4c8914a2", "metadata": {}, "source": [ "Next we will try a [simple lexical search](https://incatools.github.io/ontology-access-kit/intro/tutorial01.html#search).\n", "\n", "Here `l` means use labels and `~` means inexact (partial) matches.\n", "\n", "We will do a simple lexical search for GABAergic cortical interneurons:" ] }, { "cell_type": "code", "execution_count": 5, "id": "fb640127", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:0000617 ! GABAergic neuron\n", "CL:0010011 ! cerebral cortex GABAergic interneuron\n", "CL:0011005 ! GABAergic interneuron\n", "CL:4023007 ! L2/3 bipolar vip GABAergic cortical interneuron (Mmus)\n", "CL:4023010 ! alpha7 GABAergic cortical interneuron (Mmus)\n", "CL:4023011 ! lamp5 GABAergic cortical interneuron\n", "CL:4023014 ! L5 vip cortical GABAergic interneuron (Mmus)\n", "CL:4023015 ! sncg GABAergic cortical interneuron\n", "CL:4023016 ! vip GABAergic cortical interneuron\n", "CL:4023017 ! sst GABAergic cortical interneuron\n", "CL:4023018 ! pvalb GABAergic cortical interneuron\n", "CL:4023019 ! L5/6 cck, vip cortical GABAergic interneuron (Mmus)\n", "CL:4023022 ! canopy lamp5 GABAergic cortical interneuron (Mmus)\n", "CL:4023023 ! L5,6 neurogliaform lamp5 GABAergic cortical interneuron (Mmus)\n", "CL:4023024 ! neurogliaform lamp5 GABAergic cortical interneuron (Mmus)\n", "CL:4023025 ! long-range projecting sst GABAergic cortical interneuron (Mmus)\n", "CL:4023027 ! L5 T-Martinotti sst GABAergic cortical interneuron (Mmus)\n", "CL:4023028 ! L5 non-Martinotti sst GABAergic cortical interneuron (Mmus)\n", "CL:4023030 ! L2/3/5 fan Martinotti sst GABAergic cortical interneuron (Mmus)\n", "CL:4023031 ! L4 sst GABAergic cortical interneuron (Mmus)\n", "CL:4023034 ! obsolete L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)\n", "CL:4023036 ! chandelier pvalb GABAergic cortical interneuron\n", "CL:4023065 ! meis2 expressing cortical GABAergic cell\n", "CL:4023067 ! obsolete Martinotti morphology L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)\n", "CL:4023069 ! medial ganglionic eminence derived GABAergic cortical interneuron\n", "CL:4023070 ! caudal ganglionic eminence derived GABAergic cortical interneuron\n", "CL:4023071 ! L5/6 cck cortical GABAergic interneuron (Mmus)\n", "CL:4023075 ! L6 tyrosine hydroxylase sst GABAergic cortical interneuron (Mmus)\n", "CL:4023078 ! obsolete basket morphology L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)\n", "CL:4023106 ! obsolete meis2 expressing cortical GABAergic cell (Callithrix jacchus)\n", "CL:4023118 ! L5/6 non-Martinotti sst GABAergic cortical interneuron (Mmus)\n", "CL:4023121 ! sst chodl GABAergic cortical interneuron\n", "CL:4023122 ! oxytocin receptor sst GABAergic cortical interneuron\n", "GO:0021853 ! cerebral cortex GABAergic interneuron migration\n", "GO:0021892 ! cerebral cortex GABAergic interneuron differentiation\n", "GO:0021894 ! cerebral cortex GABAergic interneuron development\n", "GO:0032228 ! regulation of synaptic transmission, GABAergic\n", "GO:0032229 ! negative regulation of synaptic transmission, GABAergic\n", "GO:0032230 ! positive regulation of synaptic transmission, GABAergic\n", "GO:0051932 ! synaptic transmission, GABAergic\n", "GO:0097154 ! GABAergic neuron differentiation\n" ] } ], "source": [ "cl info \"l~GABAergic cortical interneuron\"" ] }, { "cell_type": "markdown", "id": "1d1240b1", "metadata": {}, "source": [ "Of course, there are more reliable ways to do this query than relying on string matching, but string searches\n", "can be useful for initial exploration.\n", "\n", "Note there are some GO terms in the matches. This is because the release version of CL includes portions of other ontologies like GO." ] }, { "cell_type": "markdown", "id": "f8b16045", "metadata": {}, "source": [ "## Exploring the structure of CL\n", "\n", "Next we will try exploring the graph structure of CL (see the glossary for [what we mean by graph structure](https://incatools.github.io/ontology-access-kit/glossary.html#term-Graph)).\n", "\n", "Here we are using the [relationships](https://incatools.github.io/ontology-access-kit/cli.html#runoak-relationships) command:" ] }, { "cell_type": "code", "execution_count": 16, "id": "1e122ab2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tpredicate\tobject\tsubject_label\tpredicate_label\tobject_label\r", "\r\n", "CL:4023014\tRO:0002100\tUBERON:0005394\tL5 vip cortical GABAergic interneuron (Mmus)\thas soma location\tcortical layer V\r", "\r\n", "CL:4023014\tRO:0002162\tNCBITaxon:10090\tL5 vip cortical GABAergic interneuron (Mmus)\tin taxon\tMus musculus\r", "\r\n", "CL:4023014\tRO:0002292\tPR:P32648\tL5 vip cortical GABAergic interneuron (Mmus)\texpresses\tVIP peptides (mouse)\r", "\r\n", "CL:4023014\trdfs:subClassOf\tCL:4023016\tL5 vip cortical GABAergic interneuron (Mmus)\tNone\tvip GABAergic cortical interneuron\r", "\r\n" ] } ], "source": [ "cl relationships CL:4023014" ] }, { "cell_type": "markdown", "id": "66be6c73", "metadata": {}, "source": [ "The output is a tab-separated table of relationships emanating from *L5 vip cortical GABAergic interneuron (Mmus)*.\n", "\n", "This doesn't look very pretty in the Jupyter interface. We will write a helper function here (of course, if running on the command line you could do other things to show the table)." ] }, { "cell_type": "code", "execution_count": 13, "id": "861d7e9c", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "def show(path=\"output/tmp.tsv\"):\n", " \"\"\"helper function to turn most recent TSV output into a dataframe\"\"\"\n", " return pd.read_csv(path, sep=\"\\t\")" ] }, { "cell_type": "code", "execution_count": 17, "id": "25fafa47", "metadata": {}, "outputs": [], "source": [ "cl relationships CL:4023014 -o output/tmp.tsv" ] }, { "cell_type": "code", "execution_count": 18, "id": "50d216b2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subjectpredicateobjectsubject_labelpredicate_labelobject_label
0CL:4023014RO:0002100UBERON:0005394L5 vip cortical GABAergic interneuron (Mmus)has soma locationcortical layer V
1CL:4023014RO:0002162NCBITaxon:10090L5 vip cortical GABAergic interneuron (Mmus)in taxonMus musculus
2CL:4023014RO:0002292PR:P32648L5 vip cortical GABAergic interneuron (Mmus)expressesVIP peptides (mouse)
3CL:4023014rdfs:subClassOfCL:4023016L5 vip cortical GABAergic interneuron (Mmus)Nonevip GABAergic cortical interneuron
\n", "
" ], "text/plain": [ " subject predicate object \\\n", "0 CL:4023014 RO:0002100 UBERON:0005394 \n", "1 CL:4023014 RO:0002162 NCBITaxon:10090 \n", "2 CL:4023014 RO:0002292 PR:P32648 \n", "3 CL:4023014 rdfs:subClassOf CL:4023016 \n", "\n", " subject_label predicate_label \\\n", "0 L5 vip cortical GABAergic interneuron (Mmus) has soma location \n", "1 L5 vip cortical GABAergic interneuron (Mmus) in taxon \n", "2 L5 vip cortical GABAergic interneuron (Mmus) expresses \n", "3 L5 vip cortical GABAergic interneuron (Mmus) None \n", "\n", " object_label \n", "0 cortical layer V \n", "1 Mus musculus \n", "2 VIP peptides (mouse) \n", "3 vip GABAergic cortical interneuron " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "show()" ] }, { "cell_type": "markdown", "id": "b41a07e8", "metadata": {}, "source": [ "This view is more readable. We can see that there are 4 edges for which the subject matches our query. *Edges can point to nodes outside CL*.\n", "\n", "Each edge can be read as a sentence - e.g.\n", "\n", " - L5 vip cortical GABAergic interneuron (Mmus) *has soma location*\tcortical layer V\"\n", "\n", "### Linking neurons to Uberon\n", "\n", "When connecting cell types to anatomy in Uberon, CL uses has-some-location rather than the stronger part-of. This is because as a general rule we can't make entire neurons part of specific regions, if those neurons have projections that overlap other areas.\n", "\n", "For more background, see:\n", "\n", "- A strategy for building neuroanatomy ontologies, Osumi-Sutherland et al https://doi.org/10.1093/bioinformatics/bts113\n", "\n", "### Transcriptomic classification of neurons\n", "\n", "Note that many newer cell types in CL may be types uncovered by RNAseq experiments and clustering. When these\n", "are captured in CL, we often link the cell type to a marker protein or gene via an *expresses* relationship.\n", "\n", "### Relationship query directionality\n", "\n", "By default, the `relationships` commands is in the \"up\" direction, i.e the query is matched to the edge subject.\n", "\n", "We can use `--direction` to get the \"down\" direction edges (i.e. the query is matched to the edge object), or \"both\".\n", "\n", "Let's try this with a more general **vip GABAergic cortical interneuron**" ] }, { "cell_type": "code", "execution_count": 23, "id": "18950258", "metadata": {}, "outputs": [], "source": [ "cl relationships CL:4023016 --direction both -o output/tmp.tsv" ] }, { "cell_type": "code", "execution_count": 24, "id": "574641f5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subjectpredicateobjectsubject_labelpredicate_labelobject_label
0CL:4023016RO:0002292PR:000017299vip GABAergic cortical interneuronexpressesVIP peptides
1CL:4023016rdfs:subClassOfCL:0010011vip GABAergic cortical interneuronNonecerebral cortex GABAergic interneuron
2CL:4023007rdfs:subClassOfCL:4023016L2/3 bipolar vip GABAergic cortical interneuro...Nonevip GABAergic cortical interneuron
3CL:4023014rdfs:subClassOfCL:4023016L5 vip cortical GABAergic interneuron (Mmus)Nonevip GABAergic cortical interneuron
4CL:4023019rdfs:subClassOfCL:4023016L5/6 cck, vip cortical GABAergic interneuron (...Nonevip GABAergic cortical interneuron
\n", "
" ], "text/plain": [ " subject predicate object \\\n", "0 CL:4023016 RO:0002292 PR:000017299 \n", "1 CL:4023016 rdfs:subClassOf CL:0010011 \n", "2 CL:4023007 rdfs:subClassOf CL:4023016 \n", "3 CL:4023014 rdfs:subClassOf CL:4023016 \n", "4 CL:4023019 rdfs:subClassOf CL:4023016 \n", "\n", " subject_label predicate_label \\\n", "0 vip GABAergic cortical interneuron expresses \n", "1 vip GABAergic cortical interneuron None \n", "2 L2/3 bipolar vip GABAergic cortical interneuro... None \n", "3 L5 vip cortical GABAergic interneuron (Mmus) None \n", "4 L5/6 cck, vip cortical GABAergic interneuron (... None \n", "\n", " object_label \n", "0 VIP peptides \n", "1 cerebral cortex GABAergic interneuron \n", "2 vip GABAergic cortical interneuron \n", "3 vip GABAergic cortical interneuron \n", "4 vip GABAergic cortical interneuron " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "show()" ] }, { "cell_type": "markdown", "id": "b79f675d", "metadata": {}, "source": [ "\n", "### Querying ancestors\n", "\n", "We will try finding all ancestors of CL:4023014\n", "\n", "__IMPORTANT__ in OAK, all graph commands are parameterized by predicate lists. Consult the OAK docs if you\n", "don't understand what this means!\n", "\n", "To find all is-a ancestors (i.e. ancestors following SubClassOf between named classes) we use `-p i`:" ] }, { "cell_type": "code", "execution_count": 25, "id": "90d3806c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BFO:0000002 ! continuant\r\n", "BFO:0000004 ! independent continuant\r\n", "BFO:0000040 ! material entity\r\n", "CARO:0000000 ! anatomical entity\r\n", "CARO:0030000 ! biological entity\r\n", "CL:0000000 ! cell\r\n", "CL:0000003 ! native cell\r\n", "CL:0000099 ! interneuron\r\n", "CL:0000117 ! CNS neuron (sensu Vertebrata)\r\n", "CL:0000151 ! secretory cell\r\n", "CL:0000161 ! acid secreting cell\r\n", "CL:0000211 ! electrically active cell\r\n", "CL:0000255 ! eukaryotic cell\r\n", "CL:0000393 ! electrically responsive cell\r\n", "CL:0000402 ! CNS interneuron\r\n", "CL:0000404 ! electrically signaling cell\r\n", "CL:0000498 ! inhibitory interneuron\r\n", "CL:0000540 ! neuron\r\n", "CL:0000548 ! animal cell\r\n", "CL:0000617 ! GABAergic neuron\r\n", "CL:0002319 ! neural cell\r\n", "CL:0002371 ! somatic cell\r\n", "CL:0008031 ! cortical interneuron\r\n", "CL:0010011 ! cerebral cortex GABAergic interneuron\r\n", "CL:0010012 ! cerebral cortex neuron\r\n", "CL:0011005 ! GABAergic interneuron\r\n", "CL:0012001 ! neuron of the forebrain\r\n", "CL:2000029 ! central nervous system neuron\r\n", "CL:4023014 ! L5 vip cortical GABAergic interneuron (Mmus)\r\n", "CL:4023016 ! vip GABAergic cortical interneuron\r\n" ] } ], "source": [ "cl ancestors -p i CL:4023014" ] }, { "cell_type": "markdown", "id": "b9b48cdc", "metadata": {}, "source": [ "We can also show this as a table in Jupyter:" ] }, { "cell_type": "code", "execution_count": 27, "id": "f07343f6", "metadata": {}, "outputs": [], "source": [ "cl ancestors -p i CL:4023014 -o output/tmp.tsv -O csv" ] }, { "cell_type": "code", "execution_count": 28, "id": "4907b1a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idlabel
0BFO:0000002continuant
1BFO:0000004independent continuant
2BFO:0000040material entity
3CARO:0000000anatomical entity
4CARO:0030000biological entity
5CL:0000000cell
6CL:0000003native cell
7CL:0000099interneuron
8CL:0000117CNS neuron (sensu Vertebrata)
9CL:0000151secretory cell
10CL:0000161acid secreting cell
11CL:0000211electrically active cell
12CL:0000255eukaryotic cell
13CL:0000393electrically responsive cell
14CL:0000402CNS interneuron
15CL:0000404electrically signaling cell
16CL:0000498inhibitory interneuron
17CL:0000540neuron
18CL:0000548animal cell
19CL:0000617GABAergic neuron
20CL:0002319neural cell
21CL:0002371somatic cell
22CL:0008031cortical interneuron
23CL:0010011cerebral cortex GABAergic interneuron
24CL:0010012cerebral cortex neuron
25CL:0011005GABAergic interneuron
26CL:0012001neuron of the forebrain
27CL:2000029central nervous system neuron
28CL:4023014L5 vip cortical GABAergic interneuron (Mmus)
29CL:4023016vip GABAergic cortical interneuron
\n", "
" ], "text/plain": [ " id label\n", "0 BFO:0000002 continuant\n", "1 BFO:0000004 independent continuant\n", "2 BFO:0000040 material entity\n", "3 CARO:0000000 anatomical entity\n", "4 CARO:0030000 biological entity\n", "5 CL:0000000 cell\n", "6 CL:0000003 native cell\n", "7 CL:0000099 interneuron\n", "8 CL:0000117 CNS neuron (sensu Vertebrata)\n", "9 CL:0000151 secretory cell\n", "10 CL:0000161 acid secreting cell\n", "11 CL:0000211 electrically active cell\n", "12 CL:0000255 eukaryotic cell\n", "13 CL:0000393 electrically responsive cell\n", "14 CL:0000402 CNS interneuron\n", "15 CL:0000404 electrically signaling cell\n", "16 CL:0000498 inhibitory interneuron\n", "17 CL:0000540 neuron\n", "18 CL:0000548 animal cell\n", "19 CL:0000617 GABAergic neuron\n", "20 CL:0002319 neural cell\n", "21 CL:0002371 somatic cell\n", "22 CL:0008031 cortical interneuron\n", "23 CL:0010011 cerebral cortex GABAergic interneuron\n", "24 CL:0010012 cerebral cortex neuron\n", "25 CL:0011005 GABAergic interneuron\n", "26 CL:0012001 neuron of the forebrain\n", "27 CL:2000029 central nervous system neuron\n", "28 CL:4023014 L5 vip cortical GABAergic interneuron (Mmus)\n", "29 CL:4023016 vip GABAergic cortical interneuron" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "show()" ] }, { "cell_type": "markdown", "id": "58f23581", "metadata": {}, "source": [ "## Visualization\n", "\n", "Next we will generate a visualization from this using the [viz](https://incatools.github.io/ontology-access-kit/cli.html#runoak-viz) command:" ] }, { "cell_type": "code", "execution_count": 31, "id": "c10dc41f", "metadata": {}, "outputs": [], "source": [ "cl viz -p i CL:4023014 -o output/CL_4023014.png" ] }, { "cell_type": "markdown", "id": "a354b6da", "metadata": {}, "source": [ "![img](output/CL_4023014.png)" ] }, { "cell_type": "markdown", "id": "216f2c87", "metadata": {}, "source": [ "\n", "### Other relationships\n", "\n", "The above visualization only shows the is-a structure of the ontology, we are missing other useful structural information.\n", "\n", "All OAK graphy commands are parameterized, let's include both part-of (for traversing within Uberon) and has-soma-location:\n" ] }, { "cell_type": "code", "execution_count": 32, "id": "c2a52d6e", "metadata": {}, "outputs": [], "source": [ "cl viz -p i,p,RO:0002100 CL:4023014 -o output/CL_4023014_with_uberon.png" ] }, { "cell_type": "markdown", "id": "be928d3a", "metadata": {}, "source": [ "The graph:" ] }, { "cell_type": "markdown", "id": "4dc87998", "metadata": {}, "source": [ "![img](output/CL_4023014_with_uberon.png)" ] }, { "cell_type": "markdown", "id": "b5d13f8f", "metadata": {}, "source": [ "This graph is a lot richer - and we are only seeing a subset of connections! In fact CL connects to NCBITaxon for taxon constraints, PRO for gene expression, GO for functional classification, ...\n", "\n", "Note we are using the default OAK stylesheet which colors CL in grey, UBERON in yellow, etc. For more info\n", "on visualization and stylesheets see [OboGraphViz](https://github.com/INCATools/obographviz/)" ] }, { "cell_type": "markdown", "id": "f29e49d9", "metadata": {}, "source": [ "## Relation graph tables\n", "\n", "Now we have seen graphs incorporating transitive closures of certain edge types, let's return to the `relationships` command.\n", "\n", "We will use the `--include-entailed` option to include entailed relations that have been computed using [relation-graph](https://github.com/balhoff/relation-graph)\n", "\n", "(note: this option won't work with all OAK adapters - for example, if you are using OAK to connect to an obo file or a remote sparql endpoint that doesn't support relation-graph. We recommend using either the sqlite backend, as in this tutorial, or ubergraph)" ] }, { "cell_type": "code", "execution_count": 34, "id": "d6b2bc93", "metadata": {}, "outputs": [], "source": [ "cl relationships CL:4023014 --include-entailed -o output/tmp.tsv" ] }, { "cell_type": "code", "execution_count": 35, "id": "a8ff5a41", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subjectpredicateobjectsubject_labelpredicate_labelobject_label
0CL:4023014BFO:0000050BFO:0000002L5 vip cortical GABAergic interneuron (Mmus)part ofcontinuant
1CL:4023014BFO:0000050BFO:0000004L5 vip cortical GABAergic interneuron (Mmus)part ofindependent continuant
2CL:4023014BFO:0000050BFO:0000040L5 vip cortical GABAergic interneuron (Mmus)part ofmaterial entity
3CL:4023014BFO:0000050CARO:0000000L5 vip cortical GABAergic interneuron (Mmus)part ofanatomical entity
4CL:4023014BFO:0000050CARO:0000006L5 vip cortical GABAergic interneuron (Mmus)part ofmaterial anatomical entity
.....................
655CL:4023014rdfs:subClassOfCL:0011005L5 vip cortical GABAergic interneuron (Mmus)NoneGABAergic interneuron
656CL:4023014rdfs:subClassOfCL:0012001L5 vip cortical GABAergic interneuron (Mmus)Noneneuron of the forebrain
657CL:4023014rdfs:subClassOfCL:2000029L5 vip cortical GABAergic interneuron (Mmus)Nonecentral nervous system neuron
658CL:4023014rdfs:subClassOfCL:4023014L5 vip cortical GABAergic interneuron (Mmus)NoneL5 vip cortical GABAergic interneuron (Mmus)
659CL:4023014rdfs:subClassOfCL:4023016L5 vip cortical GABAergic interneuron (Mmus)Nonevip GABAergic cortical interneuron
\n", "

660 rows × 6 columns

\n", "
" ], "text/plain": [ " subject predicate object \\\n", "0 CL:4023014 BFO:0000050 BFO:0000002 \n", "1 CL:4023014 BFO:0000050 BFO:0000004 \n", "2 CL:4023014 BFO:0000050 BFO:0000040 \n", "3 CL:4023014 BFO:0000050 CARO:0000000 \n", "4 CL:4023014 BFO:0000050 CARO:0000006 \n", ".. ... ... ... \n", "655 CL:4023014 rdfs:subClassOf CL:0011005 \n", "656 CL:4023014 rdfs:subClassOf CL:0012001 \n", "657 CL:4023014 rdfs:subClassOf CL:2000029 \n", "658 CL:4023014 rdfs:subClassOf CL:4023014 \n", "659 CL:4023014 rdfs:subClassOf CL:4023016 \n", "\n", " subject_label predicate_label \\\n", "0 L5 vip cortical GABAergic interneuron (Mmus) part of \n", "1 L5 vip cortical GABAergic interneuron (Mmus) part of \n", "2 L5 vip cortical GABAergic interneuron (Mmus) part of \n", "3 L5 vip cortical GABAergic interneuron (Mmus) part of \n", "4 L5 vip cortical GABAergic interneuron (Mmus) part of \n", ".. ... ... \n", "655 L5 vip cortical GABAergic interneuron (Mmus) None \n", "656 L5 vip cortical GABAergic interneuron (Mmus) None \n", "657 L5 vip cortical GABAergic interneuron (Mmus) None \n", "658 L5 vip cortical GABAergic interneuron (Mmus) None \n", "659 L5 vip cortical GABAergic interneuron (Mmus) None \n", "\n", " object_label \n", "0 continuant \n", "1 independent continuant \n", "2 material entity \n", "3 anatomical entity \n", "4 material anatomical entity \n", ".. ... \n", "655 GABAergic interneuron \n", "656 neuron of the forebrain \n", "657 central nervous system neuron \n", "658 L5 vip cortical GABAergic interneuron (Mmus) \n", "659 vip GABAergic cortical interneuron \n", "\n", "[660 rows x 6 columns]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "show()" ] }, { "cell_type": "markdown", "id": "6264a795", "metadata": {}, "source": [ "660 entailed relationships is quite a lot!\n", "\n", "Note a lot of these are quite trivial: every **L5 vip cortical GABAergic interneuron (Mmus)** is a *part of* SOME *material entity*. Duh!\n", "\n", "It's not expected that a typical user would inspect these large computed tables. Instead they are to be used\n", "\"behind the scenes\" in databases and applications - for example a gene expression database could use this table to answer questions like *what genes are expressed in the forebrain* by joining a direct *expresses* table with the relation-graph closure table, filtering on relationships like part-of or has-soma-location, or the weaker *overlaps*.\n", "\n", "Let's see what such queries might yield. First we will find the RO relationship for \"overlaps\":" ] }, { "cell_type": "code", "execution_count": 36, "id": "584fe1a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RO:0002131 ! overlaps\r\n" ] } ], "source": [ "cl info overlaps" ] }, { "cell_type": "markdown", "id": "26fa9e66", "metadata": {}, "source": [ "(remember, part of RO is distributed with CL).\n", "\n", "Next we will filter our entailed relationships, and we will query \"down\"wards, i.e. we are asking *what overlaps the amygdala*?" ] }, { "cell_type": "code", "execution_count": 40, "id": "6b28e86e", "metadata": {}, "outputs": [], "source": [ "cl relationships -p RO:0002131 \"olfactory bulb\" --direction down --include-entailed -o output/tmp.tsv" ] }, { "cell_type": "code", "execution_count": 41, "id": "9e265d69", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subjectpredicateobjectsubject_labelpredicate_labelobject_label
0CL:1001435RO:0002131UBERON:0002264periglomerular celloverlapsolfactory bulb
1CL:1001434RO:0002131UBERON:0002264olfactory bulb interneuronoverlapsolfactory bulb
2UBERON:0004001RO:0002131UBERON:0002264olfactory bulb layeroverlapsolfactory bulb
3CL:0000626RO:0002131UBERON:0002264olfactory granule celloverlapsolfactory bulb
4UBERON:0009950RO:0002131UBERON:0002264olfactory bulb plexiform layeroverlapsolfactory bulb
5UBERON:0005377RO:0002131UBERON:0002264olfactory bulb glomerular layeroverlapsolfactory bulb
6UBERON:0005376RO:0002131UBERON:0002264olfactory bulb external plexiform layeroverlapsolfactory bulb
7UBERON:0004186RO:0002131UBERON:0002264olfactory bulb mitral cell layeroverlapsolfactory bulb
8CL:1001502RO:0002131UBERON:0002264mitral celloverlapsolfactory bulb
9CL:1001503RO:0002131UBERON:0002264olfactory bulb tufted celloverlapsolfactory bulb
10UBERON:0034730RO:0002131UBERON:0002264olfactory tract linking bulb to ipsilateral do...overlapsolfactory bulb
11UBERON:2000238RO:0002131UBERON:0002264olfactory tract linking bulb to ipsilateral ve...overlapsolfactory bulb
12UBERON:0002264RO:0002131UBERON:0002264olfactory bulboverlapsolfactory bulb
13UBERON:0002265RO:0002131UBERON:0002264olfactory tractoverlapsolfactory bulb
\n", "
" ], "text/plain": [ " subject predicate object \\\n", "0 CL:1001435 RO:0002131 UBERON:0002264 \n", "1 CL:1001434 RO:0002131 UBERON:0002264 \n", "2 UBERON:0004001 RO:0002131 UBERON:0002264 \n", "3 CL:0000626 RO:0002131 UBERON:0002264 \n", "4 UBERON:0009950 RO:0002131 UBERON:0002264 \n", "5 UBERON:0005377 RO:0002131 UBERON:0002264 \n", "6 UBERON:0005376 RO:0002131 UBERON:0002264 \n", "7 UBERON:0004186 RO:0002131 UBERON:0002264 \n", "8 CL:1001502 RO:0002131 UBERON:0002264 \n", "9 CL:1001503 RO:0002131 UBERON:0002264 \n", "10 UBERON:0034730 RO:0002131 UBERON:0002264 \n", "11 UBERON:2000238 RO:0002131 UBERON:0002264 \n", "12 UBERON:0002264 RO:0002131 UBERON:0002264 \n", "13 UBERON:0002265 RO:0002131 UBERON:0002264 \n", "\n", " subject_label predicate_label \\\n", "0 periglomerular cell overlaps \n", "1 olfactory bulb interneuron overlaps \n", "2 olfactory bulb layer overlaps \n", "3 olfactory granule cell overlaps \n", "4 olfactory bulb plexiform layer overlaps \n", "5 olfactory bulb glomerular layer overlaps \n", "6 olfactory bulb external plexiform layer overlaps \n", "7 olfactory bulb mitral cell layer overlaps \n", "8 mitral cell overlaps \n", "9 olfactory bulb tufted cell overlaps \n", "10 olfactory tract linking bulb to ipsilateral do... overlaps \n", "11 olfactory tract linking bulb to ipsilateral ve... overlaps \n", "12 olfactory bulb overlaps \n", "13 olfactory tract overlaps \n", "\n", " object_label \n", "0 olfactory bulb \n", "1 olfactory bulb \n", "2 olfactory bulb \n", "3 olfactory bulb \n", "4 olfactory bulb \n", "5 olfactory bulb \n", "6 olfactory bulb \n", "7 olfactory bulb \n", "8 olfactory bulb \n", "9 olfactory bulb \n", "10 olfactory bulb \n", "11 olfactory bulb \n", "12 olfactory bulb \n", "13 olfactory bulb " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "show()" ] }, { "cell_type": "markdown", "id": "b805d8b0", "metadata": {}, "source": [ "### Complex queries\n", "\n", "We can also make use of entailed edges in complex boolean queries.\n", "\n", "The following query is an *intersection* (using `and`) syntax of\n", "\n", " - all things that overlap the **olfactory bulb**\n", " - all subtypes of **interneuron**" ] }, { "cell_type": "code", "execution_count": 42, "id": "c8a7ad11", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:1001435 ! periglomerular cell\r\n", "CL:1001434 ! olfactory bulb interneuron\r\n", "CL:1001502 ! mitral cell\r\n" ] } ], "source": [ "cl info .desc//p=RO:0002131 \"olfactory bulb\" .and .desc//p=i \"interneuron\"" ] }, { "cell_type": "markdown", "id": "183cf38c", "metadata": {}, "source": [ "### Pairwise term similarity\n", "\n", "Next we will explore the nascent semantic similarity functions in OAK\n", "\n", "Note that the data model and signatures may change slightly here in the future.\n", "\n", "Once again, it is important to understand how OAK handles graphs - all similarity methods are parameterized\n", "by predicate lists. Let's start with the simple case of is-a hierarchies.\n", "\n", "Here we will compare:\n", "\n", " - CL:1001435 ! periglomerular cell\n", " - CL:1001502 ! mitral cell" ] }, { "cell_type": "code", "execution_count": 44, "id": "c082b7b6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject_id: CL:1001435\r\n", "object_id: CL:1001502\r\n", "ancestor_id: CL:1001434\r\n", "ancestor_information_content: 13.47134302805148\r\n", "jaccard_similarity: 0.92\r\n", "phenodigm_score: 3.520459570256043\r\n" ] } ], "source": [ "cl similarity -p i CL:1001435 CL:1001502" ] }, { "cell_type": "markdown", "id": "85aa90d3", "metadata": {}, "source": [ "TODOs:\n", "\n", "- allow calculation of IC from background annotations\n", "- add an `--autolabel` option (other OAK commands have this)\n", "\n", "to see what the MRCA is:" ] }, { "cell_type": "code", "execution_count": 45, "id": "fe8e4fcc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CL:1001434 ! olfactory bulb interneuron\r\n" ] } ], "source": [ "cl info CL:1001434" ] }, { "cell_type": "markdown", "id": "c6d9359a", "metadata": {}, "source": [ "not surprising since we selected those terms based on the fact they are OB interneurons!\n", "\n", "### Using queries as inputs for similarity (advanced)\n", "\n", "Next we are going to explore a (randomly chosen) example - how similar are the neurons of two cortical layers?\n", "\n", "We will use `all-similarity`, which can take as input either:\n", "\n", "- two files containing term lists\n", "- two boolean queries, each resolving to a term list\n", "\n", "Similarity is then computed for the cross-product of the two lists:" ] }, { "cell_type": "code", "execution_count": 48, "id": "0090794f", "metadata": {}, "outputs": [], "source": [ "cl all-similarity -p i .desc//p=RO:0002131 \"cortical layer II/III\" .and .desc//p=i \"neuron\" @ .desc//p=RO:0002131 \"cortical layer V\" .and .desc//p=i \"neuron\" -o output/sim.png -O seaborn" ] }, { "cell_type": "markdown", "id": "84519654", "metadata": {}, "source": [ "![img](output/sim.png)" ] }, { "cell_type": "markdown", "id": "592ffeff", "metadata": {}, "source": [ "As can be seen, glutaminergic cells are more similar, etc" ] }, { "cell_type": "markdown", "id": "b8c63eab", "metadata": {}, "source": [ "## Text Mining\n", "\n", "Next we will use the [annotate](https://incatools.github.io/ontology-access-kit/cli.html#runoak-annotate) command to annotate some text\n", "\n", "Up until now we have been using the sqlite adaptor, but for this we will switch to the [bioportal adaptor](https://incatools.github.io/ontology-access-kit/implementations/bioportal.html)\n", "\n", "In future it will be possible to use plugins to combine your choice of adapter with different annotators, such as SciSpacy. For now bear in mind that bioportal gives wide coverage of ontologies but can have recall issues e.g. with plurals or different orthographic forms." ] }, { "cell_type": "code", "execution_count": 50, "id": "d7ca85aa", "metadata": {}, "outputs": [], "source": [ "%alias annotate runoak -i bioportal:cl annotate" ] }, { "cell_type": "code", "execution_count": 51, "id": "f5c47a31", "metadata": {}, "outputs": [], "source": [ "annotate \"olfactory bulb interneuron projects into amygdala\" -O csv -o output/tmp.tsv" ] }, { "cell_type": "code", "execution_count": 52, "id": "0ff98b0c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
predicate_idobject_idobject_labelobject_sourceconfidencematch_stringis_longest_matchmatches_whole_textmatch_typeinfosubject_startsubject_endsubject_labelsubject_sourcesubject_text_id
0NoneCL:1001434olfactory bulb interneuronhttps://data.bioontology.org/ontologies/CLNoneNoneNoneNonePREFNone126OLFACTORY BULB INTERNEURONNoneNone
1NoneUBERON:0002264olfactory bulbhttps://data.bioontology.org/ontologies/CLNoneNoneNoneNonePREFNone114OLFACTORY BULBNoneNone
2NoneUBERON:0001896medulla oblongatahttps://data.bioontology.org/ontologies/CLNoneNoneNoneNoneSYNNone1114BULBNoneNone
3NoneCL:0000099interneuronhttps://data.bioontology.org/ontologies/CLNoneNoneNoneNonePREFNone1626INTERNEURONNoneNone
4NoneUBERON:0001876amygdalahttps://data.bioontology.org/ontologies/CLNoneNoneNoneNonePREFNone4249AMYGDALANoneNone
\n", "
" ], "text/plain": [ " predicate_id object_id object_label \\\n", "0 None CL:1001434 olfactory bulb interneuron \n", "1 None UBERON:0002264 olfactory bulb \n", "2 None UBERON:0001896 medulla oblongata \n", "3 None CL:0000099 interneuron \n", "4 None UBERON:0001876 amygdala \n", "\n", " object_source confidence match_string \\\n", "0 https://data.bioontology.org/ontologies/CL None None \n", "1 https://data.bioontology.org/ontologies/CL None None \n", "2 https://data.bioontology.org/ontologies/CL None None \n", "3 https://data.bioontology.org/ontologies/CL None None \n", "4 https://data.bioontology.org/ontologies/CL None None \n", "\n", " is_longest_match matches_whole_text match_type info subject_start \\\n", "0 None None PREF None 1 \n", "1 None None PREF None 1 \n", "2 None None SYN None 11 \n", "3 None None PREF None 16 \n", "4 None None PREF None 42 \n", "\n", " subject_end subject_label subject_source subject_text_id \n", "0 26 OLFACTORY BULB INTERNEURON None None \n", "1 14 OLFACTORY BULB None None \n", "2 14 BULB None None \n", "3 26 INTERNEURON None None \n", "4 49 AMYGDALA None None " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "show()" ] }, { "cell_type": "code", "execution_count": null, "id": "86e73dcf", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }