Neuro-CL tutorial
author: Chris Mungall
created: 2022-09-9
This tutorial walks through the neuro-relevant subset of the Cell Ontology (CL), the goals are:
to help understand the structure of CL
to show how CL uses relationships like has-soma-location
to show how CL relates to Uberon and other ontologies
to show how to do advanced OAK queries and visualization (CLI and programmatic) on CL
demonstrate rudimentary text annotation
Running this notebook locally or on mybinder requires 0.1.41 or higher
Create an alias
For convenience we will set a bash alias.
The first time you run this, a copy of cl.db is downloaded from S3, which may include a delay - for subsequent invocations, the cached copy will be used
[47]:
%alias cl runoak -i sqlite:obo:cl
Basic lookup queries
Let’s check it’s working. We will use the info command:
[4]:
cl info neuron
CL:0000540 ! neuron
Next we will try a simple lexical search.
Here l
means use labels and ~
means inexact (partial) matches.
We will do a simple lexical search for GABAergic cortical interneurons:
[5]:
cl info "l~GABAergic cortical interneuron"
CL:0000617 ! GABAergic neuron
CL:0010011 ! cerebral cortex GABAergic interneuron
CL:0011005 ! GABAergic interneuron
CL:4023007 ! L2/3 bipolar vip GABAergic cortical interneuron (Mmus)
CL:4023010 ! alpha7 GABAergic cortical interneuron (Mmus)
CL:4023011 ! lamp5 GABAergic cortical interneuron
CL:4023014 ! L5 vip cortical GABAergic interneuron (Mmus)
CL:4023015 ! sncg GABAergic cortical interneuron
CL:4023016 ! vip GABAergic cortical interneuron
CL:4023017 ! sst GABAergic cortical interneuron
CL:4023018 ! pvalb GABAergic cortical interneuron
CL:4023019 ! L5/6 cck, vip cortical GABAergic interneuron (Mmus)
CL:4023022 ! canopy lamp5 GABAergic cortical interneuron (Mmus)
CL:4023023 ! L5,6 neurogliaform lamp5 GABAergic cortical interneuron (Mmus)
CL:4023024 ! neurogliaform lamp5 GABAergic cortical interneuron (Mmus)
CL:4023025 ! long-range projecting sst GABAergic cortical interneuron (Mmus)
CL:4023027 ! L5 T-Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023028 ! L5 non-Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023030 ! L2/3/5 fan Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023031 ! L4 sst GABAergic cortical interneuron (Mmus)
CL:4023034 ! obsolete L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)
CL:4023036 ! chandelier pvalb GABAergic cortical interneuron
CL:4023065 ! meis2 expressing cortical GABAergic cell
CL:4023067 ! obsolete Martinotti morphology L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)
CL:4023069 ! medial ganglionic eminence derived GABAergic cortical interneuron
CL:4023070 ! caudal ganglionic eminence derived GABAergic cortical interneuron
CL:4023071 ! L5/6 cck cortical GABAergic interneuron (Mmus)
CL:4023075 ! L6 tyrosine hydroxylase sst GABAergic cortical interneuron (Mmus)
CL:4023078 ! obsolete basket morphology L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)
CL:4023106 ! obsolete meis2 expressing cortical GABAergic cell (Callithrix jacchus)
CL:4023118 ! L5/6 non-Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023121 ! sst chodl GABAergic cortical interneuron
CL:4023122 ! oxytocin receptor sst GABAergic cortical interneuron
GO:0021853 ! cerebral cortex GABAergic interneuron migration
GO:0021892 ! cerebral cortex GABAergic interneuron differentiation
GO:0021894 ! cerebral cortex GABAergic interneuron development
GO:0032228 ! regulation of synaptic transmission, GABAergic
GO:0032229 ! negative regulation of synaptic transmission, GABAergic
GO:0032230 ! positive regulation of synaptic transmission, GABAergic
GO:0051932 ! synaptic transmission, GABAergic
GO:0097154 ! GABAergic neuron differentiation
Of course, there are more reliable ways to do this query than relying on string matching, but string searches can be useful for initial exploration.
Note there are some GO terms in the matches. This is because the release version of CL includes portions of other ontologies like GO.
Exploring the structure of CL
Next we will try exploring the graph structure of CL (see the glossary for what we mean by graph structure).
Here we are using the relationships command:
[16]:
cl relationships CL:4023014
The output is a tab-separated table of relationships emanating from L5 vip cortical GABAergic interneuron (Mmus).
This doesn’t look very pretty in the Jupyter interface. We will write a helper function here (of course, if running on the command line you could do other things to show the table).
[13]:
import pandas as pd
def show(path="output/tmp.tsv"):
"""helper function to turn most recent TSV output into a dataframe"""
return pd.read_csv(path, sep="\t")
[17]:
cl relationships CL:4023014 -o output/tmp.tsv
[18]:
show()
[18]:
subject | predicate | object | subject_label | predicate_label | object_label | |
---|---|---|---|---|---|---|
0 | CL:4023014 | RO:0002100 | UBERON:0005394 | L5 vip cortical GABAergic interneuron (Mmus) | has soma location | cortical layer V |
1 | CL:4023014 | RO:0002162 | NCBITaxon:10090 | L5 vip cortical GABAergic interneuron (Mmus) | in taxon | Mus musculus |
2 | CL:4023014 | RO:0002292 | PR:P32648 | L5 vip cortical GABAergic interneuron (Mmus) | expresses | VIP peptides (mouse) |
3 | CL:4023014 | rdfs:subClassOf | CL:4023016 | L5 vip cortical GABAergic interneuron (Mmus) | None | vip GABAergic cortical interneuron |
This view is more readable. We can see that there are 4 edges for which the subject matches our query. Edges can point to nodes outside CL.
Each edge can be read as a sentence - e.g.
L5 vip cortical GABAergic interneuron (Mmus) has soma location cortical layer V”
Linking neurons to Uberon
When connecting cell types to anatomy in Uberon, CL uses has-some-location rather than the stronger part-of. This is because as a general rule we can’t make entire neurons part of specific regions, if those neurons have projections that overlap other areas.
For more background, see:
A strategy for building neuroanatomy ontologies, Osumi-Sutherland et al https://doi.org/10.1093/bioinformatics/bts113
Transcriptomic classification of neurons
Note that many newer cell types in CL may be types uncovered by RNAseq experiments and clustering. When these are captured in CL, we often link the cell type to a marker protein or gene via an expresses relationship.
Relationship query directionality
By default, the relationships
commands is in the “up” direction, i.e the query is matched to the edge subject.
We can use --direction
to get the “down” direction edges (i.e. the query is matched to the edge object), or “both”.
Let’s try this with a more general vip GABAergic cortical interneuron
[23]:
cl relationships CL:4023016 --direction both -o output/tmp.tsv
[24]:
show()
[24]:
subject | predicate | object | subject_label | predicate_label | object_label | |
---|---|---|---|---|---|---|
0 | CL:4023016 | RO:0002292 | PR:000017299 | vip GABAergic cortical interneuron | expresses | VIP peptides |
1 | CL:4023016 | rdfs:subClassOf | CL:0010011 | vip GABAergic cortical interneuron | None | cerebral cortex GABAergic interneuron |
2 | CL:4023007 | rdfs:subClassOf | CL:4023016 | L2/3 bipolar vip GABAergic cortical interneuro... | None | vip GABAergic cortical interneuron |
3 | CL:4023014 | rdfs:subClassOf | CL:4023016 | L5 vip cortical GABAergic interneuron (Mmus) | None | vip GABAergic cortical interneuron |
4 | CL:4023019 | rdfs:subClassOf | CL:4023016 | L5/6 cck, vip cortical GABAergic interneuron (... | None | vip GABAergic cortical interneuron |
Querying ancestors
We will try finding all ancestors of CL:4023014
IMPORTANT in OAK, all graph commands are parameterized by predicate lists. Consult the OAK docs if you don’t understand what this means!
To find all is-a ancestors (i.e. ancestors following SubClassOf between named classes) we use -p i
:
[25]:
cl ancestors -p i CL:4023014
BFO:0000002 ! continuant
BFO:0000004 ! independent continuant
BFO:0000040 ! material entity
CARO:0000000 ! anatomical entity
CARO:0030000 ! biological entity
CL:0000000 ! cell
CL:0000003 ! native cell
CL:0000099 ! interneuron
CL:0000117 ! CNS neuron (sensu Vertebrata)
CL:0000151 ! secretory cell
CL:0000161 ! acid secreting cell
CL:0000211 ! electrically active cell
CL:0000255 ! eukaryotic cell
CL:0000393 ! electrically responsive cell
CL:0000402 ! CNS interneuron
CL:0000404 ! electrically signaling cell
CL:0000498 ! inhibitory interneuron
CL:0000540 ! neuron
CL:0000548 ! animal cell
CL:0000617 ! GABAergic neuron
CL:0002319 ! neural cell
CL:0002371 ! somatic cell
CL:0008031 ! cortical interneuron
CL:0010011 ! cerebral cortex GABAergic interneuron
CL:0010012 ! cerebral cortex neuron
CL:0011005 ! GABAergic interneuron
CL:0012001 ! neuron of the forebrain
CL:2000029 ! central nervous system neuron
CL:4023014 ! L5 vip cortical GABAergic interneuron (Mmus)
CL:4023016 ! vip GABAergic cortical interneuron
We can also show this as a table in Jupyter:
[27]:
cl ancestors -p i CL:4023014 -o output/tmp.tsv -O csv
[28]:
show()
[28]:
id | label | |
---|---|---|
0 | BFO:0000002 | continuant |
1 | BFO:0000004 | independent continuant |
2 | BFO:0000040 | material entity |
3 | CARO:0000000 | anatomical entity |
4 | CARO:0030000 | biological entity |
5 | CL:0000000 | cell |
6 | CL:0000003 | native cell |
7 | CL:0000099 | interneuron |
8 | CL:0000117 | CNS neuron (sensu Vertebrata) |
9 | CL:0000151 | secretory cell |
10 | CL:0000161 | acid secreting cell |
11 | CL:0000211 | electrically active cell |
12 | CL:0000255 | eukaryotic cell |
13 | CL:0000393 | electrically responsive cell |
14 | CL:0000402 | CNS interneuron |
15 | CL:0000404 | electrically signaling cell |
16 | CL:0000498 | inhibitory interneuron |
17 | CL:0000540 | neuron |
18 | CL:0000548 | animal cell |
19 | CL:0000617 | GABAergic neuron |
20 | CL:0002319 | neural cell |
21 | CL:0002371 | somatic cell |
22 | CL:0008031 | cortical interneuron |
23 | CL:0010011 | cerebral cortex GABAergic interneuron |
24 | CL:0010012 | cerebral cortex neuron |
25 | CL:0011005 | GABAergic interneuron |
26 | CL:0012001 | neuron of the forebrain |
27 | CL:2000029 | central nervous system neuron |
28 | CL:4023014 | L5 vip cortical GABAergic interneuron (Mmus) |
29 | CL:4023016 | vip GABAergic cortical interneuron |
Visualization
Next we will generate a visualization from this using the viz command:
[31]:
cl viz -p i CL:4023014 -o output/CL_4023014.png
Other relationships
The above visualization only shows the is-a structure of the ontology, we are missing other useful structural information.
All OAK graphy commands are parameterized, let’s include both part-of (for traversing within Uberon) and has-soma-location:
[32]:
cl viz -p i,p,RO:0002100 CL:4023014 -o output/CL_4023014_with_uberon.png
The graph:
This graph is a lot richer - and we are only seeing a subset of connections! In fact CL connects to NCBITaxon for taxon constraints, PRO for gene expression, GO for functional classification, …
Note we are using the default OAK stylesheet which colors CL in grey, UBERON in yellow, etc. For more info on visualization and stylesheets see OboGraphViz
Relation graph tables
Now we have seen graphs incorporating transitive closures of certain edge types, let’s return to the relationships
command.
We will use the --include-entailed
option to include entailed relations that have been computed using relation-graph
(note: this option won’t work with all OAK adapters - for example, if you are using OAK to connect to an obo file or a remote sparql endpoint that doesn’t support relation-graph. We recommend using either the sqlite backend, as in this tutorial, or ubergraph)
[34]:
cl relationships CL:4023014 --include-entailed -o output/tmp.tsv
[35]:
show()
[35]:
subject | predicate | object | subject_label | predicate_label | object_label | |
---|---|---|---|---|---|---|
0 | CL:4023014 | BFO:0000050 | BFO:0000002 | L5 vip cortical GABAergic interneuron (Mmus) | part of | continuant |
1 | CL:4023014 | BFO:0000050 | BFO:0000004 | L5 vip cortical GABAergic interneuron (Mmus) | part of | independent continuant |
2 | CL:4023014 | BFO:0000050 | BFO:0000040 | L5 vip cortical GABAergic interneuron (Mmus) | part of | material entity |
3 | CL:4023014 | BFO:0000050 | CARO:0000000 | L5 vip cortical GABAergic interneuron (Mmus) | part of | anatomical entity |
4 | CL:4023014 | BFO:0000050 | CARO:0000006 | L5 vip cortical GABAergic interneuron (Mmus) | part of | material anatomical entity |
... | ... | ... | ... | ... | ... | ... |
655 | CL:4023014 | rdfs:subClassOf | CL:0011005 | L5 vip cortical GABAergic interneuron (Mmus) | None | GABAergic interneuron |
656 | CL:4023014 | rdfs:subClassOf | CL:0012001 | L5 vip cortical GABAergic interneuron (Mmus) | None | neuron of the forebrain |
657 | CL:4023014 | rdfs:subClassOf | CL:2000029 | L5 vip cortical GABAergic interneuron (Mmus) | None | central nervous system neuron |
658 | CL:4023014 | rdfs:subClassOf | CL:4023014 | L5 vip cortical GABAergic interneuron (Mmus) | None | L5 vip cortical GABAergic interneuron (Mmus) |
659 | CL:4023014 | rdfs:subClassOf | CL:4023016 | L5 vip cortical GABAergic interneuron (Mmus) | None | vip GABAergic cortical interneuron |
660 rows × 6 columns
660 entailed relationships is quite a lot!
Note a lot of these are quite trivial: every L5 vip cortical GABAergic interneuron (Mmus) is a part of SOME material entity. Duh!
It’s not expected that a typical user would inspect these large computed tables. Instead they are to be used “behind the scenes” in databases and applications - for example a gene expression database could use this table to answer questions like what genes are expressed in the forebrain by joining a direct expresses table with the relation-graph closure table, filtering on relationships like part-of or has-soma-location, or the weaker overlaps.
Let’s see what such queries might yield. First we will find the RO relationship for “overlaps”:
[36]:
cl info overlaps
RO:0002131 ! overlaps
(remember, part of RO is distributed with CL).
Next we will filter our entailed relationships, and we will query “down”wards, i.e. we are asking what overlaps the amygdala?
[40]:
cl relationships -p RO:0002131 "olfactory bulb" --direction down --include-entailed -o output/tmp.tsv
[41]:
show()
[41]:
subject | predicate | object | subject_label | predicate_label | object_label | |
---|---|---|---|---|---|---|
0 | CL:1001435 | RO:0002131 | UBERON:0002264 | periglomerular cell | overlaps | olfactory bulb |
1 | CL:1001434 | RO:0002131 | UBERON:0002264 | olfactory bulb interneuron | overlaps | olfactory bulb |
2 | UBERON:0004001 | RO:0002131 | UBERON:0002264 | olfactory bulb layer | overlaps | olfactory bulb |
3 | CL:0000626 | RO:0002131 | UBERON:0002264 | olfactory granule cell | overlaps | olfactory bulb |
4 | UBERON:0009950 | RO:0002131 | UBERON:0002264 | olfactory bulb plexiform layer | overlaps | olfactory bulb |
5 | UBERON:0005377 | RO:0002131 | UBERON:0002264 | olfactory bulb glomerular layer | overlaps | olfactory bulb |
6 | UBERON:0005376 | RO:0002131 | UBERON:0002264 | olfactory bulb external plexiform layer | overlaps | olfactory bulb |
7 | UBERON:0004186 | RO:0002131 | UBERON:0002264 | olfactory bulb mitral cell layer | overlaps | olfactory bulb |
8 | CL:1001502 | RO:0002131 | UBERON:0002264 | mitral cell | overlaps | olfactory bulb |
9 | CL:1001503 | RO:0002131 | UBERON:0002264 | olfactory bulb tufted cell | overlaps | olfactory bulb |
10 | UBERON:0034730 | RO:0002131 | UBERON:0002264 | olfactory tract linking bulb to ipsilateral do... | overlaps | olfactory bulb |
11 | UBERON:2000238 | RO:0002131 | UBERON:0002264 | olfactory tract linking bulb to ipsilateral ve... | overlaps | olfactory bulb |
12 | UBERON:0002264 | RO:0002131 | UBERON:0002264 | olfactory bulb | overlaps | olfactory bulb |
13 | UBERON:0002265 | RO:0002131 | UBERON:0002264 | olfactory tract | overlaps | olfactory bulb |
Complex queries
We can also make use of entailed edges in complex boolean queries.
The following query is an intersection (using and
) syntax of
all things that overlap the olfactory bulb
all subtypes of interneuron
[42]:
cl info .desc//p=RO:0002131 "olfactory bulb" .and .desc//p=i "interneuron"
CL:1001435 ! periglomerular cell
CL:1001434 ! olfactory bulb interneuron
CL:1001502 ! mitral cell
Pairwise term similarity
Next we will explore the nascent semantic similarity functions in OAK
Note that the data model and signatures may change slightly here in the future.
Once again, it is important to understand how OAK handles graphs - all similarity methods are parameterized by predicate lists. Let’s start with the simple case of is-a hierarchies.
Here we will compare:
CL:1001435 ! periglomerular cell
CL:1001502 ! mitral cell
[44]:
cl similarity -p i CL:1001435 CL:1001502
subject_id: CL:1001435
object_id: CL:1001502
ancestor_id: CL:1001434
ancestor_information_content: 13.47134302805148
jaccard_similarity: 0.92
phenodigm_score: 3.520459570256043
TODOs:
allow calculation of IC from background annotations
add an
--autolabel
option (other OAK commands have this)
to see what the MRCA is:
[45]:
cl info CL:1001434
CL:1001434 ! olfactory bulb interneuron
not surprising since we selected those terms based on the fact they are OB interneurons!
Using queries as inputs for similarity (advanced)
Next we are going to explore a (randomly chosen) example - how similar are the neurons of two cortical layers?
We will use all-similarity
, which can take as input either:
two files containing term lists
two boolean queries, each resolving to a term list
Similarity is then computed for the cross-product of the two lists:
[48]:
cl all-similarity -p i .desc//p=RO:0002131 "cortical layer II/III" .and .desc//p=i "neuron" @ .desc//p=RO:0002131 "cortical layer V" .and .desc//p=i "neuron" -o output/sim.png -O seaborn
As can be seen, glutaminergic cells are more similar, etc
Text Mining
Next we will use the annotate command to annotate some text
Up until now we have been using the sqlite adaptor, but for this we will switch to the bioportal adaptor
In future it will be possible to use plugins to combine your choice of adapter with different annotators, such as SciSpacy. For now bear in mind that bioportal gives wide coverage of ontologies but can have recall issues e.g. with plurals or different orthographic forms.
[50]:
%alias annotate runoak -i bioportal:cl annotate
[51]:
annotate "olfactory bulb interneuron projects into amygdala" -O csv -o output/tmp.tsv
[52]:
show()
[52]:
predicate_id | object_id | object_label | object_source | confidence | match_string | is_longest_match | matches_whole_text | match_type | info | subject_start | subject_end | subject_label | subject_source | subject_text_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | CL:1001434 | olfactory bulb interneuron | https://data.bioontology.org/ontologies/CL | None | None | None | None | PREF | None | 1 | 26 | OLFACTORY BULB INTERNEURON | None | None |
1 | None | UBERON:0002264 | olfactory bulb | https://data.bioontology.org/ontologies/CL | None | None | None | None | PREF | None | 1 | 14 | OLFACTORY BULB | None | None |
2 | None | UBERON:0001896 | medulla oblongata | https://data.bioontology.org/ontologies/CL | None | None | None | None | SYN | None | 11 | 14 | BULB | None | None |
3 | None | CL:0000099 | interneuron | https://data.bioontology.org/ontologies/CL | None | None | None | None | PREF | None | 16 | 26 | INTERNEURON | None | None |
4 | None | UBERON:0001876 | amygdala | https://data.bioontology.org/ontologies/CL | None | None | None | None | PREF | None | 42 | 49 | AMYGDALA | None | None |
[ ]: