Neuro-CL tutorial

  • author: Chris Mungall

  • created: 2022-09-9

This tutorial walks through the neuro-relevant subset of the Cell Ontology (CL), the goals are:

  • to help understand the structure of CL

    • to show how CL uses relationships like has-soma-location

    • to show how CL relates to Uberon and other ontologies

  • to show how to do advanced OAK queries and visualization (CLI and programmatic) on CL

  • demonstrate rudimentary text annotation

Running this notebook locally or on mybinder requires 0.1.41 or higher

Create an alias

For convenience we will set a bash alias.

The first time you run this, a copy of cl.db is downloaded from S3, which may include a delay - for subsequent invocations, the cached copy will be used

[47]:
%alias cl runoak -i sqlite:obo:cl

Basic lookup queries

Let’s check it’s working. We will use the info command:

[4]:
cl info neuron
CL:0000540 ! neuron

Next we will try a simple lexical search.

Here l means use labels and ~ means inexact (partial) matches.

We will do a simple lexical search for GABAergic cortical interneurons:

[5]:
cl info "l~GABAergic cortical interneuron"
CL:0000617 ! GABAergic neuron
CL:0010011 ! cerebral cortex GABAergic interneuron
CL:0011005 ! GABAergic interneuron
CL:4023007 ! L2/3 bipolar vip GABAergic cortical interneuron (Mmus)
CL:4023010 ! alpha7 GABAergic cortical interneuron (Mmus)
CL:4023011 ! lamp5 GABAergic cortical interneuron
CL:4023014 ! L5 vip cortical GABAergic interneuron (Mmus)
CL:4023015 ! sncg GABAergic cortical interneuron
CL:4023016 ! vip GABAergic cortical interneuron
CL:4023017 ! sst GABAergic cortical interneuron
CL:4023018 ! pvalb GABAergic cortical interneuron
CL:4023019 ! L5/6 cck, vip cortical GABAergic interneuron (Mmus)
CL:4023022 ! canopy lamp5 GABAergic cortical interneuron (Mmus)
CL:4023023 ! L5,6 neurogliaform lamp5 GABAergic cortical interneuron (Mmus)
CL:4023024 ! neurogliaform lamp5 GABAergic cortical interneuron (Mmus)
CL:4023025 ! long-range projecting sst GABAergic cortical interneuron (Mmus)
CL:4023027 ! L5 T-Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023028 ! L5 non-Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023030 ! L2/3/5 fan Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023031 ! L4 sst GABAergic cortical interneuron (Mmus)
CL:4023034 ! obsolete L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)
CL:4023036 ! chandelier pvalb GABAergic cortical interneuron
CL:4023065 ! meis2 expressing cortical GABAergic cell
CL:4023067 ! obsolete Martinotti morphology L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)
CL:4023069 ! medial ganglionic eminence derived GABAergic cortical interneuron
CL:4023070 ! caudal ganglionic eminence derived GABAergic cortical interneuron
CL:4023071 ! L5/6 cck cortical GABAergic interneuron (Mmus)
CL:4023075 ! L6 tyrosine hydroxylase sst GABAergic cortical interneuron (Mmus)
CL:4023078 ! obsolete basket morphology L2/3 pvalb-like sst GABAergic cortical interneuron (Mus musculus)
CL:4023106 ! obsolete meis2 expressing cortical GABAergic cell (Callithrix jacchus)
CL:4023118 ! L5/6 non-Martinotti sst GABAergic cortical interneuron (Mmus)
CL:4023121 ! sst chodl GABAergic cortical interneuron
CL:4023122 ! oxytocin receptor sst GABAergic cortical interneuron
GO:0021853 ! cerebral cortex GABAergic interneuron migration
GO:0021892 ! cerebral cortex GABAergic interneuron differentiation
GO:0021894 ! cerebral cortex GABAergic interneuron development
GO:0032228 ! regulation of synaptic transmission, GABAergic
GO:0032229 ! negative regulation of synaptic transmission, GABAergic
GO:0032230 ! positive regulation of synaptic transmission, GABAergic
GO:0051932 ! synaptic transmission, GABAergic
GO:0097154 ! GABAergic neuron differentiation

Of course, there are more reliable ways to do this query than relying on string matching, but string searches can be useful for initial exploration.

Note there are some GO terms in the matches. This is because the release version of CL includes portions of other ontologies like GO.

Exploring the structure of CL

Next we will try exploring the graph structure of CL (see the glossary for what we mean by graph structure).

Here we are using the relationships command:

[16]:
cl relationships CL:4023014





The output is a tab-separated table of relationships emanating from L5 vip cortical GABAergic interneuron (Mmus).

This doesn’t look very pretty in the Jupyter interface. We will write a helper function here (of course, if running on the command line you could do other things to show the table).

[13]:
import pandas as pd
def show(path="output/tmp.tsv"):
    """helper function to turn most recent TSV output into a dataframe"""
    return pd.read_csv(path, sep="\t")
[17]:
cl relationships CL:4023014 -o output/tmp.tsv
[18]:
show()
[18]:
subject predicate object subject_label predicate_label object_label
0 CL:4023014 RO:0002100 UBERON:0005394 L5 vip cortical GABAergic interneuron (Mmus) has soma location cortical layer V
1 CL:4023014 RO:0002162 NCBITaxon:10090 L5 vip cortical GABAergic interneuron (Mmus) in taxon Mus musculus
2 CL:4023014 RO:0002292 PR:P32648 L5 vip cortical GABAergic interneuron (Mmus) expresses VIP peptides (mouse)
3 CL:4023014 rdfs:subClassOf CL:4023016 L5 vip cortical GABAergic interneuron (Mmus) None vip GABAergic cortical interneuron

This view is more readable. We can see that there are 4 edges for which the subject matches our query. Edges can point to nodes outside CL.

Each edge can be read as a sentence - e.g.

  • L5 vip cortical GABAergic interneuron (Mmus) has soma location cortical layer V”

Linking neurons to Uberon

When connecting cell types to anatomy in Uberon, CL uses has-some-location rather than the stronger part-of. This is because as a general rule we can’t make entire neurons part of specific regions, if those neurons have projections that overlap other areas.

For more background, see:

Transcriptomic classification of neurons

Note that many newer cell types in CL may be types uncovered by RNAseq experiments and clustering. When these are captured in CL, we often link the cell type to a marker protein or gene via an expresses relationship.

Relationship query directionality

By default, the relationships commands is in the “up” direction, i.e the query is matched to the edge subject.

We can use --direction to get the “down” direction edges (i.e. the query is matched to the edge object), or “both”.

Let’s try this with a more general vip GABAergic cortical interneuron

[23]:
cl relationships CL:4023016 --direction both -o output/tmp.tsv
[24]:
show()
[24]:
subject predicate object subject_label predicate_label object_label
0 CL:4023016 RO:0002292 PR:000017299 vip GABAergic cortical interneuron expresses VIP peptides
1 CL:4023016 rdfs:subClassOf CL:0010011 vip GABAergic cortical interneuron None cerebral cortex GABAergic interneuron
2 CL:4023007 rdfs:subClassOf CL:4023016 L2/3 bipolar vip GABAergic cortical interneuro... None vip GABAergic cortical interneuron
3 CL:4023014 rdfs:subClassOf CL:4023016 L5 vip cortical GABAergic interneuron (Mmus) None vip GABAergic cortical interneuron
4 CL:4023019 rdfs:subClassOf CL:4023016 L5/6 cck, vip cortical GABAergic interneuron (... None vip GABAergic cortical interneuron

Querying ancestors

We will try finding all ancestors of CL:4023014

IMPORTANT in OAK, all graph commands are parameterized by predicate lists. Consult the OAK docs if you don’t understand what this means!

To find all is-a ancestors (i.e. ancestors following SubClassOf between named classes) we use -p i:

[25]:
cl ancestors -p i CL:4023014
BFO:0000002 ! continuant
BFO:0000004 ! independent continuant
BFO:0000040 ! material entity
CARO:0000000 ! anatomical entity
CARO:0030000 ! biological entity
CL:0000000 ! cell
CL:0000003 ! native cell
CL:0000099 ! interneuron
CL:0000117 ! CNS neuron (sensu Vertebrata)
CL:0000151 ! secretory cell
CL:0000161 ! acid secreting cell
CL:0000211 ! electrically active cell
CL:0000255 ! eukaryotic cell
CL:0000393 ! electrically responsive cell
CL:0000402 ! CNS interneuron
CL:0000404 ! electrically signaling cell
CL:0000498 ! inhibitory interneuron
CL:0000540 ! neuron
CL:0000548 ! animal cell
CL:0000617 ! GABAergic neuron
CL:0002319 ! neural cell
CL:0002371 ! somatic cell
CL:0008031 ! cortical interneuron
CL:0010011 ! cerebral cortex GABAergic interneuron
CL:0010012 ! cerebral cortex neuron
CL:0011005 ! GABAergic interneuron
CL:0012001 ! neuron of the forebrain
CL:2000029 ! central nervous system neuron
CL:4023014 ! L5 vip cortical GABAergic interneuron (Mmus)
CL:4023016 ! vip GABAergic cortical interneuron

We can also show this as a table in Jupyter:

[27]:
cl ancestors -p i CL:4023014 -o output/tmp.tsv -O csv
[28]:
show()
[28]:
id label
0 BFO:0000002 continuant
1 BFO:0000004 independent continuant
2 BFO:0000040 material entity
3 CARO:0000000 anatomical entity
4 CARO:0030000 biological entity
5 CL:0000000 cell
6 CL:0000003 native cell
7 CL:0000099 interneuron
8 CL:0000117 CNS neuron (sensu Vertebrata)
9 CL:0000151 secretory cell
10 CL:0000161 acid secreting cell
11 CL:0000211 electrically active cell
12 CL:0000255 eukaryotic cell
13 CL:0000393 electrically responsive cell
14 CL:0000402 CNS interneuron
15 CL:0000404 electrically signaling cell
16 CL:0000498 inhibitory interneuron
17 CL:0000540 neuron
18 CL:0000548 animal cell
19 CL:0000617 GABAergic neuron
20 CL:0002319 neural cell
21 CL:0002371 somatic cell
22 CL:0008031 cortical interneuron
23 CL:0010011 cerebral cortex GABAergic interneuron
24 CL:0010012 cerebral cortex neuron
25 CL:0011005 GABAergic interneuron
26 CL:0012001 neuron of the forebrain
27 CL:2000029 central nervous system neuron
28 CL:4023014 L5 vip cortical GABAergic interneuron (Mmus)
29 CL:4023016 vip GABAergic cortical interneuron

Visualization

Next we will generate a visualization from this using the viz command:

[31]:
cl viz -p i CL:4023014 -o output/CL_4023014.png

img

Other relationships

The above visualization only shows the is-a structure of the ontology, we are missing other useful structural information.

All OAK graphy commands are parameterized, let’s include both part-of (for traversing within Uberon) and has-soma-location:

[32]:
cl viz -p i,p,RO:0002100 CL:4023014 -o output/CL_4023014_with_uberon.png

The graph:

img

This graph is a lot richer - and we are only seeing a subset of connections! In fact CL connects to NCBITaxon for taxon constraints, PRO for gene expression, GO for functional classification, …

Note we are using the default OAK stylesheet which colors CL in grey, UBERON in yellow, etc. For more info on visualization and stylesheets see OboGraphViz

Relation graph tables

Now we have seen graphs incorporating transitive closures of certain edge types, let’s return to the relationships command.

We will use the --include-entailed option to include entailed relations that have been computed using relation-graph

(note: this option won’t work with all OAK adapters - for example, if you are using OAK to connect to an obo file or a remote sparql endpoint that doesn’t support relation-graph. We recommend using either the sqlite backend, as in this tutorial, or ubergraph)

[34]:
cl relationships CL:4023014 --include-entailed -o output/tmp.tsv
[35]:
show()
[35]:
subject predicate object subject_label predicate_label object_label
0 CL:4023014 BFO:0000050 BFO:0000002 L5 vip cortical GABAergic interneuron (Mmus) part of continuant
1 CL:4023014 BFO:0000050 BFO:0000004 L5 vip cortical GABAergic interneuron (Mmus) part of independent continuant
2 CL:4023014 BFO:0000050 BFO:0000040 L5 vip cortical GABAergic interneuron (Mmus) part of material entity
3 CL:4023014 BFO:0000050 CARO:0000000 L5 vip cortical GABAergic interneuron (Mmus) part of anatomical entity
4 CL:4023014 BFO:0000050 CARO:0000006 L5 vip cortical GABAergic interneuron (Mmus) part of material anatomical entity
... ... ... ... ... ... ...
655 CL:4023014 rdfs:subClassOf CL:0011005 L5 vip cortical GABAergic interneuron (Mmus) None GABAergic interneuron
656 CL:4023014 rdfs:subClassOf CL:0012001 L5 vip cortical GABAergic interneuron (Mmus) None neuron of the forebrain
657 CL:4023014 rdfs:subClassOf CL:2000029 L5 vip cortical GABAergic interneuron (Mmus) None central nervous system neuron
658 CL:4023014 rdfs:subClassOf CL:4023014 L5 vip cortical GABAergic interneuron (Mmus) None L5 vip cortical GABAergic interneuron (Mmus)
659 CL:4023014 rdfs:subClassOf CL:4023016 L5 vip cortical GABAergic interneuron (Mmus) None vip GABAergic cortical interneuron

660 rows × 6 columns

660 entailed relationships is quite a lot!

Note a lot of these are quite trivial: every L5 vip cortical GABAergic interneuron (Mmus) is a part of SOME material entity. Duh!

It’s not expected that a typical user would inspect these large computed tables. Instead they are to be used “behind the scenes” in databases and applications - for example a gene expression database could use this table to answer questions like what genes are expressed in the forebrain by joining a direct expresses table with the relation-graph closure table, filtering on relationships like part-of or has-soma-location, or the weaker overlaps.

Let’s see what such queries might yield. First we will find the RO relationship for “overlaps”:

[36]:
cl info overlaps
RO:0002131 ! overlaps

(remember, part of RO is distributed with CL).

Next we will filter our entailed relationships, and we will query “down”wards, i.e. we are asking what overlaps the amygdala?

[40]:
cl relationships -p RO:0002131 "olfactory bulb" --direction down --include-entailed -o output/tmp.tsv
[41]:
show()
[41]:
subject predicate object subject_label predicate_label object_label
0 CL:1001435 RO:0002131 UBERON:0002264 periglomerular cell overlaps olfactory bulb
1 CL:1001434 RO:0002131 UBERON:0002264 olfactory bulb interneuron overlaps olfactory bulb
2 UBERON:0004001 RO:0002131 UBERON:0002264 olfactory bulb layer overlaps olfactory bulb
3 CL:0000626 RO:0002131 UBERON:0002264 olfactory granule cell overlaps olfactory bulb
4 UBERON:0009950 RO:0002131 UBERON:0002264 olfactory bulb plexiform layer overlaps olfactory bulb
5 UBERON:0005377 RO:0002131 UBERON:0002264 olfactory bulb glomerular layer overlaps olfactory bulb
6 UBERON:0005376 RO:0002131 UBERON:0002264 olfactory bulb external plexiform layer overlaps olfactory bulb
7 UBERON:0004186 RO:0002131 UBERON:0002264 olfactory bulb mitral cell layer overlaps olfactory bulb
8 CL:1001502 RO:0002131 UBERON:0002264 mitral cell overlaps olfactory bulb
9 CL:1001503 RO:0002131 UBERON:0002264 olfactory bulb tufted cell overlaps olfactory bulb
10 UBERON:0034730 RO:0002131 UBERON:0002264 olfactory tract linking bulb to ipsilateral do... overlaps olfactory bulb
11 UBERON:2000238 RO:0002131 UBERON:0002264 olfactory tract linking bulb to ipsilateral ve... overlaps olfactory bulb
12 UBERON:0002264 RO:0002131 UBERON:0002264 olfactory bulb overlaps olfactory bulb
13 UBERON:0002265 RO:0002131 UBERON:0002264 olfactory tract overlaps olfactory bulb

Complex queries

We can also make use of entailed edges in complex boolean queries.

The following query is an intersection (using and) syntax of

  • all things that overlap the olfactory bulb

  • all subtypes of interneuron

[42]:
cl info .desc//p=RO:0002131 "olfactory bulb" .and .desc//p=i "interneuron"
CL:1001435 ! periglomerular cell
CL:1001434 ! olfactory bulb interneuron
CL:1001502 ! mitral cell

Pairwise term similarity

Next we will explore the nascent semantic similarity functions in OAK

Note that the data model and signatures may change slightly here in the future.

Once again, it is important to understand how OAK handles graphs - all similarity methods are parameterized by predicate lists. Let’s start with the simple case of is-a hierarchies.

Here we will compare:

  • CL:1001435 ! periglomerular cell

  • CL:1001502 ! mitral cell

[44]:
cl similarity -p i CL:1001435 CL:1001502
subject_id: CL:1001435
object_id: CL:1001502
ancestor_id: CL:1001434
ancestor_information_content: 13.47134302805148
jaccard_similarity: 0.92
phenodigm_score: 3.520459570256043

TODOs:

  • allow calculation of IC from background annotations

  • add an --autolabel option (other OAK commands have this)

to see what the MRCA is:

[45]:
cl info CL:1001434
CL:1001434 ! olfactory bulb interneuron

not surprising since we selected those terms based on the fact they are OB interneurons!

Using queries as inputs for similarity (advanced)

Next we are going to explore a (randomly chosen) example - how similar are the neurons of two cortical layers?

We will use all-similarity, which can take as input either:

  • two files containing term lists

  • two boolean queries, each resolving to a term list

Similarity is then computed for the cross-product of the two lists:

[48]:
cl all-similarity -p i .desc//p=RO:0002131 "cortical layer II/III" .and .desc//p=i "neuron" @ .desc//p=RO:0002131 "cortical layer V" .and .desc//p=i "neuron" -o output/sim.png -O seaborn

img

As can be seen, glutaminergic cells are more similar, etc

Text Mining

Next we will use the annotate command to annotate some text

Up until now we have been using the sqlite adaptor, but for this we will switch to the bioportal adaptor

In future it will be possible to use plugins to combine your choice of adapter with different annotators, such as SciSpacy. For now bear in mind that bioportal gives wide coverage of ontologies but can have recall issues e.g. with plurals or different orthographic forms.

[50]:
%alias annotate runoak -i bioportal:cl annotate
[51]:
annotate "olfactory bulb interneuron projects into amygdala" -O csv -o output/tmp.tsv
[52]:
show()
[52]:
predicate_id object_id object_label object_source confidence match_string is_longest_match matches_whole_text match_type info subject_start subject_end subject_label subject_source subject_text_id
0 None CL:1001434 olfactory bulb interneuron https://data.bioontology.org/ontologies/CL None None None None PREF None 1 26 OLFACTORY BULB INTERNEURON None None
1 None UBERON:0002264 olfactory bulb https://data.bioontology.org/ontologies/CL None None None None PREF None 1 14 OLFACTORY BULB None None
2 None UBERON:0001896 medulla oblongata https://data.bioontology.org/ontologies/CL None None None None SYN None 11 14 BULB None None
3 None CL:0000099 interneuron https://data.bioontology.org/ontologies/CL None None None None PREF None 16 26 INTERNEURON None None
4 None UBERON:0001876 amygdala https://data.bioontology.org/ontologies/CL None None None None PREF None 42 49 AMYGDALA None None
[ ]: