OAK paths command

This notebook is intended as a supplement to the main OAK CLI docs.

This notebook provides examples for the paths command, which can be used to query for paths between ontology terms

Help Option

You can get help on any OAK command using --help

[1]:
!runoak paths --help
Usage: runoak paths [OPTIONS] [TERMS]...

  List all paths between one or more start curies.

  Example:

      runoak -i sqlite:obo:go paths  -p i,p 'nuclear membrane'

  This shows all shortest paths from nuclear membrane to all ancestors

  Example:

      runoak -i sqlite:obo:go paths  -p i,p 'nuclear membrane' --target
      cytoplasm

  This shows shortest paths between two nodes

  Example:

      runoak -i sqlite:obo:go paths  -p i,p 'nuclear membrane' 'thylakoid'
      --target cytoplasm 'thylakoid membrane'

  This shows all shortest paths between 4 combinations of starts and ends

  You can also use "@" to separate start node list and end node list. Like
  most OAK commands, you can pass either explicit terms, or term queries. For
  example, if you have two files of IDs, then you can do this:

      runoak -i sqlite:obo:go paths  -p i,p .idfile START_NODES.txt @ .idfile
      END_NODES.txt

  You can also pass in weights for each predicate, used when calculating
  shortest paths.

  Example:

      runoak -i sqlite:obo:go paths  -p i,p 'nuclear membrane' --target
      cytoplasm                 --predicate-weights "{i: 0.0001, p: 999}"

  This shows all shortest paths after weighting relations

  (Note: you can use the same shorthands as in the `--predicates` option)

  This command can be combined with others to visualize the paths.

  Example:

      alias go="runoak -i sqlite:obo:go"     go paths  -p i,p 'nuclear
      membrane' --target cytoplasm --narrow | go viz --fill-gaps -

  This visualizes the path by first exporting the path as a flat list, then
  passing the results to viz, using the fill-gaps option

Options:
  --target TEXT                   end point of path
  --narrow / --no-narrow          If true then output path is written a list
                                  of terms  [default: no-narrow]
  --autolabel / --no-autolabel    If set, results will automatically have
                                  labels assigned  [default: autolabel]
  -p, --predicates TEXT           A comma-separated list of predicates
  -O, --output-type TEXT          Desired output type
  --directed / --no-directed      only show directed paths  [default: no-
                                  directed]
  --include-predicates / --no-include-predicates
                                  show predicates between nodes  [default: no-
                                  include-predicates]
  --predicate-weights TEXT        key-value pairs specified in YAML where keys
                                  are predicates or shorthands and values are
                                  weights
  -o, --output FILENAME           Output file, e.g. obo file
  --help                          Show this message and exit.

Set up an alias

For convenience we will set up an alias for use in this notebook

[2]:
alias cl runoak -i sqlite:obo:cl

Note if you want to do this on your own machine the syntax is slightly different in bash/zsh:

alias cl="runoak -i sqlite:obo:cl"

Example: simple subclass ancestor path

[3]:
cl paths --target cell interneuron
subject subject_label   object  object_label    path    path_label
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'GO:0030154', 'CL:0000000']   interneuron|neuron|material entity|precursor cell|cell differentiation|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0000003', 'CL:0000000']   interneuron|neuron|material entity|precursor cell|native cell|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000003', 'CL:0000000']   interneuron|neuron|material entity|motile cell|native cell|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'CL:0000393', 'CL:0000211', 'CL:0000003', 'CL:0000000']    interneuron|neuron|electrically responsive cell|electrically active cell|native cell|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'CL:0000404', 'CL:0000211', 'CL:0000003', 'CL:0000000']    interneuron|neuron|electrically signaling cell|electrically active cell|native cell|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'CL:0002319', 'CL:0002371', 'CL:0000003', 'CL:0000000']    interneuron|neuron|neural cell|somatic cell|native cell|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0098916', 'GO:0098794', 'CL:0000000']    interneuron|neuron|presynapse|anterograde trans-synaptic signaling|postsynapse|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0045202', 'GO:0098794', 'CL:0000000']    interneuron|neuron|presynapse|synapse|postsynapse|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0098794', 'CL:0000000']    interneuron|neuron|presynapse|cellular anatomical entity|postsynapse|cell
CL:0000099      interneuron     CL:0000000      cell    ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0030312', 'CL:0000000']    interneuron|neuron|presynapse|cellular anatomical entity|external encapsulating structure|cell
CL:0000099      interneuron     CARO:0000013    cell    ['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0020003', 'CARO:0000013']  interneuron|neuron|nervous system|material anatomical entity|cellular anatomical structure|cell
CL:0000099      interneuron     CARO:0000013    cell    ['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0000003', 'CARO:0000013']  interneuron|neuron|nervous system|material anatomical entity|connected anatomical structure|cell

You can see a similar structure using the tree command:

[4]:
cl tree interneuron -p i
* [] BFO:0000002 ! continuant
    * [i] BFO:0000004 ! independent continuant
        * [i] BFO:0000040 ! material entity
            * [i] CL:0000540 ! neuron
                * [i] **CL:0000099 ! interneuron**
        * [i] CL:0002319 ! neural cell
            * [i] CL:0000540 ! neuron
                * [i] **CL:0000099 ! interneuron**
* [] CL:0000000 ! cell
    * [i] CL:0000003 ! native cell
        * [i] CL:0000211 ! electrically active cell
            * [i] CL:0000393 ! electrically responsive cell
                * [i] CL:0000540 ! neuron
                    * [i] **CL:0000099 ! interneuron**
            * [i] CL:0000404 ! electrically signaling cell
                * [i] CL:0000540 ! neuron
                    * [i] **CL:0000099 ! interneuron**
        * [i] CL:0000255 ! eukaryotic cell
            * [i] CL:0000548 ! animal cell
                * [i] CL:0002319 ! neural cell
                    * [i] CL:0000540 ! neuron
                        * [i] **CL:0000099 ! interneuron**
        * [i] CL:0002371 ! somatic cell
            * [i] CL:0002319 ! neural cell
                * [i] CL:0000540 ! neuron
                    * [i] **CL:0000099 ! interneuron**

Non-directed paths

By default the paths command will ignore direction and show paths going both up and down:

[5]:
cl paths interneuron --target "T-cell"
subject subject_label   object  object_label    path    path_label
CL:0000099      interneuron     CL:0000084      T cell  ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000738', 'CL:0000842', 'CL:0000542', 'CL:0000084']       interneuron|neuron|material entity|motile cell|leukocyte|mononuclear cell|lymphocyte|T cell
CL:0000099      interneuron     CL:0000084      T cell  ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000542', 'CL:0000084']       interneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|lymphocyte|T cell
CL:0000099      interneuron     CL:0000084      T cell  ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0000785', 'GO:0000792', 'CL:0000542', 'CL:0000084']        interneuron|neuron|presynapse|cellular anatomical entity|chromatin|heterochromatin|lymphocyte|T cell
CL:0000099      interneuron     CL:0000084      T cell  ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0005737', 'CL:0017500', 'CL:0000542', 'CL:0000084']        interneuron|neuron|presynapse|cellular anatomical entity|cytoplasm|neutrophillic cytoplasm|lymphocyte|T cell
CL:0000099      interneuron     CL:0000084      T cell  ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000827', 'CL:0000084']       interneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|pro-T cell|T cell
CL:0000099      interneuron     CL:0000084      T cell  ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000838', 'CL:0000827', 'CL:0000084']       interneuron|neuron|material entity|precursor cell|progenitor cell|lymphoid lineage restricted progenitor cell|pro-T cell|T cell

Specifying --directed forces traversal of subject to object; in this case, there are no such paths:

[6]:
cl paths interneuron --directed --target "T-cell"

Narrow table options

The default output is one row per path

You can use the --narrow option to make a narrow table, with one row per path element:

[8]:
cl paths --narrow --target CL:4023061 interneuron
subject subject_label   object  object_label    path_node       path_node_label
CL:0000099      interneuron     CL:4023061      hippocampal CA4 neuron  CL:0000099      interneuron
CL:0000099      interneuron     CL:4023061      hippocampal CA4 neuron  CL:0000540      neuron
CL:0000099      interneuron     CL:4023061      hippocampal CA4 neuron  CL:4023061      hippocampal CA4 neuron
[9]:
cl paths --narrow --target CL:4023061 interneuron -o output/interneuron-CA4-path.tsv
[10]:
import pandas as pd
[11]:
df = pd.read_csv("output/interneuron-CA4-path.tsv", sep="\t")
df
[11]:
subject subject_label object object_label path_node path_node_label
0 CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:0000099 interneuron
1 CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:0000540 neuron
2 CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:4023061 hippocampal CA4 neuron
[ ]: