OAK paths command
This notebook is intended as a supplement to the main OAK CLI docs.
This notebook provides examples for the paths
command, which can be used to query for paths between ontology terms
Help Option
You can get help on any OAK command using --help
[1]:
!runoak paths --help
Usage: runoak paths [OPTIONS] [TERMS]...
List all paths between one or more start curies.
Example:
runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane'
This shows all shortest paths from nuclear membrane to all ancestors
Example:
runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' --target
cytoplasm
This shows shortest paths between two nodes
Example:
runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' 'thylakoid'
--target cytoplasm 'thylakoid membrane'
This shows all shortest paths between 4 combinations of starts and ends
You can also use "@" to separate start node list and end node list. Like
most OAK commands, you can pass either explicit terms, or term queries. For
example, if you have two files of IDs, then you can do this:
runoak -i sqlite:obo:go paths -p i,p .idfile START_NODES.txt @ .idfile
END_NODES.txt
You can also pass in weights for each predicate, used when calculating
shortest paths.
Example:
runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' --target
cytoplasm --predicate-weights "{i: 0.0001, p: 999}"
This shows all shortest paths after weighting relations
(Note: you can use the same shorthands as in the `--predicates` option)
This command can be combined with others to visualize the paths.
Example:
alias go="runoak -i sqlite:obo:go" go paths -p i,p 'nuclear
membrane' --target cytoplasm --narrow | go viz --fill-gaps -
This visualizes the path by first exporting the path as a flat list, then
passing the results to viz, using the fill-gaps option
Options:
--target TEXT end point of path
--narrow / --no-narrow If true then output path is written a list
of terms [default: no-narrow]
--autolabel / --no-autolabel If set, results will automatically have
labels assigned [default: autolabel]
-p, --predicates TEXT A comma-separated list of predicates
-O, --output-type TEXT Desired output type
--directed / --no-directed only show directed paths [default: no-
directed]
--include-predicates / --no-include-predicates
show predicates between nodes [default: no-
include-predicates]
--predicate-weights TEXT key-value pairs specified in YAML where keys
are predicates or shorthands and values are
weights
-o, --output FILENAME Output file, e.g. obo file
--help Show this message and exit.
Set up an alias
For convenience we will set up an alias for use in this notebook
[2]:
alias cl runoak -i sqlite:obo:cl
Note if you want to do this on your own machine the syntax is slightly different in bash/zsh:
alias cl="runoak -i sqlite:obo:cl"
Example: simple subclass ancestor path
[3]:
cl paths --target cell interneuron
subject subject_label object object_label path path_label
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'GO:0030154', 'CL:0000000'] interneuron|neuron|material entity|precursor cell|cell differentiation|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0000003', 'CL:0000000'] interneuron|neuron|material entity|precursor cell|native cell|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000003', 'CL:0000000'] interneuron|neuron|material entity|motile cell|native cell|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'CL:0000393', 'CL:0000211', 'CL:0000003', 'CL:0000000'] interneuron|neuron|electrically responsive cell|electrically active cell|native cell|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'CL:0000404', 'CL:0000211', 'CL:0000003', 'CL:0000000'] interneuron|neuron|electrically signaling cell|electrically active cell|native cell|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'CL:0002319', 'CL:0002371', 'CL:0000003', 'CL:0000000'] interneuron|neuron|neural cell|somatic cell|native cell|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0098916', 'GO:0098794', 'CL:0000000'] interneuron|neuron|presynapse|anterograde trans-synaptic signaling|postsynapse|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0045202', 'GO:0098794', 'CL:0000000'] interneuron|neuron|presynapse|synapse|postsynapse|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0098794', 'CL:0000000'] interneuron|neuron|presynapse|cellular anatomical entity|postsynapse|cell
CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0030312', 'CL:0000000'] interneuron|neuron|presynapse|cellular anatomical entity|external encapsulating structure|cell
CL:0000099 interneuron CARO:0000013 cell ['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0020003', 'CARO:0000013'] interneuron|neuron|nervous system|material anatomical entity|cellular anatomical structure|cell
CL:0000099 interneuron CARO:0000013 cell ['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0000003', 'CARO:0000013'] interneuron|neuron|nervous system|material anatomical entity|connected anatomical structure|cell
You can see a similar structure using the tree
command:
[4]:
cl tree interneuron -p i
* [] BFO:0000002 ! continuant
* [i] BFO:0000004 ! independent continuant
* [i] BFO:0000040 ! material entity
* [i] CL:0000540 ! neuron
* [i] **CL:0000099 ! interneuron**
* [i] CL:0002319 ! neural cell
* [i] CL:0000540 ! neuron
* [i] **CL:0000099 ! interneuron**
* [] CL:0000000 ! cell
* [i] CL:0000003 ! native cell
* [i] CL:0000211 ! electrically active cell
* [i] CL:0000393 ! electrically responsive cell
* [i] CL:0000540 ! neuron
* [i] **CL:0000099 ! interneuron**
* [i] CL:0000404 ! electrically signaling cell
* [i] CL:0000540 ! neuron
* [i] **CL:0000099 ! interneuron**
* [i] CL:0000255 ! eukaryotic cell
* [i] CL:0000548 ! animal cell
* [i] CL:0002319 ! neural cell
* [i] CL:0000540 ! neuron
* [i] **CL:0000099 ! interneuron**
* [i] CL:0002371 ! somatic cell
* [i] CL:0002319 ! neural cell
* [i] CL:0000540 ! neuron
* [i] **CL:0000099 ! interneuron**
Non-directed paths
By default the paths command will ignore direction and show paths going both up and down:
[5]:
cl paths interneuron --target "T-cell"
subject subject_label object object_label path path_label
CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000738', 'CL:0000842', 'CL:0000542', 'CL:0000084'] interneuron|neuron|material entity|motile cell|leukocyte|mononuclear cell|lymphocyte|T cell
CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000542', 'CL:0000084'] interneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|lymphocyte|T cell
CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0000785', 'GO:0000792', 'CL:0000542', 'CL:0000084'] interneuron|neuron|presynapse|cellular anatomical entity|chromatin|heterochromatin|lymphocyte|T cell
CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0005737', 'CL:0017500', 'CL:0000542', 'CL:0000084'] interneuron|neuron|presynapse|cellular anatomical entity|cytoplasm|neutrophillic cytoplasm|lymphocyte|T cell
CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000827', 'CL:0000084'] interneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|pro-T cell|T cell
CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000838', 'CL:0000827', 'CL:0000084'] interneuron|neuron|material entity|precursor cell|progenitor cell|lymphoid lineage restricted progenitor cell|pro-T cell|T cell
Specifying --directed
forces traversal of subject to object; in this case, there are no such paths:
[6]:
cl paths interneuron --directed --target "T-cell"
Narrow table options
The default output is one row per path
You can use the --narrow
option to make a narrow table, with one row per path element:
[8]:
cl paths --narrow --target CL:4023061 interneuron
subject subject_label object object_label path_node path_node_label
CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:0000099 interneuron
CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:0000540 neuron
CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:4023061 hippocampal CA4 neuron
[9]:
cl paths --narrow --target CL:4023061 interneuron -o output/interneuron-CA4-path.tsv
[10]:
import pandas as pd
[11]:
df = pd.read_csv("output/interneuron-CA4-path.tsv", sep="\t")
df
[11]:
subject | subject_label | object | object_label | path_node | path_node_label | |
---|---|---|---|---|---|---|
0 | CL:0000099 | interneuron | CL:4023061 | hippocampal CA4 neuron | CL:0000099 | interneuron |
1 | CL:0000099 | interneuron | CL:4023061 | hippocampal CA4 neuron | CL:0000540 | neuron |
2 | CL:0000099 | interneuron | CL:4023061 | hippocampal CA4 neuron | CL:4023061 | hippocampal CA4 neuron |
[ ]: