{ "cells": [ { "cell_type": "markdown", "id": "c6420d5c", "metadata": {}, "source": [ "# OAK paths command\n", "\n", "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n", "\n", "This notebook provides examples for the `paths` command, which can be used to query for paths between ontology terms\n", "\n", "## Help Option\n", "\n", "You can get help on any OAK command using `--help`" ] }, { "cell_type": "code", "execution_count": 1, "id": "136865db", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak paths [OPTIONS] [TERMS]...\n", "\n", " List all paths between one or more start curies.\n", "\n", " Example:\n", "\n", " runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane'\n", "\n", " This shows all shortest paths from nuclear membrane to all ancestors\n", "\n", " Example:\n", "\n", " runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' --target\n", " cytoplasm\n", "\n", " This shows shortest paths between two nodes\n", "\n", " Example:\n", "\n", " runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' 'thylakoid'\n", " --target cytoplasm 'thylakoid membrane'\n", "\n", " This shows all shortest paths between 4 combinations of starts and ends\n", "\n", " You can also use \"@\" to separate start node list and end node list. Like\n", " most OAK commands, you can pass either explicit terms, or term queries. For\n", " example, if you have two files of IDs, then you can do this:\n", "\n", " runoak -i sqlite:obo:go paths -p i,p .idfile START_NODES.txt @ .idfile\n", " END_NODES.txt\n", "\n", " You can also pass in weights for each predicate, used when calculating\n", " shortest paths.\n", "\n", " Example:\n", "\n", " runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' --target\n", " cytoplasm --predicate-weights \"{i: 0.0001, p: 999}\"\n", "\n", " This shows all shortest paths after weighting relations\n", "\n", " (Note: you can use the same shorthands as in the `--predicates` option)\n", "\n", " This command can be combined with others to visualize the paths.\n", "\n", " Example:\n", "\n", " alias go=\"runoak -i sqlite:obo:go\" go paths -p i,p 'nuclear\n", " membrane' --target cytoplasm --narrow | go viz --fill-gaps -\n", "\n", " This visualizes the path by first exporting the path as a flat list, then\n", " passing the results to viz, using the fill-gaps option\n", "\n", "Options:\n", " --target TEXT end point of path\n", " --narrow / --no-narrow If true then output path is written a list\n", " of terms [default: no-narrow]\n", " --autolabel / --no-autolabel If set, results will automatically have\n", " labels assigned [default: autolabel]\n", " -p, --predicates TEXT A comma-separated list of predicates\n", " -O, --output-type TEXT Desired output type\n", " --directed / --no-directed only show directed paths [default: no-\n", " directed]\n", " --include-predicates / --no-include-predicates\n", " show predicates between nodes [default: no-\n", " include-predicates]\n", " --predicate-weights TEXT key-value pairs specified in YAML where keys\n", " are predicates or shorthands and values are\n", " weights\n", " -o, --output FILENAME Output file, e.g. obo file\n", " --help Show this message and exit.\n" ] } ], "source": [ "!runoak paths --help" ] }, { "cell_type": "markdown", "id": "fce81cad", "metadata": {}, "source": [ "## Set up an alias\n", "\n", "For convenience we will set up an alias for use in this notebook" ] }, { "cell_type": "code", "execution_count": 2, "id": "bb9456d8", "metadata": {}, "outputs": [], "source": [ "alias cl runoak -i sqlite:obo:cl" ] }, { "cell_type": "markdown", "id": "21259480", "metadata": {}, "source": [ "__Note__ if you want to do this on your own machine the syntax is slightly different in bash/zsh:\n", "\n", "`alias cl=\"runoak -i sqlite:obo:cl\"`" ] }, { "cell_type": "markdown", "id": "e60d8feb", "metadata": {}, "source": [ "## Example: simple subclass ancestor path" ] }, { "cell_type": "code", "execution_count": 3, "id": "fcba12c2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tsubject_label\tobject\tobject_label\tpath\tpath_label\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'GO:0030154', 'CL:0000000']\tinterneuron|neuron|material entity|precursor cell|cell differentiation|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0000003', 'CL:0000000']\tinterneuron|neuron|material entity|precursor cell|native cell|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000003', 'CL:0000000']\tinterneuron|neuron|material entity|motile cell|native cell|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'CL:0000393', 'CL:0000211', 'CL:0000003', 'CL:0000000']\tinterneuron|neuron|electrically responsive cell|electrically active cell|native cell|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'CL:0000404', 'CL:0000211', 'CL:0000003', 'CL:0000000']\tinterneuron|neuron|electrically signaling cell|electrically active cell|native cell|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'CL:0002319', 'CL:0002371', 'CL:0000003', 'CL:0000000']\tinterneuron|neuron|neural cell|somatic cell|native cell|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0098916', 'GO:0098794', 'CL:0000000']\tinterneuron|neuron|presynapse|anterograde trans-synaptic signaling|postsynapse|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0045202', 'GO:0098794', 'CL:0000000']\tinterneuron|neuron|presynapse|synapse|postsynapse|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0098794', 'CL:0000000']\tinterneuron|neuron|presynapse|cellular anatomical entity|postsynapse|cell\n", "CL:0000099\tinterneuron\tCL:0000000\tcell\t['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0030312', 'CL:0000000']\tinterneuron|neuron|presynapse|cellular anatomical entity|external encapsulating structure|cell\n", "CL:0000099\tinterneuron\tCARO:0000013\tcell\t['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0020003', 'CARO:0000013']\tinterneuron|neuron|nervous system|material anatomical entity|cellular anatomical structure|cell\n", "CL:0000099\tinterneuron\tCARO:0000013\tcell\t['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0000003', 'CARO:0000013']\tinterneuron|neuron|nervous system|material anatomical entity|connected anatomical structure|cell\n" ] } ], "source": [ "cl paths --target cell interneuron" ] }, { "cell_type": "markdown", "id": "91539b81", "metadata": {}, "source": [ "You can see a similar structure using the `tree` command:" ] }, { "cell_type": "code", "execution_count": 4, "id": "995eb689", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* [] BFO:0000002 ! continuant\r\n", " * [i] BFO:0000004 ! independent continuant\r\n", " * [i] BFO:0000040 ! material entity\r\n", " * [i] CL:0000540 ! neuron\r\n", " * [i] **CL:0000099 ! interneuron**\r\n", " * [i] CL:0002319 ! neural cell\r\n", " * [i] CL:0000540 ! neuron\r\n", " * [i] **CL:0000099 ! interneuron**\r\n", "* [] CL:0000000 ! cell\r\n", " * [i] CL:0000003 ! native cell\r\n", " * [i] CL:0000211 ! electrically active cell\r\n", " * [i] CL:0000393 ! electrically responsive cell\r\n", " * [i] CL:0000540 ! neuron\r\n", " * [i] **CL:0000099 ! interneuron**\r\n", " * [i] CL:0000404 ! electrically signaling cell\r\n", " * [i] CL:0000540 ! neuron\r\n", " * [i] **CL:0000099 ! interneuron**\r\n", " * [i] CL:0000255 ! eukaryotic cell\r\n", " * [i] CL:0000548 ! animal cell\r\n", " * [i] CL:0002319 ! neural cell\r\n", " * [i] CL:0000540 ! neuron\r\n", " * [i] **CL:0000099 ! interneuron**\r\n", " * [i] CL:0002371 ! somatic cell\r\n", " * [i] CL:0002319 ! neural cell\r\n", " * [i] CL:0000540 ! neuron\r\n", " * [i] **CL:0000099 ! interneuron**\r\n" ] } ], "source": [ "cl tree interneuron -p i " ] }, { "cell_type": "markdown", "id": "b832fee2", "metadata": {}, "source": [ "## Non-directed paths\n", "\n", "By default the paths command will ignore direction and show paths going both up and down:" ] }, { "cell_type": "code", "execution_count": 5, "id": "f3449146", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tsubject_label\tobject\tobject_label\tpath\tpath_label\n", "CL:0000099\tinterneuron\tCL:0000084\tT cell\t['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000738', 'CL:0000842', 'CL:0000542', 'CL:0000084']\tinterneuron|neuron|material entity|motile cell|leukocyte|mononuclear cell|lymphocyte|T cell\n", "CL:0000099\tinterneuron\tCL:0000084\tT cell\t['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000542', 'CL:0000084']\tinterneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|lymphocyte|T cell\n", "CL:0000099\tinterneuron\tCL:0000084\tT cell\t['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0000785', 'GO:0000792', 'CL:0000542', 'CL:0000084']\tinterneuron|neuron|presynapse|cellular anatomical entity|chromatin|heterochromatin|lymphocyte|T cell\n", "CL:0000099\tinterneuron\tCL:0000084\tT cell\t['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0005737', 'CL:0017500', 'CL:0000542', 'CL:0000084']\tinterneuron|neuron|presynapse|cellular anatomical entity|cytoplasm|neutrophillic cytoplasm|lymphocyte|T cell\n", "CL:0000099\tinterneuron\tCL:0000084\tT cell\t['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000827', 'CL:0000084']\tinterneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|pro-T cell|T cell\n", "CL:0000099\tinterneuron\tCL:0000084\tT cell\t['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000838', 'CL:0000827', 'CL:0000084']\tinterneuron|neuron|material entity|precursor cell|progenitor cell|lymphoid lineage restricted progenitor cell|pro-T cell|T cell\n" ] } ], "source": [ "cl paths interneuron --target \"T-cell\"" ] }, { "cell_type": "markdown", "id": "c3194e74", "metadata": {}, "source": [ "Specifying `--directed` forces traversal of subject to object; in this case, there are no such paths:" ] }, { "cell_type": "code", "execution_count": 6, "id": "fbcf6f35", "metadata": {}, "outputs": [], "source": [ "cl paths interneuron --directed --target \"T-cell\"" ] }, { "cell_type": "markdown", "id": "dca1c24e", "metadata": {}, "source": [ "## Narrow table options\n", "\n", "The default output is one row per *path*\n", "\n", "You can use the `--narrow` option to make a narrow table, with one row per path *element*:" ] }, { "cell_type": "code", "execution_count": 8, "id": "b2a1cc04", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tsubject_label\tobject\tobject_label\tpath_node\tpath_node_label\n", "CL:0000099\tinterneuron\tCL:4023061\thippocampal CA4 neuron\tCL:0000099\tinterneuron\n", "CL:0000099\tinterneuron\tCL:4023061\thippocampal CA4 neuron\tCL:0000540\tneuron\n", "CL:0000099\tinterneuron\tCL:4023061\thippocampal CA4 neuron\tCL:4023061\thippocampal CA4 neuron\n" ] } ], "source": [ "cl paths --narrow --target CL:4023061 interneuron" ] }, { "cell_type": "code", "execution_count": 9, "id": "e8429d2e", "metadata": {}, "outputs": [], "source": [ "cl paths --narrow --target CL:4023061 interneuron -o output/interneuron-CA4-path.tsv" ] }, { "cell_type": "code", "execution_count": 10, "id": "21506d63", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 11, "id": "a7063402", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | subject | \n", "subject_label | \n", "object | \n", "object_label | \n", "path_node | \n", "path_node_label | \n", "
---|---|---|---|---|---|---|
0 | \n", "CL:0000099 | \n", "interneuron | \n", "CL:4023061 | \n", "hippocampal CA4 neuron | \n", "CL:0000099 | \n", "interneuron | \n", "
1 | \n", "CL:0000099 | \n", "interneuron | \n", "CL:4023061 | \n", "hippocampal CA4 neuron | \n", "CL:0000540 | \n", "neuron | \n", "
2 | \n", "CL:0000099 | \n", "interneuron | \n", "CL:4023061 | \n", "hippocampal CA4 neuron | \n", "CL:4023061 | \n", "hippocampal CA4 neuron | \n", "