{ "cells": [ { "cell_type": "markdown", "id": "0f6c4513", "metadata": {}, "source": [ "# OAK apply command\n", "\n", "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n", "\n", "This notebook provides examples for the `apply` command, which applies any *change* conforming to the [KGCL](https://w3id.org/kgcl) specification.\n", "\n", "## Help Option\n", "\n", "You can get help on any OAK command using `--help`" ] }, { "cell_type": "code", "execution_count": 1, "id": "65db4b53", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak apply [OPTIONS] [COMMANDS]...\r\n", "\r\n", " Applies a patch to an ontology. The patch should be specified using KGCL\r\n", " syntax, see https://github.com/INCATools/kgcl\r\n", "\r\n", " Example:\r\n", "\r\n", " runoak -i cl.owl.ttl apply \"rename CL:0000561 to 'amacrine neuron'\" -o\r\n", " cl.owl.ttl -O ttl\r\n", "\r\n", " On an obo format file:\r\n", "\r\n", " runoak -i simpleobo:go-edit.obo apply \"rename GO:0005634 from 'nucleus'\r\n", " to 'foo'\" -o go-edit-new.obo\r\n", "\r\n", " With URIs:\r\n", "\r\n", " runoak -i cl.owl.ttl apply \"rename\r\n", " from 'amacrine cell' to\r\n", " 'amacrine neuron'\" -o cl.owl.ttl -O ttl\r\n", "\r\n", " WARNING:\r\n", "\r\n", " This command is still experimental. Some things to bear in mind:\r\n", "\r\n", " - for some ontologies, CURIEs may not work, instead specify a full URI\r\n", " surrounded by <>s - only a subset of KGCL commands are supported by each\r\n", " backend\r\n", "\r\n", "Options:\r\n", " -o, --output TEXT\r\n", " --changes-output TEXT output file for KGCL changes\r\n", " --changes-input FILENAME Path to an input changes file\r\n", " --changes-format TEXT Format of the changes file (json or kgcl)\r\n", " --dry-run / --no-dry-run if true, only perform the parse of KCGL and\r\n", " do not apply [default: no-dry-run]\r\n", " --expand / --no-expand if true, expand complex changes to atomic\r\n", " changes [default: expand]\r\n", " --ignore-invalid-changes / --no-ignore-invalid-changes\r\n", " if true, ignore invalid changes, e.g.\r\n", " obsoletions of dependent entities [default:\r\n", " no-ignore-invalid-changes]\r\n", " --contributor TEXT CURIE for the person contributing the patch\r\n", " -O, --output-type TEXT Desired output type\r\n", " --overwrite / --no-overwrite If set, any changes applied will be saved\r\n", " back to the input file/source\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!runoak apply --help" ] }, { "cell_type": "markdown", "id": "c8878ac5", "metadata": {}, "source": [ "## Download example file\n", "\n", "A typical use case for the `apply` command is for applying changes to the source, aka *edit* version of an ontology.\n", "For our purposes here we will make a copy of the go editorial file." ] }, { "cell_type": "code", "execution_count": 7, "id": "12a41f0d", "metadata": {}, "outputs": [], "source": [ "!curl -L -s https://github.com/geneontology/go-ontology/raw/master/src/ontology/go-edit.obo > input/go-edit.obo" ] }, { "cell_type": "markdown", "id": "d57ac006", "metadata": {}, "source": [ "Note that the go edit file is in *obo* format. A number of ontologies like GO, Uberon, and Mondo use obo format as the edit format due to the fact obo was designed to make human-readable diffs.\n", "\n", "The KGCL apply command may be used with other adapters, but it has been tested most extensively on the above three ontologies." ] }, { "cell_type": "markdown", "id": "c9047f55", "metadata": {}, "source": [ "## Create a new exact synonym\n", "\n", "Next we will create a new change of type [NewSynonym](https://w3id.org/kgcl/NewSynonym), using KGCL syntax\n", "on the command line.\n", "\n", "We will try making a synonym *compartment* for `GO:0043226` (organelle)\n", "\n", "We will first run in `--dry-run` mode:" ] }, { "cell_type": "code", "execution_count": 3, "id": "d9395e32", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:root:--autosave not passed, changes are NOT saved\r\n", "create exact synonym 'compartment' for GO:0043226" ] } ], "source": [ "!runoak -i simpleobo:input/go-edit.obo apply \"create exact synonym 'compartment' for GO:0043226\" --dry-run" ] }, { "cell_type": "markdown", "id": "b1d291f2", "metadata": {}, "source": [ "This warns us that changes were not saved anywhere.\n", "\n", "next we will try the real deal, and save the output file:" ] }, { "cell_type": "code", "execution_count": 4, "id": "5f59a5fc", "metadata": {}, "outputs": [], "source": [ "!runoak -i simpleobo:input/go-edit.obo apply \"create exact synonym 'compartment' for GO:0043226\" -o output/go-edit-modified.obo" ] }, { "cell_type": "markdown", "id": "cb67adfb", "metadata": {}, "source": [ "The command doesn't produce any output on stdout, but we instructed it to save these in an external file [output/go-edit-modified.obo](output/go-edit-modified.obo).\n", "\n", "Let's double check that it did what we asked it to do. First we'll try a plain old unix diff (one advantage of OBO format is its easy diffability):" ] }, { "cell_type": "code", "execution_count": 5, "id": "2ab326a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- input/go-edit.obo\t2023-01-20 12:36:57.000000000 -0800\r\n", "+++ output/go-edit-modified.obo\t2023-01-20 12:37:07.000000000 -0800\r\n", "@@ -241846,6 +241846,7 @@\r\n", " xref: NIF_Subcellular:sao1539965131\r\n", " xref: Wikipedia:Organelle\r\n", " is_a: GO:0110165 ! cellular anatomical entity\r\n", "+synonym: \"compartment\" EXACT []\r\n", " \r\n", " [Term]\r\n", " id: GO:0043227\r\n" ] } ], "source": [ "!diff -u input/go-edit.obo output/go-edit-modified.obo" ] }, { "cell_type": "markdown", "id": "a046d1d2", "metadata": {}, "source": [ "This is also what you would see in a Pull Request implementing this change\n", "\n", "## Diff Command\n", "\n", "The unix diff is still a little low level. OAK comes with a `diff` command that we can use instead.\n", "\n", "This is the reciprocal of the `apply` command, and it will generate a set of change objects in KGCL (which can then be applied....)" ] }, { "cell_type": "code", "execution_count": 5, "id": "0c590329", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[\n", "{\n", " \"id\": \"uuid:a50afe2c-9ed4-4ee9-9a17-e80e971b072e\",\n", " \"new_value\": \"compartment\",\n", " \"about_node\": \"GO:0043226\",\n", " \"@type\": \"NewSynonym\"\n", "}\n", "]\n" ] } ], "source": [ "!runoak -i simpleobo:input/go-edit.obo diff -X simpleobo:output/go-edit-modified.obo -O json" ] }, { "cell_type": "markdown", "id": "4dc3fc80", "metadata": {}, "source": [ "(this is currently a bit slow, so be patient - we're working on optimizing this).\n", "\n", "If you prefer human-readable KGCL syntax to KGCL JSON:" ] }, { "cell_type": "code", "execution_count": 6, "id": "0d6b5576", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "create synonym 'compartment' for GO:0043226\r\n" ] } ], "source": [ "!runoak -i simpleobo:input/go-edit.obo diff -X simpleobo:output/go-edit-modified.obo -O kgcl" ] }, { "cell_type": "markdown", "id": "63d121d6", "metadata": {}, "source": [ "Note that this is the same string we used to apply the patch in the first place - this demonstrates the complementary nature of `diff` and `patch`.\n", "\n", "**TODO**: the diff should reflect the *scope* of the synonym, i.e EXACT" ] }, { "cell_type": "markdown", "id": "96ff7366", "metadata": {}, "source": [ "## Apply multiple changes\n", "\n", "You can pass in a list of multiple changes on the command line, or via a file:" ] }, { "cell_type": "code", "execution_count": 11, "id": "350df7ea", "metadata": {}, "outputs": [], "source": [ "!echo create exact synonym \\'test1\\' for GO:0043226 > input/test.kgcl" ] }, { "cell_type": "code", "execution_count": 12, "id": "b4166991", "metadata": {}, "outputs": [], "source": [ "!echo create exact synonym \\'test2\\' for GO:0043226 >> input/test.kgcl" ] }, { "cell_type": "code", "execution_count": 13, "id": "3cc36b26", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "create exact synonym 'test1' for GO:0043226\r\n", "create exact synonym 'test2' for GO:0043226\r\n" ] } ], "source": [ "!cat input/test.kgcl" ] }, { "cell_type": "code", "execution_count": 14, "id": "7bed7690", "metadata": {}, "outputs": [], "source": [ "!runoak -i simpleobo:input/go-edit.obo apply --changes-input input/test.kgcl -o output/go-edit-modified.obo" ] }, { "cell_type": "markdown", "id": "ced77e71", "metadata": {}, "source": [ "## Expanding complex changes into atomic changes\n", "\n", "Some changes represent composites of multiple smaller changes; other changes might *entail* other changes.\n", "Some of these may be variable depending on particular ontology *workflows*.\n", "\n", "For example, in many OBO workflows, the act of performing a [NodeObsoletion](https://w3id.org/kgcl/NodeObsoletion) might also involve:\n", "\n", "- *renaming* the node, preceding the label with \"`obsolete `\"\n", "- *rewiring* the surrounding nodes, such that:\n", " - the children of the obsolete nodes point directly to the parents, with the obsolete node bypassed\n", " - *deleting edges* such that there are no logical axioms that reference the obsoleted node" ] }, { "cell_type": "markdown", "id": "2b44e8a2", "metadata": {}, "source": [ "first let's try a dry run simulating what it would be like to obsolete *organelle* (GO:0043226).\n", "\n", "First let's explore the neighborhood - we will use the `viz` command to view a random child of organelle, *non-membrane-bounded organelle* (GO:0043228)" ] }, { "cell_type": "code", "execution_count": 23, "id": "5a350f97", "metadata": {}, "outputs": [], "source": [ "!runoak -i simpleobo:input/go-edit.obo viz -p i,p GO:0043228 GO:0043226 -o output/nmbo.png" ] }, { "cell_type": "markdown", "id": "1396282d", "metadata": {}, "source": [ "![img](output/nmbo.png)" ] }, { "cell_type": "markdown", "id": "a666d1f0", "metadata": {}, "source": [ "now let's try obsoleting the intermediate *organelle* class (`GO:0043226`), but in `--dry-run` mode, with `--expand`. (Note `--expand` is the default, but it helps to make this explicit).\n", "\n", "This will trigger the outputting of all expanded changes as KGCL syntax:" ] }, { "cell_type": "code", "execution_count": 17, "id": "33798efa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "obsolete GO:0043226\r\n", "rename GO:0043226 from 'organelle' to 'obsolete organelle'\r\n", "create edge GO:0005929 rdfs:subClassOf GO:0110165\r\n", "create edge GO:0043228 rdfs:subClassOf GO:0110165\r\n", "create edge GO:0043227 rdfs:subClassOf GO:0110165\r\n", "create edge GO:0043230 rdfs:subClassOf GO:0110165\r\n", "create edge GO:0099572 rdfs:subClassOf GO:0110165\r\n", "delete edge GO:0005929 rdfs:subClassOf GO:0043226\r\n", "delete edge GO:0043228 rdfs:subClassOf GO:0043226\r\n", "delete edge GO:0020004 BFO:0000050 GO:0043226\r\n", "delete edge GO:0031676 BFO:0000050 GO:0043226\r\n", "delete edge GO:0043227 rdfs:subClassOf GO:0043226\r\n", "delete edge GO:0032420 BFO:0000050 GO:0043226\r\n", "delete edge GO:0043230 rdfs:subClassOf GO:0043226\r\n", "delete edge GO:0044232 BFO:0000050 GO:0043226\r\n", "delete edge GO:0060091 BFO:0000050 GO:0043226\r\n", "delete edge GO:0060171 BFO:0000050 GO:0043226\r\n", "delete edge GO:0097591 BFO:0000050 GO:0043226\r\n", "delete edge GO:0097592 BFO:0000050 GO:0043226\r\n", "delete edge GO:0097593 BFO:0000050 GO:0043226\r\n", "delete edge GO:0097594 BFO:0000050 GO:0043226\r\n", "delete edge GO:0097595 BFO:0000050 GO:0043226\r\n", "delete edge GO:0097596 BFO:0000050 GO:0043226\r\n", "delete edge GO:0099572 rdfs:subClassOf GO:0043226\r\n", "delete edge GO:0043226 rdfs:subClassOf GO:0110165WARNING:root:--autosave not passed, changes are NOT saved\r\n" ] } ], "source": [ "!runoak -i simpleobo:input/go-edit.obo apply --expand \"obsolete GO:0043226\" --dry-run" ] }, { "cell_type": "markdown", "id": "1e034bee", "metadata": {}, "source": [ "in future it will be possible to visualize KGCL directly. For now, let's just visualize the output file after running in non-dry-run mode:" ] }, { "cell_type": "code", "execution_count": 19, "id": "2d26947f", "metadata": {}, "outputs": [], "source": [ "!runoak -i simpleobo:input/go-edit.obo apply --expand \"obsolete GO:0043226\" -o output/obsoleted-organelle.obo" ] }, { "cell_type": "code", "execution_count": 22, "id": "e8a1c9f2", "metadata": {}, "outputs": [], "source": [ "!runoak --stacktrace -i simpleobo:output/obsoleted-organelle.obo viz -p i,p GO:0043228 GO:0043226 -o output/nmbo2.png" ] }, { "cell_type": "markdown", "id": "fdf29d11", "metadata": {}, "source": [ "![img](output/nmbo2.png)" ] }, { "cell_type": "markdown", "id": "2c8fc4a6", "metadata": {}, "source": [ "## Invalid Obsolete Operations" ] }, { "cell_type": "markdown", "id": "800424dd", "metadata": {}, "source": [ "Currently the obsolete operation will not rewire certain axioms of ontology axioms like *logical definitions*, these\n", "require curator intervention.\n", "\n", "This can be seen if we try and obsolete a core term like *metabolic process* (`GO:0008152`):" ] }, { "cell_type": "code", "execution_count": 15, "id": "e258173e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ValueError: GO:0008152 used in logical definition of GO:0000023\r\n" ] } ], "source": [ "!runoak -i simpleobo:input/go-edit.obo apply --expand \"obsolete GO:0008152\" --dry-run" ] }, { "cell_type": "markdown", "id": "7b6203fa", "metadata": {}, "source": [ "In future, OAK may allow more configurability here, including the ability to do full cascading deletes. But this\n", "in general would not be recommended - if you want to obsolete a term that is commonly used in logical definitions then\n", "you need to do some manual examination of your design patterns.\n", "\n", "However, if you also want to obsolete all the dependent nodes in the same operation, you can do that by batching the obsoletes in a single file." ] }, { "cell_type": "markdown", "id": "f4576758", "metadata": {}, "source": [ "## Creating an entire ontology from change directives\n", "\n", "You can create an entire ontology from scratch using only change directives.\n" ] }, { "cell_type": "code", "execution_count": 17, "id": "54af9032", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "create node X:1 'limb'\r\n", "create node X:2 'forelimb'\r\n", "create edge X:2 is_a X:1\r\n", "create node X:3 'hindlimb'\r\n", "create edge X:3 is_a X:1\r\n", "create related synonym 'arm' for X:2\r\n", "create related synonym 'leg' for X:3\r\n", "# foo\r\n" ] } ], "source": [ "!cat input/test-create.kgcl.txt" ] }, { "cell_type": "code", "execution_count": 19, "id": "194b3637", "metadata": {}, "outputs": [], "source": [ "!runoak -i pronto: apply --changes-input input/test-create.kgcl.txt -o output/kgcl-de-novo.obo" ] }, { "cell_type": "code", "execution_count": 20, "id": "5da11a6e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "format-version: 1.4\r\n", "\r\n", "[Term]\r\n", "id: X:1\r\n", "name: limb\r\n", "\r\n", "[Term]\r\n", "id: X:2\r\n", "name: forelimb\r\n", "synonym: \"arm\" RELATED []\r\n", "is_a: X:1\r\n", "\r\n", "[Term]\r\n", "id: X:3\r\n", "name: hindlimb\r\n", "synonym: \"leg\" RELATED []\r\n", "is_a: X:1\r\n" ] } ], "source": [ "!cat output/kgcl-de-novo.obo" ] }, { "cell_type": "markdown", "id": "55785fe4", "metadata": {}, "source": [ "the same thing but using the funowl wrapper for making an ontology in OWL functional syntax. Note here it's\n", "necessary to set the prefixes as these are not implicit like in obo:" ] }, { "cell_type": "code", "execution_count": 1, "id": "cab30a43", "metadata": {}, "outputs": [], "source": [ "!runoak --stacktrace --prefix X=http://example.org/ -i funowl: apply --changes-input input/test-create.kgcl.txt -o output/kgcl-de-novo.ofn" ] }, { "cell_type": "code", "execution_count": 2, "id": "7f02240b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Prefix( owl: = )\r\n", "Prefix( rdf: = )\r\n", "Prefix( rdfs: = )\r\n", "Prefix( xsd: = )\r\n", "Prefix( xml: = )\r\n", "\r\n", "Ontology(\r\n", " AnnotationAssertion( rdfs:label \"limb\" )\r\n", " AnnotationAssertion( rdfs:label \"forelimb\" )\r\n", " SubClassOf( )\r\n", " AnnotationAssertion( rdfs:label \"hindlimb\" )\r\n", " SubClassOf( )\r\n", " AnnotationAssertion( X:2 \"arm\" )\r\n", " AnnotationAssertion( X:3 \"leg\" )\r\n", ")" ] } ], "source": [ "!cat output/kgcl-de-novo.ofn" ] }, { "cell_type": "code", "execution_count": null, "id": "4659334b", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }