{ "cells": [ { "cell_type": "markdown", "id": "08f6dae5-2605-4bf0-be5d-d1c88a563f8b", "metadata": {}, "source": [ "# OAK associations command\n", "\n", "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n", "\n", "This notebook provides examples for the `associations` command which provides ways of querying [associations](https://incatools.github.io/ontology-access-kit/glossary.html#term-Association).\n", "\n", "For more on associations, see [Associations and Curated Annotations](https://incatools.github.io/ontology-access-kit/guide/associations.html) in the OAK guide.\n", "\n", "For more on command line usage in general, see the [Command Line Tutorial](https://doi.org/10.5281/zenodo.7708963)\n", "\n", "## Help Option\n", "\n", "You can get help on any OAK command using `--help`" ] }, { "cell_type": "code", "execution_count": 1, "id": "e4e56e4f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak associations [OPTIONS] [TERMS]...\n", "\n", " Lookup associations from or to entities.\n", "\n", " Example:\n", "\n", " runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations\n", "\n", " The above will show all associations\n", "\n", " To query using an ontology term, including is-a closure, specify one or more\n", " terms or term queries, plus the closure predicate(s), e.g.\n", "\n", " Example:\n", "\n", " runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations -p i\n", " HP:0001392\n", "\n", " This shows all annotations either to \"Abnormality of the liver\"\n", " (HP:0001392), or to is-a descendants.\n", "\n", " Using input specifications:\n", "\n", " It can be awkward to specify both input ontology and association path and\n", " format. You can use input specifications to bundle common combinations of\n", " inputs together.\n", "\n", " For example, the go-dictybase-input-spec combines go plus dictybase\n", " associations.\n", "\n", " Example:\n", "\n", " runoak --i src/oaklib/conf/go-dictybase-input-spec.yaml associations -p\n", " i,p GO:0008104\n", "\n", " More examples:\n", "\n", " https://github.com/INCATools/ontology-access-\n", " kit/blob/main/notebooks/Commands/Associations.ipynb\n", "\n", "Options:\n", " -o, --output FILENAME Output file, e.g. obo file\n", " -p, --predicates TEXT A comma-separated list of predicates. This\n", " may be a shorthand (i, p) or CURIE\n", " --autolabel / --no-autolabel If set, results will automatically have\n", " labels assigned [default: autolabel]\n", " -O, --output-type TEXT Desired output type\n", " -o, --output FILENAME Output file, e.g. obo file\n", " --if-absent [absent-only|present-only]\n", " determines behavior when the value is not\n", " present or is empty.\n", " -S, --set-value TEXT the value to set for all terms for the given\n", " property.\n", " --association-predicates TEXT A comma-separated list of predicates for the\n", " association relation\n", " -Q, --terms-role [subject|object|both]\n", " How to interpret query terms. [default:\n", " object]\n", " --help Show this message and exit.\n" ] } ], "source": [ "!runoak associations --help" ] }, { "cell_type": "markdown", "id": "7ac5760e-bb37-440e-8e09-d828f59e2fe9", "metadata": {}, "source": [ "### Set up an alias\n", "\n", "We will set up an alias for running OAK bound to GO for the purposes of this notebook:" ] }, { "cell_type": "code", "execution_count": 2, "id": "a9dbd43c", "metadata": {}, "outputs": [], "source": [ "alias go runoak -i sqlite:obo:go" ] }, { "cell_type": "code", "execution_count": 3, "id": "03a7e8c2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "id:\n", "- obo:go/extensions/go-plus.owl\n", "dce:description:\n", "- The Gene Ontology (GO) provides a framework and set of concepts for describing the\n", " functions of gene products from all organisms.\n", "dce:title:\n", "- Gene Ontology\n", "dcterms:license:\n", "- \n", "oio:default-namespace:\n", "- gene_ontology\n", "oio:hasOBOFormatVersion:\n", "- '1.2'\n", "owl:versionIRI:\n", "- obo:go/releases/2023-04-01/extensions/go-plus.owl\n", "owl:versionInfo:\n", "- '2023-04-01'\n", "rdf:type:\n", "- owl:Ontology\n", "sh:prefix:\n", "- obo\n", "schema:url:\n", "- http://purl.obolibrary.org/obo/go/extensions/go-plus.owl\n", "rdfs:isDefinedBy:\n", "- http://purl.obolibrary.org/obo/obo.owl\n" ] } ], "source": [ "go ontology-metadata --all" ] }, { "cell_type": "markdown", "id": "eb483a9d-c9a4-4eee-879f-746e86272c31", "metadata": {}, "source": [ "Check that queries work" ] }, { "cell_type": "code", "execution_count": 4, "id": "65b5f090", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GO:0016301 ! kinase activity\n" ] } ], "source": [ "go info \"kinase activity\"" ] }, { "cell_type": "markdown", "id": "88419b98-5395-4d92-8229-d5ad3be27255", "metadata": {}, "source": [ "### Query for associations to a gene\n", "\n", "Here we will query from a previously downloaded GAF all associations to a gene" ] }, { "cell_type": "code", "execution_count": 8, "id": "0a9799ed", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tpredicate\tobject\tobject_label\tproperty_values\tsubject_label\tpredicate_label\tnegated\tpublications\tprimary_knowledge_source\taggregator_knowledge_source\n", "SGD:S000004294\tNone\tGO:0003824\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000004294\tNone\tGO:0003824\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000004294\tNone\tGO:0003824\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0005737\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0005737\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000148671\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0005737\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000069459|PMID:11914276\tinfores:SGD\tNone\n", "SGD:S000004294\tNone\tGO:0005737\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000069459|PMID:11914276\tinfores:SGD\tNone\n", "SGD:S000004294\tNone\tGO:0016765\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000004294\tNone\tGO:0030170\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000004294\tNone\tGO:0030170\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000185201\tinfores:GO_Central\tNone\n", "SGD:S000004294\tNone\tGO:0006520\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000004294\tNone\tGO:0008152\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0008652\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0009086\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0019344\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0019344\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000075748|PMID:15042590\tinfores:SGD\tNone\n", "SGD:S000004294\tNone\tGO:0071266\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000204515\tinfores:GOC\tNone\n", "SGD:S000004294\tNone\tGO:0003961\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000004294\tNone\tGO:0003961\tNone\t\tMET17\tNone\tNone\tSGD_REF:S000057063|PMID:3299001\tinfores:SGD\tNone\n" ] } ], "source": [ "go -g input/gene_association.sgd.gaf -G gaf associations -Q subject SGD:S000004294 -O csv | head -20" ] }, { "cell_type": "markdown", "id": "fda241fe-6a52-4ba2-b8ef-9a30241d3f0d", "metadata": {}, "source": [ "## Query for associations to a term\n", "\n", "In contrast to gene queries, we want to make use of [ontology relationships](https://incatools.github.io/ontology-access-kit/guide/relationships-and-graphs.html) - in particular we typically want to include all is-a and part-of descendants in our query" ] }, { "cell_type": "code", "execution_count": 9, "id": "04738ae7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tpredicate\tobject\tobject_label\tproperty_values\tsubject_label\tpredicate_label\tnegated\tpublications\tprimary_knowledge_source\taggregator_knowledge_source\n", "SGD:S000001369\tNone\tGO:0016301\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000051318|PMID:1322693\tinfores:SGD\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000048479|PMID:1657152\tinfores:SGD\tNone\n", "SGD:S000002318\tNone\tGO:0004708\tNone\t\tSTE7\tNone\tNone\tSGD_REF:S000041791|PMID:8668180\tinfores:SGD\tNone\n", "SGD:S000002318\tNone\tGO:0004708\tNone\t\tSTE7\tNone\tNone\tSGD_REF:S000045748|PMID:8384702\tinfores:SGD\tNone\n", "SGD:S000003272\tNone\tGO:0004707\tNone\t\tKSS1\tNone\tNone\tSGD_REF:S000041791|PMID:8668180\tinfores:SGD\tNone\n", "SGD:S000003272\tNone\tGO:0004707\tNone\t\tKSS1\tNone\tNone\tSGD_REF:S000045641|PMID:8918885\tinfores:SGD\tNone\n", "SGD:S000003272\tNone\tGO:0004707\tNone\t\tKSS1\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000003272\tNone\tGO:0004707\tNone\t\tKSS1\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000006124\tNone\tGO:0004672\tNone\t\tTPK2\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000006124\tNone\tGO:0004672\tNone\t\tTPK2\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000006124\tNone\tGO:0004672\tNone\t\tTPK2\tNone\tNone\tSGD_REF:S000113918|PMID:16319894\tinfores:SGD\tNone\n", "SGD:S000006124\tNone\tGO:0004672\tNone\t\tTPK2\tNone\tNone\tSGD_REF:S000113918|PMID:16319894\tinfores:SGD\tNone\n", "SGD:S000002318\tNone\tGO:0016301\tNone\t\tSTE7\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000000364\tNone\tGO:0004674\tNone\t\tCDC28\tNone\tNone\tSGD_REF:S000086178|PMID:16096060\tinfores:SGD\tNone\n", "SGD:S000000364\tNone\tGO:0004674\tNone\t\tCDC28\tNone\tNone\tSGD_REF:S000146417|PMID:21841787\tinfores:SGD\tNone\n", "SGD:S000000364\tNone\tGO:0004674\tNone\t\tCDC28\tNone\tNone\tSGD_REF:S000149310|PMID:22521784\tinfores:SGD\tNone\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "SGD:S000003820\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002394\tNone\tNone\tGO:0009927\thistidine phosphotransfer kinase activity\t[]\n", "SGD:S000001644\tNone\tNone\tGO:0004693\tcyclin-dependent protein serine/threonine kinase activity\t[]\n", "SGD:S000004710\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000001949\tNone\tNone\tGO:0008865\tfructokinase activity\t[]\n", "SGD:S000003607\tNone\tNone\tGO:0003991\tacetylglutamate kinase activity\t[]\n", "SGD:S000001075\tNone\tNone\tGO:0004349\tglutamate 5-kinase activity\t[]\n", "SGD:S000002924\tNone\tNone\tGO:0019158\tmannokinase activity\t[]\n", "SGD:S000003509\tNone\tNone\tGO:0004140\tdephospho-CoA kinase activity\t[]\n", "SGD:S000004438\tNone\tNone\tGO:0008865\tfructokinase activity\t[]\n", "SGD:S000001651\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000001681\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002427\tNone\tNone\tGO:0004849\turidine kinase activity\t[]\n", "SGD:S000003866\tNone\tNone\tGO:0019200\tcarbohydrate kinase activity\t[]\n", "SGD:S000000687\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000006071\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000004438\tNone\tNone\tGO:0004340\tglucokinase activity\t[]\n", "SGD:S000001654\tNone\tNone\tGO:0009931\tcalcium-dependent protein serine/threonine kinase activity\t[]\n", "SGD:S000001949\tNone\tNone\tGO:0004340\tglucokinase activity\t[]\n", "SGD:S000005878\tNone\tNone\tGO:0004683\tcalmodulin-dependent protein kinase activity\t[]\n", "SGD:S000006074\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000003126\tNone\tNone\tGO:0004683\tcalmodulin-dependent protein kinase activity\t[]\n", "SGD:S000001622\tNone\tNone\tGO:0008353\tRNA polymerase II CTD heptapeptide repeat kinase activity\t[]\n", "SGD:S000005488\tNone\tNone\tGO:0004693\tcyclin-dependent protein serine/threonine kinase activity\t[]\n", "SGD:S000002924\tNone\tNone\tGO:0008865\tfructokinase activity\t[]\n", "SGD:S000003222\tNone\tNone\tGO:0008865\tfructokinase activity\t[]\n", "SGD:S000002266\tNone\tNone\tGO:0004693\tcyclin-dependent protein serine/threonine kinase activity\t[]\n", "SGD:S000006043\tNone\tNone\tGO:0008353\tRNA polymerase II CTD heptapeptide repeat kinase activity\t[]\n", "SGD:S000004438\tNone\tNone\tGO:0019158\tmannokinase activity\t[]\n", "SGD:S000000545\tNone\tNone\tGO:0019158\tmannokinase activity\t[]\n", "SGD:S000002656\tNone\tNone\tGO:0046316\tgluconokinase activity\t[]\n", "SGD:S000002924\tNone\tNone\tGO:0004340\tglucokinase activity\t[]\n", "SGD:S000001654\tNone\tNone\tGO:0004683\tcalmodulin-dependent protein kinase activity\t[]\n", "SGD:S000003222\tNone\tNone\tGO:0019158\tmannokinase activity\t[]\n", "SGD:S000000632\tNone\tNone\tGO:0019200\tcarbohydrate kinase activity\t[]\n", "SGD:S000001141\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002915\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000005776\tNone\tNone\tGO:0051731\tpolynucleotide 5'-hydroxyl-kinase activity\t[]\n", "SGD:S000000601\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000003222\tNone\tNone\tGO:0004340\tglucokinase activity\t[]\n", "SGD:S000000612\tNone\tNone\tGO:0019200\tcarbohydrate kinase activity\t[]\n", "SGD:S000006130\tNone\tNone\tGO:0035174\thistone serine kinase activity\t[]\n", "SGD:S000003701\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000003437\tNone\tNone\tGO:0004849\turidine kinase activity\t[]\n", "SGD:S000003818\tNone\tNone\tGO:0004550\tnucleoside diphosphate kinase activity\t[]\n", "SGD:S000002516\tNone\tNone\tGO:0019150\tD-ribulokinase activity\t[]\n", "SGD:S000002691\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001124\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000004965\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003631\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001297\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005242\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001910\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005376\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003324\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001531\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000000015\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000006125\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003593\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001121\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000004086\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000002266\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003942\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003272\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005251\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000002186\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000000931\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003664\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005952\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001357\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001599\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000002373\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000006074\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000004747\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005947\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001072\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000002885\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005963\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000004354\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000000999\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003723\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003700\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001649\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000002655\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001177\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005098\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000000925\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005488\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000000364\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000036\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003818\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000605\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000224\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001949\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003222\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001649\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000001649\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002175\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003700\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002266\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002175\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000006124\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002266\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000006124\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002318\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000002318\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002634\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002634\tNone\tNone\tGO:0019205\tnucleobase-containing compound kinase activity\t[]\n", "SGD:S000002580\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004821\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003664\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000003664\tNone\tNone\tGO:0004713\tprotein tyrosine kinase activity\t[]\n", "SGD:S000002931\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002931\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003623\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000854\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001248\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001248\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001609\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001609\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004086\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003272\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002691\tNone\tNone\tGO:0016301\tkinase activity\t[]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "SGD:S000001507\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001507\tNone\tNone\tGO:0019205\tnucleobase-containing compound kinase activity\t[]\n", "SGD:S000002862\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001297\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000112\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000000112\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005952\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000545\tNone\tNone\tGO:0004396\thexokinase activity\t[]\n", "SGD:S000000545\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005587\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000004615\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004123\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003810\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005251\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000071\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000005127\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005127\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003324\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006074\tNone\tNone\tGO:0004713\tprotein tyrosine kinase activity\t[]\n", "SGD:S000006074\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000301\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000000301\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005376\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000005376\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004230\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001177\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005098\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004833\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000601\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000687\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000529\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000669\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000000669\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003087\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001599\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000001599\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006125\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000015\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003942\tNone\tNone\tGO:0004712\tprotein serine/threonine/tyrosine kinase activity\t[]\n", "SGD:S000003942\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004603\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000004603\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001686\tNone\tNone\tGO:0004430\t1-phosphatidylinositol 4-kinase activity\t[]\n", "SGD:S000001686\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000767\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000105\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001664\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003723\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000003723\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001915\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001915\tNone\tNone\tGO:0016307\tphosphatidylinositol phosphate kinase activity\t[]\n", "SGD:S000003827\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001681\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001654\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000001654\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001651\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001651\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001644\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000001644\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000001644\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001550\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001508\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000004296\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000164\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000184\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000340\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000000340\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004747\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001003\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001075\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001124\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003701\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000931\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006130\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002616\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002616\tNone\tNone\tGO:0016307\tphosphatidylinositol phosphate kinase activity\t[]\n", "SGD:S000006130\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000002259\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005963\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000005211\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001409\tNone\tNone\tGO:0000155\tphosphorelay sensor kinase activity\t[]\n", "SGD:S000001409\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000925\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001357\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001304\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003420\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000006258\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000006258\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000006258\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006135\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003437\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003636\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001861\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003593\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000003593\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003820\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003866\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002237\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000002237\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003051\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000003027\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003494\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005310\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005330\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005200\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005105\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000005105\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004965\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004535\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004535\tNone\tNone\tGO:0050354\ttriokinase activity\t[]\n", "SGD:S000001072\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000871\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003631\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000003631\tNone\tNone\tGO:0004713\tprotein tyrosine kinase activity\t[]\n", "SGD:S000002898\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000999\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000000999\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002604\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001622\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002924\tNone\tNone\tGO:0004396\thexokinase activity\t[]\n", "SGD:S000002924\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002516\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004250\tNone\tNone\tGO:0008481\tsphinganine kinase activity\t[]\n", "SGD:S000004250\tNone\tNone\tGO:0003951\tNAD+ kinase activity\t[]\n", "SGD:S000004438\tNone\tNone\tGO:0004396\thexokinase activity\t[]\n", "SGD:S000004438\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006109\tNone\tNone\tGO:0003951\tNAD+ kinase activity\t[]\n", "SGD:S000003958\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006179\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006157\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\n", "SGD:S000006157\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002325\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002427\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002183\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006071\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005645\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005645\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\n", "SGD:S000005488\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005460\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005697\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005697\tNone\tNone\tGO:0008481\tsphinganine kinase activity\t[]\n", "SGD:S000002915\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005422\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005473\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005496\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005947\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002885\tNone\tNone\tGO:0016301\tkinase activity\t[]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "SGD:S000003664\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000002416\tNone\tNone\tGO:0004335\tgalactokinase activity\t[]\r", "\r\n", "SGD:S000003272\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000004818\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000005878\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000006315\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\r", "\r\n", "SGD:S000000232\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\r", "\r\n", "SGD:S000000478\tNone\tNone\tGO:0004672\tprotein kinase activity\t[]\r", "\r\n", "SGD:S000003426\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000001865\tNone\tNone\tGO:0004674\tprotein serine/threonine kinase activity\t[]\r", "\r\n", "SGD:S000005874\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000002874\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000002554\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000002939\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000005793\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000000972\tNone\tNone\tGO:0004017\tadenylate kinase activity\t[]\r", "\r\n", "SGD:S000000972\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000000972\tNone\tNone\tGO:0019205\tnucleobase-containing compound kinase activity\t[]\r", "\r\n", "SGD:S000002644\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000002655\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n", "SGD:S000002656\tNone\tNone\tGO:0016301\tkinase activity\t[]\r", "\r\n" ] } ], "source": [ "go -g input/gene_association.sgd.gaf -G gaf associations -p i,p \"kinase activity\" -O csv | head -20" ] }, { "cell_type": "markdown", "id": "3991460a-cbd1-4067-9f1a-5a373add960c", "metadata": {}, "source": [ "Note that including part of (`p`) does not make a difference with the MF hierarchy in GO, but does\n", "make a big difference in the other two.\n", "\n", "### Important: closures make a big difference\n", "\n", "Let's compare the number of results with and without closures" ] }, { "cell_type": "code", "execution_count": 10, "id": "823618e3-2881-4e0f-af17-f61a36738aa4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3209 32091 315394\n" ] } ], "source": [ "go -g input/gene_association.sgd.gaf -G gaf associations -p i,p \"kinase activity\" -O csv | wc" ] }, { "cell_type": "code", "execution_count": 11, "id": "f4ded29b-dfd9-44dd-83e1-06379b30d29c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 285 2851 26750\n" ] } ], "source": [ "go -g input/gene_association.sgd.gaf -G gaf associations \"kinase activity\" -O csv | wc" ] }, { "cell_type": "markdown", "id": "57835163-f5f0-44d4-b2ff-56b2c13c3e3d", "metadata": {}, "source": [ "## Complex Queries\n", "\n", "We can use the OAK graph query language to specify exhaustive lists of direct terms.\n", "\n", "For example, not retrieve annotations to any kinase that is not a protein kinase:" ] }, { "cell_type": "code", "execution_count": 12, "id": "93f3bcb4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tpredicate\tobject\tobject_label\tproperty_values\tsubject_label\tpredicate_label\tnegated\tpublications\tprimary_knowledge_source\taggregator_knowledge_source\n", "SGD:S000001369\tNone\tGO:0016301\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000051318|PMID:1322693\tinfores:SGD\tNone\n", "SGD:S000001369\tNone\tGO:0003873\tNone\t\tPFK26\tNone\tNone\tSGD_REF:S000048479|PMID:1657152\tinfores:SGD\tNone\n", "SGD:S000002318\tNone\tGO:0016301\tNone\t\tSTE7\tNone\tNone\tSGD_REF:S000148669\tinfores:UniProt\tNone\n", "SGD:S000000605\tNone\tGO:0004618\tNone\t\tPGK1\tNone\tNone\tSGD_REF:S000058483|PMID:6254992\tinfores:SGD\tNone\n", "SGD:S000000605\tNone\tGO:0004618\tNone\t\tPGK1\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000000605\tNone\tGO:0004618\tNone\t\tPGK1\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000000605\tNone\tGO:0004618\tNone\t\tPGK1\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000000605\tNone\tGO:0004618\tNone\t\tPGK1\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000000605\tNone\tGO:0004618\tNone\t\tPGK1\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000053290|PMID:6088527\tinfores:SGD\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000049877|PMID:6094555\tinfores:SGD\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000130762|PMID:19540237\tinfores:SGD\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000130762|PMID:19540237\tinfores:SGD\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000053290|PMID:6088527\tinfores:SGD\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000042433|PMID:6091111\tinfores:SGD\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000042433|PMID:6091111\tinfores:SGD\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000003818\tNone\tGO:0004798\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000003818\tNone\tGO:0009041\tNone\t\tCDC8\tNone\tNone\tSGD_REF:S000049877|PMID:6094555\tinfores:SGD\tNone\n", "SGD:S000002939\tNone\tGO:0004594\tNone\t\tCAB1\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000002939\tNone\tGO:0004594\tNone\t\tCAB1\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n", "SGD:S000002939\tNone\tGO:0004594\tNone\t\tCAB1\tNone\tNone\tSGD_REF:S000129524|PMID:19266201\tinfores:SGD\tNone\n", "SGD:S000002939\tNone\tGO:0004594\tNone\t\tCAB1\tNone\tNone\tSGD_REF:S000065722|PMID:9890959\tinfores:SGD\tNone\n", "SGD:S000002939\tNone\tGO:0004594\tNone\t\tCAB1\tNone\tNone\tSGD_REF:S000124037\tinfores:UniProt\tNone\n", "SGD:S000002939\tNone\tGO:0004594\tNone\t\tCAB1\tNone\tNone\tSGD_REF:S000124036\tinfores:InterPro\tNone\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "SGD:S000001357\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001304\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006258\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006135\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003437\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003636\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001861\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003593\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003820\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003866\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002237\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003027\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003494\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005310\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005330\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005200\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005105\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004965\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004535\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004535\tNone\tNone\tGO:0050354\ttriokinase activity\t[]\n", "SGD:S000001072\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000871\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002898\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000999\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002604\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000001622\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002924\tNone\tNone\tGO:0004396\thexokinase activity\t[]\n", "SGD:S000002924\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002516\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004250\tNone\tNone\tGO:0008481\tsphinganine kinase activity\t[]\n", "SGD:S000004250\tNone\tNone\tGO:0003951\tNAD+ kinase activity\t[]\n", "SGD:S000004438\tNone\tNone\tGO:0004396\thexokinase activity\t[]\n", "SGD:S000004438\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006109\tNone\tNone\tGO:0003951\tNAD+ kinase activity\t[]\n", "SGD:S000003958\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006179\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006157\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002325\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002427\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002183\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000006071\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005645\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005488\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005460\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005697\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005697\tNone\tNone\tGO:0008481\tsphinganine kinase activity\t[]\n", "SGD:S000002915\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005422\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005473\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005496\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005947\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002885\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003664\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002416\tNone\tNone\tGO:0004335\tgalactokinase activity\t[]\n", "SGD:S000003272\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000004818\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005878\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000003426\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005874\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002874\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002554\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002939\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000005793\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000972\tNone\tNone\tGO:0004017\tadenylate kinase activity\t[]\n", "SGD:S000000972\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000000972\tNone\tNone\tGO:0019205\tnucleobase-containing compound kinase activity\t[]\n", "SGD:S000002644\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002655\tNone\tNone\tGO:0016301\tkinase activity\t[]\n", "SGD:S000002656\tNone\tNone\tGO:0016301\tkinase activity\t[]\n" ] } ], "source": [ "go -g input/gene_association.sgd.gaf -G gaf associations .desc//p=i \"kinase activity\" .not .desc//p=i \"protein kinase activity\" -O csv | head -30" ] }, { "cell_type": "markdown", "id": "0ec3c69d-e090-463c-ab03-c7a91de3719d", "metadata": {}, "source": [ "## Querying via API\n", "\n", "Some association sources provide an API, so rather than downloading an association file, you have OAK speak to the API.\n", "\n", "Note that API endpoints may not support all OAK options; e.g. the amigo endpoint currently forces you to use IDs:" ] }, { "cell_type": "code", "execution_count": 13, "id": "100dff1d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject\tpredicate\tobject\tproperty_values\tsubject_label\tpredicate_label\tobject_label\tnegated\tpublications\tprimary_knowledge_source\taggregator_knowledge_source\n", "UniProtKB:Q13976\tNone\tGO:0004672\t\tPRKG1\tNone\tprotein kinase activity\tNone\tPMID:25447536\tBHF-UCL\tinfores:go\n", "UniProtKB:Q13976\tNone\tGO:0004692\t\tPRKG1\tNone\tcGMP-dependent protein kinase activity\tNone\tPMID:21402151\tUniProt\tinfores:go\n", "UniProtKB:Q13976\tNone\tGO:0004692\t\tPRKG1\tNone\tcGMP-dependent protein kinase activity\tNone\tReactome:R-HSA-418442\tReactome\tinfores:go\n", "UniProtKB:Q13976\tNone\tGO:0106310\t\tPRKG1\tNone\tprotein serine kinase activity\tNone\tGO_REF:0000116\tRHEA\tinfores:go\n", "UniProtKB:Q9HCP0\tNone\tGO:0004674\t\tCSNK1G1\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:25500533\tParkinsonsUK-UCL\tinfores:go\n", "UniProtKB:Q9HCP0\tNone\tGO:0106310\t\tCSNK1G1\tNone\tprotein serine kinase activity\tNone\tGO_REF:0000116\tRHEA\tinfores:go\n", "UniProtKB:Q9HCP0\tNone\tGO:0004674\t\tCSNK1G1\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:21873635\tGO_Central\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0004674\t\tBRSK2\tNone\tprotein serine/threonine kinase activity\tNone\tGO_REF:0000024\tARUK-UCL\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0004674\t\tBRSK2\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:14976552\tUniProt\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0050321\t\tBRSK2\tNone\ttau-protein kinase activity\tNone\tGO_REF:0000024\tUniProt\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0050321\t\tBRSK2\tNone\ttau-protein kinase activity\tNone\tPMID:21985311\tUniProt\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0050321\t\tBRSK2\tNone\ttau-protein kinase activity\tNone\tPMID:28386764\tARUK-UCL\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0106310\t\tBRSK2\tNone\tprotein serine kinase activity\tNone\tGO_REF:0000116\tRHEA\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0050321\t\tBRSK2\tNone\ttau-protein kinase activity\tNone\tPMID:21873635\tGO_Central\tinfores:go\n", "UniProtKB:Q8IWQ3\tNone\tGO:0004674\t\tBRSK2\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:21873635\tGO_Central\tinfores:go\n", "UniProtKB:Q96PF2\tNone\tGO:0004674\t\tTSSK2\tNone\tprotein serine/threonine kinase activity\tNone\tGO_REF:0000024\tUniProt\tinfores:go\n", "UniProtKB:Q96PF2\tNone\tGO:0004674\t\tTSSK2\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:18533145\tUniProt\tinfores:go\n", "UniProtKB:Q96PF2\tNone\tGO:0004674\t\tTSSK2\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:20729278\tUniProt\tinfores:go\n", "UniProtKB:Q96PF2\tNone\tGO:0106310\t\tTSSK2\tNone\tprotein serine kinase activity\tNone\tGO_REF:0000116\tRHEA\tinfores:go\n", "UniProtKB:Q96PF2\tNone\tGO:0004674\t\tTSSK2\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:21873635\tGO_Central\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004672\t\tEIF2AK2\tNone\tprotein kinase activity\tNone\tPMID:12882984\tUniProt\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004672\t\tEIF2AK2\tNone\tprotein kinase activity\tNone\tPMID:15229216\tUniProt\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004672\t\tEIF2AK2\tNone\tprotein kinase activity\tNone\tPMID:18835251\tUniProt\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004672\t\tEIF2AK2\tNone\tprotein kinase activity\tNone\tPMID:21123651\tUniProt\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004672\t\tEIF2AK2\tNone\tprotein kinase activity\tNone\tPMID:248628414\tUniProt\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004674\t\tEIF2AK2\tNone\tprotein serine/threonine kinase activity\tNone\tPMID:1695551\tPINC\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004694\t\tEIF2AK2\tNone\teukaryotic translation initiation factor 2alpha kinase activity\tNone\tPMID:25329545\tUniProt\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0004715\t\tEIF2AK2\tNone\tnon-membrane spanning protein tyrosine kinase activity\tNone\tGO_REF:0000003\tUniProt\tinfores:go\n", "UniProtKB:P19525\tNone\tGO:0016301\t\tEIF2AK2\tNone\tkinase activity\tNone\tPMID:21123651\tUniProt\tinfores:go\n" ] } ], "source": [ "!runoak -i amigo:NCBITaxon:9606 associations -p i,p GO:0016301 | head -30" ] }, { "cell_type": "code", "execution_count": null, "id": "c93dcd70-2d46-41c5-b0ed-3d8655f495dd", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }