# OAK associations command

This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).

This notebook provides examples for the `associations` command which provides ways of querying [associations](https://incatools.github.io/ontology-access-kit/glossary.html#term-Association).

For more on associations, see [Associations and Curated Annotations](https://incatools.github.io/ontology-access-kit/guide/associations.html) in the OAK guide.

For more on command line usage in general, see the [Command Line Tutorial](https://doi.org/10.5281/zenodo.7708963)

## Help Option

You can get help on any OAK command using `--help`

In [1]:
!runoak associations --help

Usage: runoak associations [OPTIONS] [TERMS]...

 Lookup associations from or to entities.

 Example:

 runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations

 The above will show all associations

 To query using an ontology term, including is-a closure, specify one or more
 terms or term queries, plus the closure predicate(s), e.g.

 Example:

 runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations -p i
 HP:0001392

 This shows all annotations either to "Abnormality of the liver"
 (HP:0001392), or to is-a descendants.

 Using input specifications:

 It can be awkward to specify both input ontology and association path and
 format. You can use input specifications to bundle common combinations of
 inputs together.

 For example, the go-dictybase-input-spec combines go plus dictybase
 associations.

 Example:

 runoak --i src/oaklib/conf/go-dictybase-input-spec.yaml associations -p
 i,p GO:0008104

 More examples:

 https://github.com/INCATools/ontology-access-
 kit/blob/main/no

### Set up an alias

We will set up an alias for running OAK bound to GO for the purposes of this notebook:

In [2]:
alias go runoak -i sqlite:obo:go

In [3]:
go ontology-metadata --all

id:
- obo:go/extensions/go-plus.owl
dce:description:
- The Gene Ontology (GO) provides a framework and set of concepts for describing the
 functions of gene products from all organisms.
dce:title:
- Gene Ontology
dcterms:license:
- 
oio:default-namespace:
- gene_ontology
oio:hasOBOFormatVersion:
- '1.2'
owl:versionIRI:
- obo:go/releases/2023-04-01/extensions/go-plus.owl
owl:versionInfo:
- '2023-04-01'
rdf:type:
- owl:Ontology
sh:prefix:
- obo
schema:url:
- http://purl.obolibrary.org/obo/go/extensions/go-plus.owl
rdfs:isDefinedBy:
- http://purl.obolibrary.org/obo/obo.owl


Check that queries work

In [4]:
go info "kinase activity"

GO:0016301 ! kinase activity


### Query for associations to a gene

Here we will query from a previously downloaded GAF all associations to a gene

In [8]:
go -g input/gene_association.sgd.gaf -G gaf associations -Q subject SGD:S000004294 -O csv | head -20

subject	predicate	object	object_label	property_values	subject_label	predicate_label	negated	publications	primary_knowledge_source	aggregator_knowledge_source
SGD:S000004294	None	GO:0003824	None		MET17	None	None	SGD_REF:S000124036	infores:InterPro	None
SGD:S000004294	None	GO:0003824	None		MET17	None	None	SGD_REF:S000124036	infores:InterPro	None
SGD:S000004294	None	GO:0003824	None		MET17	None	None	SGD_REF:S000148669	infores:UniProt	None
SGD:S000004294	None	GO:0005737	None		MET17	None	None	SGD_REF:S000148669	infores:UniProt	None
SGD:S000004294	None	GO:0005737	None		MET17	None	None	SGD_REF:S000148671	infores:UniProt	None
SGD:S000004294	None	GO:0005737	None		MET17	None	None	SGD_REF:S000069459|PMID:11914276	infores:SGD	None
SGD:S000004294	None	GO:0005737	None		MET17	None	None	SGD_REF:S000069459|PMID:11914276	infores:SGD	None
SGD:S000004294	None	GO:0016765	None		MET17	None	None	SGD_REF:S000124036	infores:InterPro	None
SGD:S000004294	None	GO:0030170	None		MET17	None	None	SGD_REF:S000124036	inf

## Query for associations to a term

In contrast to gene queries, we want to make use of [ontology relationships](https://incatools.github.io/ontology-access-kit/guide/relationships-and-graphs.html) - in particular we typically want to include all is-a and part-of descendants in our query

In [9]:
go -g input/gene_association.sgd.gaf -G gaf associations -p i,p "kinase activity" -O csv | head -20

subject	predicate	object	object_label	property_values	subject_label	predicate_label	negated	publications	primary_knowledge_source	aggregator_knowledge_source
SGD:S000001369	None	GO:0016301	None		PFK26	None	None	SGD_REF:S000148669	infores:UniProt	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000124037	infores:UniProt	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000124036	infores:InterPro	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000051318|PMID:1322693	infores:SGD	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000048479|PMID:1657152	infores:SGD	None
SGD:S000002318	None	GO:0004708	None		STE7	None	None	SGD_REF:S000041791|PMID:8668180	infores:SGD	None
SGD:S000002318	None	GO:0004708	None		STE7	None	None	SGD_REF:S000045748|PMID:8384702	infores:SGD	None
SGD:S000003272	None	GO:0004707	None		KSS1	None	None	SGD_REF:S000041791|PMID:8668180	infores:SGD	None
SGD:S000003272	None	GO:0004707	None		KSS1	None	None	SGD

SGD:S000003820	None	None	GO:0004674	protein serine/threonine kinase activity	[]
SGD:S000002394	None	None	GO:0009927	histidine phosphotransfer kinase activity	[]
SGD:S000001644	None	None	GO:0004693	cyclin-dependent protein serine/threonine kinase activity	[]
SGD:S000004710	None	None	GO:0004674	protein serine/threonine kinase activity	[]
SGD:S000001949	None	None	GO:0008865	fructokinase activity	[]
SGD:S000003607	None	None	GO:0003991	acetylglutamate kinase activity	[]
SGD:S000001075	None	None	GO:0004349	glutamate 5-kinase activity	[]
SGD:S000002924	None	None	GO:0019158	mannokinase activity	[]
SGD:S000003509	None	None	GO:0004140	dephospho-CoA kinase activity	[]
SGD:S000004438	None	None	GO:0008865	fructokinase activity	[]
SGD:S000001651	None	None	GO:0004674	protein serine/threonine kinase activity	[]
SGD:S000001681	None	None	GO:0004674	protein serine/threonine kinase activity	[]
SGD:S000002427	None	None	GO:0004849	uridine kinase activity	[]
SGD:S000003866	None	None	GO:0019200	carbohydrate k

SGD:S000001507	None	None	GO:0016301	kinase activity	[]
SGD:S000001507	None	None	GO:0019205	nucleobase-containing compound kinase activity	[]
SGD:S000002862	None	None	GO:0016301	kinase activity	[]
SGD:S000001297	None	None	GO:0016301	kinase activity	[]
SGD:S000000112	None	None	GO:0004674	protein serine/threonine kinase activity	[]
SGD:S000000112	None	None	GO:0016301	kinase activity	[]
SGD:S000005952	None	None	GO:0016301	kinase activity	[]
SGD:S000000545	None	None	GO:0004396	hexokinase activity	[]
SGD:S000000545	None	None	GO:0016301	kinase activity	[]
SGD:S000005587	None	None	GO:0004672	protein kinase activity	[]
SGD:S000004615	None	None	GO:0016301	kinase activity	[]
SGD:S000004123	None	None	GO:0016301	kinase activity	[]
SGD:S000003810	None	None	GO:0016301	kinase activity	[]
SGD:S000005251	None	None	GO:0016301	kinase activity	[]
SGD:S000000071	None	None	GO:0004674	protein serine/threonine kinase activity	[]
SGD:S000005127	None	None	GO:0004672	protein kinase activity	[]
SGD:S000005127	None

SGD:S000003664	None	None	GO:0016301	kinase activity	[]
SGD:S000002416	None	None	GO:0004335	galactokinase activity	[]
SGD:S000003272	None	None	GO:0016301	kinase activity	[]
SGD:S000004818	None	None	GO:0016301	kinase activity	[]
SGD:S000005878	None	None	GO:0016301	kinase activity	[]
SGD:S000006315	None	None	GO:0004672	protein kinase activity	[]
SGD:S000000232	None	None	GO:0004672	protein kinase activity	[]
SGD:S000000478	None	None	GO:0004672	protein kinase activity	[]
SGD:S000003426	None	None	GO:0016301	kinase activity	[]
SGD:S000001865	None	None	GO:0004674	protein serine/threonine kinase activity	[]
SGD:S000005874	None	None	GO:0016301	kinase activity	[]
SGD:S000002874	None	None	GO:0016301	kinase activity	[]
SGD:S000002554	None	None	GO:0016301	kinase activity	[]
SGD:S000002939	None	None	GO:0016301	kinase activity	[]
SGD:S000005793	None	None	GO:0016301	kinase activity	[]
SGD:S000000972	None	None	GO:0004017	adenylate kinase activity	[]
SGD:S000000972	None	No

Note that including part of (`p`) does not make a difference with the MF hierarchy in GO, but does
make a big difference in the other two.

### Important: closures make a big difference

Let's compare the number of results with and without closures

In [10]:
go -g input/gene_association.sgd.gaf -G gaf associations -p i,p "kinase activity" -O csv | wc

 3209 32091 315394


In [11]:
go -g input/gene_association.sgd.gaf -G gaf associations "kinase activity" -O csv | wc

 285 2851 26750


## Complex Queries

We can use the OAK graph query language to specify exhaustive lists of direct terms.

For example, not retrieve annotations to any kinase that is not a protein kinase:

In [12]:
go -g input/gene_association.sgd.gaf -G gaf associations .desc//p=i "kinase activity" .not .desc//p=i "protein kinase activity" -O csv | head -30

subject	predicate	object	object_label	property_values	subject_label	predicate_label	negated	publications	primary_knowledge_source	aggregator_knowledge_source
SGD:S000001369	None	GO:0016301	None		PFK26	None	None	SGD_REF:S000148669	infores:UniProt	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000124037	infores:UniProt	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000124036	infores:InterPro	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000051318|PMID:1322693	infores:SGD	None
SGD:S000001369	None	GO:0003873	None		PFK26	None	None	SGD_REF:S000048479|PMID:1657152	infores:SGD	None
SGD:S000002318	None	GO:0016301	None		STE7	None	None	SGD_REF:S000148669	infores:UniProt	None
SGD:S000000605	None	GO:0004618	None		PGK1	None	None	SGD_REF:S000058483|PMID:6254992	infores:SGD	None
SGD:S000000605	None	GO:0004618	None		PGK1	None	None	SGD_REF:S000124036	infores:InterPro	None
SGD:S000000605	None	GO:0004618	None		PGK1	None	None	SGD_REF:S000124036	i

SGD:S000001357	None	None	GO:0016301	kinase activity	[]
SGD:S000001304	None	None	GO:0016301	kinase activity	[]
SGD:S000006258	None	None	GO:0016301	kinase activity	[]
SGD:S000006135	None	None	GO:0016301	kinase activity	[]
SGD:S000003437	None	None	GO:0016301	kinase activity	[]
SGD:S000003636	None	None	GO:0016301	kinase activity	[]
SGD:S000001861	None	None	GO:0016301	kinase activity	[]
SGD:S000003593	None	None	GO:0016301	kinase activity	[]
SGD:S000003820	None	None	GO:0016301	kinase activity	[]
SGD:S000003866	None	None	GO:0016301	kinase activity	[]
SGD:S000002237	None	None	GO:0016301	kinase activity	[]
SGD:S000003027	None	None	GO:0016301	kinase activity	[]
SGD:S000003494	None	None	GO:0016301	kinase activity	[]
SGD:S000005310	None	None	GO:0016301	kinase activity	[]
SGD:S000005330	None	None	GO:0016301	kinase activity	[]
SGD:S000005200	None	None	GO:0016301	kinase activity	[]
SGD:S000005105	None	None	GO:0016301	kinase activity	[]
SGD:S000004965	None	None	GO:0016301	kinase activity	[]
SGD:S00000

## Querying via API

Some association sources provide an API, so rather than downloading an association file, you have OAK speak to the API.

Note that API endpoints may not support all OAK options; e.g. the amigo endpoint currently forces you to use IDs:

In [13]:
!runoak -i amigo:NCBITaxon:9606 associations -p i,p GO:0016301 | head -30

subject	predicate	object	property_values	subject_label	predicate_label	object_label	negated	publications	primary_knowledge_source	aggregator_knowledge_source
UniProtKB:Q13976	None	GO:0004672		PRKG1	None	protein kinase activity	None	PMID:25447536	BHF-UCL	infores:go
UniProtKB:Q13976	None	GO:0004692		PRKG1	None	cGMP-dependent protein kinase activity	None	PMID:21402151	UniProt	infores:go
UniProtKB:Q13976	None	GO:0004692		PRKG1	None	cGMP-dependent protein kinase activity	None	Reactome:R-HSA-418442	Reactome	infores:go
UniProtKB:Q13976	None	GO:0106310		PRKG1	None	protein serine kinase activity	None	GO_REF:0000116	RHEA	infores:go
UniProtKB:Q9HCP0	None	GO:0004674		CSNK1G1	None	protein serine/threonine kinase activity	None	PMID:25500533	ParkinsonsUK-UCL	infores:go
UniProtKB:Q9HCP0	None	GO:0106310		CSNK1G1	None	protein serine kinase activity	None	GO_REF:0000116	RHEA	infores:go
UniProtKB:Q9HCP0	None	GO:0004674		CSNK1G1	None	protein serine/threonine kinase activity	None	PMID:21873635	GO_Central	inf