Mondo Phenotypes Example (IN PROGRESS)
Help Option
You can get help on any OAK command using --help
[1]:
!runoak enrichment --help
Usage: runoak enrichment [OPTIONS] [TERMS]...
Run class enrichment analysis.
Given a sample file of identifiers (e.g. gene IDs), plus a set of
associations (e.g. gene to term associations, return the terms that are
over-represented in the sample set.
Example:
runoak -i sqlite:obo:uberon -g gene2anat.txt -G g2t enrichment -U my-
genes.txt -O csv
This runs an enrichment using Uberon on my-genes.txt, using the
gene2anat.txt file as the association file (assuming simple gene-to-term
format). The output is in CSV format.
It is recommended you always provide a background set, including all the
entity identifiers considered in the experiment.
You can specify --filter-redundant to filter out redundant terms. This will
block reporting of any terms that are either subsumed by or subsume a lower
p-value term that is already reported.
For a full example, see:
https://github.com/INCATools/ontology-access-
kit/blob/main/notebooks/Commands/Enrichment.ipynb
Note that it is possible to run "pseudo-enrichments" on term lists only by
passing no associations and using --ontology-only. This creates a fake
association set that is simply reflexive relations between each term and
itself. This can be useful for summarizing term lists, but note that
P-values may not be meaningful.
Options:
-o, --output FILENAME Output file, e.g. obo file
-p, --predicates TEXT A comma-separated list of predicates. This
may be a shorthand (i, p) or CURIE
--autolabel / --no-autolabel If set, results will automatically have
labels assigned [default: autolabel]
-O, --output-type TEXT Desired output type
-o, --output FILENAME Output file, e.g. obo file
--ontology-only / --no-ontology-only
If true, perform a pseudo-enrichment
analysis treating each term as an
association to itself. [default: no-
ontology-only]
--cutoff FLOAT The cutoff for the p-value; any p-values
greater than this are not reported.
[default: 0.05]
-U, --sample-file FILENAME file containing input list of entity IDs
(e.g. gene IDs) [required]
-B, --background-file FILENAME file containing background list of entity
IDs (e.g. gene IDs)
--association-predicates TEXT A comma-separated list of predicates for the
association relation
--filter-redundant / --no-filter-redundant
If true, filter out redundant terms
--allow-labels / --no-allow-labels
If true, allow labels as well as CURIEs in
the input files
--help Show this message and exit.
Download example file and setup
We will use the HPO Association file
[3]:
!mkdir -p input
!curl -L -s http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa > input/hpoa.tsv
next we will set up an hpo alias
[4]:
alias hp runoak -i sqlite:obo:hp
[5]:
alias mondo runoak -i sqlite:obo:mondo
Test this out by querying for associations for a particular orpha disease.
We need to pass in the association file we downloaded, as well as specify the file type (with -G
):
[6]:
hp -G hpoa -g input/hpoa.tsv associations -Q subject ORPHA:1899 -O csv | head
subject predicate object object_label property_values subject_label predicate_label negated publications evidence_type supporting_objects primary_knowledge_source aggregator_knowledge_source subject_closure subject_closure_label object_closure object_closure_label comments
ORPHA:1899 None HP:0000963 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0000974 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0001001 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0001252 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0001373 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0001385 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0001387 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0002300 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
ORPHA:1899 None HP:0002381 None Arthrochalasia Ehlers-Danlos syndrome None None None None None
Rollup
Next we will roll up annotations. We choose two representations of the same EDS concept, from Orphanet and OMIM (note we can provide as many diseases as we like).
We will use HPO terms roughly inspired by https://www.omim.org/clinicalSynopsis/130060
[7]:
mondo labels .parents//p=RO:0004003 [ .desc//p=i EDS ] -O csv > output/EDS-genes.tsv
[9]:
!head output/EDS-genes.tsv
id label
HGNC:11976 TNXB
HGNC:1246 C1R
HGNC:1247 C1S
HGNC:17978 B3GALT6
HGNC:18625 FKBP14
HGNC:20859 SLC39A13
HGNC:21144 DSE
HGNC:218 ADAMTS2
HGNC:2188 COL12A1
[10]:
!runoak -i translator: normalize -M NCBIGene [ .parents//p=RO:0004003 [ .desc//p=i EDS ] ]
NotImplementedError
[ ]: