Mondo Phenotypes Example (IN PROGRESS)

Help Option

You can get help on any OAK command using --help

[1]:
!runoak enrichment --help
Usage: runoak enrichment [OPTIONS] [TERMS]...

  Run class enrichment analysis.

  Given a sample file of identifiers (e.g. gene IDs), plus a set of
  associations (e.g. gene to term associations, return the terms that are
  over-represented in the sample set.

  Example:

      runoak -i sqlite:obo:uberon -g gene2anat.txt -G g2t enrichment -U my-
      genes.txt -O csv

  This runs an enrichment using Uberon on my-genes.txt, using the
  gene2anat.txt file as the association file (assuming simple gene-to-term
  format). The output is in CSV format.

  It is recommended you always provide a background set, including all the
  entity identifiers considered in the experiment.

  You can specify --filter-redundant to filter out redundant terms. This will
  block reporting of any terms that are either subsumed by or subsume a lower
  p-value term that is already reported.

  For a full example, see:

     https://github.com/INCATools/ontology-access-
     kit/blob/main/notebooks/Commands/Enrichment.ipynb

  Note that it is possible to run "pseudo-enrichments" on term lists only by
  passing no associations and using --ontology-only. This creates a fake
  association set that is simply reflexive relations between each term and
  itself. This can be useful for summarizing term lists, but note that
  P-values may not be meaningful.

Options:
  -o, --output FILENAME           Output file, e.g. obo file
  -p, --predicates TEXT           A comma-separated list of predicates. This
                                  may be a shorthand (i, p) or CURIE
  --autolabel / --no-autolabel    If set, results will automatically have
                                  labels assigned  [default: autolabel]
  -O, --output-type TEXT          Desired output type
  -o, --output FILENAME           Output file, e.g. obo file
  --ontology-only / --no-ontology-only
                                  If true, perform a pseudo-enrichment
                                  analysis treating each term as an
                                  association to itself.  [default: no-
                                  ontology-only]
  --cutoff FLOAT                  The cutoff for the p-value; any p-values
                                  greater than this are not reported.
                                  [default: 0.05]
  -U, --sample-file FILENAME      file containing input list of entity IDs
                                  (e.g. gene IDs)  [required]
  -B, --background-file FILENAME  file containing background list of entity
                                  IDs (e.g. gene IDs)
  --association-predicates TEXT   A comma-separated list of predicates for the
                                  association relation
  --filter-redundant / --no-filter-redundant
                                  If true, filter out redundant terms
  --allow-labels / --no-allow-labels
                                  If true, allow labels as well as CURIEs in
                                  the input files
  --help                          Show this message and exit.

Download example file and setup

We will use the HPO Association file

[3]:
!mkdir -p input
!curl -L -s http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa > input/hpoa.tsv

next we will set up an hpo alias

[4]:
alias hp runoak -i sqlite:obo:hp
[5]:
alias mondo runoak -i sqlite:obo:mondo

Test this out by querying for associations for a particular orpha disease.

We need to pass in the association file we downloaded, as well as specify the file type (with -G):

[6]:
hp -G hpoa -g input/hpoa.tsv associations -Q subject ORPHA:1899 -O csv | head
subject predicate       object  object_label    property_values subject_label   predicate_label negated publications    evidence_type   supporting_objects      primary_knowledge_source        aggregator_knowledge_source     subject_closure subject_closure_label   object_closure  object_closure_label    comments
ORPHA:1899      None    HP:0000963      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0000974      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0001001      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0001252      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0001373      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0001385      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0001387      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0002300      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None
ORPHA:1899      None    HP:0002381      None            Arthrochalasia Ehlers-Danlos syndrome   None    None            None            None    None

Rollup

Next we will roll up annotations. We choose two representations of the same EDS concept, from Orphanet and OMIM (note we can provide as many diseases as we like).

We will use HPO terms roughly inspired by https://www.omim.org/clinicalSynopsis/130060

[7]:
mondo labels .parents//p=RO:0004003 [ .desc//p=i EDS ] -O csv > output/EDS-genes.tsv
[9]:
!head output/EDS-genes.tsv
id      label
HGNC:11976      TNXB
HGNC:1246       C1R
HGNC:1247       C1S
HGNC:17978      B3GALT6
HGNC:18625      FKBP14
HGNC:20859      SLC39A13
HGNC:21144      DSE
HGNC:218        ADAMTS2
HGNC:2188       COL12A1
[10]:
!runoak -i translator: normalize -M NCBIGene [ .parents//p=RO:0004003 [ .desc//p=i EDS ] ]
NotImplementedError
[ ]: