OAK termset-similarity command

This notebook is intended as a supplement to the main OAK CLI docs.

This notebook provides examples for the termset-similarity command, which can be used to do an aggregate comparisons between two sets of terms (term profiles).

Use cases include:

  • comparing two genes based on their GO annotations, or their expression profiles (using Uberon)

  • comparing two patients based on their HPO annotations

  • compare a patient’s HPO profile against a mouse allele using its MP profile, using PhenIO as a background

  • comparing two people based on their favorite bands

Note that this command isn’t aware of the actual associations themselves - it relies on you to assemble the profile.

The command is general and doesn’t make any assumptions about ontology used. The user can control which predicates to use in traversal.

Help Option

You can get help on any OAK command using --help

[1]:
!runoak termset-similarity --help
Usage: runoak termset-similarity [OPTIONS] [TERMS]...

  Termset similarity.

  This calculates a similarity matrix for two sets of terms.

  Example:

      runoak -i go.db termset-similarity -p i,p nucleus membrane @ "nuclear
      membrane" vacuole -p i,p

  Python API:

     https://incatools.github.io/ontology-access-kit/interfaces/semantic-
     similarity

  Data model:

     https://w3id.org/oak/similarity

Options:
  -p, --predicates TEXT         A comma-separated list of predicates. This may
                                be a shorthand (i, p) or CURIE
  -o, --output FILENAME         Output file, e.g. obo file
  -O, --output-type TEXT        Desired output type
  --autolabel / --no-autolabel  If set, results will automatically have labels
                                assigned  [default: autolabel]
  --help                        Show this message and exit.

Set up an alias for HPO

[2]:
alias hp runoak -i sqlite:obo:hp

Compare two phenotype profiles

[3]:
hp termset-similarity "Abnormal liver lobulation" "Focal white matter lesions" @ "Diffuse hepatic steatosis" "Hypoplastic hippocampus"
subject_termset:
  HP:0100752:
    id: HP:0100752
    label: Abnormal liver lobulation
  HP:0007042:
    id: HP:0007042
    label: Focal white matter lesions
object_termset:
  HP:0006555:
    id: HP:0006555
    label: Diffuse hepatic steatosis
  HP:0025517:
    id: HP:0025517
    label: Hypoplastic hippocampus
subject_best_matches:
  HP:0007042:
    match_source: HP:0007042
    score: 6.775984316965229
    similarity:
      subject_id: HP:0007042
      object_id: HP:0025517
      ancestor_id: HP:0100547
      ancestor_label: Abnormal forebrain morphology
      ancestor_information_content: 6.775984316965229
      jaccard_similarity: 0.5
      phenodigm_score: 1.8406499282814792
    match_source_label: Focal white matter lesions
    match_target: HP:0025517
    match_target_label: Hypoplastic hippocampus
  HP:0100752:
    match_source: HP:0100752
    score: 8.632074905566515
    similarity:
      subject_id: HP:0100752
      object_id: HP:0006555
      ancestor_id: HP:0410042
      ancestor_label: Abnormal liver morphology
      ancestor_information_content: 8.632074905566515
      jaccard_similarity: 0.5
      phenodigm_score: 2.0775075096815554
    match_source_label: Abnormal liver lobulation
    match_target: HP:0006555
    match_target_label: Diffuse hepatic steatosis
object_best_matches:
  HP:0006555:
    match_source: HP:0006555
    score: 8.632074905566515
    similarity:
      subject_id: HP:0100752
      object_id: HP:0006555
      ancestor_id: HP:0410042
      ancestor_label: Abnormal liver morphology
      ancestor_information_content: 8.632074905566515
      jaccard_similarity: 0.5
      phenodigm_score: 2.0775075096815554
    match_source_label: Diffuse hepatic steatosis
    match_target: HP:0100752
    match_target_label: Abnormal liver lobulation
  HP:0025517:
    match_source: HP:0025517
    score: 6.775984316965229
    similarity:
      subject_id: HP:0007042
      object_id: HP:0025517
      ancestor_id: HP:0100547
      ancestor_label: Abnormal forebrain morphology
      ancestor_information_content: 6.775984316965229
      jaccard_similarity: 0.5
      phenodigm_score: 1.8406499282814792
    match_source_label: Hypoplastic hippocampus
    match_target: HP:0007042
    match_target_label: Focal white matter lesions
average_score: 7.704029611265872
best_score: 8.632074905566515

Faster comparisons using Rust

OAK has the ability to use semsimian to use a more efficient semantic similarity implementation under the hood

[4]:
!runoak -i semsimian:sqlite:obo:hp termset-similarity -p i "Abnormal liver lobulation" "Focal white matter lesions" @ "Diffuse hepatic steatosis" "Hypoplastic hippocampus"
[00:00:00] Building (all subjects X all objects) pairwise similarity: ████████████████████████████████████████ 100%ing (all subjects X all objects) pairwise similarity: ████████████████████░░░░░░░░░░░░░░░░░░░░ 50%WARNING:root:Adding labels not yet implemented in SemsimianImplementation.
subject_termset:
  HP:0007042:
    id: HP:0007042
    label: Focal white matter lesions
  HP:0100752:
    id: HP:0100752
    label: Abnormal liver lobulation
object_termset:
  HP:0025517:
    id: HP:0025517
    label: Hypoplastic hippocampus
  HP:0006555:
    id: HP:0006555
    label: Diffuse hepatic steatosis
subject_best_matches:
  HP:0007042:
    match_source: HP:0007042
    score: 6.7759382869726945
    similarity:
      subject_id: HP:0007042
      object_id: HP:0025517
      ancestor_id: HP:0100547
      ancestor_label: ''
      ancestor_information_content: 6.7759382869726945
      jaccard_similarity: 0.5
      phenodigm_score: 1.8406436764040854
    match_source_label: Focal white matter lesions
    match_target: HP:0025517
    match_target_label: Hypoplastic hippocampus
  HP:0100752:
    match_source: HP:0100752
    score: 8.632028875573981
    similarity:
      subject_id: HP:0100752
      object_id: HP:0006555
      ancestor_id: HP:0410042
      ancestor_label: ''
      ancestor_information_content: 8.632028875573981
      jaccard_similarity: 0.5
      phenodigm_score: 2.0775019705855855
    match_source_label: Abnormal liver lobulation
    match_target: HP:0006555
    match_target_label: Diffuse hepatic steatosis
object_best_matches:
  HP:0006555:
    match_source: HP:0006555
    score: 8.632028875573981
    similarity:
      subject_id: HP:0006555
      object_id: HP:0100752
      ancestor_id: HP:0410042
      ancestor_label: ''
      ancestor_information_content: 8.632028875573981
      jaccard_similarity: 0.5
      phenodigm_score: 2.0775019705855855
    match_source_label: Diffuse hepatic steatosis
    match_target: HP:0100752
    match_target_label: Abnormal liver lobulation
  HP:0025517:
    match_source: HP:0025517
    score: 6.7759382869726945
    similarity:
      subject_id: HP:0025517
      object_id: HP:0007042
      ancestor_id: HP:0100547
      ancestor_label: ''
      ancestor_information_content: 6.7759382869726945
      jaccard_similarity: 0.5
      phenodigm_score: 1.8406436764040854
    match_source_label: Hypoplastic hippocampus
    match_target: HP:0007042
    match_target_label: Focal white matter lesions
average_score: 7.703983581273338
best_score: 8.632028875573981
metric: ancestor_information_content
[ ]: