OAK termset-similarity command
This notebook is intended as a supplement to the main OAK CLI docs.
This notebook provides examples for the termset-similarity
command, which can be used to do an aggregate comparisons between two sets of terms (term profiles).
Use cases include:
comparing two genes based on their GO annotations, or their expression profiles (using Uberon)
comparing two patients based on their HPO annotations
compare a patient’s HPO profile against a mouse allele using its MP profile, using PhenIO as a background
comparing two people based on their favorite bands
Note that this command isn’t aware of the actual associations themselves - it relies on you to assemble the profile.
The command is general and doesn’t make any assumptions about ontology used. The user can control which predicates to use in traversal.
Help Option
You can get help on any OAK command using --help
[1]:
!runoak termset-similarity --help
Usage: runoak termset-similarity [OPTIONS] [TERMS]...
Termset similarity.
This calculates a similarity matrix for two sets of terms.
Example:
runoak -i go.db termset-similarity -p i,p nucleus membrane @ "nuclear
membrane" vacuole -p i,p
Python API:
https://incatools.github.io/ontology-access-kit/interfaces/semantic-
similarity
Data model:
https://w3id.org/oak/similarity
Options:
-p, --predicates TEXT A comma-separated list of predicates. This may
be a shorthand (i, p) or CURIE
-o, --output FILENAME Output file, e.g. obo file
-O, --output-type TEXT Desired output type
--autolabel / --no-autolabel If set, results will automatically have labels
assigned [default: autolabel]
--help Show this message and exit.
Set up an alias for HPO
[2]:
alias hp runoak -i sqlite:obo:hp
Compare two phenotype profiles
[3]:
hp termset-similarity "Abnormal liver lobulation" "Focal white matter lesions" @ "Diffuse hepatic steatosis" "Hypoplastic hippocampus"
subject_termset:
HP:0100752:
id: HP:0100752
label: Abnormal liver lobulation
HP:0007042:
id: HP:0007042
label: Focal white matter lesions
object_termset:
HP:0006555:
id: HP:0006555
label: Diffuse hepatic steatosis
HP:0025517:
id: HP:0025517
label: Hypoplastic hippocampus
subject_best_matches:
HP:0007042:
match_source: HP:0007042
score: 6.775984316965229
similarity:
subject_id: HP:0007042
object_id: HP:0025517
ancestor_id: HP:0100547
ancestor_label: Abnormal forebrain morphology
ancestor_information_content: 6.775984316965229
jaccard_similarity: 0.5
phenodigm_score: 1.8406499282814792
match_source_label: Focal white matter lesions
match_target: HP:0025517
match_target_label: Hypoplastic hippocampus
HP:0100752:
match_source: HP:0100752
score: 8.632074905566515
similarity:
subject_id: HP:0100752
object_id: HP:0006555
ancestor_id: HP:0410042
ancestor_label: Abnormal liver morphology
ancestor_information_content: 8.632074905566515
jaccard_similarity: 0.5
phenodigm_score: 2.0775075096815554
match_source_label: Abnormal liver lobulation
match_target: HP:0006555
match_target_label: Diffuse hepatic steatosis
object_best_matches:
HP:0006555:
match_source: HP:0006555
score: 8.632074905566515
similarity:
subject_id: HP:0100752
object_id: HP:0006555
ancestor_id: HP:0410042
ancestor_label: Abnormal liver morphology
ancestor_information_content: 8.632074905566515
jaccard_similarity: 0.5
phenodigm_score: 2.0775075096815554
match_source_label: Diffuse hepatic steatosis
match_target: HP:0100752
match_target_label: Abnormal liver lobulation
HP:0025517:
match_source: HP:0025517
score: 6.775984316965229
similarity:
subject_id: HP:0007042
object_id: HP:0025517
ancestor_id: HP:0100547
ancestor_label: Abnormal forebrain morphology
ancestor_information_content: 6.775984316965229
jaccard_similarity: 0.5
phenodigm_score: 1.8406499282814792
match_source_label: Hypoplastic hippocampus
match_target: HP:0007042
match_target_label: Focal white matter lesions
average_score: 7.704029611265872
best_score: 8.632074905566515
Faster comparisons using Rust
OAK has the ability to use semsimian to use a more efficient semantic similarity implementation under the hood
[4]:
!runoak -i semsimian:sqlite:obo:hp termset-similarity -p i "Abnormal liver lobulation" "Focal white matter lesions" @ "Diffuse hepatic steatosis" "Hypoplastic hippocampus"
[00:00:00] Building (all subjects X all objects) pairwise similarity: ████████████████████████████████████████ 100%ing (all subjects X all objects) pairwise similarity: ████████████████████░░░░░░░░░░░░░░░░░░░░ 50%WARNING:root:Adding labels not yet implemented in SemsimianImplementation.
subject_termset:
HP:0007042:
id: HP:0007042
label: Focal white matter lesions
HP:0100752:
id: HP:0100752
label: Abnormal liver lobulation
object_termset:
HP:0025517:
id: HP:0025517
label: Hypoplastic hippocampus
HP:0006555:
id: HP:0006555
label: Diffuse hepatic steatosis
subject_best_matches:
HP:0007042:
match_source: HP:0007042
score: 6.7759382869726945
similarity:
subject_id: HP:0007042
object_id: HP:0025517
ancestor_id: HP:0100547
ancestor_label: ''
ancestor_information_content: 6.7759382869726945
jaccard_similarity: 0.5
phenodigm_score: 1.8406436764040854
match_source_label: Focal white matter lesions
match_target: HP:0025517
match_target_label: Hypoplastic hippocampus
HP:0100752:
match_source: HP:0100752
score: 8.632028875573981
similarity:
subject_id: HP:0100752
object_id: HP:0006555
ancestor_id: HP:0410042
ancestor_label: ''
ancestor_information_content: 8.632028875573981
jaccard_similarity: 0.5
phenodigm_score: 2.0775019705855855
match_source_label: Abnormal liver lobulation
match_target: HP:0006555
match_target_label: Diffuse hepatic steatosis
object_best_matches:
HP:0006555:
match_source: HP:0006555
score: 8.632028875573981
similarity:
subject_id: HP:0006555
object_id: HP:0100752
ancestor_id: HP:0410042
ancestor_label: ''
ancestor_information_content: 8.632028875573981
jaccard_similarity: 0.5
phenodigm_score: 2.0775019705855855
match_source_label: Diffuse hepatic steatosis
match_target: HP:0100752
match_target_label: Abnormal liver lobulation
HP:0025517:
match_source: HP:0025517
score: 6.7759382869726945
similarity:
subject_id: HP:0025517
object_id: HP:0007042
ancestor_id: HP:0100547
ancestor_label: ''
ancestor_information_content: 6.7759382869726945
jaccard_similarity: 0.5
phenodigm_score: 1.8406436764040854
match_source_label: Hypoplastic hippocampus
match_target: HP:0007042
match_target_label: Focal white matter lesions
average_score: 7.703983581273338
best_score: 8.632028875573981
metric: ancestor_information_content
[ ]: