{ "cells": [ { "cell_type": "markdown", "id": "658e80e6-0ce8-4576-97ed-0d95f245748a", "metadata": {}, "source": [ "# OAK termset-similarity command\n", "\n", "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n", "\n", "This notebook provides examples for the `termset-similarity` command, which can be used to do an aggregate comparisons between\n", "two sets of terms (term profiles).\n", "\n", "Use cases include:\n", "\n", "- comparing two genes based on their GO annotations, or their expression profiles (using Uberon)\n", "- comparing two patients based on their HPO annotations\n", "- compare a patient's HPO profile against a mouse allele using its MP profile, using PhenIO as a background\n", "- comparing two people based on their favorite bands\n", "\n", "Note that this command isn't aware of the actual associations themselves - it relies on you to assemble the profile.\n", "\n", "The command is general and doesn't make any assumptions about ontology used. The user can control which predicates to use in traversal.\n", "\n", "## Help Option\n", "\n", "You can get help on any OAK command using `--help`" ] }, { "cell_type": "code", "execution_count": 1, "id": "a89d8429-5175-4eec-a15f-b9920e949831", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak termset-similarity [OPTIONS] [TERMS]...\n", "\n", " Termset similarity.\n", "\n", " This calculates a similarity matrix for two sets of terms.\n", "\n", " Example:\n", "\n", " runoak -i go.db termset-similarity -p i,p nucleus membrane @ \"nuclear\n", " membrane\" vacuole -p i,p\n", "\n", " Python API:\n", "\n", " https://incatools.github.io/ontology-access-kit/interfaces/semantic-\n", " similarity\n", "\n", " Data model:\n", "\n", " https://w3id.org/oak/similarity\n", "\n", "Options:\n", " -p, --predicates TEXT A comma-separated list of predicates. This may\n", " be a shorthand (i, p) or CURIE\n", " -o, --output FILENAME Output file, e.g. obo file\n", " -O, --output-type TEXT Desired output type\n", " --autolabel / --no-autolabel If set, results will automatically have labels\n", " assigned [default: autolabel]\n", " --help Show this message and exit.\n" ] } ], "source": [ "!runoak termset-similarity --help" ] }, { "cell_type": "markdown", "id": "e2645351-06f6-4c89-9125-b38a682adb39", "metadata": {}, "source": [ "## Set up an alias for HPO" ] }, { "cell_type": "code", "execution_count": 2, "id": "c87fdb82", "metadata": {}, "outputs": [], "source": [ "alias hp runoak -i sqlite:obo:hp" ] }, { "cell_type": "markdown", "id": "ba203aeb-d935-49b6-b5bf-b27b55d40f2a", "metadata": {}, "source": [ "## Compare two phenotype profiles" ] }, { "cell_type": "code", "execution_count": 3, "id": "70661358", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject_termset:\n", " HP:0100752:\n", " id: HP:0100752\n", " label: Abnormal liver lobulation\n", " HP:0007042:\n", " id: HP:0007042\n", " label: Focal white matter lesions\n", "object_termset:\n", " HP:0006555:\n", " id: HP:0006555\n", " label: Diffuse hepatic steatosis\n", " HP:0025517:\n", " id: HP:0025517\n", " label: Hypoplastic hippocampus\n", "subject_best_matches:\n", " HP:0007042:\n", " match_source: HP:0007042\n", " score: 6.775984316965229\n", " similarity:\n", " subject_id: HP:0007042\n", " object_id: HP:0025517\n", " ancestor_id: HP:0100547\n", " ancestor_label: Abnormal forebrain morphology\n", " ancestor_information_content: 6.775984316965229\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 1.8406499282814792\n", " match_source_label: Focal white matter lesions\n", " match_target: HP:0025517\n", " match_target_label: Hypoplastic hippocampus\n", " HP:0100752:\n", " match_source: HP:0100752\n", " score: 8.632074905566515\n", " similarity:\n", " subject_id: HP:0100752\n", " object_id: HP:0006555\n", " ancestor_id: HP:0410042\n", " ancestor_label: Abnormal liver morphology\n", " ancestor_information_content: 8.632074905566515\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 2.0775075096815554\n", " match_source_label: Abnormal liver lobulation\n", " match_target: HP:0006555\n", " match_target_label: Diffuse hepatic steatosis\n", "object_best_matches:\n", " HP:0006555:\n", " match_source: HP:0006555\n", " score: 8.632074905566515\n", " similarity:\n", " subject_id: HP:0100752\n", " object_id: HP:0006555\n", " ancestor_id: HP:0410042\n", " ancestor_label: Abnormal liver morphology\n", " ancestor_information_content: 8.632074905566515\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 2.0775075096815554\n", " match_source_label: Diffuse hepatic steatosis\n", " match_target: HP:0100752\n", " match_target_label: Abnormal liver lobulation\n", " HP:0025517:\n", " match_source: HP:0025517\n", " score: 6.775984316965229\n", " similarity:\n", " subject_id: HP:0007042\n", " object_id: HP:0025517\n", " ancestor_id: HP:0100547\n", " ancestor_label: Abnormal forebrain morphology\n", " ancestor_information_content: 6.775984316965229\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 1.8406499282814792\n", " match_source_label: Hypoplastic hippocampus\n", " match_target: HP:0007042\n", " match_target_label: Focal white matter lesions\n", "average_score: 7.704029611265872\n", "best_score: 8.632074905566515\n" ] } ], "source": [ "hp termset-similarity \"Abnormal liver lobulation\" \"Focal white matter lesions\" @ \"Diffuse hepatic steatosis\" \"Hypoplastic hippocampus\"" ] }, { "cell_type": "markdown", "id": "e9839eb8-9dbb-445b-b0d5-168ad92b38b0", "metadata": {}, "source": [ "## Faster comparisons using Rust\n", "\n", "OAK has the ability to use semsimian to use a more efficient semantic similarity implementation under the hood" ] }, { "cell_type": "code", "execution_count": 4, "id": "8838921c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2K\u001b[1A[00:00:00] Building (all subjects X all objects) pairwise similarity: \u001b[36m████████████████████████████████████████\u001b[34m\u001b[0m\u001b[0m 100%\u001b[1Aing (all subjects X all objects) pairwise similarity: \u001b[36m████████████████████░\u001b[34m░░░░░░░░░░░░░░░░░░░\u001b[0m\u001b[0m 50%\u001b[1AWARNING:root:Adding labels not yet implemented in SemsimianImplementation.\n", "subject_termset:\n", " HP:0007042:\n", " id: HP:0007042\n", " label: Focal white matter lesions\n", " HP:0100752:\n", " id: HP:0100752\n", " label: Abnormal liver lobulation\n", "object_termset:\n", " HP:0025517:\n", " id: HP:0025517\n", " label: Hypoplastic hippocampus\n", " HP:0006555:\n", " id: HP:0006555\n", " label: Diffuse hepatic steatosis\n", "subject_best_matches:\n", " HP:0007042:\n", " match_source: HP:0007042\n", " score: 6.7759382869726945\n", " similarity:\n", " subject_id: HP:0007042\n", " object_id: HP:0025517\n", " ancestor_id: HP:0100547\n", " ancestor_label: ''\n", " ancestor_information_content: 6.7759382869726945\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 1.8406436764040854\n", " match_source_label: Focal white matter lesions\n", " match_target: HP:0025517\n", " match_target_label: Hypoplastic hippocampus\n", " HP:0100752:\n", " match_source: HP:0100752\n", " score: 8.632028875573981\n", " similarity:\n", " subject_id: HP:0100752\n", " object_id: HP:0006555\n", " ancestor_id: HP:0410042\n", " ancestor_label: ''\n", " ancestor_information_content: 8.632028875573981\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 2.0775019705855855\n", " match_source_label: Abnormal liver lobulation\n", " match_target: HP:0006555\n", " match_target_label: Diffuse hepatic steatosis\n", "object_best_matches:\n", " HP:0006555:\n", " match_source: HP:0006555\n", " score: 8.632028875573981\n", " similarity:\n", " subject_id: HP:0006555\n", " object_id: HP:0100752\n", " ancestor_id: HP:0410042\n", " ancestor_label: ''\n", " ancestor_information_content: 8.632028875573981\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 2.0775019705855855\n", " match_source_label: Diffuse hepatic steatosis\n", " match_target: HP:0100752\n", " match_target_label: Abnormal liver lobulation\n", " HP:0025517:\n", " match_source: HP:0025517\n", " score: 6.7759382869726945\n", " similarity:\n", " subject_id: HP:0025517\n", " object_id: HP:0007042\n", " ancestor_id: HP:0100547\n", " ancestor_label: ''\n", " ancestor_information_content: 6.7759382869726945\n", " jaccard_similarity: 0.5\n", " phenodigm_score: 1.8406436764040854\n", " match_source_label: Hypoplastic hippocampus\n", " match_target: HP:0007042\n", " match_target_label: Focal white matter lesions\n", "average_score: 7.703983581273338\n", "best_score: 8.632028875573981\n", "metric: ancestor_information_content\n" ] } ], "source": [ "!runoak -i semsimian:sqlite:obo:hp termset-similarity -p i \"Abnormal liver lobulation\" \"Focal white matter lesions\" @ \"Diffuse hepatic steatosis\" \"Hypoplastic hippocampus\"" ] }, { "cell_type": "code", "execution_count": null, "id": "0f43896e", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }