{ "cells": [ { "cell_type": "markdown", "id": "f73ba39a", "metadata": {}, "source": [ "# COMPLoinc Example\n", "\n", "See https://github.com/loinc/comp-loinc for a description of the COMPLoinc project. This\n", "creates an OWL version of LOINC that can be used in OAK to explore relationships between codes\n", "and components.\n", "\n", "Currently this notebook largely only uses the command line functionality of OAK. It\n", "should therefore be accessible to non-programmers (although it helps to have\n", "a good understanding of the command line, and some advanced OAK query concepts are introduced).\n", "\n", "In future we may extend this notebook to have Python examples" ] }, { "cell_type": "markdown", "id": "38d0f1e1", "metadata": {}, "source": [ "## Creating an alias\n", "\n", "First we create an alias `comploinc` for the [runoak command](https://incatools.github.io/ontology-access-kit/cli.html#runoak) using a sqlite selector for the comploinc resource" ] }, { "cell_type": "code", "execution_count": 1, "id": "75a54fa8", "metadata": {}, "outputs": [], "source": [ "%alias comploinc runoak -i sqlite:obo:comploinc" ] }, { "cell_type": "markdown", "id": "15e0d701", "metadata": {}, "source": [ "## Basic Lookup\n", "\n", "The info command can be used to do lookup for a given entity or set of entities. These can be specified as\n", "lists of CURIEs or labels on the command line after the `info` command:" ] }, { "cell_type": "code", "execution_count": 2, "id": "9a4d62f0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:\r\n" ] } ], "source": [ "comploinc info loinc:11145-0" ] }, { "cell_type": "markdown", "id": "a301dee8", "metadata": {}, "source": [ "## Graph Visualization\n", "\n", "Here we show the `viz` command with a single term (multiple terms can be passed, but we illustrate\n", "with one for now)" ] }, { "cell_type": "code", "execution_count": 7, "id": "8754a950", "metadata": {}, "outputs": [], "source": [ "comploinc viz loinc:11145-0 -o output/loinc-11145-0.png" ] }, { "cell_type": "markdown", "id": "886678a3", "metadata": {}, "source": [ "![img](output/loinc-11145-0.png)" ] }, { "cell_type": "markdown", "id": "5d176d09", "metadata": {}, "source": [ "## Lookup by label\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "878fe6fd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:LP14693-3 ! Serotonin\r\n" ] } ], "source": [ "comploinc info Serotonin" ] }, { "cell_type": "code", "execution_count": 5, "id": "7e67e216", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:LP14693-3 ! Serotonin\r\n", "loinc:LP15097-6 ! 5-Hydroxyindoleacetate\r\n", "loinc:LP36806-5 ! 5-Hydroxyindoleacetate & Creatinine\r\n" ] } ], "source": [ "comploinc descendants -p i Serotonin" ] }, { "cell_type": "code", "execution_count": 6, "id": "f35df0e3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:\r\n", "loinc:12172-3 ! 5-Hydroxyindoleacetate:PrThr:24H:Urine:Ord:\r\n", "loinc:14573-0 ! 5-Hydroxyindoleacetate:SRat:24H:Urine:Qn:\r\n", "loinc:14910-4 ! Serotonin:SCnc:Pt:Ser:Qn:\r\n", "loinc:15009-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Urine:Qn:\r\n", "loinc:1692-3 ! 5-Hydroxyindoleacetate:MCnc:Pt:CSF:Qn:\r\n", "loinc:1693-1 ! 5-Hydroxyindoleacetate:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:1694-9 ! 5-Hydroxyindoleacetate:MCnc:Pt:Urine:Qn:\r\n", "loinc:1695-6 ! 5-Hydroxyindoleacetate:MRat:24H:Urine:Qn:\r\n", "loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:\r\n", "loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:\r\n", "loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:\r\n", "loinc:25524-0 ! Serotonin:SCnc:Pt:Bld:Qn:\r\n", "loinc:25971-3 ! 5-Hydroxyindoleacetate:SCnc:24H:Urine:Qn:\r\n", "loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:\r\n", "loinc:26035-6 ! Serotonin:SCnc:Pt:Plas:Qn:\r\n", "loinc:27057-9 ! Serotonin:MCnc:Pt:Ser:Qn:\r\n", "loinc:2939-7 ! Serotonin:MCnc:Pt:Bld:Qn:\r\n", "loinc:2940-5 ! Serotonin:MCnc:Pt:Plas:Qn:\r\n", "loinc:2941-3 ! Serotonin:MCnc:Pt:Platelets:Qn:\r\n", "loinc:29520-4 ! 5-Hydroxyindoleacetate/Creatinine:SRto:Pt:Urine:Qn:\r\n", "loinc:31203-3 ! 5-Hydroxyindoleacetate:MCnc:24H:Urine:Qn:\r\n", "loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:\r\n", "loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:\r\n", "loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:\r\n", "loinc:42671-8 ! Serotonin:EntMass:Pt:Platelets:Qn:\r\n", "loinc:44288-9 ! 5-Hydroxyindoleacetate/Creatinine:MRto:24H:Urine:Qn:\r\n", "loinc:44909-0 ! 5-Hydroxyindoleacetate & Creatinine:Imp:Pt:Urine:Nom:\r\n", "loinc:47544-2 ! 5-Hydroxyindoleacetate:SCnc:Pt:CSF:Qn:\r\n", "loinc:47545-9 ! 5-Hydroxyindoleacetate/Creatinine:SRto:24H:Urine:Qn:\r\n", "loinc:48168-9 ! 5-Hydroxyindoleacetate:PrThr:Pt:Urine:Ord:\r\n", "loinc:50149-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:\r\n", "loinc:71804-9 ! Serotonin:EntSub:Pt:Platelets:Qn:\r\n", "loinc:74769-1 ! 5-Hydroxyindoleacetate:SCnc:Pt:PRP:Qn:\r\n", "loinc:LP14693-3 ! Serotonin\r\n", "loinc:LP15097-6 ! 5-Hydroxyindoleacetate\r\n", "loinc:LP36806-5 ! 5-Hydroxyindoleacetate & Creatinine\r\n" ] } ], "source": [ "comploinc descendants -p i,loinc:hasComponent Serotonin" ] }, { "cell_type": "markdown", "id": "efdd8b25", "metadata": {}, "source": [ "## Text Search" ] }, { "cell_type": "code", "execution_count": 7, "id": "23639579", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:LP14693-3 ! Serotonin\r\n", "loinc:14910-4 ! Serotonin:SCnc:Pt:Ser:Qn:\r\n", "loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:\r\n", "loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:\r\n", "loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:\r\n", "loinc:25524-0 ! Serotonin:SCnc:Pt:Bld:Qn:\r\n", "loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:\r\n", "loinc:26035-6 ! Serotonin:SCnc:Pt:Plas:Qn:\r\n", "loinc:27057-9 ! Serotonin:MCnc:Pt:Ser:Qn:\r\n", "loinc:2939-7 ! Serotonin:MCnc:Pt:Bld:Qn:\r\n", "loinc:2940-5 ! Serotonin:MCnc:Pt:Plas:Qn:\r\n", "loinc:2941-3 ! Serotonin:MCnc:Pt:Platelets:Qn:\r\n", "loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:\r\n", "loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:\r\n", "loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:\r\n", "loinc:42671-8 ! Serotonin:EntMass:Pt:Platelets:Qn:\r\n", "loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:\r\n", "loinc:71804-9 ! Serotonin:EntSub:Pt:Platelets:Qn:\r\n" ] } ], "source": [ "comploinc info l~Serotonin" ] }, { "cell_type": "markdown", "id": "90d8522e", "metadata": {}, "source": [ "## Boolean Graph Queries\n", "\n", "Find all codes that have a component of \"Serotonin\" and has a system of \"Urine\"" ] }, { "cell_type": "code", "execution_count": 8, "id": "d8a5d547", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:15009-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Urine:Qn:\n", "loinc:1695-6 ! 5-Hydroxyindoleacetate:MRat:24H:Urine:Qn:\n", "loinc:44909-0 ! 5-Hydroxyindoleacetate & Creatinine:Imp:Pt:Urine:Nom:\n", "loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:\n", "loinc:25971-3 ! 5-Hydroxyindoleacetate:SCnc:24H:Urine:Qn:\n", "loinc:31203-3 ! 5-Hydroxyindoleacetate:MCnc:24H:Urine:Qn:\n", "loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:\n", "loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:\n", "loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:\n", "loinc:29520-4 ! 5-Hydroxyindoleacetate/Creatinine:SRto:Pt:Urine:Qn:\n", "loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:\n", "loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:\n", "loinc:44288-9 ! 5-Hydroxyindoleacetate/Creatinine:MRto:24H:Urine:Qn:\n", "loinc:12172-3 ! 5-Hydroxyindoleacetate:PrThr:24H:Urine:Ord:\n", "loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:\n", "loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:\n", "loinc:48168-9 ! 5-Hydroxyindoleacetate:PrThr:Pt:Urine:Ord:\n", "loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:\n", "loinc:1694-9 ! 5-Hydroxyindoleacetate:MCnc:Pt:Urine:Qn:\n", "loinc:47545-9 ! 5-Hydroxyindoleacetate/Creatinine:SRto:24H:Urine:Qn:\n", "loinc:14573-0 ! 5-Hydroxyindoleacetate:SRat:24H:Urine:Qn:\n" ] } ], "source": [ "comploinc info .descendant//p=i,loinc:hasComponent Serotonin .and .descendant//p=i,loinc:hasSystem Urine" ] }, { "cell_type": "markdown", "id": "873a3651", "metadata": {}, "source": [ "## Semantic Similarity (Term Wise)" ] }, { "cell_type": "code", "execution_count": 16, "id": "acbf1cc6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak similarity [OPTIONS] [TERMS]...\r\n", "\r\n", " All by all similarity\r\n", "\r\n", " This calculates a similarity matrix for two sets of terms.\r\n", "\r\n", " Input sets of a terms can be specified in different ways:\r\n", "\r\n", " - via a file - via explicit lists of terms or queries\r\n", "\r\n", " Example:\r\n", "\r\n", " runoak -i hp.db all-similarity -p i --set1-file HPO-TERMS1 --set2-file\r\n", " HPO-TERMS2 -O csv\r\n", "\r\n", " This will compare every term in TERMS1 vs TERMS2\r\n", "\r\n", " Alternatively standard OAK term queries can be used, with \"@\" separating the\r\n", " two lists\r\n", "\r\n", " Example:\r\n", "\r\n", " runoak -i hp.db all-similarity -p i TERM_1 TERM_2 ... TERM_N @ TERM_N+1\r\n", " ... TERM_M\r\n", "\r\n", " The .all term syntax can be used to select all terms in an ontology\r\n", "\r\n", " Example:\r\n", "\r\n", " runoak -i ma.db all-similarity -p i,p .all @ .all\r\n", "\r\n", " This can be mixed with other term selectors; for example to calculate the\r\n", " similarity of \"neuron\" vs all terms in CL:\r\n", "\r\n", " runoak -i cl.db all-similarity -p i,p .all @ neuron\r\n", "\r\n", " An example pipeline to do all by all over all phenotypes in HPO:\r\n", "\r\n", " Explicit:\r\n", "\r\n", " runoak -i hp.db descendants -p i HP:0000118 > HPO runoak -i hp.db\r\n", " all-similarity -p i --set1-file HPO --set2-file HPO -O csv -o\r\n", " RESULTS.tsv\r\n", "\r\n", " The same thing can be done more compactly with term queries:\r\n", "\r\n", " runoak -i hp.db all-similarity -p i .desc//p=i HP:0000118 @ .desc//p=i\r\n", " HP:0000118\r\n", "\r\n", "Options:\r\n", " -p, --predicates TEXT A comma-separated list of predicates\r\n", " --set1-file TEXT ID file for set1\r\n", " --set2-file TEXT ID file for set2\r\n", " --jaccard-minimum FLOAT Minimum value for jaccard score\r\n", " --ic-minimum FLOAT Minimum value for information content\r\n", " -o, --output TEXT path to output\r\n", " --main-score-field TEXT Score used for summarization [default:\r\n", " phenodigm_score]\r\n", " --autolabel / --no-autolabel If set, results will automatically have labels\r\n", " assigned [default: autolabel]\r\n", " -O, --output-type TEXT Desired output type\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "comploinc similarity --help" ] }, { "cell_type": "code", "execution_count": 15, "id": "2cfa585e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ancestor_id: loinc:LP65098-3\r\n", "ancestor_information_content: 6.066089190457772\r\n", "ancestor_label: Sugar\r\n", "jaccard_similarity: 0.5\r\n", "object_id: loinc:3134-4\r\n", "object_label: 'Xylose:MCnc:Pt:Bld:Qn:'\r\n", "phenodigm_score: 1.7415638361050352\r\n", "subject_id: loinc:2341-6\r\n", "subject_label: Glucose:MCnc:Pt:Bld:Qn:Test strip manual\r\n" ] } ], "source": [ "comploinc similarity loinc:2341-6 @ loinc:3134-4" ] }, { "cell_type": "markdown", "id": "592e74fd", "metadata": {}, "source": [ "## Value Sets\n", "\n", "The COMPLoinc project doesn't define any value sets. Here we just use two random hardcoded ones\n", "for illustration purposes" ] }, { "cell_type": "code", "execution_count": 2, "id": "0a995ad7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:5914-7 ! Glucose:PrThr:Pt:Bld:Ord:Test strip\r\n", "loinc:2339-0 ! Glucose:MCnc:Pt:Bld:Qn:\r\n", "loinc:50216-1 ! Glucose^6th specimen:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:6777-7 ! Glucose:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:77145-1 ! Glucose^post CFst:SCnc:Pt:Ser/Plas/Bld:Qn:\r\n", "loinc:54085-6 ! Galactose:SCnc:Pt:Bld.dot:Qn:\r\n", "loinc:50218-7 ! Glucose^9th specimen:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:25426-8 ! Galactose:SCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:32016-8 ! Glucose:MCnc:Pt:BldC:Qn:\r\n", "loinc:77135-2 ! Glucose:SCnc:Pt:Ser/Plas/Bld:Qn:\r\n", "loinc:2307-7 ! Galactose:MCnc:Pt:Bld:Qn:\r\n" ] } ], "source": [ "comploinc info .idfile input/valueset1.txt" ] }, { "cell_type": "code", "execution_count": 3, "id": "edf8273e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loinc:54495-7 ! Glucose^post dialysis:SCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:2308-5 ! Galactose:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:76629-5 ! Glucose^post CFst:SCnc:Pt:Bld:Qn:\r\n", "loinc:27353-2 ! Estimated average glucose:MCnc:Pt:Bld:Qn:Estimated from glycated hemoglobin\r\n", "loinc:2552-8 ! Lactose:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:12611-0 ! Glucose^4H specimen:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:51596-5 ! Glucose:SCnc:Pt:BldC:Qn:\r\n", "loinc:93791-2 ! Glucose:MCnc:Stdy^mean:Ser/Plas:Qn:\r\n", "loinc:50215-3 ! Glucose^5th specimen:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:50208-8 ! Glucose^10th specimen:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:3134-4 ! Xylose:MCnc:Pt:Bld:Qn:\r\n", "loinc:29999-0 ! Xylose:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:5914-7 ! Glucose:PrThr:Pt:Bld:Ord:Test strip\r\n", "loinc:2339-0 ! Glucose:MCnc:Pt:Bld:Qn:\r\n", "loinc:50216-1 ! Glucose^6th specimen:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:6777-7 ! Glucose:MCnc:Pt:Ser/Plas:Qn:\r\n", "loinc:77145-1 ! Glucose^post CFst:SCnc:Pt:Ser/Plas/Bld:Qn:\r\n" ] } ], "source": [ "comploinc info .idfile input/valueset2.txt" ] }, { "cell_type": "code", "execution_count": 2, "id": "08959cfe", "metadata": {}, "outputs": [], "source": [ "comploinc termset-similarity .idfile input/valueset1.txt @ .idfile input/valueset2.txt -o output/sim-out.yaml" ] }, { "cell_type": "code", "execution_count": 4, "id": "811ee466", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "average_score: 9.623542661061256\r\n", "best_score: 13.738514532429267\r\n", "object_best_matches:\r\n", " loinc:12611-0:\r\n", " match_source: loinc:12611-0\r\n", " match_source_label: 'Glucose^4H specimen:MCnc:Pt:Ser/Plas:Qn:'\r\n", " match_target: loinc:2339-0\r\n", " match_target_label: 'Glucose:MCnc:Pt:Bld:Qn:'\r\n", " score: 6.519346011967107\r\n", " similarity:\r\n", " ancestor_id: loinc:LP14635-4\r\n", " ancestor_information_content: 6.519346011967107\r\n", " ancestor_label: Glucose\r\n", " jaccard_similarity: 0.84\r\n", " object_id: loinc:12611-0\r\n", " phenodigm_score: 2.3401390236591437\r\n", " subject_id: loinc:2339-0\r\n", " loinc:2308-5:\r\n", " match_source: loinc:2308-5\r\n", " match_source_label: 'Galactose:MCnc:Pt:Ser/Plas:Qn:'\r\n" ] } ], "source": [ "!head -20 output/sim-out.yaml" ] }, { "cell_type": "markdown", "id": "d19f3b9a", "metadata": {}, "source": [ "## Logical Definitions\n", "\n", "Currently these are invisible - best way to fix this is to address:\n", "\n", "https://github.com/loinc/comp-loinc/issues/17" ] }, { "cell_type": "code", "execution_count": 9, "id": "5502729c", "metadata": {}, "outputs": [], "source": [ "comploinc logical-definitions loinc:14573-0 loinc:47545-9" ] }, { "cell_type": "code", "execution_count": 10, "id": "89cc20cf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak lexmatch [OPTIONS] [TERMS]...\r\n", "\r\n", " Performs lexical matching between pairs of terms in one more more\r\n", " ontologies.\r\n", "\r\n", " Examples:\r\n", "\r\n", " runoak -i foo.obo lexmatch -o foo.sssom.tsv\r\n", "\r\n", " In this example, the input ontology file is assumed to contain all pairs of\r\n", " terms to be mapped.\r\n", "\r\n", " It is more common to map between all pairs of terms in two ontology files.\r\n", " In this case, you can merge the ontologies using a tool like ROBOT; or, to\r\n", " avoid a merge preprocessing step, use the --addl (-a) option to specify a\r\n", " second ontology file.\r\n", "\r\n", " runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv\r\n", "\r\n", " By default, this command will compare all terms in all ontologies. You can\r\n", " use the OAK term query syntax to pass in the set of all terms to be\r\n", " compared.\r\n", "\r\n", " For example, to compare all terms in union of FOO and BAR namespaces:\r\n", "\r\n", " runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv i^FOO: i^BAR:\r\n", "\r\n", " All members of the set are compared (including FOO to FOO matches and BAR to\r\n", " BAR matches), omitting trivial reciprocal matches.\r\n", "\r\n", " Use an \"@\" separator between two queries to feed in two explicit sets:\r\n", "\r\n", " runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv i^FOO: @\r\n", " i^BAR:\r\n", "\r\n", " ALGORITHM: lexmatch implements a simple algorithm:\r\n", "\r\n", " - create a lexical index, keyed by normalized strings of labels, synonyms -\r\n", " report all pairs of entities that have the same key\r\n", "\r\n", " The lexical index can be exported (in native YAML) using -L:\r\n", "\r\n", " runoak -i foo.obo lexmatch -L foo.index.yaml -o foo.sssom.tsv\r\n", "\r\n", " Note: if you run the above command a second time it will be faster as the\r\n", " index will be reused.\r\n", "\r\n", " RULES: Using custom rules:\r\n", "\r\n", " runoak -i foo.obo lexmatch -R match_rules.yaml -L foo.index.yaml -o\r\n", " foo.sssom.tsv\r\n", "\r\n", " Full documentation:\r\n", "\r\n", " - https://incatools.github.io/ontology-access-\r\n", " kit/src/oaklib.utilities.lexical.lexical_indexer.html# module-\r\n", " oaklib.utilities.lexical.lexical_indexer\r\n", "\r\n", "Options:\r\n", " -R, --rules-file TEXT path to rules file. Conforms to\r\n", " rules_datamodel. e.g.\r\n", " https://github.com/INCATools/ontology-\r\n", " access-\r\n", " kit/blob/main/tests/input/matcher_rules.yaml\r\n", " --add-labels / --no-add-labels Populate empty labels with URI fragments or\r\n", " CURIE local IDs, for ontologies that use\r\n", " semantic IDs [default: no-add-labels]\r\n", " -L, --lexical-index-file TEXT path to lexical index. This is recreated\r\n", " each time unless --no-recreate is passed\r\n", " --recreate / --no-recreate if true and lexical index is specified,\r\n", " always recreate, otherwise load from index\r\n", " [default: recreate]\r\n", " -o, --output FILENAME Output file, e.g. obo file\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "comploinc lexmatch --help" ] }, { "cell_type": "markdown", "id": "30b2be47", "metadata": {}, "source": [ "## Lexical Matching" ] }, { "cell_type": "code", "execution_count": 11, "id": "bfa38220", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n", "WARNING:root:Skipping as it is not a valid CURIE\r\n" ] } ], "source": [ "comploinc -a sqlite:obo:uberon lexmatch -L output/loinc-uberon-lexical-index.yaml -o output/loinc-uberon.sssom.tsv i^UBERON: @ i^loinc:" ] }, { "cell_type": "code", "execution_count": 12, "id": "9f47c239", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 13, "id": "a6c1dd6a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idsubject_labelpredicate_idobject_idobject_labelmapping_justificationmapping_toolsubject_match_fieldobject_match_fieldmatch_string
0UBERON:0000004noseskos:closeMatchloinc:LP7443-7Nosesemapv:LexicalMatchingoakliboio:hasExactSynonymrdfs:labelnose
1UBERON:0000004noseskos:closeMatchloinc:LP7443-7Nosesemapv:LexicalMatchingoaklibrdfs:labelrdfs:labelnose
2UBERON:0000014zone of skinskos:closeMatchloinc:LP36760-4Skinsemapv:LexicalMatchingoakliboio:hasExactSynonymrdfs:labelskin
3UBERON:0000019camera-type eyeskos:closeMatchloinc:LP7797-6EYEsemapv:LexicalMatchingoakliboio:hasBroadSynonymrdfs:labeleye
4UBERON:0000019camera-type eyeskos:closeMatchloinc:LP7218-3Eyesemapv:LexicalMatchingoakliboio:hasBroadSynonymrdfs:labeleye
.................................
74UBERON:2000673hypobranchial arteryskos:closeMatchloinc:LP28800-8HAsemapv:LexicalMatchingoakliboio:hasExactSynonymrdfs:labelha
75UBERON:3011048genital systemskos:closeMatchloinc:LP7555-8Reproductive systemsemapv:LexicalMatchingoakliboio:hasBroadSynonymrdfs:labelreproductive system
76UBERON:3011048genital systemskos:closeMatchloinc:LP7264-7Genitaliasemapv:LexicalMatchingoakliboio:hasBroadSynonymrdfs:labelgenitalia
77UBERON:6110636insect adult cerebral ganglionskos:closeMatchloinc:LP7084-9Brainsemapv:LexicalMatchingoakliboio:hasRelatedSynonymrdfs:labelbrain
78UBERON:8420000hair of scalpskos:closeMatchloinc:LP7280-3Hairsemapv:LexicalMatchingoakliboio:hasBroadSynonymrdfs:labelhair
\n", "

79 rows × 10 columns

\n", "
" ], "text/plain": [ " subject_id subject_label predicate_id \\\n", "0 UBERON:0000004 nose skos:closeMatch \n", "1 UBERON:0000004 nose skos:closeMatch \n", "2 UBERON:0000014 zone of skin skos:closeMatch \n", "3 UBERON:0000019 camera-type eye skos:closeMatch \n", "4 UBERON:0000019 camera-type eye skos:closeMatch \n", ".. ... ... ... \n", "74 UBERON:2000673 hypobranchial artery skos:closeMatch \n", "75 UBERON:3011048 genital system skos:closeMatch \n", "76 UBERON:3011048 genital system skos:closeMatch \n", "77 UBERON:6110636 insect adult cerebral ganglion skos:closeMatch \n", "78 UBERON:8420000 hair of scalp skos:closeMatch \n", "\n", " object_id object_label mapping_justification mapping_tool \\\n", "0 loinc:LP7443-7 Nose semapv:LexicalMatching oaklib \n", "1 loinc:LP7443-7 Nose semapv:LexicalMatching oaklib \n", "2 loinc:LP36760-4 Skin semapv:LexicalMatching oaklib \n", "3 loinc:LP7797-6 EYE semapv:LexicalMatching oaklib \n", "4 loinc:LP7218-3 Eye semapv:LexicalMatching oaklib \n", ".. ... ... ... ... \n", "74 loinc:LP28800-8 HA semapv:LexicalMatching oaklib \n", "75 loinc:LP7555-8 Reproductive system semapv:LexicalMatching oaklib \n", "76 loinc:LP7264-7 Genitalia semapv:LexicalMatching oaklib \n", "77 loinc:LP7084-9 Brain semapv:LexicalMatching oaklib \n", "78 loinc:LP7280-3 Hair semapv:LexicalMatching oaklib \n", "\n", " subject_match_field object_match_field match_string \n", "0 oio:hasExactSynonym rdfs:label nose \n", "1 rdfs:label rdfs:label nose \n", "2 oio:hasExactSynonym rdfs:label skin \n", "3 oio:hasBroadSynonym rdfs:label eye \n", "4 oio:hasBroadSynonym rdfs:label eye \n", ".. ... ... ... \n", "74 oio:hasExactSynonym rdfs:label ha \n", "75 oio:hasBroadSynonym rdfs:label reproductive system \n", "76 oio:hasBroadSynonym rdfs:label genitalia \n", "77 oio:hasRelatedSynonym rdfs:label brain \n", "78 oio:hasBroadSynonym rdfs:label hair \n", "\n", "[79 rows x 10 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"output/loinc-uberon.sssom.tsv\", sep=\"\\t\", comment=\"#\")\n", "df" ] }, { "cell_type": "code", "execution_count": null, "id": "375dbacf", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }