COMPLoinc Example
See https://github.com/loinc/comp-loinc for a description of the COMPLoinc project. This creates an OWL version of LOINC that can be used in OAK to explore relationships between codes and components.
Currently this notebook largely only uses the command line functionality of OAK. It should therefore be accessible to non-programmers (although it helps to have a good understanding of the command line, and some advanced OAK query concepts are introduced).
In future we may extend this notebook to have Python examples
Creating an alias
First we create an alias comploinc
for the runoak command using a sqlite selector for the comploinc resource
[1]:
%alias comploinc runoak -i sqlite:obo:comploinc
Basic Lookup
The info command can be used to do lookup for a given entity or set of entities. These can be specified as lists of CURIEs or labels on the command line after the info
command:
[2]:
comploinc info loinc:11145-0
loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:
Graph Visualization
Here we show the viz
command with a single term (multiple terms can be passed, but we illustrate with one for now)
[7]:
comploinc viz loinc:11145-0 -o output/loinc-11145-0.png
Lookup by label
[4]:
comploinc info Serotonin
loinc:LP14693-3 ! Serotonin
[5]:
comploinc descendants -p i Serotonin
loinc:LP14693-3 ! Serotonin
loinc:LP15097-6 ! 5-Hydroxyindoleacetate
loinc:LP36806-5 ! 5-Hydroxyindoleacetate & Creatinine
[6]:
comploinc descendants -p i,loinc:hasComponent Serotonin
loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:
loinc:12172-3 ! 5-Hydroxyindoleacetate:PrThr:24H:Urine:Ord:
loinc:14573-0 ! 5-Hydroxyindoleacetate:SRat:24H:Urine:Qn:
loinc:14910-4 ! Serotonin:SCnc:Pt:Ser:Qn:
loinc:15009-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Urine:Qn:
loinc:1692-3 ! 5-Hydroxyindoleacetate:MCnc:Pt:CSF:Qn:
loinc:1693-1 ! 5-Hydroxyindoleacetate:MCnc:Pt:Ser/Plas:Qn:
loinc:1694-9 ! 5-Hydroxyindoleacetate:MCnc:Pt:Urine:Qn:
loinc:1695-6 ! 5-Hydroxyindoleacetate:MRat:24H:Urine:Qn:
loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:
loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:25524-0 ! Serotonin:SCnc:Pt:Bld:Qn:
loinc:25971-3 ! 5-Hydroxyindoleacetate:SCnc:24H:Urine:Qn:
loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:
loinc:26035-6 ! Serotonin:SCnc:Pt:Plas:Qn:
loinc:27057-9 ! Serotonin:MCnc:Pt:Ser:Qn:
loinc:2939-7 ! Serotonin:MCnc:Pt:Bld:Qn:
loinc:2940-5 ! Serotonin:MCnc:Pt:Plas:Qn:
loinc:2941-3 ! Serotonin:MCnc:Pt:Platelets:Qn:
loinc:29520-4 ! 5-Hydroxyindoleacetate/Creatinine:SRto:Pt:Urine:Qn:
loinc:31203-3 ! 5-Hydroxyindoleacetate:MCnc:24H:Urine:Qn:
loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:
loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:
loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:
loinc:42671-8 ! Serotonin:EntMass:Pt:Platelets:Qn:
loinc:44288-9 ! 5-Hydroxyindoleacetate/Creatinine:MRto:24H:Urine:Qn:
loinc:44909-0 ! 5-Hydroxyindoleacetate & Creatinine:Imp:Pt:Urine:Nom:
loinc:47544-2 ! 5-Hydroxyindoleacetate:SCnc:Pt:CSF:Qn:
loinc:47545-9 ! 5-Hydroxyindoleacetate/Creatinine:SRto:24H:Urine:Qn:
loinc:48168-9 ! 5-Hydroxyindoleacetate:PrThr:Pt:Urine:Ord:
loinc:50149-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Ser/Plas:Qn:
loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:
loinc:71804-9 ! Serotonin:EntSub:Pt:Platelets:Qn:
loinc:74769-1 ! 5-Hydroxyindoleacetate:SCnc:Pt:PRP:Qn:
loinc:LP14693-3 ! Serotonin
loinc:LP15097-6 ! 5-Hydroxyindoleacetate
loinc:LP36806-5 ! 5-Hydroxyindoleacetate & Creatinine
Text Search
[7]:
comploinc info l~Serotonin
loinc:LP14693-3 ! Serotonin
loinc:14910-4 ! Serotonin:SCnc:Pt:Ser:Qn:
loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:
loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:25524-0 ! Serotonin:SCnc:Pt:Bld:Qn:
loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:
loinc:26035-6 ! Serotonin:SCnc:Pt:Plas:Qn:
loinc:27057-9 ! Serotonin:MCnc:Pt:Ser:Qn:
loinc:2939-7 ! Serotonin:MCnc:Pt:Bld:Qn:
loinc:2940-5 ! Serotonin:MCnc:Pt:Plas:Qn:
loinc:2941-3 ! Serotonin:MCnc:Pt:Platelets:Qn:
loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:
loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:
loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:
loinc:42671-8 ! Serotonin:EntMass:Pt:Platelets:Qn:
loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:
loinc:71804-9 ! Serotonin:EntSub:Pt:Platelets:Qn:
Boolean Graph Queries
Find all codes that have a component of “Serotonin” and has a system of “Urine”
[8]:
comploinc info .descendant//p=i,loinc:hasComponent Serotonin .and .descendant//p=i,loinc:hasSystem Urine
loinc:15009-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Urine:Qn:
loinc:1695-6 ! 5-Hydroxyindoleacetate:MRat:24H:Urine:Qn:
loinc:44909-0 ! 5-Hydroxyindoleacetate & Creatinine:Imp:Pt:Urine:Nom:
loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:
loinc:25971-3 ! 5-Hydroxyindoleacetate:SCnc:24H:Urine:Qn:
loinc:31203-3 ! 5-Hydroxyindoleacetate:MCnc:24H:Urine:Qn:
loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:
loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:
loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:
loinc:29520-4 ! 5-Hydroxyindoleacetate/Creatinine:SRto:Pt:Urine:Qn:
loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:
loinc:44288-9 ! 5-Hydroxyindoleacetate/Creatinine:MRto:24H:Urine:Qn:
loinc:12172-3 ! 5-Hydroxyindoleacetate:PrThr:24H:Urine:Ord:
loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:
loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:48168-9 ! 5-Hydroxyindoleacetate:PrThr:Pt:Urine:Ord:
loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:
loinc:1694-9 ! 5-Hydroxyindoleacetate:MCnc:Pt:Urine:Qn:
loinc:47545-9 ! 5-Hydroxyindoleacetate/Creatinine:SRto:24H:Urine:Qn:
loinc:14573-0 ! 5-Hydroxyindoleacetate:SRat:24H:Urine:Qn:
Semantic Similarity (Term Wise)
[16]:
comploinc similarity --help
Usage: runoak similarity [OPTIONS] [TERMS]...
All by all similarity
This calculates a similarity matrix for two sets of terms.
Input sets of a terms can be specified in different ways:
- via a file - via explicit lists of terms or queries
Example:
runoak -i hp.db all-similarity -p i --set1-file HPO-TERMS1 --set2-file
HPO-TERMS2 -O csv
This will compare every term in TERMS1 vs TERMS2
Alternatively standard OAK term queries can be used, with "@" separating the
two lists
Example:
runoak -i hp.db all-similarity -p i TERM_1 TERM_2 ... TERM_N @ TERM_N+1
... TERM_M
The .all term syntax can be used to select all terms in an ontology
Example:
runoak -i ma.db all-similarity -p i,p .all @ .all
This can be mixed with other term selectors; for example to calculate the
similarity of "neuron" vs all terms in CL:
runoak -i cl.db all-similarity -p i,p .all @ neuron
An example pipeline to do all by all over all phenotypes in HPO:
Explicit:
runoak -i hp.db descendants -p i HP:0000118 > HPO runoak -i hp.db
all-similarity -p i --set1-file HPO --set2-file HPO -O csv -o
RESULTS.tsv
The same thing can be done more compactly with term queries:
runoak -i hp.db all-similarity -p i .desc//p=i HP:0000118 @ .desc//p=i
HP:0000118
Options:
-p, --predicates TEXT A comma-separated list of predicates
--set1-file TEXT ID file for set1
--set2-file TEXT ID file for set2
--jaccard-minimum FLOAT Minimum value for jaccard score
--ic-minimum FLOAT Minimum value for information content
-o, --output TEXT path to output
--main-score-field TEXT Score used for summarization [default:
phenodigm_score]
--autolabel / --no-autolabel If set, results will automatically have labels
assigned [default: autolabel]
-O, --output-type TEXT Desired output type
--help Show this message and exit.
[15]:
comploinc similarity loinc:2341-6 @ loinc:3134-4
ancestor_id: loinc:LP65098-3
ancestor_information_content: 6.066089190457772
ancestor_label: Sugar
jaccard_similarity: 0.5
object_id: loinc:3134-4
object_label: 'Xylose:MCnc:Pt:Bld:Qn:'
phenodigm_score: 1.7415638361050352
subject_id: loinc:2341-6
subject_label: Glucose:MCnc:Pt:Bld:Qn:Test strip manual
Value Sets
The COMPLoinc project doesn’t define any value sets. Here we just use two random hardcoded ones for illustration purposes
[2]:
comploinc info .idfile input/valueset1.txt
loinc:5914-7 ! Glucose:PrThr:Pt:Bld:Ord:Test strip
loinc:2339-0 ! Glucose:MCnc:Pt:Bld:Qn:
loinc:50216-1 ! Glucose^6th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:6777-7 ! Glucose:MCnc:Pt:Ser/Plas:Qn:
loinc:77145-1 ! Glucose^post CFst:SCnc:Pt:Ser/Plas/Bld:Qn:
loinc:54085-6 ! Galactose:SCnc:Pt:Bld.dot:Qn:
loinc:50218-7 ! Glucose^9th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:25426-8 ! Galactose:SCnc:Pt:Ser/Plas:Qn:
loinc:32016-8 ! Glucose:MCnc:Pt:BldC:Qn:
loinc:77135-2 ! Glucose:SCnc:Pt:Ser/Plas/Bld:Qn:
loinc:2307-7 ! Galactose:MCnc:Pt:Bld:Qn:
[3]:
comploinc info .idfile input/valueset2.txt
loinc:54495-7 ! Glucose^post dialysis:SCnc:Pt:Ser/Plas:Qn:
loinc:2308-5 ! Galactose:MCnc:Pt:Ser/Plas:Qn:
loinc:76629-5 ! Glucose^post CFst:SCnc:Pt:Bld:Qn:
loinc:27353-2 ! Estimated average glucose:MCnc:Pt:Bld:Qn:Estimated from glycated hemoglobin
loinc:2552-8 ! Lactose:MCnc:Pt:Ser/Plas:Qn:
loinc:12611-0 ! Glucose^4H specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:51596-5 ! Glucose:SCnc:Pt:BldC:Qn:
loinc:93791-2 ! Glucose:MCnc:Stdy^mean:Ser/Plas:Qn:
loinc:50215-3 ! Glucose^5th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:50208-8 ! Glucose^10th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:3134-4 ! Xylose:MCnc:Pt:Bld:Qn:
loinc:29999-0 ! Xylose:MCnc:Pt:Ser/Plas:Qn:
loinc:5914-7 ! Glucose:PrThr:Pt:Bld:Ord:Test strip
loinc:2339-0 ! Glucose:MCnc:Pt:Bld:Qn:
loinc:50216-1 ! Glucose^6th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:6777-7 ! Glucose:MCnc:Pt:Ser/Plas:Qn:
loinc:77145-1 ! Glucose^post CFst:SCnc:Pt:Ser/Plas/Bld:Qn:
[2]:
comploinc termset-similarity .idfile input/valueset1.txt @ .idfile input/valueset2.txt -o output/sim-out.yaml
[4]:
!head -20 output/sim-out.yaml
average_score: 9.623542661061256
best_score: 13.738514532429267
object_best_matches:
loinc:12611-0:
match_source: loinc:12611-0
match_source_label: 'Glucose^4H specimen:MCnc:Pt:Ser/Plas:Qn:'
match_target: loinc:2339-0
match_target_label: 'Glucose:MCnc:Pt:Bld:Qn:'
score: 6.519346011967107
similarity:
ancestor_id: loinc:LP14635-4
ancestor_information_content: 6.519346011967107
ancestor_label: Glucose
jaccard_similarity: 0.84
object_id: loinc:12611-0
phenodigm_score: 2.3401390236591437
subject_id: loinc:2339-0
loinc:2308-5:
match_source: loinc:2308-5
match_source_label: 'Galactose:MCnc:Pt:Ser/Plas:Qn:'
Logical Definitions
Currently these are invisible - best way to fix this is to address:
https://github.com/loinc/comp-loinc/issues/17
[9]:
comploinc logical-definitions loinc:14573-0 loinc:47545-9
[10]:
comploinc lexmatch --help
Usage: runoak lexmatch [OPTIONS] [TERMS]...
Performs lexical matching between pairs of terms in one more more
ontologies.
Examples:
runoak -i foo.obo lexmatch -o foo.sssom.tsv
In this example, the input ontology file is assumed to contain all pairs of
terms to be mapped.
It is more common to map between all pairs of terms in two ontology files.
In this case, you can merge the ontologies using a tool like ROBOT; or, to
avoid a merge preprocessing step, use the --addl (-a) option to specify a
second ontology file.
runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv
By default, this command will compare all terms in all ontologies. You can
use the OAK term query syntax to pass in the set of all terms to be
compared.
For example, to compare all terms in union of FOO and BAR namespaces:
runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv i^FOO: i^BAR:
All members of the set are compared (including FOO to FOO matches and BAR to
BAR matches), omitting trivial reciprocal matches.
Use an "@" separator between two queries to feed in two explicit sets:
runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv i^FOO: @
i^BAR:
ALGORITHM: lexmatch implements a simple algorithm:
- create a lexical index, keyed by normalized strings of labels, synonyms -
report all pairs of entities that have the same key
The lexical index can be exported (in native YAML) using -L:
runoak -i foo.obo lexmatch -L foo.index.yaml -o foo.sssom.tsv
Note: if you run the above command a second time it will be faster as the
index will be reused.
RULES: Using custom rules:
runoak -i foo.obo lexmatch -R match_rules.yaml -L foo.index.yaml -o
foo.sssom.tsv
Full documentation:
- https://incatools.github.io/ontology-access-
kit/src/oaklib.utilities.lexical.lexical_indexer.html# module-
oaklib.utilities.lexical.lexical_indexer
Options:
-R, --rules-file TEXT path to rules file. Conforms to
rules_datamodel. e.g.
https://github.com/INCATools/ontology-
access-
kit/blob/main/tests/input/matcher_rules.yaml
--add-labels / --no-add-labels Populate empty labels with URI fragments or
CURIE local IDs, for ontologies that use
semantic IDs [default: no-add-labels]
-L, --lexical-index-file TEXT path to lexical index. This is recreated
each time unless --no-recreate is passed
--recreate / --no-recreate if true and lexical index is specified,
always recreate, otherwise load from index
[default: recreate]
-o, --output FILENAME Output file, e.g. obo file
--help Show this message and exit.
Lexical Matching
[11]:
comploinc -a sqlite:obo:uberon lexmatch -L output/loinc-uberon-lexical-index.yaml -o output/loinc-uberon.sssom.tsv i^UBERON: @ i^loinc:
WARNING:root:Skipping <urn:swrl#A> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#B> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#C> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#D> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#a1> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#a2> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#d> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#e> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#eff> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#g1> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#g2> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#in> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#mf2> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#mf> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#p> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#w> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#x> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#y> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#z> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl:var#x> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl:var#y> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl:var#z> as it is not a valid CURIE
[12]:
import pandas as pd
[13]:
df = pd.read_csv("output/loinc-uberon.sssom.tsv", sep="\t", comment="#")
df
[13]:
subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification | mapping_tool | subject_match_field | object_match_field | match_string | |
---|---|---|---|---|---|---|---|---|---|---|
0 | UBERON:0000004 | nose | skos:closeMatch | loinc:LP7443-7 | Nose | semapv:LexicalMatching | oaklib | oio:hasExactSynonym | rdfs:label | nose |
1 | UBERON:0000004 | nose | skos:closeMatch | loinc:LP7443-7 | Nose | semapv:LexicalMatching | oaklib | rdfs:label | rdfs:label | nose |
2 | UBERON:0000014 | zone of skin | skos:closeMatch | loinc:LP36760-4 | Skin | semapv:LexicalMatching | oaklib | oio:hasExactSynonym | rdfs:label | skin |
3 | UBERON:0000019 | camera-type eye | skos:closeMatch | loinc:LP7797-6 | EYE | semapv:LexicalMatching | oaklib | oio:hasBroadSynonym | rdfs:label | eye |
4 | UBERON:0000019 | camera-type eye | skos:closeMatch | loinc:LP7218-3 | Eye | semapv:LexicalMatching | oaklib | oio:hasBroadSynonym | rdfs:label | eye |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
74 | UBERON:2000673 | hypobranchial artery | skos:closeMatch | loinc:LP28800-8 | HA | semapv:LexicalMatching | oaklib | oio:hasExactSynonym | rdfs:label | ha |
75 | UBERON:3011048 | genital system | skos:closeMatch | loinc:LP7555-8 | Reproductive system | semapv:LexicalMatching | oaklib | oio:hasBroadSynonym | rdfs:label | reproductive system |
76 | UBERON:3011048 | genital system | skos:closeMatch | loinc:LP7264-7 | Genitalia | semapv:LexicalMatching | oaklib | oio:hasBroadSynonym | rdfs:label | genitalia |
77 | UBERON:6110636 | insect adult cerebral ganglion | skos:closeMatch | loinc:LP7084-9 | Brain | semapv:LexicalMatching | oaklib | oio:hasRelatedSynonym | rdfs:label | brain |
78 | UBERON:8420000 | hair of scalp | skos:closeMatch | loinc:LP7280-3 | Hair | semapv:LexicalMatching | oaklib | oio:hasBroadSynonym | rdfs:label | hair |
79 rows × 10 columns
[ ]: