COMPLoinc Example

See https://github.com/loinc/comp-loinc for a description of the COMPLoinc project. This creates an OWL version of LOINC that can be used in OAK to explore relationships between codes and components.

Currently this notebook largely only uses the command line functionality of OAK. It should therefore be accessible to non-programmers (although it helps to have a good understanding of the command line, and some advanced OAK query concepts are introduced).

In future we may extend this notebook to have Python examples

Creating an alias

First we create an alias comploinc for the runoak command using a sqlite selector for the comploinc resource

[1]:
%alias comploinc runoak -i sqlite:obo:comploinc

Basic Lookup

The info command can be used to do lookup for a given entity or set of entities. These can be specified as lists of CURIEs or labels on the command line after the info command:

[2]:
comploinc info loinc:11145-0
loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:

Graph Visualization

Here we show the viz command with a single term (multiple terms can be passed, but we illustrate with one for now)

[7]:
comploinc viz loinc:11145-0  -o output/loinc-11145-0.png

img

Lookup by label

[4]:
comploinc info Serotonin
loinc:LP14693-3 ! Serotonin
[5]:
comploinc descendants -p i Serotonin
loinc:LP14693-3 ! Serotonin
loinc:LP15097-6 ! 5-Hydroxyindoleacetate
loinc:LP36806-5 ! 5-Hydroxyindoleacetate & Creatinine
[6]:
comploinc descendants -p i,loinc:hasComponent Serotonin
loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:
loinc:12172-3 ! 5-Hydroxyindoleacetate:PrThr:24H:Urine:Ord:
loinc:14573-0 ! 5-Hydroxyindoleacetate:SRat:24H:Urine:Qn:
loinc:14910-4 ! Serotonin:SCnc:Pt:Ser:Qn:
loinc:15009-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Urine:Qn:
loinc:1692-3 ! 5-Hydroxyindoleacetate:MCnc:Pt:CSF:Qn:
loinc:1693-1 ! 5-Hydroxyindoleacetate:MCnc:Pt:Ser/Plas:Qn:
loinc:1694-9 ! 5-Hydroxyindoleacetate:MCnc:Pt:Urine:Qn:
loinc:1695-6 ! 5-Hydroxyindoleacetate:MRat:24H:Urine:Qn:
loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:
loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:25524-0 ! Serotonin:SCnc:Pt:Bld:Qn:
loinc:25971-3 ! 5-Hydroxyindoleacetate:SCnc:24H:Urine:Qn:
loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:
loinc:26035-6 ! Serotonin:SCnc:Pt:Plas:Qn:
loinc:27057-9 ! Serotonin:MCnc:Pt:Ser:Qn:
loinc:2939-7 ! Serotonin:MCnc:Pt:Bld:Qn:
loinc:2940-5 ! Serotonin:MCnc:Pt:Plas:Qn:
loinc:2941-3 ! Serotonin:MCnc:Pt:Platelets:Qn:
loinc:29520-4 ! 5-Hydroxyindoleacetate/Creatinine:SRto:Pt:Urine:Qn:
loinc:31203-3 ! 5-Hydroxyindoleacetate:MCnc:24H:Urine:Qn:
loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:
loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:
loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:
loinc:42671-8 ! Serotonin:EntMass:Pt:Platelets:Qn:
loinc:44288-9 ! 5-Hydroxyindoleacetate/Creatinine:MRto:24H:Urine:Qn:
loinc:44909-0 ! 5-Hydroxyindoleacetate & Creatinine:Imp:Pt:Urine:Nom:
loinc:47544-2 ! 5-Hydroxyindoleacetate:SCnc:Pt:CSF:Qn:
loinc:47545-9 ! 5-Hydroxyindoleacetate/Creatinine:SRto:24H:Urine:Qn:
loinc:48168-9 ! 5-Hydroxyindoleacetate:PrThr:Pt:Urine:Ord:
loinc:50149-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Ser/Plas:Qn:
loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:
loinc:71804-9 ! Serotonin:EntSub:Pt:Platelets:Qn:
loinc:74769-1 ! 5-Hydroxyindoleacetate:SCnc:Pt:PRP:Qn:
loinc:LP14693-3 ! Serotonin
loinc:LP15097-6 ! 5-Hydroxyindoleacetate
loinc:LP36806-5 ! 5-Hydroxyindoleacetate & Creatinine

Boolean Graph Queries

Find all codes that have a component of “Serotonin” and has a system of “Urine”

[8]:
comploinc info .descendant//p=i,loinc:hasComponent Serotonin .and .descendant//p=i,loinc:hasSystem Urine
loinc:15009-4 ! 5-Hydroxyindoleacetate:SCnc:Pt:Urine:Qn:
loinc:1695-6 ! 5-Hydroxyindoleacetate:MRat:24H:Urine:Qn:
loinc:44909-0 ! 5-Hydroxyindoleacetate & Creatinine:Imp:Pt:Urine:Nom:
loinc:34373-1 ! Serotonin:SCnc:Pt:Urine:Qn:
loinc:25971-3 ! 5-Hydroxyindoleacetate:SCnc:24H:Urine:Qn:
loinc:31203-3 ! 5-Hydroxyindoleacetate:MCnc:24H:Urine:Qn:
loinc:18253-5 ! Serotonin:MRat:24H:Urine:Qn:
loinc:34374-9 ! Serotonin/Creatinine:SRto:Pt:Urine:Qn:
loinc:32339-4 ! Serotonin:SRat:24H:Urine:Qn:
loinc:29520-4 ! 5-Hydroxyindoleacetate/Creatinine:SRto:Pt:Urine:Qn:
loinc:17003-5 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:25981-2 ! Serotonin:SCnc:24H:Urine:Qn:
loinc:44288-9 ! 5-Hydroxyindoleacetate/Creatinine:MRto:24H:Urine:Qn:
loinc:12172-3 ! 5-Hydroxyindoleacetate:PrThr:24H:Urine:Ord:
loinc:56978-0 ! Serotonin:MCnc:24H:Urine:Qn:
loinc:18375-6 ! Serotonin:MCnc:Pt:Urine:Qn:
loinc:48168-9 ! 5-Hydroxyindoleacetate:PrThr:Pt:Urine:Ord:
loinc:11145-0 ! 5-Hydroxyindoleacetate/Creatinine:MRto:Pt:Urine:Qn:
loinc:1694-9 ! 5-Hydroxyindoleacetate:MCnc:Pt:Urine:Qn:
loinc:47545-9 ! 5-Hydroxyindoleacetate/Creatinine:SRto:24H:Urine:Qn:
loinc:14573-0 ! 5-Hydroxyindoleacetate:SRat:24H:Urine:Qn:

Semantic Similarity (Term Wise)

[16]:
comploinc similarity --help
Usage: runoak similarity [OPTIONS] [TERMS]...

  All by all similarity

  This calculates a similarity matrix for two sets of terms.

  Input sets of a terms can be specified in different ways:

  - via a file - via explicit lists of terms or queries

  Example:

      runoak -i hp.db all-similarity -p i --set1-file HPO-TERMS1 --set2-file
      HPO-TERMS2 -O csv

  This will compare every term in TERMS1 vs TERMS2

  Alternatively standard OAK term queries can be used, with "@" separating the
  two lists

  Example:

      runoak -i hp.db all-similarity -p i TERM_1 TERM_2 ... TERM_N @ TERM_N+1
      ... TERM_M

  The .all term syntax can be used to select all terms in an ontology

  Example:

      runoak -i ma.db all-similarity -p i,p .all @ .all

  This can be mixed with other term selectors; for example to calculate the
  similarity of "neuron" vs all terms in CL:

      runoak -i cl.db all-similarity -p i,p .all @ neuron

  An example pipeline to do all by all over all phenotypes in HPO:

  Explicit:

      runoak -i hp.db descendants -p i HP:0000118 > HPO     runoak -i hp.db
      all-similarity -p i --set1-file HPO --set2-file HPO -O csv -o
      RESULTS.tsv

  The same thing can be done more compactly with term queries:

      runoak -i hp.db all-similarity -p i .desc//p=i HP:0000118 @ .desc//p=i
      HP:0000118

Options:
  -p, --predicates TEXT         A comma-separated list of predicates
  --set1-file TEXT              ID file for set1
  --set2-file TEXT              ID file for set2
  --jaccard-minimum FLOAT       Minimum value for jaccard score
  --ic-minimum FLOAT            Minimum value for information content
  -o, --output TEXT             path to output
  --main-score-field TEXT       Score used for summarization  [default:
                                phenodigm_score]
  --autolabel / --no-autolabel  If set, results will automatically have labels
                                assigned  [default: autolabel]
  -O, --output-type TEXT        Desired output type
  --help                        Show this message and exit.
[15]:
comploinc similarity loinc:2341-6 @ loinc:3134-4
ancestor_id: loinc:LP65098-3
ancestor_information_content: 6.066089190457772
ancestor_label: Sugar
jaccard_similarity: 0.5
object_id: loinc:3134-4
object_label: 'Xylose:MCnc:Pt:Bld:Qn:'
phenodigm_score: 1.7415638361050352
subject_id: loinc:2341-6
subject_label: Glucose:MCnc:Pt:Bld:Qn:Test strip manual

Value Sets

The COMPLoinc project doesn’t define any value sets. Here we just use two random hardcoded ones for illustration purposes

[2]:
comploinc info .idfile input/valueset1.txt
loinc:5914-7 ! Glucose:PrThr:Pt:Bld:Ord:Test strip
loinc:2339-0 ! Glucose:MCnc:Pt:Bld:Qn:
loinc:50216-1 ! Glucose^6th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:6777-7 ! Glucose:MCnc:Pt:Ser/Plas:Qn:
loinc:77145-1 ! Glucose^post CFst:SCnc:Pt:Ser/Plas/Bld:Qn:
loinc:54085-6 ! Galactose:SCnc:Pt:Bld.dot:Qn:
loinc:50218-7 ! Glucose^9th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:25426-8 ! Galactose:SCnc:Pt:Ser/Plas:Qn:
loinc:32016-8 ! Glucose:MCnc:Pt:BldC:Qn:
loinc:77135-2 ! Glucose:SCnc:Pt:Ser/Plas/Bld:Qn:
loinc:2307-7 ! Galactose:MCnc:Pt:Bld:Qn:
[3]:
comploinc info .idfile input/valueset2.txt
loinc:54495-7 ! Glucose^post dialysis:SCnc:Pt:Ser/Plas:Qn:
loinc:2308-5 ! Galactose:MCnc:Pt:Ser/Plas:Qn:
loinc:76629-5 ! Glucose^post CFst:SCnc:Pt:Bld:Qn:
loinc:27353-2 ! Estimated average glucose:MCnc:Pt:Bld:Qn:Estimated from glycated hemoglobin
loinc:2552-8 ! Lactose:MCnc:Pt:Ser/Plas:Qn:
loinc:12611-0 ! Glucose^4H specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:51596-5 ! Glucose:SCnc:Pt:BldC:Qn:
loinc:93791-2 ! Glucose:MCnc:Stdy^mean:Ser/Plas:Qn:
loinc:50215-3 ! Glucose^5th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:50208-8 ! Glucose^10th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:3134-4 ! Xylose:MCnc:Pt:Bld:Qn:
loinc:29999-0 ! Xylose:MCnc:Pt:Ser/Plas:Qn:
loinc:5914-7 ! Glucose:PrThr:Pt:Bld:Ord:Test strip
loinc:2339-0 ! Glucose:MCnc:Pt:Bld:Qn:
loinc:50216-1 ! Glucose^6th specimen:MCnc:Pt:Ser/Plas:Qn:
loinc:6777-7 ! Glucose:MCnc:Pt:Ser/Plas:Qn:
loinc:77145-1 ! Glucose^post CFst:SCnc:Pt:Ser/Plas/Bld:Qn:
[2]:
comploinc termset-similarity .idfile input/valueset1.txt @ .idfile input/valueset2.txt -o output/sim-out.yaml
[4]:
!head -20 output/sim-out.yaml
average_score: 9.623542661061256
best_score: 13.738514532429267
object_best_matches:
  loinc:12611-0:
    match_source: loinc:12611-0
    match_source_label: 'Glucose^4H specimen:MCnc:Pt:Ser/Plas:Qn:'
    match_target: loinc:2339-0
    match_target_label: 'Glucose:MCnc:Pt:Bld:Qn:'
    score: 6.519346011967107
    similarity:
      ancestor_id: loinc:LP14635-4
      ancestor_information_content: 6.519346011967107
      ancestor_label: Glucose
      jaccard_similarity: 0.84
      object_id: loinc:12611-0
      phenodigm_score: 2.3401390236591437
      subject_id: loinc:2339-0
  loinc:2308-5:
    match_source: loinc:2308-5
    match_source_label: 'Galactose:MCnc:Pt:Ser/Plas:Qn:'

Logical Definitions

Currently these are invisible - best way to fix this is to address:

https://github.com/loinc/comp-loinc/issues/17

[9]:
comploinc logical-definitions loinc:14573-0 loinc:47545-9
[10]:
comploinc lexmatch --help
Usage: runoak lexmatch [OPTIONS] [TERMS]...

  Performs lexical matching between pairs of terms in one more more
  ontologies.

  Examples:

      runoak -i foo.obo lexmatch -o foo.sssom.tsv

  In this example, the input ontology file is assumed to contain all pairs of
  terms to be mapped.

  It is more common to map between all pairs of terms in two ontology files.
  In this case, you can merge the ontologies using a tool like ROBOT; or,  to
  avoid a merge preprocessing step, use the --addl (-a) option to specify a
  second ontology file.

      runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv

  By default, this command will compare all terms in all ontologies. You can
  use the OAK term query syntax to pass in the set of all terms to be
  compared.

  For example, to compare all terms in union of FOO and BAR namespaces:

      runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv i^FOO: i^BAR:

  All members of the set are compared (including FOO to FOO matches and BAR to
  BAR matches), omitting trivial reciprocal matches.

  Use an "@" separator between two queries to feed in two explicit sets:

      runoak -i foo.obo --add bar.obo lexmatch -o foo.sssom.tsv i^FOO: @
      i^BAR:

  ALGORITHM: lexmatch implements a simple algorithm:

  - create a lexical index, keyed by normalized strings of labels, synonyms -
  report all pairs of entities that have the same key

  The lexical index can be exported (in native YAML) using -L:

      runoak -i foo.obo lexmatch -L foo.index.yaml -o foo.sssom.tsv

  Note: if you run the above command a second time it will be faster as the
  index will be reused.

  RULES: Using custom rules:

      runoak  -i foo.obo lexmatch -R match_rules.yaml -L foo.index.yaml -o
      foo.sssom.tsv

  Full documentation:

  - https://incatools.github.io/ontology-access-
  kit/src/oaklib.utilities.lexical.lexical_indexer.html# module-
  oaklib.utilities.lexical.lexical_indexer

Options:
  -R, --rules-file TEXT           path to rules file. Conforms to
                                  rules_datamodel.        e.g.
                                  https://github.com/INCATools/ontology-
                                  access-
                                  kit/blob/main/tests/input/matcher_rules.yaml
  --add-labels / --no-add-labels  Populate empty labels with URI fragments or
                                  CURIE local IDs, for ontologies that use
                                  semantic IDs  [default: no-add-labels]
  -L, --lexical-index-file TEXT   path to lexical index. This is recreated
                                  each time unless --no-recreate is passed
  --recreate / --no-recreate      if true and lexical index is specified,
                                  always recreate, otherwise load from index
                                  [default: recreate]
  -o, --output FILENAME           Output file, e.g. obo file
  --help                          Show this message and exit.

Lexical Matching

[11]:
comploinc -a sqlite:obo:uberon lexmatch -L output/loinc-uberon-lexical-index.yaml -o output/loinc-uberon.sssom.tsv i^UBERON: @ i^loinc:
WARNING:root:Skipping <urn:swrl#A> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#B> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#C> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#D> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#a1> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#a2> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#d> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#e> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#eff> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#g1> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#g2> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#in> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#mf2> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#mf> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#p> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#w> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#x> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#y> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl#z> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl:var#x> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl:var#y> as it is not a valid CURIE
WARNING:root:Skipping <urn:swrl:var#z> as it is not a valid CURIE
[12]:
import pandas as pd
[13]:
df = pd.read_csv("output/loinc-uberon.sssom.tsv", sep="\t", comment="#")
df
[13]:
subject_id subject_label predicate_id object_id object_label mapping_justification mapping_tool subject_match_field object_match_field match_string
0 UBERON:0000004 nose skos:closeMatch loinc:LP7443-7 Nose semapv:LexicalMatching oaklib oio:hasExactSynonym rdfs:label nose
1 UBERON:0000004 nose skos:closeMatch loinc:LP7443-7 Nose semapv:LexicalMatching oaklib rdfs:label rdfs:label nose
2 UBERON:0000014 zone of skin skos:closeMatch loinc:LP36760-4 Skin semapv:LexicalMatching oaklib oio:hasExactSynonym rdfs:label skin
3 UBERON:0000019 camera-type eye skos:closeMatch loinc:LP7797-6 EYE semapv:LexicalMatching oaklib oio:hasBroadSynonym rdfs:label eye
4 UBERON:0000019 camera-type eye skos:closeMatch loinc:LP7218-3 Eye semapv:LexicalMatching oaklib oio:hasBroadSynonym rdfs:label eye
... ... ... ... ... ... ... ... ... ... ...
74 UBERON:2000673 hypobranchial artery skos:closeMatch loinc:LP28800-8 HA semapv:LexicalMatching oaklib oio:hasExactSynonym rdfs:label ha
75 UBERON:3011048 genital system skos:closeMatch loinc:LP7555-8 Reproductive system semapv:LexicalMatching oaklib oio:hasBroadSynonym rdfs:label reproductive system
76 UBERON:3011048 genital system skos:closeMatch loinc:LP7264-7 Genitalia semapv:LexicalMatching oaklib oio:hasBroadSynonym rdfs:label genitalia
77 UBERON:6110636 insect adult cerebral ganglion skos:closeMatch loinc:LP7084-9 Brain semapv:LexicalMatching oaklib oio:hasRelatedSynonym rdfs:label brain
78 UBERON:8420000 hair of scalp skos:closeMatch loinc:LP7280-3 Hair semapv:LexicalMatching oaklib oio:hasBroadSynonym rdfs:label hair

79 rows × 10 columns

[ ]: