{ "cells": [ { "cell_type": "markdown", "id": "32a2e447", "metadata": {}, "source": [ "# OAK mappings command\n", "\n", "This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n", "\n", "This notebook provides examples for the `mappings` command, which can be used to lookup mappings that are bundled with ontologies.\n", "\n", "Overall background on the concepts here can be found in the [OAK Guide to Mappings](https://incatools.github.io/ontology-access-kit/guide/mappings.html).\n", "\n", "## Help Option\n", "\n", "You can get help on any OAK command using `--help`" ] }, { "cell_type": "code", "execution_count": 1, "id": "97ed8cee", "metadata": { "ExecuteTime": { "end_time": "2024-04-25T16:36:45.881638Z", "start_time": "2024-04-25T16:36:42.970458Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: runoak mappings [OPTIONS] [TERMS]...\r\n", "\r\n", " List all mappings encoded in the ontology\r\n", "\r\n", " Example:\r\n", "\r\n", " runoak -i sqlite:obo:envo mappings\r\n", "\r\n", " The default output is SSSOM YAML. To use the (canonical) csv format:\r\n", "\r\n", " runoak -i sqlite:obo:envo mappings -O sssom\r\n", "\r\n", " By default, labels are not included. Use --autolabel to include labels (but\r\n", " note that if the label is not in the source ontology, then no label will be\r\n", " retrieved)\r\n", "\r\n", " runoak -i sqlite:obo:envo mappings -O sssom\r\n", "\r\n", " To constrain the mapped object source:\r\n", "\r\n", " runoak -i sqlite:obo:foodon mappings -O sssom --maps-to-source\r\n", " SUBSET_SIREN\r\n", "\r\n", " Python API:\r\n", "\r\n", " https://incatools.github.io/ontology-access-kit/interfaces/mapping-\r\n", " provider\r\n", "\r\n", " Data model:\r\n", "\r\n", " https://w3id.org/oak/mapping-provider\r\n", "\r\n", "Options:\r\n", " -o, --output FILENAME Output file, e.g. obo file\r\n", " -O, --output-type TEXT Desired output type\r\n", " --autolabel / --no-autolabel If set, results will automatically have labels\r\n", " assigned [default: autolabel]\r\n", " -M, --maps-to-source TEXT Return only mappings with subject or object\r\n", " source equal to this\r\n", " --mapper TEXT A selector for an adapter that is to be used\r\n", " for the main lookup operation\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!runoak mappings --help" ] }, { "cell_type": "markdown", "id": "1f933146", "metadata": {}, "source": [ "## Set up an alias\n", "\n", "For convenience we will set up an alias for use in this notebook. This will allow us to use `uberon ...` rather than `runoak -i sqlite:obo:uberon ...` for the rest of the notebook.\n", "\n", "We use Uberon as an example, as Uberon bundles a lot of diverse mappings. See [Uberon docs](https://obophenotype.github.io/uberon/bridges/)." ] }, { "cell_type": "code", "execution_count": 2, "id": "29d2249a", "metadata": { "ExecuteTime": { "end_time": "2024-04-25T16:38:06.765885Z", "start_time": "2024-04-25T16:38:06.757549Z" } }, "outputs": [], "source": [ "alias uberon runoak -i sqlite:obo:uberon" ] }, { "cell_type": "markdown", "id": "1a7c69d7", "metadata": {}, "source": [ "## Direct mappings for a subject term\n", "\n", "First we will look up the mappings for the Uberon term for the CA4 region of the hippocampus" ] }, { "cell_type": "code", "execution_count": 4, "id": "2c406bc1", "metadata": { "ExecuteTime": { "end_time": "2024-04-25T16:39:33.552868Z", "start_time": "2024-04-25T16:39:30.732480Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: DHBA:10300\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: DHBA\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: EFO:0002457\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: EFO\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: EMAPA:32771\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: EMAPA\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: FMA:75741\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: FMA\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: HBA:12895\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: HBA\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: MA:0000953\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: MA\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: NCIT:C32249\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: NCIT\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: PBA:10074\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: PBA\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: UMLS:C2328406\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: UMLS\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: Wikipedia:Region_IV_of_hippocampus_proper\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: Wikipedia\r\n", "\r\n", "---\r\n", "subject_id: UBERON:0003884\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: neuronames:181\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: CA4 field of hippocampus\r\n", "subject_source: UBERON\r\n", "object_source: neuronames\r\n" ] } ], "source": [ "uberon mappings UBERON:0003884" ] }, { "cell_type": "markdown", "source": [ "The above YAML follow the SSSOM datamodel (https://w3id.org/sssom).\n", "\n", "We can get the results back in SSSOM tsv format (this time querying for \"brain\"). Here we will view it via pandas:" ], "metadata": { "collapsed": false }, "id": "2ccbc738447cd6e5" }, { "cell_type": "code", "execution_count": 11, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oaklib-OeQZizwE-py3.9/lib/python3.9/site-packages/sssom/util.py:168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\r\n", " df.replace(\"\", np.nan, inplace=True)\r\n" ] } ], "source": [ "uberon mappings UBERON:0000955 -o output/brain-mappings.tsv -O sssom" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-04-25T16:48:50.217198Z", "start_time": "2024-04-25T16:48:46.515220Z" } }, "id": "cd60a25935944f7" }, { "cell_type": "code", "execution_count": 13, "outputs": [ { "data": { "text/plain": " subject_id subject_label predicate_id object_id object_label \\\n0 UBERON:0000955 brain oio:hasDbXref AAO:0010478 NaN \n1 UBERON:0000955 brain oio:hasDbXref ABA:Brain NaN \n2 UBERON:0000955 brain oio:hasDbXref BAMS:Br NaN \n3 UBERON:0000955 brain oio:hasDbXref BAMS:Brain NaN \n4 UBERON:0000955 brain oio:hasDbXref BILA:0000135 NaN \n5 UBERON:0000955 brain oio:hasDbXref BIRNLEX:796 NaN \n6 UBERON:0000955 brain oio:hasDbXref BTO:0000142 NaN \n7 UBERON:0000955 brain oio:hasDbXref CALOHA:TS-0095 NaN \n8 UBERON:0000955 brain oio:hasDbXref DHBA:10155 NaN \n9 UBERON:0000955 brain oio:hasDbXref EFO:0000302 NaN \n10 UBERON:0000955 brain oio:hasDbXref EHDAA2:0000183 NaN \n11 UBERON:0000955 brain oio:hasDbXref EHDAA:2641 NaN \n12 UBERON:0000955 brain oio:hasDbXref EHDAA:6485 NaN \n13 UBERON:0000955 brain oio:hasDbXref EMAPA:16894 NaN \n14 UBERON:0000955 brain oio:hasDbXref EV:0100164 NaN \n15 UBERON:0000955 brain oio:hasDbXref FBbt:00005095 NaN \n16 UBERON:0000955 brain oio:hasDbXref FMA:50801 NaN \n17 UBERON:0000955 brain oio:hasDbXref GAID:571 NaN \n18 UBERON:0000955 brain oio:hasDbXref HBA:4005 NaN \n19 UBERON:0000955 brain oio:hasDbXref MA:0000168 NaN \n20 UBERON:0000955 brain oio:hasDbXref MAT:0000098 NaN \n21 UBERON:0000955 brain oio:hasDbXref MBA:8 NaN \n22 UBERON:0000955 brain oio:hasDbXref MBA:997 NaN \n23 UBERON:0000955 brain oio:hasDbXref MESH:D001921 NaN \n24 UBERON:0000955 brain oio:hasDbXref MIAA:0000098 NaN \n25 UBERON:0000955 brain oio:hasDbXref NCIT:C12439 NaN \n26 UBERON:0000955 brain oio:hasDbXref PBA:3999 NaN \n27 UBERON:0000955 brain oio:hasDbXref SCTID:258335003 NaN \n28 UBERON:0000955 brain oio:hasDbXref TAO:0000008 NaN \n29 UBERON:0000955 brain oio:hasDbXref UMLS:C0006104 NaN \n30 UBERON:0000955 brain oio:hasDbXref UMLS:C1269537 NaN \n31 UBERON:0000955 brain oio:hasDbXref VHOG:0000157 NaN \n32 UBERON:0000955 brain oio:hasDbXref Wikipedia:Brain NaN \n33 UBERON:0000955 brain oio:hasDbXref XAO:0000010 NaN \n34 UBERON:0000955 brain oio:hasDbXref ZFA:0000008 NaN \n35 UBERON:0000955 brain oio:hasDbXref galen:Brain NaN \n36 UBERON:0000955 brain oio:hasDbXref neuronames:21 NaN \n37 _:riog00027434 NaN oio:hasDbXref UBERON:0000955 brain \n\n mapping_justification subject_source object_source \n0 semapv:UnspecifiedMatching UBERON AAO \n1 semapv:UnspecifiedMatching UBERON ABA \n2 semapv:UnspecifiedMatching UBERON BAMS \n3 semapv:UnspecifiedMatching UBERON BAMS \n4 semapv:UnspecifiedMatching UBERON BILA \n5 semapv:UnspecifiedMatching UBERON BIRNLEX \n6 semapv:UnspecifiedMatching UBERON BTO \n7 semapv:UnspecifiedMatching UBERON CALOHA \n8 semapv:UnspecifiedMatching UBERON DHBA \n9 semapv:UnspecifiedMatching UBERON EFO \n10 semapv:UnspecifiedMatching UBERON EHDAA2 \n11 semapv:UnspecifiedMatching UBERON EHDAA \n12 semapv:UnspecifiedMatching UBERON EHDAA \n13 semapv:UnspecifiedMatching UBERON EMAPA \n14 semapv:UnspecifiedMatching UBERON EV \n15 semapv:UnspecifiedMatching UBERON FBbt \n16 semapv:UnspecifiedMatching UBERON FMA \n17 semapv:UnspecifiedMatching UBERON GAID \n18 semapv:UnspecifiedMatching UBERON HBA \n19 semapv:UnspecifiedMatching UBERON MA \n20 semapv:UnspecifiedMatching UBERON MAT \n21 semapv:UnspecifiedMatching UBERON MBA \n22 semapv:UnspecifiedMatching UBERON MBA \n23 semapv:UnspecifiedMatching UBERON MESH \n24 semapv:UnspecifiedMatching UBERON MIAA \n25 semapv:UnspecifiedMatching UBERON NCIT \n26 semapv:UnspecifiedMatching UBERON PBA \n27 semapv:UnspecifiedMatching UBERON SCTID \n28 semapv:UnspecifiedMatching UBERON TAO \n29 semapv:UnspecifiedMatching UBERON UMLS \n30 semapv:UnspecifiedMatching UBERON UMLS \n31 semapv:UnspecifiedMatching UBERON VHOG \n32 semapv:UnspecifiedMatching UBERON Wikipedia \n33 semapv:UnspecifiedMatching UBERON XAO \n34 semapv:UnspecifiedMatching UBERON ZFA \n35 semapv:UnspecifiedMatching UBERON galen \n36 semapv:UnspecifiedMatching UBERON neuronames \n37 semapv:UnspecifiedMatching _ UBERON ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
subject_idsubject_labelpredicate_idobject_idobject_labelmapping_justificationsubject_sourceobject_source
0UBERON:0000955brainoio:hasDbXrefAAO:0010478NaNsemapv:UnspecifiedMatchingUBERONAAO
1UBERON:0000955brainoio:hasDbXrefABA:BrainNaNsemapv:UnspecifiedMatchingUBERONABA
2UBERON:0000955brainoio:hasDbXrefBAMS:BrNaNsemapv:UnspecifiedMatchingUBERONBAMS
3UBERON:0000955brainoio:hasDbXrefBAMS:BrainNaNsemapv:UnspecifiedMatchingUBERONBAMS
4UBERON:0000955brainoio:hasDbXrefBILA:0000135NaNsemapv:UnspecifiedMatchingUBERONBILA
5UBERON:0000955brainoio:hasDbXrefBIRNLEX:796NaNsemapv:UnspecifiedMatchingUBERONBIRNLEX
6UBERON:0000955brainoio:hasDbXrefBTO:0000142NaNsemapv:UnspecifiedMatchingUBERONBTO
7UBERON:0000955brainoio:hasDbXrefCALOHA:TS-0095NaNsemapv:UnspecifiedMatchingUBERONCALOHA
8UBERON:0000955brainoio:hasDbXrefDHBA:10155NaNsemapv:UnspecifiedMatchingUBERONDHBA
9UBERON:0000955brainoio:hasDbXrefEFO:0000302NaNsemapv:UnspecifiedMatchingUBERONEFO
10UBERON:0000955brainoio:hasDbXrefEHDAA2:0000183NaNsemapv:UnspecifiedMatchingUBERONEHDAA2
11UBERON:0000955brainoio:hasDbXrefEHDAA:2641NaNsemapv:UnspecifiedMatchingUBERONEHDAA
12UBERON:0000955brainoio:hasDbXrefEHDAA:6485NaNsemapv:UnspecifiedMatchingUBERONEHDAA
13UBERON:0000955brainoio:hasDbXrefEMAPA:16894NaNsemapv:UnspecifiedMatchingUBERONEMAPA
14UBERON:0000955brainoio:hasDbXrefEV:0100164NaNsemapv:UnspecifiedMatchingUBERONEV
15UBERON:0000955brainoio:hasDbXrefFBbt:00005095NaNsemapv:UnspecifiedMatchingUBERONFBbt
16UBERON:0000955brainoio:hasDbXrefFMA:50801NaNsemapv:UnspecifiedMatchingUBERONFMA
17UBERON:0000955brainoio:hasDbXrefGAID:571NaNsemapv:UnspecifiedMatchingUBERONGAID
18UBERON:0000955brainoio:hasDbXrefHBA:4005NaNsemapv:UnspecifiedMatchingUBERONHBA
19UBERON:0000955brainoio:hasDbXrefMA:0000168NaNsemapv:UnspecifiedMatchingUBERONMA
20UBERON:0000955brainoio:hasDbXrefMAT:0000098NaNsemapv:UnspecifiedMatchingUBERONMAT
21UBERON:0000955brainoio:hasDbXrefMBA:8NaNsemapv:UnspecifiedMatchingUBERONMBA
22UBERON:0000955brainoio:hasDbXrefMBA:997NaNsemapv:UnspecifiedMatchingUBERONMBA
23UBERON:0000955brainoio:hasDbXrefMESH:D001921NaNsemapv:UnspecifiedMatchingUBERONMESH
24UBERON:0000955brainoio:hasDbXrefMIAA:0000098NaNsemapv:UnspecifiedMatchingUBERONMIAA
25UBERON:0000955brainoio:hasDbXrefNCIT:C12439NaNsemapv:UnspecifiedMatchingUBERONNCIT
26UBERON:0000955brainoio:hasDbXrefPBA:3999NaNsemapv:UnspecifiedMatchingUBERONPBA
27UBERON:0000955brainoio:hasDbXrefSCTID:258335003NaNsemapv:UnspecifiedMatchingUBERONSCTID
28UBERON:0000955brainoio:hasDbXrefTAO:0000008NaNsemapv:UnspecifiedMatchingUBERONTAO
29UBERON:0000955brainoio:hasDbXrefUMLS:C0006104NaNsemapv:UnspecifiedMatchingUBERONUMLS
30UBERON:0000955brainoio:hasDbXrefUMLS:C1269537NaNsemapv:UnspecifiedMatchingUBERONUMLS
31UBERON:0000955brainoio:hasDbXrefVHOG:0000157NaNsemapv:UnspecifiedMatchingUBERONVHOG
32UBERON:0000955brainoio:hasDbXrefWikipedia:BrainNaNsemapv:UnspecifiedMatchingUBERONWikipedia
33UBERON:0000955brainoio:hasDbXrefXAO:0000010NaNsemapv:UnspecifiedMatchingUBERONXAO
34UBERON:0000955brainoio:hasDbXrefZFA:0000008NaNsemapv:UnspecifiedMatchingUBERONZFA
35UBERON:0000955brainoio:hasDbXrefgalen:BrainNaNsemapv:UnspecifiedMatchingUBERONgalen
36UBERON:0000955brainoio:hasDbXrefneuronames:21NaNsemapv:UnspecifiedMatchingUBERONneuronames
37_:riog00027434NaNoio:hasDbXrefUBERON:0000955brainsemapv:UnspecifiedMatching_UBERON
\n
" }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv(\"output/brain-mappings.tsv\", sep=\"\\t\", comment=\"#\")\n", "df" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-04-25T16:48:50.385307Z", "start_time": "2024-04-25T16:48:50.361656Z" } }, "id": "97e4ed6ac44bb3a7" }, { "cell_type": "markdown", "source": [ "If we are only interested in a particular source we can use `--maps-to-source` (`-M`).\n", "\n", "E.g to filter to the Allen institute Developmental Human Brain Atlas (DHBA):" ], "metadata": { "collapsed": false }, "id": "55247dc5319b3d40" }, { "cell_type": "code", "execution_count": 15, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject_id: UBERON:0000955\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: DHBA:10155\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: brain\r\n", "subject_source: UBERON\r\n", "object_source: DHBA\r\n" ] } ], "source": [ " uberon mappings UBERON:0000955 -M DHBA" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-04-25T16:51:15.119155Z", "start_time": "2024-04-25T16:51:12.057014Z" } }, "id": "9e40062dd23a73c2" }, { "cell_type": "markdown", "source": [ "In theory all mappings should be to CURIEs registered in bioregistry.io, but in practice different ontologies may have\n", "a number of ad-hoc unmapped targets," ], "metadata": { "collapsed": false }, "id": "84617090ea0d0d97" }, { "cell_type": "markdown", "source": [ "## Mapping via reciprocal term\n", "\n", "We can also query Uberon for mappings to an external term:" ], "metadata": { "collapsed": false }, "id": "bb5a58e0646e6b00" }, { "cell_type": "code", "execution_count": 17, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject_id: UBERON:0000955\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: DHBA:10155\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: brain\r\n", "subject_source: UBERON\r\n", "object_source: DHBA\r\n" ] } ], "source": [ " uberon mappings DHBA:10155" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-04-25T16:52:59.593030Z", "start_time": "2024-04-25T16:52:56.793686Z" } }, "id": "5425083fbfa5323d" }, { "cell_type": "markdown", "source": [ "## Complex queries\n", "\n", "Like most OAK commands, the `mapping` command can take lists of labels, lists of IDs, or even complex query terms (which might themselves involve graphs).\n", "\n", "For example, we can look up mappings for all brain regions:" ], "metadata": { "collapsed": false }, "id": "860ef175aa0e1bb8" }, { "cell_type": "code", "execution_count": 18, "id": "70acf6ae", "metadata": { "ExecuteTime": { "end_time": "2024-04-25T16:55:29.165307Z", "start_time": "2024-04-25T16:55:19.722462Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oaklib-OeQZizwE-py3.9/lib/python3.9/site-packages/sssom/util.py:168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\r\n", " df.replace(\"\", np.nan, inplace=True)\r\n" ] } ], "source": [ "uberon mappings .desc//p=i,p brain -M ZFA -O sssom -o output/all-brain-zfa-mappings.tsv" ] }, { "cell_type": "code", "execution_count": 19, "id": "c68535b0", "metadata": { "ExecuteTime": { "end_time": "2024-04-25T16:55:43.131812Z", "start_time": "2024-04-25T16:55:43.117901Z" } }, "outputs": [ { "data": { "text/plain": " subject_id subject_label predicate_id \\\n0 UBERON:0000007 pituitary gland oio:hasDbXref \n1 UBERON:0000203 pallium oio:hasDbXref \n2 UBERON:0000204 ventral part of telencephalon oio:hasDbXref \n3 UBERON:0000430 ventral intermediate nucleus of thalamus oio:hasDbXref \n4 UBERON:0000935 anterior commissure oio:hasDbXref \n.. ... ... ... \n280 UBERON:2005340 nucleus of the posterior recess oio:hasDbXref \n281 UBERON:2007001 dorso-rostral cluster oio:hasDbXref \n282 UBERON:2007002 ventro-rostral cluster oio:hasDbXref \n283 UBERON:2007003 ventro-caudal cluster oio:hasDbXref \n284 UBERON:2007004 epiphysial cluster oio:hasDbXref \n\n object_id mapping_justification subject_source object_source \n0 ZFA:0000118 semapv:UnspecifiedMatching UBERON ZFA \n1 ZFA:0000505 semapv:UnspecifiedMatching UBERON ZFA \n2 ZFA:0000304 semapv:UnspecifiedMatching UBERON ZFA \n3 ZFA:0000370 semapv:UnspecifiedMatching UBERON ZFA \n4 ZFA:0001108 semapv:UnspecifiedMatching UBERON ZFA \n.. ... ... ... ... \n280 ZFA:0005340 semapv:UnspecifiedMatching UBERON ZFA \n281 ZFA:0007001 semapv:UnspecifiedMatching UBERON ZFA \n282 ZFA:0007002 semapv:UnspecifiedMatching UBERON ZFA \n283 ZFA:0007003 semapv:UnspecifiedMatching UBERON ZFA \n284 ZFA:0007004 semapv:UnspecifiedMatching UBERON ZFA \n\n[285 rows x 7 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
subject_idsubject_labelpredicate_idobject_idmapping_justificationsubject_sourceobject_source
0UBERON:0000007pituitary glandoio:hasDbXrefZFA:0000118semapv:UnspecifiedMatchingUBERONZFA
1UBERON:0000203palliumoio:hasDbXrefZFA:0000505semapv:UnspecifiedMatchingUBERONZFA
2UBERON:0000204ventral part of telencephalonoio:hasDbXrefZFA:0000304semapv:UnspecifiedMatchingUBERONZFA
3UBERON:0000430ventral intermediate nucleus of thalamusoio:hasDbXrefZFA:0000370semapv:UnspecifiedMatchingUBERONZFA
4UBERON:0000935anterior commissureoio:hasDbXrefZFA:0001108semapv:UnspecifiedMatchingUBERONZFA
........................
280UBERON:2005340nucleus of the posterior recessoio:hasDbXrefZFA:0005340semapv:UnspecifiedMatchingUBERONZFA
281UBERON:2007001dorso-rostral clusteroio:hasDbXrefZFA:0007001semapv:UnspecifiedMatchingUBERONZFA
282UBERON:2007002ventro-rostral clusteroio:hasDbXrefZFA:0007002semapv:UnspecifiedMatchingUBERONZFA
283UBERON:2007003ventro-caudal clusteroio:hasDbXrefZFA:0007003semapv:UnspecifiedMatchingUBERONZFA
284UBERON:2007004epiphysial clusteroio:hasDbXrefZFA:0007004semapv:UnspecifiedMatchingUBERONZFA
\n

285 rows × 7 columns

\n
" }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"output/all-brain-zfa-mappings.tsv\", sep=\"\\t\", comment=\"#\")\n", "df" ] }, { "cell_type": "markdown", "source": [ "## Predicates\n", "\n", "At the time of writing most ontologies bundle their mappings as oio:hasDbXref in the ontology. Some ontologies are starting to release richer SSSOM files. Other ontologies include both xref mappings and mappings with richer skos predicates as a part of the ontology release (this allows for backwards compatibility with tools that expect xrefs, but allows more modern tools to use the richer mappings).\n", "\n", "We will use mondo as an example here" ], "metadata": { "collapsed": false }, "id": "ab44faf418e56af6" }, { "cell_type": "code", "execution_count": 20, "outputs": [], "source": [ "alias mondo runoak -i sqlite:obo:mondo" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-04-25T16:57:43.499957Z", "start_time": "2024-04-25T16:57:43.486107Z" } }, "id": "9f9c2c7af301fc4d" }, { "cell_type": "code", "execution_count": 22, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ERROR:root:Skipping statements(subject=MONDO:0000179,predicate=skos:exactMatch,object=,value=None,datatype=None,language=None,); ValueError: is not a valid URI or CURIE\r\n", "subject_id: MONDO:0000179\r\n", "predicate_id: oio:hasDbXref\r\n", "object_id: NCIT:C14089\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: Neu-Laxova syndrome\r\n", "subject_source: MONDO\r\n", "object_source: NCIT\r\n", "\r\n", "---\r\n", "subject_id: MONDO:0000179\r\n", "predicate_id: skos:exactMatch\r\n", "object_id: NCIT:C14089\r\n", "mapping_justification: semapv:UnspecifiedMatching\r\n", "subject_label: Neu-Laxova syndrome\r\n", "subject_source: MONDO\r\n", "object_source: NCIT\r\n" ] } ], "source": [ "mondo mappings MONDO:0000179 -M NCIT" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-04-25T16:59:31.248740Z", "start_time": "2024-04-25T16:59:28.226262Z" } }, "id": "5d47e1f618fc412e" }, { "cell_type": "markdown", "source": [ "Here we can see what appears to be a duplicate mapping - but this is on purpose, Mondo includes the xref for backwards compatibility, and the skos:exactMatch for more modern tools." ], "metadata": { "collapsed": false }, "id": "da18145b15196206" }, { "cell_type": "markdown", "source": [ "## Generating Mappings\n", "\n", "The `lexmatch` command can be used to generate mappings between ontologies. This is a complex topic and is covered in the [OAK Guide to Mappings](https://incatools.github.io/ontology-access-kit/guide/mappings.html).\n", "\n", "See also [OBO Academy section of lexmatch](https://oboacademy.github.io/obook/tutorial/lexmatch-tutorial/)" ], "metadata": { "collapsed": false }, "id": "1d3e29c11ebca7fa" }, { "cell_type": "markdown", "source": [ "## Validating Mappings\n", "\n", "See the [ValidateMappings](ValidateMappings.ipynb) notebook for details on how to validate mappings using rule-based and LLM methods." ], "metadata": { "collapsed": false }, "id": "25696d60b3797d56" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false }, "id": "ecd99d366172e439" } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }