{ "cells": [ { "cell_type": "markdown", "id": "6c9a188d", "metadata": {}, "source": [ "# Basic Ontology Interface Demo\n", "\n", "this demonstrates the use of the BasicOntologyInterface which provides simplified access to local or remote\n", "ontologies.\n", "\n", "We demonstrate the use of different backends, but in practice you will likely only use one depending on your use case.\n", "\n", "- pronto or sqlite for working with ontologies which you have a local copy of, and can trade startup time for generally faster operations\n", "- ubergraph or ontobee or ols or bioportal for an ontology that has been loaded into a remote server" ] }, { "cell_type": "markdown", "id": "667f0480", "metadata": {}, "source": [ "## Loading from obo files using Pronto" ] }, { "cell_type": "code", "execution_count": 1, "id": "d0255749", "metadata": {}, "outputs": [], "source": [ "from oaklib.implementations.pronto.pronto_implementation import ProntoImplementation\n", "from oaklib.resource import OntologyResource" ] }, { "cell_type": "markdown", "id": "6ef2643a", "metadata": {}, "source": [ "### Local files\n", "\n", "First we demonstrate loading from a file on the filesystem" ] }, { "cell_type": "code", "execution_count": 2, "id": "3b911252", "metadata": {}, "outputs": [], "source": [ "resource = OntologyResource(slug='go-nucleus.obo', directory='input', local=True)\n", "oi = ProntoImplementation(resource)" ] }, { "cell_type": "code", "execution_count": 4, "id": "af7bc82b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " rdfs:subClassOf ! subClassOf\n", " GO:0043231 ! intracellular membrane-bounded organelle\n", " BFO:0000050 ! part_of\n", " GO:0005737 ! cytoplasm\n" ] } ], "source": [ "rels = oi.outgoing_relationship_map('GO:0005773')\n", "for rel, parents in rels.items():\n", " print(f' {rel} ! {oi.label(rel)}')\n", " for parent in parents:\n", " print(f' {parent} ! {oi.label(parent)}')" ] }, { "cell_type": "markdown", "id": "74879ed0", "metadata": {}, "source": [ "### Remote (downloading from OBO)\n", "\n", "Next we use pronto's load from obo library feature" ] }, { "cell_type": "code", "execution_count": 5, "id": "33fd0318", "metadata": {}, "outputs": [], "source": [ "oi = ProntoImplementation(OntologyResource(local=False, slug='go.obo'))" ] }, { "cell_type": "markdown", "id": "82b8e42a", "metadata": {}, "source": [ "note this slight lag in executing the command above - while this method relieves\n", "the need to manage and synchronize files locally there is an initial network startup\n", "penalty" ] }, { "cell_type": "code", "execution_count": 6, "id": "6bb78662", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " rdfs:subClassOf ! subClassOf\n", " GO:0043231 ! intracellular membrane-bounded organelle\n", " BFO:0000050 ! part of\n", " GO:0005737 ! cytoplasm\n" ] } ], "source": [ "rels = oi.outgoing_relationship_map('GO:0005773')\n", "for rel, parents in rels.items():\n", " print(f' {rel} ! {oi.label(rel)}')\n", " for parent in parents:\n", " print(f' {parent} ! {oi.label(parent)}')" ] }, { "cell_type": "markdown", "id": "77e03f3f", "metadata": {}, "source": [ "## SQL Database access\n", "\n", "We can load from a SQL Database following Semantic SQL patterns (see [docs](https://github.com/cmungall/semantic-sql/#download-sqlite-dbs) for how to download ready-made sqlite dbs for all of OBO)" ] }, { "cell_type": "code", "execution_count": 7, "id": "fcb703b1", "metadata": {}, "outputs": [], "source": [ "from oaklib.implementations.sqldb.sql_implementation import SqlImplementation\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "90615183", "metadata": {}, "outputs": [], "source": [ "oi = SqlImplementation(OntologyResource(slug=f'sqlite:///input/go-nucleus.db'))" ] }, { "cell_type": "code", "execution_count": 9, "id": "b21e10d2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " rdfs:subClassOf ! subClassOf\n", " GO:0043231 ! intracellular membrane-bounded organelle\n", " BFO:0000050 ! part of\n", " GO:0005737 ! cytoplasm\n" ] } ], "source": [ "rels = oi.outgoing_relationship_map('GO:0005773')\n", "for rel, parents in rels.items():\n", " print(f' {rel} ! {oi.label(rel)}')\n", " for parent in parents:\n", " print(f' {parent} ! {oi.label(parent)}')" ] }, { "cell_type": "code", "execution_count": 10, "id": "38eeb8bb", "metadata": {}, "outputs": [], "source": [ "for curie in oi.basic_search('intracellular'):\n", " print(f' MATCH: {curie} ! {oi.label(curie)} ')" ] }, { "cell_type": "markdown", "id": "803c5777", "metadata": {}, "source": [ "### Determining which interfaces an implementation implements" ] }, { "cell_type": "code", "execution_count": 11, "id": "7db300b9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[oaklib.interfaces.validator_interface.ValidatorInterface,\n", " oaklib.interfaces.rdf_interface.RdfInterface,\n", " oaklib.interfaces.obograph_interface.OboGraphInterface,\n", " oaklib.interfaces.search_interface.SearchInterface,\n", " oaklib.interfaces.mapping_provider_interface.MappingProviderInterface,\n", " oaklib.interfaces.patcher_interface.PatcherInterface]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "oi.interfaces_implemented()" ] }, { "cell_type": "markdown", "id": "a43eab8d", "metadata": {}, "source": [ "## Loading from OWL ontologies using owlfun\n", "\n", "TODO" ] }, { "cell_type": "markdown", "id": "b8fa28d5", "metadata": {}, "source": [ "## Wrapping remote ontology portals\n", "\n", "### OLS\n", "\n", "TODO\n", "\n", "### BioPortal\n", "\n", "TODO" ] }, { "cell_type": "markdown", "id": "166a6345", "metadata": {}, "source": [ "## Wrapping SPARQL Endpoints\n", "\n", "### Ubergraph" ] }, { "cell_type": "code", "execution_count": 12, "id": "ffa5badd", "metadata": {}, "outputs": [], "source": [ "from oaklib.implementations.ubergraph.ubergraph_implementation import UbergraphImplementation\n", "oi = UbergraphImplementation()" ] }, { "cell_type": "code", "execution_count": 13, "id": "675cb043", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:root:Multiple labels for BFO:0000050 = ['part of', 'part_of']\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " BFO:0000050 ! part of\n", " GO:0005737 ! cytoplasm\n", " COB:0000072 ! part of\n", " GO:0005737 ! cytoplasm\n", " rdfs:subClassOf ! None\n", " GO:0043231 ! intracellular membrane-bounded organelle\n" ] } ], "source": [ "rels = oi.outgoing_relationship_map('GO:0005773')\n", "for rel, parents in rels.items():\n", " print(f' {rel} ! {oi.label(rel)}')\n", " for parent in parents:\n", " print(f' {parent} ! {oi.label(parent)}')" ] }, { "cell_type": "markdown", "id": "29d7d5bb", "metadata": {}, "source": [ "### notes\n", "\n", "Notice some of the differences with some of the other mechanisms:\n", "\n", " - ubergraph includes multiple ontologies, one of which 'injects' an legacy caro#part_of relationship\n", " - similarly there are different injected labels for the part-of relation\n", " - note also that the ubergraph implementation uses the actual predicate CURIE, currently pronto uses the shortname\n", " \n", "This also involves multiple iterative calls to the API which is inefficient.\n", "\n", "In future there will be an interface for 'bigger' operations that can be implemented more efficiently" ] }, { "cell_type": "markdown", "id": "b1925ff5", "metadata": {}, "source": [ "### Ontobee\n", "\n", "Currently the ontobee implementation doesn't allow the selection of a specific ontology\n", "within the triplestore -- instead the whole store is treated as if it were one giant\n", "ontology with everything merged together.\n", "\n", "This can be confusing when one ontology contains a stale part of another - e.g. if an ontology\n", "used to have a parent term but it has since been obsoleted." ] }, { "cell_type": "code", "execution_count": 14, "id": "2d793147", "metadata": {}, "outputs": [], "source": [ "from oaklib.implementations.ontobee.ontobee_implementation import OntobeeImplementation\n", "oi = OntobeeImplementation()" ] }, { "cell_type": "code", "execution_count": 26, "id": "19cf2710", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " rdfs:subClassOf ! subClassOf\n", " GO:0043231 ! intracellular membrane-bounded organelle\n", " BFO:0000050 ! part of\n", " GO:0005737 ! cytoplasm\n" ] } ], "source": [ "rels = oi.outgoing_relationship_map('GO:0005773')\n", "for rel, parents in rels.items():\n", " print(f' {rel} ! {oi.label(rel)}')\n", " for parent in parents:\n", " print(f' {parent} ! {oi.label(parent)}')" ] }, { "cell_type": "markdown", "id": "3818f6c6", "metadata": {}, "source": [ "# Graph Operations" ] }, { "cell_type": "code", "execution_count": 16, "id": "46df98cd", "metadata": {}, "outputs": [], "source": [ "oi = ProntoImplementation(OntologyResource(local=False, slug='go.obo'))" ] }, { "cell_type": "code", "execution_count": 17, "id": "a6c60020", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GO:0016020 ! membrane\n", "GO:0005773 ! vacuole\n", "GO:0005622 ! intracellular anatomical structure\n", "GO:0043227 ! membrane-bounded organelle\n", "GO:0043229 ! intracellular organelle\n", "GO:0043226 ! organelle\n", "GO:0043231 ! intracellular membrane-bounded organelle\n", "GO:0005737 ! cytoplasm\n", "GO:0110165 ! cellular anatomical entity\n", "GO:0005575 ! cellular_component\n" ] } ], "source": [ "ancs = oi.ancestors('GO:0005773')\n", "for anc in list(ancs):\n", " print(f'{anc} ! {oi.label(anc)}')" ] }, { "cell_type": "code", "execution_count": 19, "id": "6f5fc417", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GO:0005773 ! vacuole\n", "GO:0005622 ! intracellular anatomical structure\n", "GO:0043227 ! membrane-bounded organelle\n", "GO:0043229 ! intracellular organelle\n", "GO:0043226 ! organelle\n", "GO:0043231 ! intracellular membrane-bounded organelle\n", "GO:0005737 ! cytoplasm\n", "GO:0110165 ! cellular anatomical entity\n", "GO:0005575 ! cellular_component\n" ] } ], "source": [ "from oaklib.datamodels.vocabulary import IS_A, PART_OF\n", "\n", "ancs = oi.ancestors('GO:0005773', predicates=[IS_A, PART_OF])\n", "for anc in list(ancs):\n", " print(f'{anc} ! {oi.label(anc)}')" ] }, { "cell_type": "code", "execution_count": 20, "id": "41652e82", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GO:0005773 \"vacuole\" -rdfs:subClassOf \"subClassOf\"-> GO:0043231 \"intracellular membrane-bounded organelle\"\n", "GO:0005773 \"vacuole\" -BFO:0000050 \"part of\"-> GO:0005737 \"cytoplasm\"\n", "GO:0005737 \"cytoplasm\" -rdfs:subClassOf \"subClassOf\"-> GO:0110165 \"cellular anatomical entity\"\n", "GO:0005737 \"cytoplasm\" -BFO:0000050 \"part of\"-> GO:0005622 \"intracellular anatomical structure\"\n", "GO:0005622 \"intracellular anatomical structure\" -rdfs:subClassOf \"subClassOf\"-> GO:0110165 \"cellular anatomical entity\"\n", "GO:0110165 \"cellular anatomical entity\" -rdfs:subClassOf \"subClassOf\"-> GO:0005575 \"cellular_component\"\n", "GO:0043231 \"intracellular membrane-bounded organelle\" -rdfs:subClassOf \"subClassOf\"-> GO:0043227 \"membrane-bounded organelle\"\n", "GO:0043231 \"intracellular membrane-bounded organelle\" -rdfs:subClassOf \"subClassOf\"-> GO:0043229 \"intracellular organelle\"\n", "GO:0043229 \"intracellular organelle\" -rdfs:subClassOf \"subClassOf\"-> GO:0043226 \"organelle\"\n", "GO:0043229 \"intracellular organelle\" -BFO:0000050 \"part of\"-> GO:0005622 \"intracellular anatomical structure\"\n", "GO:0043226 \"organelle\" -rdfs:subClassOf \"subClassOf\"-> GO:0110165 \"cellular anatomical entity\"\n", "GO:0043227 \"membrane-bounded organelle\" -rdfs:subClassOf \"subClassOf\"-> GO:0043226 \"organelle\"\n" ] } ], "source": [ "def render(curie):\n", " return f'{curie} \"{oi.label(curie)}\"'\n", "for s,p,o in oi.walk_up_relationship_graph('GO:0005773', predicates=[IS_A, PART_OF]):\n", " print(f'{render(s)} -{render(p)}-> {render(o)}')" ] }, { "cell_type": "markdown", "id": "3041bfb2", "metadata": {}, "source": [ "## Rendering using GraphViz\n" ] }, { "cell_type": "code", "execution_count": 21, "id": "7017dc0e", "metadata": {}, "outputs": [], "source": [ "from oaklib.utilities.obograph_utils import graph_to_image, default_stylemap_path\n", "stylemap = default_stylemap_path()" ] }, { "cell_type": "code", "execution_count": 22, "id": "eda4ad03", "metadata": {}, "outputs": [], "source": [ "graph = oi.ancestor_graph('GO:0005773', predicates=[IS_A, PART_OF])" ] }, { "cell_type": "code", "execution_count": 23, "id": "eeaef328", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "ERROR:root:No og2dot\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "You need to install a node package to be able to visualize results\n", "\n", "npm install -g obographviz\n", "Then set your path to include og2dot\n" ] }, { "ename": "Exception", "evalue": "Cannot find og2dot on path. Install from https://github.com/INCATools/obographviz", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mException\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [23]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mgraph_to_image\u001b[49m\u001b[43m(\u001b[49m\u001b[43mgraph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mGO:0005773\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstylemap\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstylemap\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mimgfile\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43moutput/vacuole.png\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Documents/src/ontology-access-kit/src/oaklib/utilities/obograph_utils.py:102\u001b[0m, in \u001b[0;36mgraph_to_image\u001b[0;34m(graph, seeds, configure, stylemap, imgfile)\u001b[0m\n\u001b[1;32m 100\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnpm install -g obographviz\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 101\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mThen set your path to include og2dot\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 102\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m(\n\u001b[1;32m 103\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCannot find \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mexec\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m on path. Install from https://github.com/INCATools/obographviz\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 104\u001b[0m )\n\u001b[1;32m 105\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m tempfile\u001b[38;5;241m.\u001b[39mNamedTemporaryFile(\u001b[38;5;28mdir\u001b[39m\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m/tmp\u001b[39m\u001b[38;5;124m\"\u001b[39m, mode\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mw\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;28;01mas\u001b[39;00m tmpfile:\n\u001b[1;32m 106\u001b[0m style \u001b[38;5;241m=\u001b[39m {}\n", "\u001b[0;31mException\u001b[0m: Cannot find og2dot on path. Install from https://github.com/INCATools/obographviz" ] } ], "source": [ "graph_to_image(graph, ['GO:0005773'], stylemap=stylemap, imgfile='output/vacuole.png')" ] }, { "cell_type": "markdown", "id": "71bbcc3c", "metadata": {}, "source": [ "### Output\n", "\n", "![img](output/vacuole.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "81adbff6", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }