{ "cells": [ { "cell_type": "markdown", "id": "7373e90b82b3695e", "metadata": {}, "source": [ "# Summarizing with LLMs\n", "\n", "This notebook demonstrates how to summarize the output of LLMs using the [datasette LLM command line tool](https://llm.datasette.io/en/stable/).\n", "\n", "See also:\n", "\n", "- [How to use LLMs with OAK](https://incatools.github.io/ontology-access-kit/howtos/use-llms.html)\n" ] }, { "cell_type": "markdown", "id": "7e17ec75eca57b86", "metadata": {}, "source": [ "## Install the LLM command line tool\n", "\n", "```\n", "pip install llm\n", "```\n", "\n", "You may also want to install plugins for your models of choice:\n", "\n", "```\n", "pip install llm-deepseek\n", "```" ] }, { "cell_type": "markdown", "id": "be5a6ffbc0277213", "metadata": {}, "source": [ "## Summarize outputs\n", "\n", "You can redirect any output you like to `llm`. For example, consider this OAK query to get definition of all kinds of hearts in Uberon:" ] }, { "cell_type": "code", "execution_count": 6, "id": "65ca9da0284cc307", "metadata": { "ExecuteTime": { "end_time": "2025-02-08T02:23:24.381674Z", "start_time": "2025-02-08T02:23:19.484155Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "id\tlabel\tdefinition\n", "UBERON:0000948\theart\tA myogenic muscular circulatory organ found in the vertebrate cardiovascular system composed of chambers of cardiac muscle. It is the primary circulatory organ.\n", "UBERON:0007100\tprimary circulatory organ\tA hollow, muscular organ, which, by contracting rhythmically, keeps up the circulation of the blood or analogs[GO,modified].\n", "UBERON:0015202\tlymph heart\tA circulatory organ that is reponsible for pumping lymph throughout the body.\n", "UBERON:0015227\tperistaltic circulatory vessel\tA vessel down which passes a wave of muscular contraction, that forces the flow of haemolymphatic fluid.\n", "UBERON:0015228\tcirculatory organ\tA hollow, muscular organ, which, by contracting rhythmically, contributes to the circulation of lymph, blood or analogs. Examples: a chambered vertebrate heart; the tubular peristaltic heart of ascidians; the dorsal vessel of an insect; the lymoh heart of a reptile.\n", "UBERON:0015229\taccessory circulatory organ\tA circulatory organ that is not responsible for primary circulation.\n", "UBERON:0015230\tdorsal vessel heart\tThe caudal, pulsatile region of the dorsal vessel of the arthropod circulatory system.\n", "UBERON:0034961\tembryonic lymph heart\tA lymph heart that is part of an embryo.\n", "UBERON:0034962\tcopulatory lymph heart\tA lymph heart that assists in the return of lymph from the penis to the venous system.\n", "UBERON:0036259\tcardial lymph propulsor\tA lymphatic propulsor that lies tightly against the truncus arteriosus, the major outflow tract of the amphibian heart.\n", "UBERON:0034959\tright lymph heart\tNone\n", "UBERON:0034960\tleft lymph heart\tNone\n" ] } ], "source": [ "!runoak -i sqlite:obo:uberon definitions .sub \"circulatory organ\"" ] }, { "cell_type": "code", "execution_count": 7, "id": "bd010c282460f7a0", "metadata": { "ExecuteTime": { "end_time": "2025-02-08T02:24:35.943675Z", "start_time": "2025-02-08T02:24:22.150332Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This dataset contains definitions and critical comments on various anatomical terms related to circulatory and lymphatic organs. Here's a summary of the terms listed:\n", "\n", "1. **Heart (UBERON:0000948):** Defined as a myogenic muscular organ in vertebrates, responsible for circulating blood through its chambers of cardiac muscle. It is characterized as the primary circulatory organ.\n", "\n", "2. **Primary Circulatory Organ (UBERON:0007100):** Described as a hollow, muscular organ that rhythmically contracts to maintain blood circulation. This definition emphasizes the functional role of the heart or equivalent structures in different organisms.\n", "\n", ". **Lymph Heart (UBERON:0015202):** A type of circulatory organ whose main function is to pump lymph throughout the body, highlighting its role in the lymphatic system rather than the blood circulatory system.\n", "\n", "4. **Peristaltic Circulatory Vessel (UBERON:0015227):** A vessel that uses waves of muscular contraction to move haemolymphatic fluid, commonly found in invertebrates.\n", "\n", "5. **Circulatory Organ (UBERON:0015228):** Describes a general category of muscle-based organs that contribute to the circulation of lymph, blood, or analogous fluids. Examples cover a range of biological structures beyond vertebrate hearts, such as insect dorsal vessels and reptilian lymph hearts.\n", "\n", ". **Accessory Circulatory Organ (UBERON:0015229):** Defined as any circulatory organ that does not play a central role in primary circulation, indicating auxiliary support components within circulatory systems.\n", "\n", ". **Dorsal Vessel Heart (UBERON:0015230):** This is explained as the pulsatile section of an arthropod's dorsal vessel, emphasizing its role within the insect circulatory system.\n", "\n", "8. **Embryonic Lymph Heart (UBERON:0034961):** A lymph heart that functions within an embryo, which likely plays a role in early circulatory system development.\n", "\n", "9. **Copulatory Lymph Heart (UBERON:0034962):** This organ assists in returning lymph from the penis to the venous system, underlining a specific physiological function.\n", "\n", "10. **Cardial Lymph Propulsor (UBERON:0036259):** Located against the truncus arteriosus in amphibians, it functions in conjunction with the main outflow tract of the heart.\n", "\n", "11. **Right Lymph Heart (UBERON:0034959) & Left Lymph Heart (UBERON:0034960):** These entries lack definitions, suggesting areas where further information and research are needed.\n", "\n", "**Critical Comments on Definitions:**\n", " definitions provided in this dataset are generally concise and specify the structural and functional characteristics of each organ. However, some definitions could benefit from more context about their biological significance and differences across species. The absence of information for the \"Right Lymph Heart\" and \"Left Lymph Heart\" suggests that these terms might either be very specific anatomical components not yet well-characterized, or they may need further clarification and research to accurately define their roles and significance within the biological taxonomy provided by UBERON.\n" ] } ], "source": [ "!runoak -i sqlite:obo:uberon definitions .sub \"circulatory organ\" | llm -m 4o -s \"give a summary of these terms and critical comments on definitions\"" ] }, { "cell_type": "markdown", "id": "ddab5a52efb184b1", "metadata": {}, "source": [ "## Templates\n", "\n", "The llm tool allows you to define templates.\n", "\n", "`llm templates edit summarize-definitions` \n", "\n", "Then in your editor:\n", "\n", "```yaml\n", "system: give a summary of these terms and critical comments on definitions\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "id": "a7dc56ace685baa7", "metadata": { "ExecuteTime": { "end_time": "2025-02-08T02:27:39.062864Z", "start_time": "2025-02-08T02:27:22.621187Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This dataset provides definitions for various anatomical terms related to circulatory organs across different species, especially focusing on aspects of their structure and functions.\n", "\n", " **Heart (UBERON:0000948)**: Defined as a myogenic muscular organ in vertebrates responsible for circulating blood, it's depicted as the primary organ in the cardiovascular system. Critical insight could involve the need for clarification on variations in structure and function across different vertebrate species.\n", "\n", "2. **Primary Circulatory Organ (UBERON:0007100)**: Essentially an organ responsible for keeping blood or similar substances circulating via rhythmic contractions. The definition emphasizes its hollow and muscular nature. Critiques might focus on the broad definition that necessitates specifying how it differs from accessory organs in terms of function.\n", "\n", "3. **Lymph Heart (UBERON:0015202)**: A specialized organ pumping lymph, reflecting its distinct role from blood-circulating hearts. The definition broadly indicates its role but leaves out specific biological distinctions that might benefit more from context about lymphatic roles and species specifics.\n", "\n", "4. **Peristaltic Circulatory Vessel (UBERON:0015227)**: Characterized by its peristalsis-driven movement of haemolymphatic fluid, typical in certain invertebrates. Critical comments could address potential confusion with non-circulatory peristaltic functions and necessitate additional context on organism diversity.\n", "\n", " **Circulatory Organ (UBERON:0015228)**: A broader category encompassing organs that use rhythmic contractions to aid circulation, including various heart types across species. This definition could be clearer with distinctions in organ origins (e.g., evolutionary paths) and functioning mechanisms.\n", "\n", ". **Accessory Circulatory Organ (UBERON:0015229)**: These are defined as supportive rather than primary organs in aiding circulation. Critical comments may suggest defining criteria for what constitutes \"accessory\" and potential overlap with primary functions in certain conditions.\n", "\n", ". **Dorsal Vessel Heart (UBERON:0015230)**: Specific to arthropods, it describes the pulsatile nature of their dorsal vessel system. Insights could focus on comparing this with vertebrate systems and clarifying its unique contributions to arthropod physiology.\n", "\n", ". **Embryonic Lymph Heart (UBERON:0034961)**: This represents a lymph heart in an embryonic stage, underscoring developmental stages in lymphatic systems. Further definition could include comparative development timing across species.\n", "\n", ". **Copulatory Lymph Heart (UBERON:0034962)**: Highlights a role in returning lymph from the penis to the venous system, particularly in reproductive context. Critical engagement might probe its existence across species and correlation to reproductive strategies.\n", "\n", ". **Cardial Lymph Propulsor (UBERON:0036259)**: Applies to amphibians, lying against the truncus arteriosus, emphasizing a distinct lymphatic role. Definitions could improve by providing more detail on its structural uniqueness and interaction with other circulatory elements.\n", "\n", ". **Right/Left Lymph Heart (UBERON:0034959 and UBERON:0034960)**: Currently lack definitions, necessitating clarification or investigation into whether these terms need unique definitions or if they are context-specific adaptations or roles.\n", "\n", "Overall, while the dataset adequately defines the basic function and category of each term, critical comments might emphasize the need to offer more biological diversity context, specify differences between analogous structures across species, and clarify evolutionary or developmental stages mentioned but not detailed.\n" ] } ], "source": [ "!runoak -i sqlite:obo:uberon definitions .sub \"circulatory organ\" | llm -m 4o -t summarize-definitions\n" ] }, { "cell_type": "markdown", "id": "8d110396a9060620", "metadata": {}, "source": [ "## Gene summaries\n", "\n", "Create a template for summarizing gene annotations:\n", "\n", "`llm templates edit summarize-gaf-for-gene` \n", "\n", "```yaml\n", "system: I will provide you with GAF for a gene. Summarize the function of the gene.\n", " Give a one short description a biologist would understand.\n", " You may weave together multiple terms where there is redundancy.\n", " You should aim to be faithful to the GAF, but be aware that mistakes and over-annotation happens.\n", " If you see things that are unlikely, you can omit these.\n", " You may also produce some commentary at the end\n", " (e.g. 'the GAF showed annotation to X but this contradicts what is known about the gene')\n", " Do not focus on the evidence, or names, or IDs, or metadata about the annotation,\n", " just write the biological narrative.\n", " The exception is if this is really relevant (e.g. you may call into question a very old annotation if it\n", " does not make sense).\n", " Be aware that historically there has been over-annotation with experimental codes, for example, phenotypes from downstream effects.\n", " These are less relevant, and you should focus on the core activity, cellular process, and localization.\n", " You may however choose to briefly summarize phenotypic annotations (e.g. the role of G in process P has downstream effects E1, ...).\n", " Use your judgment to explain the story biologically rather than simply regurgitating terms.\n", " Note that the IBA code (inferred from biological ancestor) reflects high quality annotations in many species because these terms\n", " have been reviewed in a phylogenetic context and checked for over-annotation.\n", " But note that IBAs may sometimes be less complete, especially for organism-specific knowledge.\n", " Use your own biological knowledge.\n", " If aspects of the model are not clear, or you think there are errors, then at the end of your summary report on problems or anything that was not clear.\n", "```" ] }, { "cell_type": "code", "execution_count": 9, "id": "4e3830b8ff13215b", "metadata": { "ExecuteTime": { "end_time": "2025-02-11T01:31:44.007957Z", "start_time": "2025-02-11T01:31:39.246698Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# Query IDs: GO:0009229\n", "# Ontology closure predicates: rdfs:subClassOf, BFO:0000050\n", "#\n", "# The results include a round of expansion\n", "#\n", "subject\tpredicate\tobject\tproperty_values\tsubject_label\tpredicate_label\tobject_label\tnegated\tpublications\tevidence_type\tsupporting_objects\tprimary_knowledge_source\taggregator_knowledge_source\tsubject_closure\tsubject_closure_label\tobject_closure\tobject_closure_label\tcomments\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0009229\t\tSLC19A3\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000041\tIEA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tPMID:11342111\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tPMID:38547260\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0009229\t\tSLC25A19\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0009229\t\tSLC19A2\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0009229\t\tTHTPA\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005515\t\tSLC19A3\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0015234\t\tSLC19A3\tNone\tthiamine transmembrane transporter activity\tFalse\tGO_REF:0000024\tISS\t\tBHF-UCL\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0015234\t\tSLC19A3\tNone\tthiamine transmembrane transporter activity\tFalse\tReactome:R-HSA-199626\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0009229\t\tSLC19A3\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0015888\t\tSLC19A3\tNone\tthiamine transport\tFalse\tPMID:11731220\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0015888\t\tSLC19A3\tNone\tthiamine transport\tFalse\tPMID:33008889\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0015888\t\tSLC19A3\tNone\tthiamine transport\tFalse\tPMID:35512554\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0015888\t\tSLC19A3\tNone\tthiamine transport\tFalse\tPMID:35724964\tIMP\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0031923\t\tSLC19A3\tNone\tpyridoxine transport\tFalse\tPMID:33008889\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0031923\t\tSLC19A3\tNone\tpyridoxine transport\tFalse\tPMID:35512554\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0031923\t\tSLC19A3\tNone\tpyridoxine transport\tFalse\tPMID:35724964\tIMP\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0031923\t\tSLC19A3\tNone\tpyridoxine transport\tFalse\tPMID:36456177\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0042723\t\tSLC19A3\tNone\tthiamine-containing compound metabolic process\tFalse\tReactome:R-HSA-196819\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0071934\t\tSLC19A3\tNone\tthiamine transmembrane transport\tFalse\tGO_REF:0000024\tISS\t\tBHF-UCL\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005886\t\tSLC19A3\tNone\tplasma membrane\tFalse\tReactome:R-HSA-199626\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0016020\t\tSLC19A3\tNone\tmembrane\tFalse\tPMID:11136550\tNAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0005886\t\tSLC19A3\tNone\tplasma membrane\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0055085\t\tSLC19A3\tNone\ttransmembrane transport\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BZV2\tbiolink:related_to\tGO:0015234\t\tSLC19A3\tNone\tthiamine transmembrane transporter activity\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0004788\t\tTPK1\tNone\tthiamine diphosphokinase activity\tFalse\tPMID:11342111\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0004788\t\tTPK1\tNone\tthiamine diphosphokinase activity\tFalse\tPMID:38547260\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0005515\t\tTPK1\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0005524\t\tTPK1\tNone\tATP binding\tFalse\tGO_REF:0000043\tIEA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0016301\t\tTPK1\tNone\tkinase activity\tFalse\tGO_REF:0000043\tIEA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0030975\t\tTPK1\tNone\tthiamine binding\tFalse\tGO_REF:0000002\tIEA\t\tInterPro\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0042802\t\tTPK1\tNone\tidentical protein binding\tFalse\tPMID:25502805\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0042802\t\tTPK1\tNone\tidentical protein binding\tFalse\tPMID:29892012\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0042802\t\tTPK1\tNone\tidentical protein binding\tFalse\tPMID:31515488\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0042802\t\tTPK1\tNone\tidentical protein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0141200\t\tTPK1\tNone\tUTP thiamine diphosphokinase activity\tFalse\tPMID:38547260\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0006772\t\tTPK1\tNone\tthiamine metabolic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000041\tIEA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tPMID:11342111\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tPMID:38547260\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0010510\t\tTPK1\tNone\tregulation of acetyl-CoA biosynthetic process from pyruvate\tFalse\tPMID:38547260\tIMP\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0005829\t\tTPK1\tNone\tcytosol\tFalse\tReactome:R-HSA-196761\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0004788\t\tTPK1\tNone\tthiamine diphosphokinase activity\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9H3S4\tbiolink:related_to\tGO:0009229\t\tTPK1\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:not\tGO:0030233\t\tSLC25A19\tNone\tdeoxynucleotide transmembrane transporter activity\tTrue\tPMID:15539640\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:not\tGO:0030233\t\tSLC25A19\tNone\tdeoxynucleotide transmembrane transporter activity\tTrue\tPMID:17035501\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:not\tGO:0030302\t\tSLC25A19\tNone\tdeoxynucleotide transport\tTrue\tPMID:15539640\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0015297\t\tSLC25A19\tNone\tantiporter activity\tFalse\tGO_REF:0000043\tIEA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0030233\t\tSLC25A19\tNone\tdeoxynucleotide transmembrane transporter activity\tFalse\tPMID:11226231\tTAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0090422\t\tSLC25A19\tNone\tthiamine pyrophosphate transmembrane transporter activity\tFalse\tPMID:17035501\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0090422\t\tSLC25A19\tNone\tthiamine pyrophosphate transmembrane transporter activity\tFalse\tReactome:R-HSA-8875838\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0009229\t\tSLC25A19\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0030302\t\tSLC25A19\tNone\tdeoxynucleotide transport\tFalse\tPMID:11226231\tNAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0030974\t\tSLC25A19\tNone\tthiamine pyrophosphate transmembrane transport\tFalse\tPMID:17035501\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0042723\t\tSLC25A19\tNone\tthiamine-containing compound metabolic process\tFalse\tReactome:R-HSA-196819\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005634\t\tSLC25A19\tNone\tnucleus\tFalse\tPMID:21630459\tHDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005739\t\tSLC25A19\tNone\tmitochondrion\tFalse\tGO_REF:0000052\tIDA\t\tHPA\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005739\t\tSLC25A19\tNone\tmitochondrion\tFalse\tGO_REF:0000052\tIDA\t\tHPA\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005739\t\tSLC25A19\tNone\tmitochondrion\tFalse\tGO_REF:0000052\tIDA\t\tHPA\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005739\t\tSLC25A19\tNone\tmitochondrion\tFalse\tPMID:15539640\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005739\t\tSLC25A19\tNone\tmitochondrion\tFalse\tPMID:31506564\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005739\t\tSLC25A19\tNone\tmitochondrion\tFalse\tPMID:34800366\tHTP\t\tFlyBase\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005743\t\tSLC25A19\tNone\tmitochondrial inner membrane\tFalse\tReactome:R-HSA-8875838\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0030974\t\tSLC25A19\tNone\tthiamine pyrophosphate transmembrane transport\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0015234\t\tSLC25A19\tNone\tthiamine transmembrane transporter activity\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9HC21\tbiolink:related_to\tGO:0005743\t\tSLC25A19\tNone\tmitochondrial inner membrane\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0005515\t\tSLC19A2\tNone\tprotein binding\tFalse\tPMID:21836059\tIPI\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0005515\t\tSLC19A2\tNone\tprotein binding\tFalse\tPMID:21836059\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0008517\t\tSLC19A2\tNone\tfolic acid transmembrane transporter activity\tFalse\tPMID:10542220\tNAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015234\t\tSLC19A2\tNone\tthiamine transmembrane transporter activity\tFalse\tGO_REF:0000024\tISS\t\tBHF-UCL\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015234\t\tSLC19A2\tNone\tthiamine transmembrane transporter activity\tFalse\tPMID:10542220\tTAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015234\t\tSLC19A2\tNone\tthiamine transmembrane transporter activity\tFalse\tPMID:21836059\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015234\t\tSLC19A2\tNone\tthiamine transmembrane transporter activity\tFalse\tReactome:R-HSA-199626\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0007283\t\tSLC19A2\tNone\tspermatogenesis\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0009229\t\tSLC19A2\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015884\t\tSLC19A2\tNone\tfolic acid transport\tFalse\tGO_REF:0000108\tIEA\t\tGOC\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015888\t\tSLC19A2\tNone\tthiamine transport\tFalse\tPMID:10391222\tIMP\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015888\t\tSLC19A2\tNone\tthiamine transport\tFalse\tPMID:10542220\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015888\t\tSLC19A2\tNone\tthiamine transport\tFalse\tPMID:10542220\tNAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015888\t\tSLC19A2\tNone\tthiamine transport\tFalse\tPMID:33008889\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015888\t\tSLC19A2\tNone\tthiamine transport\tFalse\tPMID:35512554\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015888\t\tSLC19A2\tNone\tthiamine transport\tFalse\tPMID:35724964\tIMP\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0031923\t\tSLC19A2\tNone\tpyridoxine transport\tFalse\tPMID:33008889\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0031923\t\tSLC19A2\tNone\tpyridoxine transport\tFalse\tPMID:35512554\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0042723\t\tSLC19A2\tNone\tthiamine-containing compound metabolic process\tFalse\tReactome:R-HSA-196819\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0071934\t\tSLC19A2\tNone\tthiamine transmembrane transport\tFalse\tGO_REF:0000024\tISS\t\tBHF-UCL\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0005886\t\tSLC19A2\tNone\tplasma membrane\tFalse\tPMID:21836059\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0005886\t\tSLC19A2\tNone\tplasma membrane\tFalse\tReactome:R-HSA-199626\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0016020\t\tSLC19A2\tNone\tmembrane\tFalse\tPMID:10542220\tNAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0005886\t\tSLC19A2\tNone\tplasma membrane\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015888\t\tSLC19A2\tNone\tthiamine transport\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0015234\t\tSLC19A2\tNone\tthiamine transmembrane transporter activity\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:O60779\tbiolink:related_to\tGO:0055085\t\tSLC19A2\tNone\ttransmembrane transport\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0000287\t\tTHTPA\tNone\tmagnesium ion binding\tFalse\tGO_REF:0000024\tISS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0005515\t\tTHTPA\tNone\tprotein binding\tFalse\tPMID:32296183\tIPI\t\tIntAct\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0016787\t\tTHTPA\tNone\thydrolase activity\tFalse\tPMID:11827967\tTAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0050333\t\tTHTPA\tNone\tthiamine triphosphate phosphatase activity\tFalse\tPMID:11827967\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0006091\t\tTHTPA\tNone\tgeneration of precursor metabolites and energy\tFalse\tPMID:11827967\tNAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0006772\t\tTHTPA\tNone\tthiamine metabolic process\tFalse\tPMID:11827967\tTAS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0009229\t\tTHTPA\tNone\tthiamine diphosphate biosynthetic process\tFalse\tGO_REF:0000107\tIEA\t\tEnsembl\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0016311\t\tTHTPA\tNone\tdephosphorylation\tFalse\tPMID:11827967\tIDA\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0042357\t\tTHTPA\tNone\tthiamine diphosphate metabolic process\tFalse\tGO_REF:0000024\tISS\t\tUniProt\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0005829\t\tTHTPA\tNone\tcytosol\tFalse\tReactome:R-HSA-965067\tTAS\t\tReactome\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0000287\t\tTHTPA\tNone\tmagnesium ion binding\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0050333\t\tTHTPA\tNone\tthiamine triphosphate phosphatase activity\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n", "UniProtKB:Q9BU02\tbiolink:related_to\tGO:0042357\t\tTHTPA\tNone\tthiamine diphosphate metabolic process\tFalse\tGO_REF:0000033\tIBA\t\tGO_Central\tinfores:go\t\t\t\t\t\n" ] } ], "source": [ "!runoak -i amigo:NCBITaxon:9606 associations -p i,p -H --expand GO:0009229 " ] }, { "cell_type": "code", "execution_count": 10, "id": "c1b7855cebdc47ca", "metadata": { "ExecuteTime": { "end_time": "2025-02-11T01:58:57.530226Z", "start_time": "2025-02-11T01:58:38.140634Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The gene annotated with the process \"thiamine diphosphate biosynthetic process\" is involved in the synthesis of thiamine diphosphate (TDP), a coenzyme form of thiamine (vitamin B1) critical for various enzymatic reactions. Here's a summary of the functions of related gene products:\n", "\n", "TPK1 (Thiamine Pyrophosphokinase 1)**: TPK1 plays a direct role in the thiamine diphosphate biosynthetic process by catalyzing the conversion of thiamine (vitamin B1) into thiamine diphosphate. This enzyme exhibits thiamine diphosphokinase activity, utilizing ATP in the phosphorylation process. It is also involved in the regulation of acetyl-CoA biosynthesis from pyruvate, a crucial step in energy metabolism. It predominantly localizes in the cytosol.\n", "\n", "2. **SLC19A2 and SLC19A3 (Thiamine Transporters)**: These are integral membrane proteins that primarily facilitate the transmembrane transport of thiamine and its derivatives. They exhibit thiamine transmembrane transporter activity and localize to the plasma membrane. They also partake in transport processes of other vitamin B compounds, like pyridoxine. SLC19A2 is also implicated in folic acid transport and localized to the plasma membrane.\n", "\n", "SLC25A19**: This gene encodes a transporter responsible for the transmembrane transport of thiamine pyrophosphate (TPP), particularly across the mitochondrial inner membrane. Although associated with thiamine diphosphate biosynthetic processes, it primarily functions as a thiamine pyrophosphate transmembrane transporter, ensuring TPP availability within mitochondria for enzymatic processes.\n", "\n", " **THTPA (Thiamine Triphosphate Phosphatase)**: This enzyme is involved in the breakdown of thiamine triphosphate to thiamine diphosphate, contributing to the overall maintenance of thiamine phosphate balance within the cell. THTPA exhibits thiamine triphosphate phosphatase and general hydrolase activity with a prominent role in the cytosol.\n", "\n", " these proteins coordinate in the transport, synthesis, and utilization of thiamine derivatives, ensuring the bioavailability of thiamine diphosphate for critical metabolic pathways.\n", "\n", "**Commentary**: The annotations broadly reflect the core function of these genes in thiamine metabolism and transport. The GAF suggests some erroneous and overlapping annotations, such as the inclusion of folic acid transport for SLC19A2, which primarily functions as a thiamine transporter. Additionally, the \"thiamine diphosphate biosynthetic process\" might not directly apply to the transporters like SLC19A2 and SLC19A3, as these are more focused on the transport aspect rather than direct synthesis.\n" ] } ], "source": [ "!runoak -i amigo:NCBITaxon:9606 associations -p i,p -H --expand GO:0009229 | llm -m 4o -t summarize-gaf-for-gene" ] }, { "cell_type": "code", "execution_count": null, "id": "e8ccdf0b3998b48d", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }