Command Line
Note
we follow the CLIG guidelines as far as possible
General Guidelines
Note
if you are running this as an internal OAK developer you need to precede the command with poetry shell
The general structure is:
runoak --input HANDLE COMMAND [COMMAND ARGS AND OPTIONS]
The value for --input
(which can be shorted to -i
) is specified in the Ontology Implementation Selectors documentation.
Examples:
runoak --input ubergraph: COMMAND [COMMAND ARGS AND OPTIONS]
runoak --input fbbt.obo COMMAND [COMMAND ARGS AND OPTIONS]
runoak --input cl.db COMMAND [COMMAND ARGS AND OPTIONS]
runoak --input sqlite:obo:cl COMMAND [COMMAND ARGS AND OPTIONS]
It can be useful to create aliases for individual ontologies. For example, to create an alias for the Uberon ontology:
alias uberon='runoak -i obo:sqlite:uberon'
You can specify further implementations with -a
which will create an aggregator implementation that wraps
multiple implementations. For example, you can multiplex queries over different endpoints.
Common Patterns
Term Lists
Many commands take a term or a list of terms as their primary argument. These are typically one of:
a CURIE such as
UBERON:0000955
a Search Syntax term, which is either:
an exact match to a label; for example “limb” or “plasma membrane”
a compound search term such as
t~limb
which finds terms with partial matches to limb
Search terms are expanded to matching CURIEs, and then fed into the command.
For example, (assuming the alias above) the following command will look up two terms using their labels:
uberon info hand foot
This is equivalent to:
uberon info UBERON:0002398 UBERON:0002397
Predicates
Many commands take a --predicates
option (shortened to -p
). This specifies a list of predicates
(aka relationship types, see Predicates) to be used in filtering. The list is specified as a comma-delimited
list (no spaces) of CURIEs.
For many biological ontologies, it can be useful to filter on is_a (rdfs:subClassOf) and part_of (BFO:0000050) so the command line interface understands shortcuts for these:
i
: is-a (i.e rdfs:subClassOf between two named classes)p
: part-of
For example, to draw the subgraph of terms starting from “hand” and “foot” and tracing upwards through is_a and part_of relationships:
uberon viz -p i,p hand foot
Commands
The following section is autogenerated from the inline docs. You should get the same results by running:
runoak COMMAND --help
runoak
Run the oaklib Command Line.
A subcommand must be passed - for example: ancestors, terms, …
Most commands require an input ontology to be specified:
runoak -i <INPUT SPECIFICATION> SUBCOMMAND <SUBCOMMAND OPTIONS AND ARGUMENTS>
Get help on any command, e.g:
runoak viz -h
runoak [OPTIONS] COMMAND [ARGS]...
Options
- -v, --verbose
- -q, --quiet <quiet>
- --stacktrace, --no-stacktrace
If set then show full stacktrace on error
- Default
False
- --save-as <save_as>
For commands that mutate the ontology, this specifies where changes are saved to
- --autosave, --no-autosave
For commands that mutate the ontology, this determines if these are automatically saved in place
- --named-prefix-map <named_prefix_map>
the name of a prefix map, e.g. obo, prefixcc
- --prefix <prefix>
prefix=expansion pair
- --metamodel-mappings <metamodel_mappings>
overrides for metamodel properties such as rdfs:label
- --import-depth <import_depth>
Maximum depth in the import tree to traverse. Currently this is only used by the pronto adapter
- -g, --associations <associations>
Location of ontology associations
- -G, --associations-type <associations_type>
Syntax of associations input
- -i, --input <input>
input implementation specification. This is either a path to a file, or an ontology selector
- -I, --input-type <input_type>
Input format. Permissible values vary depending on the context
- -a, --add <add>
additional implementation specification.
aliases
List aliases for a term or set of terms
Example:
runoak -i ubergraph:uberon aliases UBERON:0001988
TERMS should be either an explicit list of terms or queries, or can be a selector query, such as ‘.all’ to fetch all terms in the ontology
Show all aliases:
runoak -i db/envo.db aliases .all
Currently the core behavior of this command assumes a simple datamodel for aliases, where an aliases is a simple property-value tuples, with the property being from some standard vocabulary (e.g. skos:altLabel, oboInOwl, etc)
If you know the synonyms follow the OBO/oboInOwl datamodel you can pass –obo-model, this will give back richer data if present in the ontology, including synonym categories/types, synonym provenance
In future, this may become the default
runoak aliases [OPTIONS] [TERMS]...
Options
- --obo-model, --no-obo-model
If true, assume the OBO synonym datamodel, including provenancem synonym types
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
ancestors
List all ancestors of a given term or terms.
Here ancestor means the transitive closure of the parent relationship, where a parent includes all relationship types, not just is-a.
Example:
runoak -i cl.owl ancestors CL:4023094
This will show ancestry over the full relationship graph. Like any relational OAK command, this can be filtered by relationship type (predicate), using –predicate (-p). For exampple, constrained to is-a and part-of:
runoak -i cl.owl ancestors CL:4023094 -p i,BFO:0000050
Multiple backends can be used, including ubergraph:
runoak -i ubergraph: ancestors CL:4023094 -p i,BFO:0000050
Search terms can also be used:
runoak -i cl.owl ancestors ‘goblet cell’
Multiple terms can be passed:
runoak -i sqlite:go.db ancestors GO:0005773 GO:0005737 -p i,p
More background:
runoak ancestors [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- -O, --output-type <output_type>
Desired output type
- --statistics, --no-statistics
For each ancestor, show statistics.
- Default
False
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
annotate
Annotate a piece of text using a Named Entity Recognition annotation
Example:
runoak -i bioportal: annotate “enlarged nucleus in T-cells from peripheral blood”
Currently most implementations do not yet support annotation.
See the ontorunner framework for plugins for SciSpacy and OGER - these will later become plugins.
If gilda is installed as an extra, it can be used, but
--matches-whole-text
(-W
) must be specified, as gilda only performs grounding.Example:
runoak -i gilda: annotate -W BRCA2
Programmatic usage:
This command is a wrapper onto the annotate_text method, this is provided as part of the TextAnnotator interface:
https://incatools.github.io/ontology-access-kit/interfaces/text-annotator
Aliases can be listed in the output by setting the flag –include-aliases to true (default: false).
Example (using the plugin oakx-spacy):
runoak -i spacy:sqlite:obo:bero annotate Myeloid derived suppressor cells. –include-aliases
will yield:
confidence: 0.8 object_aliases: - Myeloid-Derived Suppressor Cells - MDSCs - mdscs - myeloid-derived suppressor cells object_id: obo:MESH_D000072737 object_label: Myeloid-Derived Suppressor Cells subject_end: 30 subject_start: 0
runoak annotate [OPTIONS] [WORDS]...
Options
- -W, --matches-whole-text, --no-W, --no-matches-whole-text
if true, then only show matches that span the entire input text
- Default
False
- --include-aliases, --no-include-aliases
Include alias maps in output.
- Default
False
- --text-file <text_file>
Text file to annotate. Each newline separated entry is a distinct text.
- -L, --lexical-index-file <lexical_index_file>
path to lexical index. This is recreated each time unless –no-recreate is passed
- -m, --model <model>
Name of trained model to use for annotation, e.g. ‘en_ner_craft_md’.
- -x, --exclude-tokens <exclude_tokens>
Text file or list of tokens to filter from input prior to annotation. If passed as text file, each newline separated entry is a distinct text.
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
Arguments
- WORDS
Optional argument(s)
apply
Applies a patch to an ontology. The patch should be specified using KGCL syntax, see https://github.com/INCATools/kgcl
Example:
runoak -i cl.owl.ttl apply “rename CL:0000561 to ‘amacrine neuron’” -o cl.owl.ttl -O ttl
On an obo format file:
runoak -i simpleobo:go-edit.obo apply “rename GO:0005634 from ‘nucleus’ to ‘foo’” -o go-edit-new.obo
With URIs:
runoak -i cl.owl.ttl apply “rename <http://purl.obolibrary.org/obo/CL_0000561> from ‘amacrine cell’ to ‘amacrine neuron’” -o cl.owl.ttl -O ttl
WARNING:
This command is still experimental. Some things to bear in mind:
for some ontologies, CURIEs may not work, instead specify a full URI surrounded by <>s
only a subset of KGCL commands are supported by each backend
runoak apply [OPTIONS] [COMMANDS]...
Options
- -o, --output <output>
- --changes-output <changes_output>
output file for KGCL changes
- --changes-input <changes_input>
Path to an input changes file
- --changes-format <changes_format>
Format of the changes file (json or kgcl)
- --dry-run, --no-dry-run
if true, only perform the parse of KCGL and do not apply
- Default
False
- --expand, --no-expand
if true, expand complex changes to atomic changes
- Default
True
- --ignore-invalid-changes, --no-ignore-invalid-changes
if true, ignore invalid changes, e.g. obsoletions of dependent entities
- Default
False
- --contributor <contributor>
CURIE for the person contributing the patch
- -O, --output-type <output_type>
Desired output type
- --overwrite, --no-overwrite
If set, any changes applied will be saved back to the input file/source
Arguments
- COMMANDS
Optional argument(s)
apply-obsolete
Sets an ontology element to be obsolete
Example:
runoak -i my.obo apply-obsolete MY:0002200 -o my-modified.obo
Multiple terms can be passed, as labels, IDs, or using OAK queries:
runoak -i my.obo apply-obsolete MY:1 MY:2 MY:3 … -o my-modified.obo
This may be chained, for example to take all terms matching a search query and then obsolete them all:
runoak -i my.db search ‘l/^Foo/` | runoak -i my.db –autosave apply-obsolete -
This command is partially redundant with the more general “apply” command
runoak apply-obsolete [OPTIONS] [TERMS]...
Options
- -o, --output <output>
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
associations
Lookup associations from or to entities.
Example:
runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations
The above will show all associations
To query using an ontology term, including is-a closure, specify one or more terms or term queries, plus the closure predicate(s), e.g.
Example:
runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations -p i HP:0001392
This shows all annotations either to “Abnormality of the liver” (HP:0001392), or to is-a descendants
runoak associations [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
- --if-absent <if_absent>
determines behavior when the value is not present or is empty.
- Options
absent-only | present-only
- -S, --set-value <set_value>
the value to set for all terms for the given property.
- --association-predicates <association_predicates>
A comma-separated list of predicates for the association relation
- -Q, --terms-role <terms_role>
How to interpret query terms.
- Default
object
- Options
subject | object | both
Arguments
- TERMS
Optional argument(s)
axioms
Filters axioms
Example:
runoak -i cl.ofn axiom
The above will write all axioms.
You can filter by axiom type:
Example:
runoak -i cl.ofn axiom –axiom-type SubClassOf
Note this currently only works with the funowl adapter, on functional syntax files
runoak axioms [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- --axiom-type <axiom_type>
Type of axiom, e.g. SubClassOf
- --about <about>
CURIE that the axiom is about
- --references <references>
CURIEs that the axiom references
Arguments
- TERMS
Optional argument(s)
cache-clear
Clear the contents of the pystow oaklib cache.
runoak cache-clear [OPTIONS]
Options
- --days-old <days_old>
Clear anything more than this number of days old
- Default
100
cache-ls
List the contents of the pystow oaklib cache.
TODO: this currently only works on unix-based systems.
runoak cache-ls [OPTIONS]
definitions
Show textual definitions for term or set of terms
Example:
runoak -i sqlite:obo:envo definitions ‘tropical biome’ ‘temperate biome’
You can use the “.all” selector to show all definitions for all terms in the ontology:
Example:
runoak -i sqlite:obo:envo definitions .all
You can also include definition metadata, such as provenance and source:
runoak -i sqlite:obo:cl definitions –additional-metadata neuron
runoak definitions [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -D, --display <display>
A comma-separated list of display options. Use ‘all’ for all
- -O, --output-type <output_type>
Desired output type
- Options
obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl
- --if-absent <if_absent>
determines behavior when the value is not present or is empty.
- Options
absent-only | present-only
- --additional-metadata, --no-additional-metadata
if true then fetch additional metadata about statements stored as OWL reification
- Default
False
- -S, --set-value <set_value>
the value to set for all terms for the given property.
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
Arguments
- TERMS
Optional argument(s)
descendants
List all descendants of a term
Example:
runoak -i sqlite:obo:obi descendants assay -p i
Example:
runoak -i sqlite:obo:uberon descendants heart -p i,p
This is the inverse of the ‘ancestors’ command; see the documentation for that command. But note that ‘descendants’ commands have the potential to be more “explosive” than ancestors commands, especially for high level terms, and for when predicates are not specified
More background:
runoak descendants [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- -D, --display <display>
A comma-separated list of display options. Use ‘all’ for all
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
diff
Compute difference between two ontologies.
Example:
runoak -i foo.obo diff -X bar.obo -o diff.yaml
This will produce a list of Changes that are required to go from the main input ontology (–input) to the other ontology (–other-ontology, or -X).
The output follows the KGCL data model. See https://incatools.github.io/ontology-access-kit/datamodels/kgcl/index.html
You can use –output-type to control the output format.
KGCL controlled natural language:
runoak -i foo.obo diff -X bar.obo -o diff.txt –output-type kgcl
KGCL JSON:
runoak -i foo.obo diff -X bar.obo -o diff.json –output-type json
YAML (default):
runoak -i foo.obo diff -X bar.obo -o diff.yaml –output-type yaml
The –statistics option can be used to generate summary statistics for the changes. These are grouped according to the –group-by-property option. For example, the GO uses the oio:hasOBONamespace property to partition classes into 3 categories.
Example:
runoak -i go.obo diff -X go-new.obo -o diff.yaml –statistics –group-by-property oio:hasOBONamespace
This will produce a YAML dictionary, with outer keys being the values of the oio:hasOBONamespace property, and inner keys being the change types.
If –group-by-property is not specified, or there is no value for this property, then the outer key will be “__RESIDUAL__”
For summary statistics, you can also specify –output-type csv, to get a tabular out
Limitations:
This does not do a diff over every axiom in each ontology. For a complete OWL diff, you should use ROBOT.
runoak diff [OPTIONS]
Options
- -X, --other-ontology <other_ontology>
other ontology
- --simple, --no-simple
perform a quick difference showing only terms that differ
- Default
False
- --statistics, --no-statistics
show summary statistics only
- Default
False
- --group-by-property <group_by_property>
group summaries by a metadata property, e.g. rdfs:isDefinedBy
- --group-by-obo-namespace, --no-group-by-obo-namespace
shortcut for –group-by-property oio:hasOBONamespace (note this is distinct from the ID namespace)
- Default
False
- --group-by-defined-by, --no-group-by-defined-by
shortcut for –group-by-property rdfs:isDefinedBy. This may be inferred from prefix if not set explicitly
- Default
False
- --group-by-prefix, --no-group-by-prefix
shortcut for –group-by-property sh:prefix. Groups by the prefix of the CURIE
- Default
False
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
diff-associations
Diffs two association sources. EXPERIMENTAL.
This functionality may move out of core
runoak diff-associations [OPTIONS]
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
- -g, --associations <associations>
associations
- -X, --other-associations <other_associations>
other associations
diff-terms
Compares a pair of terms in two ontologies
EXPERIMENTAL
runoak diff-terms [OPTIONS] [TERMS]...
Options
- --other-ontology <other_ontology>
other ontology
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
diff-via-mappings
Calculates cross-ontology diff using mappings
Given a pair of ontologies, and mappings that connect terms in both ontologies, this command will perform a structural comparison of all mapped pairs of terms
Example:
runoak -i sqlite:obo:uberon diff-via-mappings –other-input sqlite:obo:zfa –source UBERON –source ZFA -O csv
Note the above command does not have any mapping file specified; the mappings that are distributed within each ontology is used (in this case, Uberon contains mappings to ZFA)
If the mappings are provided externally:
runoak -i ont1.obo diff-via-mappings –other-input ont2.obo –mapping-input mappings.sssom.tsv
(in the above example, –source is not passed, so all mappings are tested)
If there are no existing mappings, you can use the lexmatch command to generate them:
runoak -i ont1.obo diff-via-mappings -a ont2.obo lexmatch -o mappings.sssom.tsv
runoak -i ont1.obo diff-via-mappings –other-input ont2.obo –mapping-input mappings.sssom.tsv
The output from this command follows the cross-ontology-diff data model (https://incatools.github.io/ontology-access-kit/datamodels/cross-ontology-diff/index.html)
This can be serialized in YAML or TSV form
runoak diff-via-mappings [OPTIONS] [TERMS]...
Options
- -S, --source <source>
ontology prefixes e.g. HP, MP
- --mapping-input <mapping_input>
File of mappings in SSSOM format. If not provided then mappings in ontology(ies) are used
- -X, --other-input <other_input>
Additional input file
- --other-input-type <other_input_type>
Type of additional input file
- --intra, --no-intra
If true, then all sources are in the main input ontology
- Default
False
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- --include-identity-mappings, --no-include-identity-mappings
Use identity relation as mapping; use this for two versions of the same ontology
- Default
False
- --filter-category-identical, --no-filter-category-identical
Do not report cases where a relationship has not changed
- Default
False
- --bidirectional, --no-bidirectional
Show diff from both left and right perspectives
- Default
True
- -p, --predicates <predicates>
A comma-separated list of predicates
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
dump
Exports (dumps) the entire contents of an ontology.
Example:
runoak -i pato.obo dump -o pato.json -O json
Example:
runoak -i pato.owl dump -o pato.ttl -O turtle
Currently each implementation only supports a subset of formats.
Some dumpers accept additional options. For example, dumping to fhirjson accepts –include-all-predicates, which changes the default behavior from only exporting IS_A to all mappable predicates.
The dump command is also blocked for remote endpoints such as Ubergraph, to avoid killer queries.
runoak dump [OPTIONS] [TERMS]...
Options
- -o, --output <output>
- --include-all-predicates, --no-include-all-predicates
For formats that export only IS_A by default, this will include all possible predicates
- Default
False
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
enrichment
Run class enrichment analysis.
runoak enrichment [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
- --if-absent <if_absent>
determines behavior when the value is not present or is empty.
- Options
absent-only | present-only
- -S, --set-value <set_value>
the value to set for all terms for the given property.
- --cutoff <cutoff>
The cutoff for the p-value
- Default
0.05
- -S, --sample-file <sample_file>
file containing input list of entity IDs (e.g. gene IDs)
- -B, --background-file <background_file>
file containing background list of entity IDs (e.g. gene IDs)
- --association-predicates <association_predicates>
A comma-separated list of predicates for the association relation
Arguments
- TERMS
Optional argument(s)
eval-taxon-constraints
Test candidate taxon constraints
Multiple candidate constraints can be passed as arguments. these are in the form of triples separated by periods.
Example:
runoak -i db/go.db eval-taxon-constraints -p i,p GO:0005743 only NCBITaxon:2759 never NCBITaxon:2 . GO:0005634 only NCBITaxon:2
The –evolution-file (-E) option can be used to pass in a file of candidates. This should follow the format used in https://arxiv.org/abs/1802.06004
E.g.
GO:0000229,Gain|NCBITaxon:1(root);>Loss|NCBITaxon:2759(Eukaryota);
Example:
runoak -i db/go.db eval-taxon-constraints -p i,p -E tests/input/go-evo-gains-losses.csv
runoak eval-taxon-constraints [OPTIONS] [CONSTRAINTS]...
Options
- -E, --evolution-file <evolution_file>
path to file containing gains and losses
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
Arguments
- CONSTRAINTS
Optional argument(s)
expand-subsets
For each subset provide a mapping of each term in the ontology to a subset
Example:
runoak -i db/pato.db expand-subsets attribute_slim value_slim
runoak expand-subsets [OPTIONS] [SUBSETS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
Arguments
- SUBSETS
Optional argument(s)
extract-triples
Extracts a subontology as triples
Currently the only endpoint to implement this is ubergraph. Ontobee seems to have performance issues with the query
This will soon be supported in the SqlDatabase/Sqlite endpoint
Example:
runoak -v -i ubergraph: extract-triples GO:0005635 CL:0000099 -o test.ttl -O ttl
runoak extract-triples [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
fill-table
Fills missing values in a table of ontology elements
See https://incatools.github.io/ontology-access-kit/src/oaklib.utilities.table_filler
Given a TSV with a populated ID column, and unpopulated columns for definition, label, mappings, ancestors, this will iterate through each row filling in each missing value by performing ontology lookups.
In some cases, this can also perform reverse lookups; i.e given a table with labels populated and blank IDs, then fill in the IDs
In the most basic scenario, you have a table with two columns ‘id’ and ‘label’. These are the “conventional” column headers for a table of ontology elements (see later for configuration when you don’t follow conventions)
Example:
runoak -i cl.owl.ttl fill-table my-table.tsv
(any implementation can be used)
The same command will work for the reverse scenario - when you have labels populated, but IDs are not populated
By default this will throw an error if a lookup is not successful; this can be relaxed
Relaxed:
runoak -i cl.owl.ttl fill-table –allow-missing my-table.tsv
In this case missing values that cannot be populated will remain empty
To explicitly populate a value:
runoak -i cl.owl.ttl fill-table –missing-value-token NO_DATA my-table.tsv
Currently the following columns are recognized:
id – the unique identifier of the element
label – the rdfs:label of the element
definition – the definition of the element
mappings – mappings for the element
ancestors – ancestors for the element (this can be parameterized)
The metadata inference procedure will also work for when you have denormalized TSV files with columns such as “foo_id” and “foo_name”. This will be recognized as an implicit normalized label relation between id and name of a foo element.
You can be more explicit in one of two ways:
Pass in a YAML structure (on command line or in a YAML file) listing relations
Pass in a LinkML data definitions YAML file
For the first method, you can pass in multiple relations using the –relation arg. For example, given a TSV with columns cl_identifier and cl_display_label you can say:
Example:
runoak -i cl.owl.ttl fill-table –relation “{primary_key: cl_identifier, dependent_column: cl_display_label, relation: label}”
You can also specify this in a YAML file
For the 2nd method, you need to specify a LinkML schema with a class where (1) at least one field is annotated as being an identifier (2) one or more slots have slot_uri elements mapping them to standard metadata elements such as rdfs:label.
For example, my-schema.yaml:
- classes:
- Person:
- attributes:
- id:
identifier: true
- name:
slot_uri: rdfs:label
This is a powerful command with many ways of configuring it - we will add separate docs for this soon, for now please file an issue on github with any questions
TODO: allow for an option that will perform fuzzy matches of labels
TODO: reverse lookup is not provided for all fields, such as definitions
TODO: add an option to detect inconsistencies
TODO: add logical for obsoletion/replaced by
TODO: use most optimized method for whichever backend
runoak fill-table [OPTIONS] TABLE_FILE
Options
- --allow-missing, --no-allow-missing
Allow some dependent values to be blank, post-processing
- Default
False
- --missing-value-token <missing_value_token>
Populate all missing values with this token
- --schema <schema>
Path to linkml schema
- --delimiter <delimiter>
Delimiter between columns in input and output
- Default
- --comment <comment>
Comment indicator at the beginning of a row.
- Default
#
- --relation <relation>
Serialized YAML string corresponding to a normalized relation between two columns
- --relation-file <relation_file>
Path to YAML file corresponding to a list of normalized relation between two columns
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TABLE_FILE
Required argument
info
Show information on term or set of terms
Example:
runoak -i sqlite:obo:cl info CL:4023094
The default output is minimal, showing only ID and label
The –output-type (-O) option can be used to specify other formats for the output.
Currently there are only a few output types are supported. More will be provided in future.
In OBO format:
runoak -i cl.owl info CL:4023094 -O obo
As CSV:
runoak -i cl.obo info CL:4023094 -O csv
The info output format can be parameterized with –display (-D)
With xrefs and definitions:
runoak -i cl.owl info CL:4023094 -D x,d
With all information:
runoak -i cl.owl info CL:4023094 -D all
Like all OAK commands, input term lists can be multivalued, a mixture of IDs and labels, as well as queries that can be combined using boolean logic
Info on two STATO terms:
runoak -i ontobee:stato info STATO:0000286 STATO:0000287 -O obo
All terms in ENVO with the string “forest” in them:
runoak -i sqlite:obo:envo info l~forest
Info on all subtypes of “statistical hypothesis test” in STATO:
runoak -i sqlite:obo:stato info .desc//p=i ‘statistical hypothesis test’
runoak info [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -D, --display <display>
A comma-separated list of display options. Use ‘all’ for all
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
labels
Show labels for term or list of terms
Example:
runoak -i cl.owl labels CL:4023093 CL:4023094
You can use the “.all” selector to show all labels:
Example:
runoak -i cl.owl labels .all
(this may be blocked for remote endpoints)
You can query for terms that have either no label, or to include only ones with labels:
Nodes with no labels:
runoak -i cl.owl labels .all –if-absent exclude
runoak labels [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -D, --display <display>
A comma-separated list of display options. Use ‘all’ for all
- -O, --output-type <output_type>
Desired output type
- Options
obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl
- --if-absent <if_absent>
determines behavior when the value is not present or is empty.
- Options
absent-only | present-only
- -S, --set-value <set_value>
the value to set for all terms for the given property.
Arguments
- TERMS
Optional argument(s)
leafs
List all leaf nodes in the ontology
Like all OAK relational commands, this is parameterized by –predicates (-p). Note that the default is to return the roots of the relation graph over all predicates
Example:
runoak -i db/cob.db leafs
This command is a wrapper onto the “leafs” command in the BasicOntologyInterface.
https://incatools.github.io/ontology-access-kit/interfaces/basic.html# oaklib.interfaces.basic_ontology_interface.BasicOntologyInterface.leafs
runoak leafs [OPTIONS]
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
- --filter-obsoletes, --no-filter-obsoletes
If set, results will exclude obsoletes
- Default
True
lexmatch
Performs lexical matching between pairs of terms in one more more ontologies.
Examples:
runoak -i foo.obo lexmatch -o foo.sssom.tsv
In this example, the input ontology file is assumed to contain all pairs of terms to be mapped.
It is more common to map between all pairs of terms in two ontology files. In this case, you can merge the ontologies using a tool like ROBOT; or, to avoid a merge preprocessing step, use the –addl (-a) option to specify a second ontology file.
runoak -i foo.obo –add bar.obo lexmatch -o foo.sssom.tsv
By default, this command will compare all terms in all ontologies. You can use the OAK term query syntax to pass in the set of all terms to be compared.
For example, to compare all terms in union of FOO and BAR namespaces:
runoak -i foo.obo –add bar.obo lexmatch -o foo.sssom.tsv i^FOO: i^BAR:
All members of the set are compared (including FOO to FOO matches and BAR to BAR matches), omitting trivial reciprocal matches.
Use an “@” separator between two queries to feed in two explicit sets:
runoak -i foo.obo –add bar.obo lexmatch -o foo.sssom.tsv i^FOO: @ i^BAR:
ALGORITHM: lexmatch implements a simple algorithm:
create a lexical index, keyed by normalized strings of labels, synonyms
report all pairs of entities that have the same key
The lexical index can be exported (in native YAML) using -L:
runoak -i foo.obo lexmatch -L foo.index.yaml -o foo.sssom.tsv
Note: if you run the above command a second time it will be faster as the index will be reused.
RULES: Using custom rules:
runoak -i foo.obo lexmatch -R match_rules.yaml -L foo.index.yaml -o foo.sssom.tsv
Full documentation:
module-oaklib.utilities.lexical.lexical_indexer
runoak lexmatch [OPTIONS] [TERMS]...
Options
- -R, --rules-file <rules_file>
path to rules file. Conforms to rules_datamodel. e.g. https://github.com/INCATools/ontology-access-kit/blob/main/tests/input/matcher_rules.yaml
- --add-labels, --no-add-labels
Populate empty labels with URI fragments or CURIE local IDs, for ontologies that use semantic IDs
- Default
False
- -L, --lexical-index-file <lexical_index_file>
path to lexical index. This is recreated each time unless –no-recreate is passed
- --recreate, --no-recreate
if true and lexical index is specified, always recreate, otherwise load from index
- Default
True
- --ensure-strict-prefixes, --no-ensure-strict-prefixes
Clean prefix map and mappings before generating an output.
- Default
True
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
lint
Lints an ontology, applying changes in place.
The current implementation is highly incomplete, and only handles linting of syntactic patterns (chains of whitespace, trailing whitespace) in labels and definitions.
The output is a list of changes, in a KCGL-compliant syntax.
By default, changes will be applied
Example:
runoak -i my.obo lint
This can be executed in dry-run mode, in which case changes are not applied:
runoak -i my.obo lint –dry-run
One common workflow is to emit the changes to a KCGL file which is manually checked, then applied as a separate step.
Example workflow:
runoak -i my.obo lint –dry-run -o changes.kgcl # examine and edit changes.kgcl runoak -i my.obo apply –changes-input changes.kgcl
runoak lint [OPTIONS]
Options
- -o, --output <output>
- --report-format <report_format>
Output format for reporting proposed/applied changes
- --dry-run, --no-dry-run
If true, nothing will be modified by executing command
- -O, --output-type <output_type>
Desired output type
logical-definitions
Show all logical definitions for a term or terms.
To show all logical definitions in an ontology, pass the “.all” query term
Example; first create an alias:
alias pato=”runoak -i obo:sqlite:pato”
Then run the query:
pato logical-definitions .all
By default, “.all” will query all axioms for all terms including merged terms; to restrict to only the current terms, use an ID query:
pato logical-definitions i^PATO
You can also restrict to branches:
pato logical-definitions .desc//p=i “physical object quality”
By default, the output is a subset of OboGraph datamodel rendered as YAML, e.g.
- definedClassId: PATO:0045071
genusIds: - PATO:0001439 restrictions: - fillerId: PATO:0000461
propertyId: RO:0015010
You can also specify CSV to generate a flattened form of this.
Example:
pato logical-definitions .all –output-type csv
You can optionally choose to “unmelt” or flatten this, such that:
Each property/predicate is a column
For repeated properties, columns of the form prop_1, prop_2, … are generated
Example:
pato logical-definitions .all –unmelt –output-type csv
Limitations:
Currently this only works for definitions that follow a basic genus-differentia pattern, which is what is currently represented in the OboGraph datamodel.
Consider using the “axioms” command for inspection of complex nested OWL axioms.
runoak logical-definitions [OPTIONS] [TERMS]...
Options
- --unmelt, --no-unmelt
Flatten to a wide table
- Default
False
- -p, --predicates <predicates>
A comma-separated list of predicates
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
- --if-absent <if_absent>
determines behavior when the value is not present or is empty.
- Options
absent-only | present-only
- -S, --set-value <set_value>
the value to set for all terms for the given property.
Arguments
- TERMS
Optional argument(s)
mappings
List all mappings encoded in the ontology
Example:
runoak -i sqlite:obo:envo mappings
The default output is SSSOM YAML. To use the (canonical) csv format:
runoak -i sqlite:obo:envo mappings -O sssom
By default, labels are not included. Use –autolabel to include labels (but note that if the label is not in the source ontology, then no label will be retrieved)
runoak -i sqlite:obo:envo mappings -O sssom
To constrain the mapped object source:
runoak -i sqlite:obo:foodon mappings -O sssom –maps-to-source SUBSET_SIREN
runoak mappings [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- --maps-to-source <maps_to_source>
Return only mappings with subject or object source equal to this
Arguments
- TERMS
Optional argument(s)
migrate-curies
Rewires an ontology replacing all instances of an ID or IDs
Note: the specified ontology is modified in place
The input for this command is a list equals-separated pairs, specifying the source and the target
Example:
runoak -i db/uberon.db migrate-curies –replace SRC1=TGT1 SRC2=TGT2
This command is a wrapper onto the “migrate_curies” command in the PatcherInterface
oaklib.interfaces.patcher_interface.PatcherInterface.migrate_curies
runoak migrate-curies [OPTIONS] [CURIE_PAIRS]...
Options
- --replace, --no-replace
If true, will update in place
- Default
False
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
Arguments
- CURIE_PAIRS
Optional argument(s)
obsoletes
Shows all obsolete entities.
Example:
runoak -i obolibrary:go.obo obsoletes
To exclude merged terms, use the
--no-include-merged
flagExample:
runoak -i obolibrary:go.obo obsoletes –no-include-merged
To show migration relationships, use the
--show-migration-relationships
flagExample:
runoak -i obolibrary:go.obo obsoletes –show-migration-relationships
You can also specify terms to show obsoletes for:
Example:
runoak -i obolibrary:go.obo obsoletes –show-migration-relationships GO:0000187 GO:0000188
runoak obsoletes [OPTIONS] [TERMS]...
Options
- --include-merged, --no-include-merged
Include merged terms in output
- Default
True
- --show-migration-relationships, --no-show-migration-relationships
Show migration relationships (e.g. replaced_by, consider)
- Default
False
- -O, --output-type <output_type>
Desired output type
- Options
obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
ontologies
Shows all ontologies
If the input is a pre-merged ontology, then the output of this command is trivially a single line, with the name of the input ontology
This command is more meaningful when the input is a multi-ontology endpoint, e.g
runoak -i ubergraph ontologies
In future this command will be expanded to allow showing more metadata about each ontology
runoak ontologies [OPTIONS]
Options
- -o, --output <output>
Output file, e.g. obo file
ontology-metadata
Shows ontology metadata
Example:
runoak -i bioportal: ontology-metadata obi uberon foodon
Use the
--all
option to show all ontologiesExample:
runoak -i bioportal: ontology-metadata –all
By default the output is YAML. You can get the results as TSV:
Example:
runoak -i bioportal: ontology-metadata –all -O csv
Warning
The output data model is not yet standardized – this may change in future
runoak ontology-metadata [OPTIONS] [ONTOLOGIES]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- --all, --no-all
If true, show all ontologies. Use in place of passing an explicit list
- Default
False
Arguments
- ONTOLOGIES
Optional argument(s)
ontology-versions
Shows ontology versions
Currently only implemented for BioPortal
Example:
runoak -i bioportal: ontology-versions mp
All ontologies:
runoak -i bioportal ontology-versions –all
runoak ontology-versions [OPTIONS] [ONTOLOGIES]...
Options
- -o, --output <output>
Output file, e.g. obo file
- --all, --no-all
If true, show all ontologies. Use in place of passing an explicit list
- Default
False
Arguments
- ONTOLOGIES
Optional argument(s)
paths
List all paths between one or more start curies
Example:
runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’
This shows all shortest paths from nuclear membrane to all ancestors
Example:
runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’ –target cytoplasm
This shows shortest paths between two nodes
Example:
runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’ ‘thylakoid’ –target cytoplasm ‘thylakoid membrane’
This shows all shortest paths between 4 combinations of starts and ends
You can also use “@” to separate start node list and end node list. Like most OAK commands, you can pass either explicit terms, or term queries. For example, if you have two files of IDs, then you can do this:
runoak -i sqlite:obo:go paths -p i,p .idfile START_NODES.txt @ .idfile END_NODES.txt
You can also pass in weights for each predicate, used when calculating shortest paths.
Example:
runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’ –target cytoplasm –predicate-weights “{i: 0.0001, p: 999}”
This shows all shortest paths after weighting relations
(Note: you can use the same shorthands as in the –predicates option)
This command can be combined with others to visualize the paths.
Example:
alias go=”runoak -i sqlite:obo:go” go paths -p i,p ‘nuclear membrane’ –target cytoplasm –flat | go viz –fill-gaps -
This visualizes the path by first exporting the path as a flat list, then passing the results to viz, using the fill-gaps option
runoak paths [OPTIONS] [TERMS]...
Options
- --target <target>
end point of path
- --flat, --no-flat
If true then output path is written a list of terms
- Default
False
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- -p, --predicates <predicates>
A comma-separated list of predicates
- -O, --output-type <output_type>
Desired output type
- --predicate-weights <predicate_weights>
key-value pairs specified in YAML where keys are predicates or shorthands and values are weights
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
prefixes
Shows prefix declarations.
All standard prefixes:
runoak prefixes
Specific prefixes:
runoak prefixes GO CL oio skos
By default, prefix maps are exported as simple pairwise TSVs.
Prefixes can also be exported in different formats, such as YAML and JSON, where they are simple dictionaries:
In yaml:
runoak prefixes –O yaml
In turtle:
runoak prefixes –O rdf
For RDF exports, the prefix declaration should appear in BOTH prefix declarations, AND also as instances of SHACL PrefixDeclarations, e.g.
@prefix CL: <http://purl.obolibrary.org/obo/CL_> . … [] a sh:PrefixDeclaration ;
sh:namespace CL: ; sh:prefix “CL” .
The default prefixmap is always used, unless options are passed specifying additional prefix maps.
Example:
runoak –named-prefix-map prefixcc prefixes
If an ontology is loaded, then –used-only can be used to restrict to prefixes for entities in that ontology
runoak -i sqlite:obo:cl prefixes –used-only
runoak prefixes [OPTIONS] [TERMS]...
Options
- -o, --output <output>
- --used-only, --no-used-only
If True, show only prefixes used in ontology
- Default
False
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
relationships
Show all relationships for a term or terms
By default, this shows all relationships where the input term(s) are the subjects
Example:
runoak -i cl.db relationships CL:4023094
Like all OAK commands, a label can be passed instead of a CURIE
Example:
runoak -i cl.db relationships neuron
To reverse the direction, and query where the search term(s) are objects, use the –direction flag:
Example:
runoak -i cl.db relationships –direction down neuron
Multiple terms can be passed
Example:
runoak -i uberon.db relationships heart liver lung
And like all OAK commands, a query can be passed rather than an explicit term list
The following query lists all arteries in the limb together which what structures they supply
Query:
runoak -i uberon.db relationships -p RO:0002178 .desc//p=i “artery” .and .desc//p=i,p “limb”
runoak relationships [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- --direction <direction>
direction of traversal over edges, which up is subject to object, down is object to subject.
- Options
up | down | both
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
- --if-absent <if_absent>
determines behavior when the value is not present or is empty.
- Options
absent-only | present-only
- -S, --set-value <set_value>
the value to set for all terms for the given property.
- --include-entailed, --no-include-entailed
Include entailed indirect relationships
- Default
False
- --include-tbox, --no-include-tbox
Include class-class relationships (subclass and existentials)
- Default
True
- --include-abox, --no-include-abox
Include instance relationships (class and object property assertions)
- Default
True
Arguments
- TERMS
Optional argument(s)
roots
List all root nodes in the ontology
Like all OAK relational commands, this is parameterized by –predicates (-p). Note that the default is to return the roots of the relation graph over all predicates. This can sometimes give unintuitive results, so we recommend always being explicit and parameterizing
Example:
runoak -i db/cob.db roots
This command is a wrapper onto the “roots” command in the BasicOntologyInterface.
https://incatools.github.io/ontology-access-kit/interfaces/basic.html# oaklib.interfaces.basic_ontology_interface.BasicOntologyInterface.roots
runoak roots [OPTIONS]
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
- -P, --has-prefix <has_prefix>
filter based on a prefix, e.g. OBI
- -O, --output-type <output_type>
Desired output type
- -A, --annotated-roots, --no-annotated-roots, --no-A
If true, use annotated roots, if present
- Default
False
search
Searches ontology for entities that have a label, alias, or other property matching a search term.
Example:
runoak -i uberon.obo search limb
This uses the Pronto implementation to load uberon from disk, and does a basic substring search over the labels and synonyms - results are not ranked
Bioportal (all ontologies):
runoak -i bioportal: search limb
(You need to set your API key first)
This uses the Bioportal API to search over a broad set of ontologies, returning a ranked list ranked by relevance. There may be many results, the results are streamed, do ctrl^C to stop
Ubergraph (all ontologies):
runoak -i ubergraph: search limb
Ubergraph (one ontology):
runoak -i ubergraph:uberon search limb
For more on search, see https://incatools.github.io/ontology-access-kit/interfaces/search.html
Warning
The behavior of search is not yet fully unified across endpoints
runoak search [OPTIONS] [TERMS]...
Options
- -O, --output-type <output_type>
Desired output type
- Options
obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
set-apikey
Sets an API key
- Example:
oak set-apikey -e bioportal MY-KEY-VALUE
This is stored in an OS-dependent path
runoak set-apikey [OPTIONS] KEYVAL
Options
- -e, --endpoint <endpoint>
Required Name of endpoint, e.g. bioportal
Arguments
- KEYVAL
Required argument
siblings
List all siblings of a specified term or terms
Example:
runoak -i cl.owl siblings CL:4023094
Note that siblings is by default over ALL relationship types, so we recommend always being explicit and passing a predicate using -p (–predicates)
runoak siblings [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- Options
obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl
Arguments
- TERMS
Optional argument(s)
similarity
All by all similarity
This calculates a similarity matrix for two sets of terms.
Input sets of a terms can be specified in different ways:
via a file
via explicit lists of terms or queries
Example:
runoak -i hp.db similarity -p i –set1-file HPO-TERMS1 –set2-file HPO-TERMS2 -O csv
This will compare every term in TERMS1 vs TERMS2
Alternatively standard OAK term queries can be used, with “@” separating the two lists
Example:
runoak -i hp.db similarity -p i TERM_1 TERM_2 … TERM_N @ TERM_N+1 … TERM_M
The .all term syntax can be used to select all terms in an ontology
Example:
runoak -i ma.db similarity -p i,p .all @ .all
This can be mixed with other term selectors; for example to calculate the similarity of “neuron” vs all terms in CL:
runoak -i cl.db similarity -p i,p .all @ neuron
An example pipeline to do all by all over all phenotypes in HPO:
Explicit:
runoak -i hp.db descendants -p i HP:0000118 > HPO runoak -i hp.db similarity -p i –set1-file HPO –set2-file HPO -O csv -o RESULTS.tsv
The same thing can be done more compactly with term queries:
runoak -i hp.db similarity -p i .desc//p=i HP:0000118 @ .desc//p=i HP:0000118
runoak similarity [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- --set1-file <set1_file>
ID file for set1
- --set2-file <set2_file>
ID file for set2
- --jaccard-minimum <jaccard_minimum>
Minimum value for jaccard score
- --ic-minimum <ic_minimum>
Minimum value for information content
- -o, --output <output>
path to output
- --main-score-field <main_score_field>
Score used for summarization
- Default
phenodigm_score
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
similarity-pair
Determine pairwise similarity between two terms using a variety of metrics
NOTE: this command may be deprecated, consider using similarity
Note: We recommend always specifying explicit predicate lists
Example:
runoak -i ubergraph: similarity-pair -p i,p CL:0000540 CL:0000000
You can omit predicates if you like but be warned this may yield hard to interpret results.
E.g.
runoak -i ubergraph: similarity-pair CL:0000540 GO:0001750
yields “fully formed stage” (i.e these are both found in the adult) as the MRCA
For phenotype ontologies, UPHENO relationship types connect phenotype terms to anatomy, etc:
runoak -i ubergraph: similarity-pair MP:0010922 HP:0010616 -p i,p,UPHENO:0000001
Background: https://incatools.github.io/ontology-access-kit/interfaces/semantic-similarity.html
runoak similarity-pair [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
Arguments
- TERMS
Optional argument(s)
singletons
List all singleton nodes in the ontology
Like all OAK relational commands, this is parameterized by –predicates (-p). Note that the default is to return the singletons of the relation graph over all predicates
Obsoletes are filtered by default
Example:
runoak -i db/cob.db singletons
This command is a wrapper onto the “singletons” command in the BasicOntologyInterface.
https://incatools.github.io/ontology-access-kit/interfaces/basic.html# oaklib.interfaces.basic_ontology_interface.BasicOntologyInterface.singletons
runoak singletons [OPTIONS]
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
- --filter-obsoletes, --no-filter-obsoletes
If set, results will exclude obsoletes
- Default
True
statistics
Shows all descriptive/summary statistics
Example:
runoak -i sqlite:obo:pr statistics
By default, this will show combined summary statistics for all terms
You can also break down the statistics in two ways:
by a collection of branch roots
by a metadata property (e.g. oio:hasOBONamespace, rdfs:isDefinedBy)
by prefix (e.g. GO, PR, CL, OBI)
Example:
runoak -i sqlite:obo:pr statistics -p oio:hasOBONamespace
Note: the oio:hasOBONamespace is not the same as the ID prefix, it is a field that is used by a subset of ontologies to partition classes into broad groupings, similar to subsets. Its use is non-standard, yet a lot of ontologies use this as the main partitioning mechanism.
A note on bundled ontologies:
The standard release many OBO ontologies “bundles” parts of other ontologies (formally, the release product includes a merged imports closure of import modules). This can complicate generation of statistics. A naive count of all classes in the main OBI release will include not only “native” OBI classes, but also classes from other ontologies that are bundled in the release.
For bundled ontologies, we recommend some kind of partitioning, such as via defined roots, or via the CURIE prefix, using the
--group-by-prefix
option.Ouput formats:
The recommended output types for this command are yaml, json, or csv. The default output type is yaml, following the SummaryStatistics data model. This is naturally nested, as the statistics includes faceted groupings (e.g. edge counts are broken down by predicate). When specifying a flat format like csv, this is flattened into a single table, with dynamic column names.
Change statistics:
You can optionally combine the ontology statistics with a change summary relative to another ontology, using the
--compare-with
option.Example:
runoak -i v2.obo statistics –group-by-obo-namespace –compare-with v1.obo
This will also include change stats broken down by KGCL change types. If a group-by option is specified, these will be grouped accordingly.
runoak statistics [OPTIONS] [BRANCHES]...
Options
- -O, --output-type <output_type>
Desired output type
- Options
obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl
- --group-by-property <group_by_property>
group summaries by a metadata property, e.g. rdfs:isDefinedBy
- --group-by-obo-namespace, --no-group-by-obo-namespace
shortcut for –group-by-property oio:hasOBONamespace (note this is distinct from the ID namespace)
- Default
False
- --group-by-prefix, --no-group-by-prefix
shortcut for –group-by-property sh:prefix. Groups by the prefix of the CURIE
- Default
False
- --group-by-defined-by, --no-group-by-defined-by
shortcut for –group-by-property rdfs:isDefinedBy. This may be inferred from prefix if not set explicitly
- Default
False
- --include-residuals, --no-include-residuals
If true include an OTHER category for terms that do not have the property
- -X, --compare-with <compare_with>
Compare with another ontology
- -P, --has-prefix <has_prefix>
filter based on a prefix, e.g. OBI
- -o, --output <output>
Output file, e.g. obo file
Arguments
- BRANCHES
Optional argument(s)
subsets
Shows information on subsets
Example:
runoak -i obolibrary:go.obo subsets
Example:
runoak -i cl.owl subsets
For background on subsets, see https://incatools.github.io/ontology-access-kit/concepts.html#subsets
Note you can use subsets in selector queries for other commands; e.g. to fetch all terms (directly) in goslim_generic in GO:
Example:
runoak -i sqlite:obo:go info .in goslim_generic
See also:
term-subsets command, which shows relationships of terms to subsets
runoak subsets [OPTIONS]
Options
- -o, --output <output>
Output file, e.g. obo file
synonymize
Apply synonymizer rule from the rules file to generate KGCL syntax see https://github.com/INCATools/kgcl.
- Example:
runoak -i foo.obo synonymize -R foo_rules.yaml –patch patch.kgcl –apply-patch
runoak synonymize [OPTIONS] [TERMS]...
Options
- -R, --rules-file <rules_file>
path to rules file. Conforms to rules_datamodel. e.g. https://github.com/INCATools/ontology-access-kit/blob/main/tests/input/matcher_rules.yaml
- --apply-patch, --no-apply-patch
Apply KGCL syntax generated based on the synonymizer rules file.
- Default
False
- --patch <patch>
Output patch file containing KGCL commands.
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
taxon-constraints
Compute all taxon constraints for a term or terms.
This will apply rules using the inferred ancestors of subject terms, as well as inferred ancestors/descendants of taxon terms.
The input ontology MUST include both the taxon constraint relationships AND the relevant portion of NCBI Taxonomy
Example:
runoak -i db/go.db taxon-constraints GO:0034357 –include-redundant -p i,p
Example:
runoak -i sqlite:obo:uberon taxon-constraints UBERON:0003884 UBERON:0003941 -p i,p
This command is a wrapper onto taxon_constraints_utils:
runoak taxon-constraints [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -p, --predicates <predicates>
A comma-separated list of predicates
- -A, --all, --no-A, --no-all
if specified then perform for all terms
- Default
False
- --include-redundant, --no-include-redundant
if specified then include redundant taxon constraints from ancestral subjects
- Default
False
Arguments
- TERMS
Optional argument(s)
term-categories
List categories for a term or set of terms
TODO
runoak term-categories [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- --category-system <category_system>
Example: biolink, cob, bfo, dbpedia, …
Arguments
- TERMS
Optional argument(s)
term-metadata
Shows term metadata.
Example:
runoak -i sqlite:obo:uberon term-metadata lung heart
You can filter the results for only selected predicates:
runoak -i sqlite:obo:uberon term-metadata lung heart -p id,oio:hasDbXref
The default output is YAML documents, where each YAML document is a term, with keys representing selected predicates. Values are always lists of atoms, even when there is typically one value (e.g. rdfs:label)
runoak term-metadata [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- -p, --predicates <predicates>
A comma-separated list of predicates
- --additional-metadata, --no-additional-metadata
if true then fetch additional metadata about statements stored as OWL reification
- Default
False
Arguments
- TERMS
Optional argument(s)
term-subsets
List subsets for a term or set of terms
runoak term-subsets [OPTIONS] [TERMS]...
Options
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
Arguments
- TERMS
Optional argument(s)
terms
List all terms in the ontology
Example:
runoak -i db/cob.db terms
All terms without obsoletes:
runoak -i prontolib:cl.obo terms –filter-obsoletes
By default “terms” is considered to be any entity type in the ontology. Use –owl-type to constrain this:
Classes:
runoak -i sqlite:obo:ro terms –owl-type owl:Class
Relationship types (Object properties):
runoak -i sqlite:obo:ro terms –owl-type owl:ObjectProperty
Annotation properties:
runoak -i sqlite:obo:omo terms –owl-type owl:AnnotationProperty
runoak terms [OPTIONS]
Options
- --filter-obsoletes, --no-filter-obsoletes
If set, results will exclude obsoletes
- Default
True
- -o, --output <output>
Output file, e.g. obo file
- --owl-type <owl_type>
only include entities of this type, e.g. owl:Class, rdf:Property
termset-similarity
Termset similarity
This calculates a similarity matrix for two sets of terms.
Example:
runoak -i go.db termset-similarity -p i,p nucleus membrane @ “nuclear membrane” vacuole -p i,p
runoak termset-similarity [OPTIONS] [TERMS]...
Options
- -p, --predicates <predicates>
A comma-separated list of predicates
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
- --autolabel, --no-autolabel
If set, results will automatically have labels assigned
- Default
True
Arguments
- TERMS
Optional argument(s)
tree
Display an ancestor graph as an ascii/markdown tree
For general instructions, see the viz command, which this is analogous too
Example:
runoak -i envo.db tree ENVO:00000372 -p i,p
This produces output like:
.code:
* [i] ENVO:00000094 ! volcanic feature * [i] ENVO:00000247 ! volcano * [i] ENVO:00000403 ! shield volcano * [i] **ENVO:00000372 ! pyroclastic shield volcano**Note: for many ontologies the tree view will explode, especially if no predicates are specified. You may wish to start with the is-a tree (-p i).
You can use the –gap-fill option to create a minimal tree:
Example:
runoak -i envo.db tree –gap-fill ‘pyroclastic shield volcano’ ‘subglacial volcano’ volcano -p i
This will show the tree containing only these terms, and the most direct inferred relationships between them.
You can also give a list of leaf terms and specify –add-mrcas alongside –gap-fill to fill in the most informative intermediate classes:
Example:
runoak -i envo.db tree –add-mrcas –gap-fill ‘pyroclastic shield volcano’ ‘subglacial volcano’ ‘mud volcano’ -p i
This will fill in the term “volcano”, as it is the most recent common ancestor of the specified terms
The –max-hops option can control the distance
runoak -i envo.db tree ‘pyroclastic shield volcano’ ‘subglacial volcano’ –max-hops 1 -p i
This will generate:
- [] ENVO:00000247 ! volcano
- [i] ENVO:00000403 ! shield volcano
[i] ENVO:00000372 ! pyroclastic shield volcano
[i] ENVO:00000407 ! subglacial volcano
Note that ‘volcano’ is the root, even though it is 2 hops from one of the terms, it can be connected to at least one of the seeds (highlighted with asterisks) by a path of length 1.
runoak tree [OPTIONS] [TERMS]...
Options
- --down, --no-down
traverse down
- Default
False
- --gap-fill, --no-gap-fill
If set then find the minimal graph that spans all input curies
- Default
False
- --add-mrcas, --no-add-mrcas
If set then extend input seed list to include all pairwise MRCAs
- Default
False
- -S, --stylemap <stylemap>
a json file to configure visualization. See https://berkeleybop.github.io/kgviz-model/
- -C, --configure <configure>
overrides for stylemap, specified as yaml. E.g. `-C “styles: [filled, rounded]” `
- --max-hops <max_hops>
Trim nodes that are equal to or greater than this distance from terms
- --skip <skip>
Exclude paths that contain this node
- --root <root>
Use this node or nodes as roots
- -p, --predicates <predicates>
A comma-separated list of predicates
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
validate
Validate an ontology against ontology metadata
Implementation notes: Currently only works on SQLite
Example:
runoak -i db/ecto.db validate -o results.tsv
The default validation performed is structural (conformance to the ontology_metadata schema)
There is experimental support for additional ontology rules, which includes heuristic methods such as aligning text and logical definitions. These are off by default.
To run these, pass –no-skip-ontology-rules
Example:
runoak -i db/uberon.db validate –skip-structural-validation –no-skip-ontology-rules
For more information, see the OAK how-to guide:
runoak validate [OPTIONS]
Options
- --cutoff <cutoff>
maximum results to report for any (type, predicate) pair
- Default
50
- --skip-structural-validation, --no-skip-structural-validation
If true, main structural validation checks are skipped
- Default
False
- --skip-ontology-rules, --no-skip-ontology-rules
If true, ontology rules are skipped
- Default
True
- -R, --rule <rule>
A rule to run. Can be specified multiple times. If not specified, all rules are run.
- -o, --output <output>
Output file, e.g. obo file
- -O, --output-type <output_type>
Desired output type
validate-definitions
Checks presence and structure of text definitions.
To run:
runoak validate-definitions -i db/uberon.db -o results.tsv
By default this will apply basic text mining of text definitions to check against machine actionable OBO text definition guideline rules. This can result in an initial lag - to skip this, and ONLY perform checks for presence of definitions, use –skip-text-annotation:
Example:
runoak validate-definitions -i db/uberon.db –skip-text-annotation
Like most OAK commands, this accepts lists of terms or term queries as arguments. You can pass in a CURIE list to selectively validate individual classes
Example:
runoak validate-definitions -i db/cl.db CL:0002053
Only on CL identifiers:
runoak validate-definitions -i db/cl.db i^CL:
Only on neuron hierarchy:
runoak validate-definitions -i db/cl.db .desc//p=i neuron
Output format:
This command emits objects conforming to the OAK validation datamodel. See https://incatools.github.io/ontology-access-kit/datamodels for more on OAK datamodels.
The default serialization of the datamodel is CSV.
Notes:
This command is largely redundant with the validate command, but is useful for targeted validation focused solely on definitions
runoak validate-definitions [OPTIONS] [TERMS]...
Options
- --skip-text-annotation, --no-skip-text-annotation
If true, do not parse text annotations
- Default
False
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Output file, e.g. obo file
Arguments
- TERMS
Optional argument(s)
validate-multiple
Validate multiple ontologies against ontology metadata
See the validate command - this is the same except you can pass a list of databases
For more information, see the OAK how-to guide:
runoak validate-multiple [OPTIONS] [DBS]...
Options
- --cutoff <cutoff>
maximum results to report for any (type, predicate) pair
- Default
50
- -s, --schema <schema>
Path to schema (if you want to override the bundled OMO schema)
- -o, --output <output>
Output file, e.g. obo file
Arguments
- DBS
Optional argument(s)
viz
Visualize an ancestor graph using obographviz
For general background on what is meant by a graph in OAK, see https://incatools.github.io/ontology-access-kit/interfaces/obograph
Note
This requires that obographviz is installed.
Example:
runoak -i sqlite:cl.db viz CL:4023094
Same query on ubergraph:
runoak -i ubergraph: viz CL:4023094
Example, showing only is-a:
runoak -i sqlite:cl.db viz CL:4023094 -p i
Example, showing only is-a and part-of, to include Uberon:
runoak -i sqlite:cl.db viz CL:4023094 -p i,p
As above, including develops-from:
runoak -i sqlite:cl.db viz CL:4023094 -p i,p,RO:0002202
With abbreviation:
runoak -i sqlite:cl.db viz CL:4023094 -p i,p,d
We can also limit the number of “hops” from the seed terms; for example, all is-a and develops-from ancestors of T-cell, limiting to a distance of 2:
runoak -i sqlite:cl.db viz ‘T cell’ -p i,d –max-hops 2
runoak viz [OPTIONS] [TERMS]...
Options
- --view, --no-view
if view is set then open the image after rendering
- Default
True
- --down, --no-down
traverse down
- Default
False
- --gap-fill, --no-gap-fill
If set then find the minimal graph that spans all input curies
- Default
False
- --add-mrcas, --no-add-mrcas
If set then extend input seed list to include all pairwise MRCAs
- Default
False
- -S, --stylemap <stylemap>
a json file to configure visualization. See https://berkeleybop.github.io/kgviz-model/
- -C, --configure <configure>
overrides for stylemap, specified as yaml. E.g. `-C “styles: [filled, rounded]” `
- --max-hops <max_hops>
Trim nodes that are equal to or greater than this distance from terms
- --meta, --no-meta
Add metadata object to graph nodes, including xrefs, definitions
- Default
False
- -p, --predicates <predicates>
A comma-separated list of predicates
- -O, --output-type <output_type>
Desired output type
- -o, --output <output>
Path to output file
Arguments
- TERMS
Optional argument(s)