Command Line

Note

we follow the CLIG guidelines as far as possible

General Guidelines

Note

if you are running this as an internal OAK developer you need to precede the command with poetry shell

The general structure is:

runoak --input HANDLE COMMAND [COMMAND ARGS AND OPTIONS]

The value for --input (which can be shorted to -i) is specified in the Ontology Implementation Selectors documentation.

Examples:

runoak --input ubergraph: COMMAND [COMMAND ARGS AND OPTIONS]
runoak --input fbbt.obo COMMAND [COMMAND ARGS AND OPTIONS]
runoak --input cl.db COMMAND [COMMAND ARGS AND OPTIONS]
runoak --input sqlite:obo:cl COMMAND [COMMAND ARGS AND OPTIONS]

It can be useful to create aliases for individual ontologies. For example, to create an alias for the Uberon ontology:

alias uberon='runoak -i obo:sqlite:uberon'

You can specify further implementations with -a which will create an aggregator implementation that wraps multiple implementations. For example, you can multiplex queries over different endpoints.

Common Patterns

Term Lists

Many commands take a term or a list of terms as their primary argument. These are typically one of:

  • a CURIE such as UBERON:0000955

  • a Search Syntax term, which is either:

    • an exact match to a label; for example “limb” or “plasma membrane”

    • a compound search term such as t~limb which finds terms with partial matches to limb

Search terms are expanded to matching CURIEs, and then fed into the command.

For example, (assuming the alias above) the following command will look up two terms using their labels:

uberon info hand foot

This is equivalent to:

uberon info UBERON:0002398 UBERON:0002397

Predicates

Many commands take a --predicates option (shortened to -p). This specifies a list of predicates (aka relationship types, see Predicates) to be used in filtering. The list is specified as a comma-delimited list (no spaces) of CURIEs.

For many biological ontologies, it can be useful to filter on is_a (rdfs:subClassOf) and part_of (BFO:0000050) so the command line interface understands shortcuts for these:

  • i: is-a (i.e rdfs:subClassOf between two named classes)

  • p: part-of

For example, to draw the subgraph of terms starting from “hand” and “foot” and tracing upwards through is_a and part_of relationships:

uberon viz -p i,p hand foot

Commands

The following section is autogenerated from the inline docs. You should get the same results by running:

runoak COMMAND --help

runoak

Run the oaklib Command Line.

A subcommand must be passed - for example: ancestors, terms, …

Most commands require an input ontology to be specified:

runoak -i <INPUT SPECIFICATION> SUBCOMMAND <SUBCOMMAND OPTIONS AND ARGUMENTS>

Get help on any command, e.g:

runoak viz -h

runoak [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose
-q, --quiet <quiet>
--stacktrace, --no-stacktrace

If set then show full stacktrace on error

Default

False

--save-as <save_as>

For commands that mutate the ontology, this specifies where changes are saved to

--autosave, --no-autosave

For commands that mutate the ontology, this determines if these are automatically saved in place

--named-prefix-map <named_prefix_map>

the name of a prefix map, e.g. obo, prefixcc

--prefix <prefix>

prefix=expansion pair

--metamodel-mappings <metamodel_mappings>

overrides for metamodel properties such as rdfs:label

--import-depth <import_depth>

Maximum depth in the import tree to traverse. Currently this is only used by the pronto adapter

-g, --associations <associations>

Location of ontology associations

-G, --associations-type <associations_type>

Syntax of associations input

-i, --input <input>

input implementation specification. This is either a path to a file, or an ontology selector

-I, --input-type <input_type>

Input format. Permissible values vary depending on the context

-a, --add <add>

additional implementation specification.

aliases

List aliases for a term or set of terms

Example:

runoak -i ubergraph:uberon aliases UBERON:0001988

TERMS should be either an explicit list of terms or queries, or can be a selector query, such as ‘.all’ to fetch all terms in the ontology

Show all aliases:

runoak -i db/envo.db aliases .all

Currently the core behavior of this command assumes a simple datamodel for aliases, where an aliases is a simple property-value tuples, with the property being from some standard vocabulary (e.g. skos:altLabel, oboInOwl, etc)

If you know the synonyms follow the OBO/oboInOwl datamodel you can pass –obo-model, this will give back richer data if present in the ontology, including synonym categories/types, synonym provenance

In future, this may become the default

runoak aliases [OPTIONS] [TERMS]...

Options

--obo-model, --no-obo-model

If true, assume the OBO synonym datamodel, including provenancem synonym types

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

ancestors

List all ancestors of a given term or terms.

Here ancestor means the transitive closure of the parent relationship, where a parent includes all relationship types, not just is-a.

Example:

runoak -i cl.owl ancestors CL:4023094

This will show ancestry over the full relationship graph. Like any relational OAK command, this can be filtered by relationship type (predicate), using –predicate (-p). For exampple, constrained to is-a and part-of:

runoak -i cl.owl ancestors CL:4023094 -p i,BFO:0000050

Multiple backends can be used, including ubergraph:

runoak -i ubergraph: ancestors CL:4023094 -p i,BFO:0000050

Search terms can also be used:

runoak -i cl.owl ancestors ‘goblet cell’

Multiple terms can be passed:

runoak -i sqlite:go.db ancestors GO:0005773 GO:0005737 -p i,p

More background:

runoak ancestors [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

-O, --output-type <output_type>

Desired output type

--statistics, --no-statistics

For each ancestor, show statistics.

Default

False

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

annotate

Annotate a piece of text using a Named Entity Recognition annotation

Example:

runoak -i bioportal: annotate “enlarged nucleus in T-cells from peripheral blood”

Currently most implementations do not yet support annotation.

See the ontorunner framework for plugins for SciSpacy and OGER - these will later become plugins.

If gilda is installed as an extra, it can be used, but --matches-whole-text (-W) must be specified, as gilda only performs grounding.

Example:

runoak -i gilda: annotate -W BRCA2

Programmatic usage:

This command is a wrapper onto the annotate_text method, this is provided as part of the TextAnnotator interface:

https://incatools.github.io/ontology-access-kit/interfaces/text-annotator

Aliases can be listed in the output by setting the flag –include-aliases to true (default: false).

Example (using the plugin oakx-spacy):

runoak -i spacy:sqlite:obo:bero annotate Myeloid derived suppressor cells. –include-aliases

will yield:

confidence: 0.8 object_aliases: - Myeloid-Derived Suppressor Cells - MDSCs - mdscs - myeloid-derived suppressor cells object_id: obo:MESH_D000072737 object_label: Myeloid-Derived Suppressor Cells subject_end: 30 subject_start: 0

runoak annotate [OPTIONS] [WORDS]...

Options

-W, --matches-whole-text, --no-W, --no-matches-whole-text

if true, then only show matches that span the entire input text

Default

False

--include-aliases, --no-include-aliases

Include alias maps in output.

Default

False

--text-file <text_file>

Text file to annotate. Each newline separated entry is a distinct text.

-L, --lexical-index-file <lexical_index_file>

path to lexical index. This is recreated each time unless –no-recreate is passed

-m, --model <model>

Name of trained model to use for annotation, e.g. ‘en_ner_craft_md’.

-x, --exclude-tokens <exclude_tokens>

Text file or list of tokens to filter from input prior to annotation. If passed as text file, each newline separated entry is a distinct text.

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

Arguments

WORDS

Optional argument(s)

apply

Applies a patch to an ontology. The patch should be specified using KGCL syntax, see https://github.com/INCATools/kgcl

Example:

runoak -i cl.owl.ttl apply “rename CL:0000561 to ‘amacrine neuron’” -o cl.owl.ttl -O ttl

On an obo format file:

runoak -i simpleobo:go-edit.obo apply “rename GO:0005634 from ‘nucleus’ to ‘foo’” -o go-edit-new.obo

With URIs:

runoak -i cl.owl.ttl apply “rename <http://purl.obolibrary.org/obo/CL_0000561> from ‘amacrine cell’ to ‘amacrine neuron’” -o cl.owl.ttl -O ttl

WARNING:

This command is still experimental. Some things to bear in mind:

  • for some ontologies, CURIEs may not work, instead specify a full URI surrounded by <>s

  • only a subset of KGCL commands are supported by each backend

runoak apply [OPTIONS] [COMMANDS]...

Options

-o, --output <output>
--changes-output <changes_output>

output file for KGCL changes

--changes-input <changes_input>

Path to an input changes file

--changes-format <changes_format>

Format of the changes file (json or kgcl)

--dry-run, --no-dry-run

if true, only perform the parse of KCGL and do not apply

Default

False

--expand, --no-expand

if true, expand complex changes to atomic changes

Default

True

--ignore-invalid-changes, --no-ignore-invalid-changes

if true, ignore invalid changes, e.g. obsoletions of dependent entities

Default

False

--contributor <contributor>

CURIE for the person contributing the patch

-O, --output-type <output_type>

Desired output type

--overwrite, --no-overwrite

If set, any changes applied will be saved back to the input file/source

Arguments

COMMANDS

Optional argument(s)

apply-obsolete

Sets an ontology element to be obsolete

Example:

runoak -i my.obo apply-obsolete MY:0002200 -o my-modified.obo

Multiple terms can be passed, as labels, IDs, or using OAK queries:

runoak -i my.obo apply-obsolete MY:1 MY:2 MY:3 … -o my-modified.obo

This may be chained, for example to take all terms matching a search query and then obsolete them all:

runoak -i my.db search ‘l/^Foo/` | runoak -i my.db –autosave apply-obsolete -

This command is partially redundant with the more general “apply” command

runoak apply-obsolete [OPTIONS] [TERMS]...

Options

-o, --output <output>
-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

associations

Lookup associations from or to entities.

Example:

runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations

The above will show all associations

To query using an ontology term, including is-a closure, specify one or more terms or term queries, plus the closure predicate(s), e.g.

Example:

runoak -i sqlite:obo:hp -g test.hpoa -G hpoa associations -p i HP:0001392

This shows all annotations either to “Abnormality of the liver” (HP:0001392), or to is-a descendants

runoak associations [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

--if-absent <if_absent>

determines behavior when the value is not present or is empty.

Options

absent-only | present-only

-S, --set-value <set_value>

the value to set for all terms for the given property.

--association-predicates <association_predicates>

A comma-separated list of predicates for the association relation

-Q, --terms-role <terms_role>

How to interpret query terms.

Default

object

Options

subject | object | both

Arguments

TERMS

Optional argument(s)

axioms

Filters axioms

Example:

runoak -i cl.ofn axiom

The above will write all axioms.

You can filter by axiom type:

Example:

runoak -i cl.ofn axiom –axiom-type SubClassOf

Note this currently only works with the funowl adapter, on functional syntax files

runoak axioms [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

--axiom-type <axiom_type>

Type of axiom, e.g. SubClassOf

--about <about>

CURIE that the axiom is about

--references <references>

CURIEs that the axiom references

Arguments

TERMS

Optional argument(s)

cache-clear

Clear the contents of the pystow oaklib cache.

runoak cache-clear [OPTIONS]

Options

--days-old <days_old>

Clear anything more than this number of days old

Default

100

cache-ls

List the contents of the pystow oaklib cache.

TODO: this currently only works on unix-based systems.

runoak cache-ls [OPTIONS]

definitions

Show textual definitions for term or set of terms

Example:

runoak -i sqlite:obo:envo definitions ‘tropical biome’ ‘temperate biome’

You can use the “.all” selector to show all definitions for all terms in the ontology:

Example:

runoak -i sqlite:obo:envo definitions .all

You can also include definition metadata, such as provenance and source:

runoak -i sqlite:obo:cl definitions –additional-metadata neuron

runoak definitions [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-D, --display <display>

A comma-separated list of display options. Use ‘all’ for all

-O, --output-type <output_type>

Desired output type

Options

obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl

--if-absent <if_absent>

determines behavior when the value is not present or is empty.

Options

absent-only | present-only

--additional-metadata, --no-additional-metadata

if true then fetch additional metadata about statements stored as OWL reification

Default

False

-S, --set-value <set_value>

the value to set for all terms for the given property.

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

Arguments

TERMS

Optional argument(s)

descendants

List all descendants of a term

Example:

runoak -i sqlite:obo:obi descendants assay -p i

Example:

runoak -i sqlite:obo:uberon descendants heart -p i,p

This is the inverse of the ‘ancestors’ command; see the documentation for that command. But note that ‘descendants’ commands have the potential to be more “explosive” than ancestors commands, especially for high level terms, and for when predicates are not specified

More background:

runoak descendants [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

-D, --display <display>

A comma-separated list of display options. Use ‘all’ for all

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

diff

Compute difference between two ontologies.

Example:

runoak -i foo.obo diff -X bar.obo -o diff.yaml

This will produce a list of Changes that are required to go from the main input ontology (–input) to the other ontology (–other-ontology, or -X).

The output follows the KGCL data model. See https://incatools.github.io/ontology-access-kit/datamodels/kgcl/index.html

You can use –output-type to control the output format.

KGCL controlled natural language:

runoak -i foo.obo diff -X bar.obo -o diff.txt –output-type kgcl

KGCL JSON:

runoak -i foo.obo diff -X bar.obo -o diff.json –output-type json

YAML (default):

runoak -i foo.obo diff -X bar.obo -o diff.yaml –output-type yaml

The –statistics option can be used to generate summary statistics for the changes. These are grouped according to the –group-by-property option. For example, the GO uses the oio:hasOBONamespace property to partition classes into 3 categories.

Example:

runoak -i go.obo diff -X go-new.obo -o diff.yaml –statistics –group-by-property oio:hasOBONamespace

This will produce a YAML dictionary, with outer keys being the values of the oio:hasOBONamespace property, and inner keys being the change types.

If –group-by-property is not specified, or there is no value for this property, then the outer key will be “__RESIDUAL__”

For summary statistics, you can also specify –output-type csv, to get a tabular out

Limitations:

This does not do a diff over every axiom in each ontology. For a complete OWL diff, you should use ROBOT.

runoak diff [OPTIONS]

Options

-X, --other-ontology <other_ontology>

other ontology

--simple, --no-simple

perform a quick difference showing only terms that differ

Default

False

--statistics, --no-statistics

show summary statistics only

Default

False

--group-by-property <group_by_property>

group summaries by a metadata property, e.g. rdfs:isDefinedBy

--group-by-obo-namespace, --no-group-by-obo-namespace

shortcut for –group-by-property oio:hasOBONamespace (note this is distinct from the ID namespace)

Default

False

--group-by-defined-by, --no-group-by-defined-by

shortcut for –group-by-property rdfs:isDefinedBy. This may be inferred from prefix if not set explicitly

Default

False

--group-by-prefix, --no-group-by-prefix

shortcut for –group-by-property sh:prefix. Groups by the prefix of the CURIE

Default

False

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

diff-associations

Diffs two association sources. EXPERIMENTAL.

This functionality may move out of core

runoak diff-associations [OPTIONS]

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

-g, --associations <associations>

associations

-X, --other-associations <other_associations>

other associations

diff-terms

Compares a pair of terms in two ontologies

EXPERIMENTAL

runoak diff-terms [OPTIONS] [TERMS]...

Options

--other-ontology <other_ontology>

other ontology

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

diff-via-mappings

Calculates cross-ontology diff using mappings

Given a pair of ontologies, and mappings that connect terms in both ontologies, this command will perform a structural comparison of all mapped pairs of terms

Example:

runoak -i sqlite:obo:uberon diff-via-mappings –other-input sqlite:obo:zfa –source UBERON –source ZFA -O csv

Note the above command does not have any mapping file specified; the mappings that are distributed within each ontology is used (in this case, Uberon contains mappings to ZFA)

If the mappings are provided externally:

runoak -i ont1.obo diff-via-mappings –other-input ont2.obo –mapping-input mappings.sssom.tsv

(in the above example, –source is not passed, so all mappings are tested)

If there are no existing mappings, you can use the lexmatch command to generate them:

runoak -i ont1.obo diff-via-mappings -a ont2.obo lexmatch -o mappings.sssom.tsv

runoak -i ont1.obo diff-via-mappings –other-input ont2.obo –mapping-input mappings.sssom.tsv

The output from this command follows the cross-ontology-diff data model (https://incatools.github.io/ontology-access-kit/datamodels/cross-ontology-diff/index.html)

This can be serialized in YAML or TSV form

runoak diff-via-mappings [OPTIONS] [TERMS]...

Options

-S, --source <source>

ontology prefixes e.g. HP, MP

--mapping-input <mapping_input>

File of mappings in SSSOM format. If not provided then mappings in ontology(ies) are used

-X, --other-input <other_input>

Additional input file

--other-input-type <other_input_type>

Type of additional input file

--intra, --no-intra

If true, then all sources are in the main input ontology

Default

False

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

--include-identity-mappings, --no-include-identity-mappings

Use identity relation as mapping; use this for two versions of the same ontology

Default

False

--filter-category-identical, --no-filter-category-identical

Do not report cases where a relationship has not changed

Default

False

--bidirectional, --no-bidirectional

Show diff from both left and right perspectives

Default

True

-p, --predicates <predicates>

A comma-separated list of predicates

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

dump

Exports (dumps) the entire contents of an ontology.

Example:

runoak -i pato.obo dump -o pato.json -O json

Example:

runoak -i pato.owl dump -o pato.ttl -O turtle

Currently each implementation only supports a subset of formats.

Some dumpers accept additional options. For example, dumping to fhirjson accepts –include-all-predicates, which changes the default behavior from only exporting IS_A to all mappable predicates.

The dump command is also blocked for remote endpoints such as Ubergraph, to avoid killer queries.

runoak dump [OPTIONS] [TERMS]...

Options

-o, --output <output>
--include-all-predicates, --no-include-all-predicates

For formats that export only IS_A by default, this will include all possible predicates

Default

False

-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

enrichment

Run class enrichment analysis.

runoak enrichment [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

--if-absent <if_absent>

determines behavior when the value is not present or is empty.

Options

absent-only | present-only

-S, --set-value <set_value>

the value to set for all terms for the given property.

--cutoff <cutoff>

The cutoff for the p-value

Default

0.05

-S, --sample-file <sample_file>

file containing input list of entity IDs (e.g. gene IDs)

-B, --background-file <background_file>

file containing background list of entity IDs (e.g. gene IDs)

--association-predicates <association_predicates>

A comma-separated list of predicates for the association relation

Arguments

TERMS

Optional argument(s)

eval-taxon-constraints

Test candidate taxon constraints

Multiple candidate constraints can be passed as arguments. these are in the form of triples separated by periods.

Example:

runoak -i db/go.db eval-taxon-constraints -p i,p GO:0005743 only NCBITaxon:2759 never NCBITaxon:2 . GO:0005634 only NCBITaxon:2

The –evolution-file (-E) option can be used to pass in a file of candidates. This should follow the format used in https://arxiv.org/abs/1802.06004

E.g.

GO:0000229,Gain|NCBITaxon:1(root);>Loss|NCBITaxon:2759(Eukaryota);

Example:

runoak -i db/go.db eval-taxon-constraints -p i,p -E tests/input/go-evo-gains-losses.csv

runoak eval-taxon-constraints [OPTIONS] [CONSTRAINTS]...

Options

-E, --evolution-file <evolution_file>

path to file containing gains and losses

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

Arguments

CONSTRAINTS

Optional argument(s)

expand-subsets

For each subset provide a mapping of each term in the ontology to a subset

Example:

runoak -i db/pato.db expand-subsets attribute_slim value_slim

runoak expand-subsets [OPTIONS] [SUBSETS]...

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

Arguments

SUBSETS

Optional argument(s)

extract-triples

Extracts a subontology as triples

Currently the only endpoint to implement this is ubergraph. Ontobee seems to have performance issues with the query

This will soon be supported in the SqlDatabase/Sqlite endpoint

Example:

runoak -v -i ubergraph: extract-triples GO:0005635 CL:0000099 -o test.ttl -O ttl

runoak extract-triples [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

fill-table

Fills missing values in a table of ontology elements

See https://incatools.github.io/ontology-access-kit/src/oaklib.utilities.table_filler

Given a TSV with a populated ID column, and unpopulated columns for definition, label, mappings, ancestors, this will iterate through each row filling in each missing value by performing ontology lookups.

In some cases, this can also perform reverse lookups; i.e given a table with labels populated and blank IDs, then fill in the IDs

In the most basic scenario, you have a table with two columns ‘id’ and ‘label’. These are the “conventional” column headers for a table of ontology elements (see later for configuration when you don’t follow conventions)

Example:

runoak -i cl.owl.ttl fill-table my-table.tsv

(any implementation can be used)

The same command will work for the reverse scenario - when you have labels populated, but IDs are not populated

By default this will throw an error if a lookup is not successful; this can be relaxed

Relaxed:

runoak -i cl.owl.ttl fill-table –allow-missing my-table.tsv

In this case missing values that cannot be populated will remain empty

To explicitly populate a value:

runoak -i cl.owl.ttl fill-table –missing-value-token NO_DATA my-table.tsv

Currently the following columns are recognized:

  • id – the unique identifier of the element

  • label – the rdfs:label of the element

  • definition – the definition of the element

  • mappings – mappings for the element

  • ancestors – ancestors for the element (this can be parameterized)

The metadata inference procedure will also work for when you have denormalized TSV files with columns such as “foo_id” and “foo_name”. This will be recognized as an implicit normalized label relation between id and name of a foo element.

You can be more explicit in one of two ways:

  1. Pass in a YAML structure (on command line or in a YAML file) listing relations

  2. Pass in a LinkML data definitions YAML file

For the first method, you can pass in multiple relations using the –relation arg. For example, given a TSV with columns cl_identifier and cl_display_label you can say:

Example:

runoak -i cl.owl.ttl fill-table –relation “{primary_key: cl_identifier, dependent_column: cl_display_label, relation: label}”

You can also specify this in a YAML file

For the 2nd method, you need to specify a LinkML schema with a class where (1) at least one field is annotated as being an identifier (2) one or more slots have slot_uri elements mapping them to standard metadata elements such as rdfs:label.

For example, my-schema.yaml:

classes:
Person:
attributes:
id:

identifier: true

name:

slot_uri: rdfs:label

This is a powerful command with many ways of configuring it - we will add separate docs for this soon, for now please file an issue on github with any questions

  • TODO: allow for an option that will perform fuzzy matches of labels

  • TODO: reverse lookup is not provided for all fields, such as definitions

  • TODO: add an option to detect inconsistencies

  • TODO: add logical for obsoletion/replaced by

  • TODO: use most optimized method for whichever backend

runoak fill-table [OPTIONS] TABLE_FILE

Options

--allow-missing, --no-allow-missing

Allow some dependent values to be blank, post-processing

Default

False

--missing-value-token <missing_value_token>

Populate all missing values with this token

--schema <schema>

Path to linkml schema

--delimiter <delimiter>

Delimiter between columns in input and output

Default

--comment <comment>

Comment indicator at the beginning of a row.

Default

#

--relation <relation>

Serialized YAML string corresponding to a normalized relation between two columns

--relation-file <relation_file>

Path to YAML file corresponding to a list of normalized relation between two columns

-o, --output <output>

Output file, e.g. obo file

Arguments

TABLE_FILE

Required argument

info

Show information on term or set of terms

Example:

runoak -i sqlite:obo:cl info CL:4023094

The default output is minimal, showing only ID and label

The –output-type (-O) option can be used to specify other formats for the output.

Currently there are only a few output types are supported. More will be provided in future.

In OBO format:

runoak -i cl.owl info CL:4023094 -O obo

As CSV:

runoak -i cl.obo info CL:4023094 -O csv

The info output format can be parameterized with –display (-D)

With xrefs and definitions:

runoak -i cl.owl info CL:4023094 -D x,d

With all information:

runoak -i cl.owl info CL:4023094 -D all

Like all OAK commands, input term lists can be multivalued, a mixture of IDs and labels, as well as queries that can be combined using boolean logic

Info on two STATO terms:

runoak -i ontobee:stato info STATO:0000286 STATO:0000287 -O obo

All terms in ENVO with the string “forest” in them:

runoak -i sqlite:obo:envo info l~forest

Info on all subtypes of “statistical hypothesis test” in STATO:

runoak -i sqlite:obo:stato info .desc//p=i ‘statistical hypothesis test’

runoak info [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-D, --display <display>

A comma-separated list of display options. Use ‘all’ for all

-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

labels

Show labels for term or list of terms

Example:

runoak -i cl.owl labels CL:4023093 CL:4023094

You can use the “.all” selector to show all labels:

Example:

runoak -i cl.owl labels .all

(this may be blocked for remote endpoints)

You can query for terms that have either no label, or to include only ones with labels:

Nodes with no labels:

runoak -i cl.owl labels .all –if-absent exclude

runoak labels [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-D, --display <display>

A comma-separated list of display options. Use ‘all’ for all

-O, --output-type <output_type>

Desired output type

Options

obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl

--if-absent <if_absent>

determines behavior when the value is not present or is empty.

Options

absent-only | present-only

-S, --set-value <set_value>

the value to set for all terms for the given property.

Arguments

TERMS

Optional argument(s)

leafs

List all leaf nodes in the ontology

Like all OAK relational commands, this is parameterized by –predicates (-p). Note that the default is to return the roots of the relation graph over all predicates

Example:

runoak -i db/cob.db leafs

This command is a wrapper onto the “leafs” command in the BasicOntologyInterface.

runoak leafs [OPTIONS]

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

--filter-obsoletes, --no-filter-obsoletes

If set, results will exclude obsoletes

Default

True

lexmatch

Performs lexical matching between pairs of terms in one more more ontologies.

Examples:

runoak -i foo.obo lexmatch -o foo.sssom.tsv

In this example, the input ontology file is assumed to contain all pairs of terms to be mapped.

It is more common to map between all pairs of terms in two ontology files. In this case, you can merge the ontologies using a tool like ROBOT; or, to avoid a merge preprocessing step, use the –addl (-a) option to specify a second ontology file.

runoak -i foo.obo –add bar.obo lexmatch -o foo.sssom.tsv

By default, this command will compare all terms in all ontologies. You can use the OAK term query syntax to pass in the set of all terms to be compared.

For example, to compare all terms in union of FOO and BAR namespaces:

runoak -i foo.obo –add bar.obo lexmatch -o foo.sssom.tsv i^FOO: i^BAR:

All members of the set are compared (including FOO to FOO matches and BAR to BAR matches), omitting trivial reciprocal matches.

Use an “@” separator between two queries to feed in two explicit sets:

runoak -i foo.obo –add bar.obo lexmatch -o foo.sssom.tsv i^FOO: @ i^BAR:

ALGORITHM: lexmatch implements a simple algorithm:

  • create a lexical index, keyed by normalized strings of labels, synonyms

  • report all pairs of entities that have the same key

The lexical index can be exported (in native YAML) using -L:

runoak -i foo.obo lexmatch -L foo.index.yaml -o foo.sssom.tsv

Note: if you run the above command a second time it will be faster as the index will be reused.

RULES: Using custom rules:

runoak -i foo.obo lexmatch -R match_rules.yaml -L foo.index.yaml -o foo.sssom.tsv

Full documentation:

module-oaklib.utilities.lexical.lexical_indexer

runoak lexmatch [OPTIONS] [TERMS]...

Options

-R, --rules-file <rules_file>

path to rules file. Conforms to rules_datamodel. e.g. https://github.com/INCATools/ontology-access-kit/blob/main/tests/input/matcher_rules.yaml

--add-labels, --no-add-labels

Populate empty labels with URI fragments or CURIE local IDs, for ontologies that use semantic IDs

Default

False

-L, --lexical-index-file <lexical_index_file>

path to lexical index. This is recreated each time unless –no-recreate is passed

--recreate, --no-recreate

if true and lexical index is specified, always recreate, otherwise load from index

Default

True

--ensure-strict-prefixes, --no-ensure-strict-prefixes

Clean prefix map and mappings before generating an output.

Default

True

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

lint

Lints an ontology, applying changes in place.

The current implementation is highly incomplete, and only handles linting of syntactic patterns (chains of whitespace, trailing whitespace) in labels and definitions.

The output is a list of changes, in a KCGL-compliant syntax.

By default, changes will be applied

Example:

runoak -i my.obo lint

This can be executed in dry-run mode, in which case changes are not applied:

runoak -i my.obo lint –dry-run

One common workflow is to emit the changes to a KCGL file which is manually checked, then applied as a separate step.

Example workflow:

runoak -i my.obo lint –dry-run -o changes.kgcl # examine and edit changes.kgcl runoak -i my.obo apply –changes-input changes.kgcl

runoak lint [OPTIONS]

Options

-o, --output <output>
--report-format <report_format>

Output format for reporting proposed/applied changes

--dry-run, --no-dry-run

If true, nothing will be modified by executing command

-O, --output-type <output_type>

Desired output type

logical-definitions

Show all logical definitions for a term or terms.

To show all logical definitions in an ontology, pass the “.all” query term

Example; first create an alias:

alias pato=”runoak -i obo:sqlite:pato”

Then run the query:

pato logical-definitions .all

By default, “.all” will query all axioms for all terms including merged terms; to restrict to only the current terms, use an ID query:

pato logical-definitions i^PATO

You can also restrict to branches:

pato logical-definitions .desc//p=i “physical object quality”

By default, the output is a subset of OboGraph datamodel rendered as YAML, e.g.

definedClassId: PATO:0045071

genusIds: - PATO:0001439 restrictions: - fillerId: PATO:0000461

propertyId: RO:0015010

You can also specify CSV to generate a flattened form of this.

Example:

pato logical-definitions .all –output-type csv

You can optionally choose to “unmelt” or flatten this, such that:

  • Each property/predicate is a column

  • For repeated properties, columns of the form prop_1, prop_2, … are generated

Example:

pato logical-definitions .all –unmelt –output-type csv

Limitations:

Currently this only works for definitions that follow a basic genus-differentia pattern, which is what is currently represented in the OboGraph datamodel.

Consider using the “axioms” command for inspection of complex nested OWL axioms.

runoak logical-definitions [OPTIONS] [TERMS]...

Options

--unmelt, --no-unmelt

Flatten to a wide table

Default

False

-p, --predicates <predicates>

A comma-separated list of predicates

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

--if-absent <if_absent>

determines behavior when the value is not present or is empty.

Options

absent-only | present-only

-S, --set-value <set_value>

the value to set for all terms for the given property.

Arguments

TERMS

Optional argument(s)

mappings

List all mappings encoded in the ontology

Example:

runoak -i sqlite:obo:envo mappings

The default output is SSSOM YAML. To use the (canonical) csv format:

runoak -i sqlite:obo:envo mappings -O sssom

By default, labels are not included. Use –autolabel to include labels (but note that if the label is not in the source ontology, then no label will be retrieved)

runoak -i sqlite:obo:envo mappings -O sssom

To constrain the mapped object source:

runoak -i sqlite:obo:foodon mappings -O sssom –maps-to-source SUBSET_SIREN

runoak mappings [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

--maps-to-source <maps_to_source>

Return only mappings with subject or object source equal to this

Arguments

TERMS

Optional argument(s)

migrate-curies

Rewires an ontology replacing all instances of an ID or IDs

Note: the specified ontology is modified in place

The input for this command is a list equals-separated pairs, specifying the source and the target

Example:

runoak -i db/uberon.db migrate-curies –replace SRC1=TGT1 SRC2=TGT2

This command is a wrapper onto the “migrate_curies” command in the PatcherInterface

oaklib.interfaces.patcher_interface.PatcherInterface.migrate_curies

runoak migrate-curies [OPTIONS] [CURIE_PAIRS]...

Options

--replace, --no-replace

If true, will update in place

Default

False

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

Arguments

CURIE_PAIRS

Optional argument(s)

obsoletes

Shows all obsolete entities.

Example:

runoak -i obolibrary:go.obo obsoletes

To exclude merged terms, use the --no-include-merged flag

Example:

runoak -i obolibrary:go.obo obsoletes –no-include-merged

To show migration relationships, use the --show-migration-relationships flag

Example:

runoak -i obolibrary:go.obo obsoletes –show-migration-relationships

You can also specify terms to show obsoletes for:

Example:

runoak -i obolibrary:go.obo obsoletes –show-migration-relationships GO:0000187 GO:0000188

runoak obsoletes [OPTIONS] [TERMS]...

Options

--include-merged, --no-include-merged

Include merged terms in output

Default

True

--show-migration-relationships, --no-show-migration-relationships

Show migration relationships (e.g. replaced_by, consider)

Default

False

-O, --output-type <output_type>

Desired output type

Options

obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

ontologies

Shows all ontologies

If the input is a pre-merged ontology, then the output of this command is trivially a single line, with the name of the input ontology

This command is more meaningful when the input is a multi-ontology endpoint, e.g

runoak -i ubergraph ontologies

In future this command will be expanded to allow showing more metadata about each ontology

runoak ontologies [OPTIONS]

Options

-o, --output <output>

Output file, e.g. obo file

ontology-metadata

Shows ontology metadata

Example:

runoak -i bioportal: ontology-metadata obi uberon foodon

Use the --all option to show all ontologies

Example:

runoak -i bioportal: ontology-metadata –all

By default the output is YAML. You can get the results as TSV:

Example:

runoak -i bioportal: ontology-metadata –all -O csv

Warning

The output data model is not yet standardized – this may change in future

runoak ontology-metadata [OPTIONS] [ONTOLOGIES]...

Options

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

--all, --no-all

If true, show all ontologies. Use in place of passing an explicit list

Default

False

Arguments

ONTOLOGIES

Optional argument(s)

ontology-versions

Shows ontology versions

Currently only implemented for BioPortal

Example:

runoak -i bioportal: ontology-versions mp

All ontologies:

runoak -i bioportal ontology-versions –all

runoak ontology-versions [OPTIONS] [ONTOLOGIES]...

Options

-o, --output <output>

Output file, e.g. obo file

--all, --no-all

If true, show all ontologies. Use in place of passing an explicit list

Default

False

Arguments

ONTOLOGIES

Optional argument(s)

paths

List all paths between one or more start curies

Example:

runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’

This shows all shortest paths from nuclear membrane to all ancestors

Example:

runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’ –target cytoplasm

This shows shortest paths between two nodes

Example:

runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’ ‘thylakoid’ –target cytoplasm ‘thylakoid membrane’

This shows all shortest paths between 4 combinations of starts and ends

You can also use “@” to separate start node list and end node list. Like most OAK commands, you can pass either explicit terms, or term queries. For example, if you have two files of IDs, then you can do this:

runoak -i sqlite:obo:go paths -p i,p .idfile START_NODES.txt @ .idfile END_NODES.txt

You can also pass in weights for each predicate, used when calculating shortest paths.

Example:

runoak -i sqlite:obo:go paths -p i,p ‘nuclear membrane’ –target cytoplasm –predicate-weights “{i: 0.0001, p: 999}”

This shows all shortest paths after weighting relations

(Note: you can use the same shorthands as in the –predicates option)

This command can be combined with others to visualize the paths.

Example:

alias go=”runoak -i sqlite:obo:go” go paths -p i,p ‘nuclear membrane’ –target cytoplasm –flat | go viz –fill-gaps -

This visualizes the path by first exporting the path as a flat list, then passing the results to viz, using the fill-gaps option

runoak paths [OPTIONS] [TERMS]...

Options

--target <target>

end point of path

--flat, --no-flat

If true then output path is written a list of terms

Default

False

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

-p, --predicates <predicates>

A comma-separated list of predicates

-O, --output-type <output_type>

Desired output type

--predicate-weights <predicate_weights>

key-value pairs specified in YAML where keys are predicates or shorthands and values are weights

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

prefixes

Shows prefix declarations.

All standard prefixes:

runoak prefixes

Specific prefixes:

runoak prefixes GO CL oio skos

By default, prefix maps are exported as simple pairwise TSVs.

Prefixes can also be exported in different formats, such as YAML and JSON, where they are simple dictionaries:

In yaml:

runoak prefixes –O yaml

In turtle:

runoak prefixes –O rdf

For RDF exports, the prefix declaration should appear in BOTH prefix declarations, AND also as instances of SHACL PrefixDeclarations, e.g.

@prefix CL: <http://purl.obolibrary.org/obo/CL_> . … [] a sh:PrefixDeclaration ;

sh:namespace CL: ; sh:prefix “CL” .

The default prefixmap is always used, unless options are passed specifying additional prefix maps.

Example:

runoak –named-prefix-map prefixcc prefixes

If an ontology is loaded, then –used-only can be used to restrict to prefixes for entities in that ontology

runoak -i sqlite:obo:cl prefixes –used-only

runoak prefixes [OPTIONS] [TERMS]...

Options

-o, --output <output>
--used-only, --no-used-only

If True, show only prefixes used in ontology

Default

False

-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

relationships

Show all relationships for a term or terms

By default, this shows all relationships where the input term(s) are the subjects

Example:

runoak -i cl.db relationships CL:4023094

Like all OAK commands, a label can be passed instead of a CURIE

Example:

runoak -i cl.db relationships neuron

To reverse the direction, and query where the search term(s) are objects, use the –direction flag:

Example:

runoak -i cl.db relationships –direction down neuron

Multiple terms can be passed

Example:

runoak -i uberon.db relationships heart liver lung

And like all OAK commands, a query can be passed rather than an explicit term list

The following query lists all arteries in the limb together which what structures they supply

Query:

runoak -i uberon.db relationships -p RO:0002178 .desc//p=i “artery” .and .desc//p=i,p “limb”

runoak relationships [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

--direction <direction>

direction of traversal over edges, which up is subject to object, down is object to subject.

Options

up | down | both

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

--if-absent <if_absent>

determines behavior when the value is not present or is empty.

Options

absent-only | present-only

-S, --set-value <set_value>

the value to set for all terms for the given property.

--include-entailed, --no-include-entailed

Include entailed indirect relationships

Default

False

--include-tbox, --no-include-tbox

Include class-class relationships (subclass and existentials)

Default

True

--include-abox, --no-include-abox

Include instance relationships (class and object property assertions)

Default

True

Arguments

TERMS

Optional argument(s)

roots

List all root nodes in the ontology

Like all OAK relational commands, this is parameterized by –predicates (-p). Note that the default is to return the roots of the relation graph over all predicates. This can sometimes give unintuitive results, so we recommend always being explicit and parameterizing

Example:

runoak -i db/cob.db roots

This command is a wrapper onto the “roots” command in the BasicOntologyInterface.

runoak roots [OPTIONS]

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

-P, --has-prefix <has_prefix>

filter based on a prefix, e.g. OBI

-O, --output-type <output_type>

Desired output type

-A, --annotated-roots, --no-annotated-roots, --no-A

If true, use annotated roots, if present

Default

False

set-apikey

Sets an API key

Example:

oak set-apikey -e bioportal MY-KEY-VALUE

This is stored in an OS-dependent path

runoak set-apikey [OPTIONS] KEYVAL

Options

-e, --endpoint <endpoint>

Required Name of endpoint, e.g. bioportal

Arguments

KEYVAL

Required argument

siblings

List all siblings of a specified term or terms

Example:

runoak -i cl.owl siblings CL:4023094

Note that siblings is by default over ALL relationship types, so we recommend always being explicit and passing a predicate using -p (–predicates)

runoak siblings [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

Options

obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl

Arguments

TERMS

Optional argument(s)

similarity

All by all similarity

This calculates a similarity matrix for two sets of terms.

Input sets of a terms can be specified in different ways:

  • via a file

  • via explicit lists of terms or queries

Example:

runoak -i hp.db similarity -p i –set1-file HPO-TERMS1 –set2-file HPO-TERMS2 -O csv

This will compare every term in TERMS1 vs TERMS2

Alternatively standard OAK term queries can be used, with “@” separating the two lists

Example:

runoak -i hp.db similarity -p i TERM_1 TERM_2 … TERM_N @ TERM_N+1 … TERM_M

The .all term syntax can be used to select all terms in an ontology

Example:

runoak -i ma.db similarity -p i,p .all @ .all

This can be mixed with other term selectors; for example to calculate the similarity of “neuron” vs all terms in CL:

runoak -i cl.db similarity -p i,p .all @ neuron

An example pipeline to do all by all over all phenotypes in HPO:

Explicit:

runoak -i hp.db descendants -p i HP:0000118 > HPO runoak -i hp.db similarity -p i –set1-file HPO –set2-file HPO -O csv -o RESULTS.tsv

The same thing can be done more compactly with term queries:

runoak -i hp.db similarity -p i .desc//p=i HP:0000118 @ .desc//p=i HP:0000118

runoak similarity [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

--set1-file <set1_file>

ID file for set1

--set2-file <set2_file>

ID file for set2

--jaccard-minimum <jaccard_minimum>

Minimum value for jaccard score

--ic-minimum <ic_minimum>

Minimum value for information content

-o, --output <output>

path to output

--main-score-field <main_score_field>

Score used for summarization

Default

phenodigm_score

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

similarity-pair

Determine pairwise similarity between two terms using a variety of metrics

NOTE: this command may be deprecated, consider using similarity

Note: We recommend always specifying explicit predicate lists

Example:

runoak -i ubergraph: similarity-pair -p i,p CL:0000540 CL:0000000

You can omit predicates if you like but be warned this may yield hard to interpret results.

E.g.

runoak -i ubergraph: similarity-pair CL:0000540 GO:0001750

yields “fully formed stage” (i.e these are both found in the adult) as the MRCA

For phenotype ontologies, UPHENO relationship types connect phenotype terms to anatomy, etc:

runoak -i ubergraph: similarity-pair MP:0010922 HP:0010616 -p i,p,UPHENO:0000001

Background: https://incatools.github.io/ontology-access-kit/interfaces/semantic-similarity.html

runoak similarity-pair [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

Arguments

TERMS

Optional argument(s)

singletons

List all singleton nodes in the ontology

Like all OAK relational commands, this is parameterized by –predicates (-p). Note that the default is to return the singletons of the relation graph over all predicates

Obsoletes are filtered by default

Example:

runoak -i db/cob.db singletons

This command is a wrapper onto the “singletons” command in the BasicOntologyInterface.

runoak singletons [OPTIONS]

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

--filter-obsoletes, --no-filter-obsoletes

If set, results will exclude obsoletes

Default

True

statistics

Shows all descriptive/summary statistics

Example:

runoak -i sqlite:obo:pr statistics

By default, this will show combined summary statistics for all terms

You can also break down the statistics in two ways:

  • by a collection of branch roots

  • by a metadata property (e.g. oio:hasOBONamespace, rdfs:isDefinedBy)

  • by prefix (e.g. GO, PR, CL, OBI)

Example:

runoak -i sqlite:obo:pr statistics -p oio:hasOBONamespace

Note: the oio:hasOBONamespace is not the same as the ID prefix, it is a field that is used by a subset of ontologies to partition classes into broad groupings, similar to subsets. Its use is non-standard, yet a lot of ontologies use this as the main partitioning mechanism.

A note on bundled ontologies:

The standard release many OBO ontologies “bundles” parts of other ontologies (formally, the release product includes a merged imports closure of import modules). This can complicate generation of statistics. A naive count of all classes in the main OBI release will include not only “native” OBI classes, but also classes from other ontologies that are bundled in the release.

For bundled ontologies, we recommend some kind of partitioning, such as via defined roots, or via the CURIE prefix, using the --group-by-prefix option.

Ouput formats:

The recommended output types for this command are yaml, json, or csv. The default output type is yaml, following the SummaryStatistics data model. This is naturally nested, as the statistics includes faceted groupings (e.g. edge counts are broken down by predicate). When specifying a flat format like csv, this is flattened into a single table, with dynamic column names.

Change statistics:

You can optionally combine the ontology statistics with a change summary relative to another ontology, using the --compare-with option.

Example:

runoak -i v2.obo statistics –group-by-obo-namespace –compare-with v1.obo

This will also include change stats broken down by KGCL change types. If a group-by option is specified, these will be grouped accordingly.

runoak statistics [OPTIONS] [BRANCHES]...

Options

-O, --output-type <output_type>

Desired output type

Options

obo | obojson | ofn | rdf | json | yaml | fhirjson | csv | nl

--group-by-property <group_by_property>

group summaries by a metadata property, e.g. rdfs:isDefinedBy

--group-by-obo-namespace, --no-group-by-obo-namespace

shortcut for –group-by-property oio:hasOBONamespace (note this is distinct from the ID namespace)

Default

False

--group-by-prefix, --no-group-by-prefix

shortcut for –group-by-property sh:prefix. Groups by the prefix of the CURIE

Default

False

--group-by-defined-by, --no-group-by-defined-by

shortcut for –group-by-property rdfs:isDefinedBy. This may be inferred from prefix if not set explicitly

Default

False

--include-residuals, --no-include-residuals

If true include an OTHER category for terms that do not have the property

-X, --compare-with <compare_with>

Compare with another ontology

-P, --has-prefix <has_prefix>

filter based on a prefix, e.g. OBI

-o, --output <output>

Output file, e.g. obo file

Arguments

BRANCHES

Optional argument(s)

subsets

Shows information on subsets

Example:

runoak -i obolibrary:go.obo subsets

Example:

runoak -i cl.owl subsets

For background on subsets, see https://incatools.github.io/ontology-access-kit/concepts.html#subsets

Note you can use subsets in selector queries for other commands; e.g. to fetch all terms (directly) in goslim_generic in GO:

Example:

runoak -i sqlite:obo:go info .in goslim_generic

See also:

term-subsets command, which shows relationships of terms to subsets

runoak subsets [OPTIONS]

Options

-o, --output <output>

Output file, e.g. obo file

synonymize

Apply synonymizer rule from the rules file to generate KGCL syntax see https://github.com/INCATools/kgcl.

Example:

runoak -i foo.obo synonymize -R foo_rules.yaml –patch patch.kgcl –apply-patch

runoak synonymize [OPTIONS] [TERMS]...

Options

-R, --rules-file <rules_file>

path to rules file. Conforms to rules_datamodel. e.g. https://github.com/INCATools/ontology-access-kit/blob/main/tests/input/matcher_rules.yaml

--apply-patch, --no-apply-patch

Apply KGCL syntax generated based on the synonymizer rules file.

Default

False

--patch <patch>

Output patch file containing KGCL commands.

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

taxon-constraints

Compute all taxon constraints for a term or terms.

This will apply rules using the inferred ancestors of subject terms, as well as inferred ancestors/descendants of taxon terms.

The input ontology MUST include both the taxon constraint relationships AND the relevant portion of NCBI Taxonomy

Example:

runoak -i db/go.db taxon-constraints GO:0034357 –include-redundant -p i,p

Example:

runoak -i sqlite:obo:uberon taxon-constraints UBERON:0003884 UBERON:0003941 -p i,p

This command is a wrapper onto taxon_constraints_utils:

runoak taxon-constraints [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-p, --predicates <predicates>

A comma-separated list of predicates

-A, --all, --no-A, --no-all

if specified then perform for all terms

Default

False

--include-redundant, --no-include-redundant

if specified then include redundant taxon constraints from ancestral subjects

Default

False

Arguments

TERMS

Optional argument(s)

term-categories

List categories for a term or set of terms

TODO

runoak term-categories [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

--category-system <category_system>

Example: biolink, cob, bfo, dbpedia, …

Arguments

TERMS

Optional argument(s)

term-metadata

Shows term metadata.

Example:

runoak -i sqlite:obo:uberon term-metadata lung heart

You can filter the results for only selected predicates:

runoak -i sqlite:obo:uberon term-metadata lung heart -p id,oio:hasDbXref

The default output is YAML documents, where each YAML document is a term, with keys representing selected predicates. Values are always lists of atoms, even when there is typically one value (e.g. rdfs:label)

runoak term-metadata [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

-p, --predicates <predicates>

A comma-separated list of predicates

--additional-metadata, --no-additional-metadata

if true then fetch additional metadata about statements stored as OWL reification

Default

False

Arguments

TERMS

Optional argument(s)

term-subsets

List subsets for a term or set of terms

runoak term-subsets [OPTIONS] [TERMS]...

Options

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

Arguments

TERMS

Optional argument(s)

terms

List all terms in the ontology

Example:

runoak -i db/cob.db terms

All terms without obsoletes:

runoak -i prontolib:cl.obo terms –filter-obsoletes

By default “terms” is considered to be any entity type in the ontology. Use –owl-type to constrain this:

Classes:

runoak -i sqlite:obo:ro terms –owl-type owl:Class

Relationship types (Object properties):

runoak -i sqlite:obo:ro terms –owl-type owl:ObjectProperty

Annotation properties:

runoak -i sqlite:obo:omo terms –owl-type owl:AnnotationProperty

runoak terms [OPTIONS]

Options

--filter-obsoletes, --no-filter-obsoletes

If set, results will exclude obsoletes

Default

True

-o, --output <output>

Output file, e.g. obo file

--owl-type <owl_type>

only include entities of this type, e.g. owl:Class, rdf:Property

termset-similarity

Termset similarity

This calculates a similarity matrix for two sets of terms.

Example:

runoak -i go.db termset-similarity -p i,p nucleus membrane @ “nuclear membrane” vacuole -p i,p

runoak termset-similarity [OPTIONS] [TERMS]...

Options

-p, --predicates <predicates>

A comma-separated list of predicates

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

--autolabel, --no-autolabel

If set, results will automatically have labels assigned

Default

True

Arguments

TERMS

Optional argument(s)

tree

Display an ancestor graph as an ascii/markdown tree

For general instructions, see the viz command, which this is analogous too

Example:

runoak -i envo.db tree ENVO:00000372 -p i,p

This produces output like:

.code:

* [i] ENVO:00000094 ! volcanic feature
    * [i] ENVO:00000247 ! volcano
        * [i] ENVO:00000403 ! shield volcano
            * [i] **ENVO:00000372 ! pyroclastic shield volcano**

Note: for many ontologies the tree view will explode, especially if no predicates are specified. You may wish to start with the is-a tree (-p i).

You can use the –gap-fill option to create a minimal tree:

Example:

runoak -i envo.db tree –gap-fill ‘pyroclastic shield volcano’ ‘subglacial volcano’ volcano -p i

This will show the tree containing only these terms, and the most direct inferred relationships between them.

You can also give a list of leaf terms and specify –add-mrcas alongside –gap-fill to fill in the most informative intermediate classes:

Example:

runoak -i envo.db tree –add-mrcas –gap-fill ‘pyroclastic shield volcano’ ‘subglacial volcano’ ‘mud volcano’ -p i

This will fill in the term “volcano”, as it is the most recent common ancestor of the specified terms

The –max-hops option can control the distance

runoak -i envo.db tree ‘pyroclastic shield volcano’ ‘subglacial volcano’ –max-hops 1 -p i

This will generate:

  • [] ENVO:00000247 ! volcano
    • [i] ENVO:00000403 ! shield volcano
      • [i] ENVO:00000372 ! pyroclastic shield volcano

    • [i] ENVO:00000407 ! subglacial volcano

Note that ‘volcano’ is the root, even though it is 2 hops from one of the terms, it can be connected to at least one of the seeds (highlighted with asterisks) by a path of length 1.

runoak tree [OPTIONS] [TERMS]...

Options

--down, --no-down

traverse down

Default

False

--gap-fill, --no-gap-fill

If set then find the minimal graph that spans all input curies

Default

False

--add-mrcas, --no-add-mrcas

If set then extend input seed list to include all pairwise MRCAs

Default

False

-S, --stylemap <stylemap>

a json file to configure visualization. See https://berkeleybop.github.io/kgviz-model/

-C, --configure <configure>

overrides for stylemap, specified as yaml. E.g. `-C “styles: [filled, rounded]” `

--max-hops <max_hops>

Trim nodes that are equal to or greater than this distance from terms

--skip <skip>

Exclude paths that contain this node

--root <root>

Use this node or nodes as roots

-p, --predicates <predicates>

A comma-separated list of predicates

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

validate

Validate an ontology against ontology metadata

Implementation notes: Currently only works on SQLite

Example:

runoak -i db/ecto.db validate -o results.tsv

The default validation performed is structural (conformance to the ontology_metadata schema)

There is experimental support for additional ontology rules, which includes heuristic methods such as aligning text and logical definitions. These are off by default.

To run these, pass –no-skip-ontology-rules

Example:

runoak -i db/uberon.db validate –skip-structural-validation –no-skip-ontology-rules

For more information, see the OAK how-to guide:

runoak validate [OPTIONS]

Options

--cutoff <cutoff>

maximum results to report for any (type, predicate) pair

Default

50

--skip-structural-validation, --no-skip-structural-validation

If true, main structural validation checks are skipped

Default

False

--skip-ontology-rules, --no-skip-ontology-rules

If true, ontology rules are skipped

Default

True

-R, --rule <rule>

A rule to run. Can be specified multiple times. If not specified, all rules are run.

-o, --output <output>

Output file, e.g. obo file

-O, --output-type <output_type>

Desired output type

validate-definitions

Checks presence and structure of text definitions.

To run:

runoak validate-definitions -i db/uberon.db -o results.tsv

By default this will apply basic text mining of text definitions to check against machine actionable OBO text definition guideline rules. This can result in an initial lag - to skip this, and ONLY perform checks for presence of definitions, use –skip-text-annotation:

Example:

runoak validate-definitions -i db/uberon.db –skip-text-annotation

Like most OAK commands, this accepts lists of terms or term queries as arguments. You can pass in a CURIE list to selectively validate individual classes

Example:

runoak validate-definitions -i db/cl.db CL:0002053

Only on CL identifiers:

runoak validate-definitions -i db/cl.db i^CL:

Only on neuron hierarchy:

runoak validate-definitions -i db/cl.db .desc//p=i neuron

Output format:

This command emits objects conforming to the OAK validation datamodel. See https://incatools.github.io/ontology-access-kit/datamodels for more on OAK datamodels.

The default serialization of the datamodel is CSV.

Notes:

This command is largely redundant with the validate command, but is useful for targeted validation focused solely on definitions

runoak validate-definitions [OPTIONS] [TERMS]...

Options

--skip-text-annotation, --no-skip-text-annotation

If true, do not parse text annotations

Default

False

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Output file, e.g. obo file

Arguments

TERMS

Optional argument(s)

validate-multiple

Validate multiple ontologies against ontology metadata

See the validate command - this is the same except you can pass a list of databases

For more information, see the OAK how-to guide:

runoak validate-multiple [OPTIONS] [DBS]...

Options

--cutoff <cutoff>

maximum results to report for any (type, predicate) pair

Default

50

-s, --schema <schema>

Path to schema (if you want to override the bundled OMO schema)

-o, --output <output>

Output file, e.g. obo file

Arguments

DBS

Optional argument(s)

viz

Visualize an ancestor graph using obographviz

For general background on what is meant by a graph in OAK, see https://incatools.github.io/ontology-access-kit/interfaces/obograph

Note

This requires that obographviz is installed.

Example:

runoak -i sqlite:cl.db viz CL:4023094

Same query on ubergraph:

runoak -i ubergraph: viz CL:4023094

Example, showing only is-a:

runoak -i sqlite:cl.db viz CL:4023094 -p i

Example, showing only is-a and part-of, to include Uberon:

runoak -i sqlite:cl.db viz CL:4023094 -p i,p

As above, including develops-from:

runoak -i sqlite:cl.db viz CL:4023094 -p i,p,RO:0002202

With abbreviation:

runoak -i sqlite:cl.db viz CL:4023094 -p i,p,d

We can also limit the number of “hops” from the seed terms; for example, all is-a and develops-from ancestors of T-cell, limiting to a distance of 2:

runoak -i sqlite:cl.db viz ‘T cell’ -p i,d –max-hops 2

runoak viz [OPTIONS] [TERMS]...

Options

--view, --no-view

if view is set then open the image after rendering

Default

True

--down, --no-down

traverse down

Default

False

--gap-fill, --no-gap-fill

If set then find the minimal graph that spans all input curies

Default

False

--add-mrcas, --no-add-mrcas

If set then extend input seed list to include all pairwise MRCAs

Default

False

-S, --stylemap <stylemap>

a json file to configure visualization. See https://berkeleybop.github.io/kgviz-model/

-C, --configure <configure>

overrides for stylemap, specified as yaml. E.g. `-C “styles: [filled, rounded]” `

--max-hops <max_hops>

Trim nodes that are equal to or greater than this distance from terms

--meta, --no-meta

Add metadata object to graph nodes, including xrefs, definitions

Default

False

-p, --predicates <predicates>

A comma-separated list of predicates

-O, --output-type <output_type>

Desired output type

-o, --output <output>

Path to output file

Arguments

TERMS

Optional argument(s)