Using the OAK Expression Language

The OAK Expression Language is a concise and powerful way to query and navigate ontologies from the command line. It provides a set of operators and syntax that allow you to perform advanced searches, filter results, and combine queries using boolean logic.

It is designed for easy use on the command line, and intentionally uses characters that compose well with the shell.

Some experience in using the CLI is recommended.

For the full list of OAK command line commands, see:

Basics

The basic building blocks of the OAK Expression Language are:

  • IDs (CURIEs)

  • Lists of IDs

  • Search queries

  • Graph queries

  • Boolean combinations

IDs (CURIEs)

You can specify a single term by providing its ID (CURIE). For example:

runoak -i sqlite:obo:fbbt info FBbt:00004751

This will retrieve information about the term with ID FBbt:00004751 (wing vein).

Note you can use any of the examples in this guide with any OAK command that accepts term lists; for example:

  • Fetching definitions: runoak -i sqlite:obo:fbbt definitions FBbt:00004751

  • Fetching aliases: runoak -i sqlite:obo:fbbt aliases FBbt:00004751

  • Fetching is-a ancestors: runoak -i sqlite:obo:fbbt ancestors -p i FBbt:00004751

Lists of IDs

You can provide a list of terms, these are implicitly combined with an OR operation. For example:

runoak -i sqlite:obo:fbbt info FBbt:00004751, FBbt:00004754

This will retrieve information about the terms with IDs “FBbt:00004751” or “FBbt:00004754”.

Label Queries

You can also search for terms based on their Label. For example:

runoak -i sqlite:obo:fbbt info 'wing vein'

Remember to quote strings. If you instead type:

runoak -i sqlite:obo:fbbt info wing vein

This will be interpreted as an OR, and you will get back two terms, one for wing, one for vein

Using label queries assumes a certain amount of familiarity with the names used in the ontology you are querying. If you are not sure of the exact name, you can use the search functionality:

Search Queries

OAK supports various search queries to find terms based on their labels, definitions, or other properties.

Exact Match

To search for an exact match of a term, use the t= prefix followed by the search term. For example:

runoak -i sqlite:obo:fbbt info 't=wing vein'

This will find terms that exactly match “wing vein”.

Starts With

To search for terms that start with a specific string, use the t^ prefix. For example:

runoak -i sqlite:obo:fbbt info 't^wing vein'

This will find terms that start with “wing vein”.

Contains

To search for terms that contain a specific string anywhere in their label or definition, use the t~ prefix. For example:

runoak -i sqlite:obo:fbbt info 't~wing vein'

This will find terms that contain “wing vein” anywhere in their label or definition.

Regular Expressions

To perform a regular expression search, use the t/ prefix followed by a valid regular expression pattern. For example:

runoak -i sqlite:obo:fbbt info 't/^wing vein L\d+$'

This will find terms that match the regular expression pattern “^wing vein Ld+$”.

Graph Queries

OAK allows you to traverse and query the ontology Graph using relationships between terms.

Ancestors

To find all ancestors of a term, use the .anc operator, optionally parameterized by the relationship type(s) (aka Predicate). For example:

runoak -i sqlite:obo:fbbt info .anc//p=i "wing vein"

This will find all is-a ancestors of “wing vein” using the “is-a” (i) relationship.

The // syntax can be used with many prefix operators to provide arguments. The general syntax is

.<operator>//k1=<arg1>,<arg2>,...//k2=<arg1>,<arg2>,...//...

Note

OAK also includes a separate command ancestors, but being able to use the .anc operator in the term list allows you to combine graph querying with other commands.


Definitions for all is-a ancestors of “wing vein”:

runoak -i sqlite:obo:fbbt definitions .anc//p=i “wing vein”

Descendants

To find all descendants of a term, use the .desc operator followed by the relationship type(s). For example:

runoak -i sqlite:obo:fbbt search .desc//p=i,p nucleus

This will find all descendants of “nucleus” using the is-a (i) and part-of (p) relationships.

MRCAs

To find the most recent common ancestors (MRCA) of a set of terms, use the .mrca operator.

For example:

runoak -i sqlite:obo:fbbt info .mrca//p=i,p wing vein, wing

Boolean Combinations

OAK allows you to combine queries using boolean operators such as AND, OR, NOT.

AND

To perform an AND operation between two queries, use the .and operator. For example:

runoak -i sqlite:obo:fbbt info .mrca//p=i [ FBbt:00052481 FBbt:00100571 ]

This finds the most recent common ancestor of “FBbt:00052481” and “FBbt:00100571” (which is “neuroblast”).

Note that like any graph command for most ontologies this is only meaningful if parameterized by a predicate. With some ontologies, there are edges such as “overlaps” or “adjacent to” that yield trivial but non informative MRCAs.

OR

To perform an OR operation between two queries, use the .or operator.

runoak -i sqlite:obo:fbbt info .desc//p=i,p antenna .or .desc//p=i,p wing

This will find terms that are parts of of either “antenna” or “wing”.

Note that .or is already assumed for term lists, so you can also write:

runoak -i sqlite:obo:fbbt info .desc//p=i,p antenna .desc//p=i,p wing

NOT

To exclude terms that match a specific query, use the .not operator. For example:

runoak -i sqlite:obo:fbbt info .desc//p=i vein .not .desc//p=i,p wing

You should think of this more like a “minus” operator - expressions are currently evaluated in order, so the .not (exclusion list) should come after the inclusion list.

Nesting

You can nest queries using square brackets to create more complex expressions, or to explicitly control the order of precedence.

Always leave spaces around the square brackets.

runoak -i sqlite:obo:fbbt info [ .desc//p=i cell .not .desc//p=i neuron ] .and [ .desc//p=i,p head .or .desc//p=i,p thorax ]

This will find all terms that are descendants of “cell” but not descendants of “neuron”, and are either parts of “head” or “thorax”.

Note that prefix operators such as .desc bind more tightly than infix operators such as .and.

Using files and redirects

IDFILE

Use this is to read a list of term IDs from a file. For example:

runoak -i sqlite:obo:fbbt info .idfile my_terms.txt

STDIN

You can also use the special file name - to read from standard input. For example:

cat my_terms.txt | runoak -i sqlite:obo:fbbt info -

Other Operators

IN

The .in operator allows you to query by subset

runoak -i sqlite:obo:fbbt info .in cellxgene_subset

FILTER

The .filter operator allows you to provide arbitrary python filters.

QUERY

The .query operator allows you to pass through a query to the underlying store (SPARQL, SQL).

For example, the sqlite backend uses SQL, so you can pass through SQL:

runoak -i sqlite:obo:uberon info .query \
 "SELECT subject from has_dbxref_statement where value like 'ZFA:%'"

This is equivalent to:

runoak -i sqlite:obo:uberon info x^ZFA:

NR

The .nr operator takes a set of terms and returns the non-redundant set of terms from that list (parameterized by a predicate or predicates).

MRCA

The .mrca operator takes a set of terms are returns the most recent common ancestors (MRCA). Parameterized by a predicate or predicates.

Example:

runoak -i sqlite:obo:uberon info .mrca//p=i,p .idfile my_terms.txt

RAND

Pick a random subset of terms. Parameterized by n (number of terms).

Definitions for random terms in the Cell Ontology:

runoak -i sqlite:obo:cl definitions .rand

For 10 random terms

runoak -i sqlite:obo:cl definitions .rand//n=10

Note

The .rand operator will sample from all terms in the ontology. This could include terms imported and merged from other ontologies. For finer-grained control, use the .sample operator, which allows the combination of a sample operator with the results of evaluating any OAK expression.

SAMPLE

The .sample operator takes a random sample of terms. It is parameterized by n (number of terms in sample).

Definitions for 3 random terms:

runoak -i sqlite:obo:obi definitions .sample//n=3 i^OBI:

To compare 3 random terms with 3 other random terms:

runoak -i sqlite:obo:cl similarity .sample//n=3 i^CL: @ .sample//n=3 i^CL:

Others

  • .is_obsolete: all Obsolete terms

  • .non_obsolete: all non-obsoletes

  • .dangling: all Dangling terms

  • .child: non-transitive version of .desc. Also parameterized by predicate.

  • .parent: non-transitive version of .anc. Also parameterized by predicate.

  • .sib: all siblings of a term. Also parameterized by predicate.

  • .all: all terms

  • .classes: all classes

  • .relations: all relations