Creating an ontology subset

The odk:subset command creates an ontology subset. It is intended to replace several OWLTools commands with a consistent behaviour.

Subset definition

The command offers several ways of defining which classes should be included in the subset.

Using a DL query

Use the --query <QUERY> option to define a subset from a DL query, as in:

robot odk:subset --input uberon.owl \
                 --query "'part of' some 'nervous system'"

The subset will include all equivalent classes and subclasses matching the query (to include superclasses as well, add the --ancestors true option).

The query can use either quoted labels (as in the example above) or short-form identifiers (e.g. --query UBERON:0001016) or a mix of both (e.g. --query "BFO:0000050 some 'nervous system'"). When using quoted labels, if the query consists of a single class, be mindful that you will most likely need to quote it twice, as in --query "'nervous system'" – the outer quotes will be striped by your command interpreter.

Be also mindful that not all reasoners allow querying using a class expression – ELK, which the odk:subset command uses by default, does not, so you might want to use WHELK instead (--reasoner WHELK).

Using a subset name or IRI

Use the --subset <IRI> option to select all classes that are marked with a oboInOwl:inSubset annotation whose value is the specified IRI. For example:

robot odk:subset --input cl.owl \
                 --subset http://purl.obolibrary.org/obo/cl#BDS_subset

For compatibility with OWLTools’ --extract-ontology-subset command, if the argument is not an IRI, this will select all classes with a oboInOwl:inSubset annotation whose value ends with the specified argument prefixed with a # character, regardless of the namespace. For example,

robot odk:subset --input cl.owl --subset BDS_subset

will select the same classes as the previous example (as well as any class carrying a oboInOwl:inSubset annotation ending with #BDS_subset, if such classes exist).

Using an explicit list of terms

Any class whose ID is explicitly specified on the command line with the --term ID option, or is listed in the file pointed by the argument to the --term-file <FILE> option (which is expected to contain a list of IDs, with one ID per line, excluding blank lines and lines starting with #) will be included in the subset.

Combining several definitions

The --query, --subset, --term, and --term-file option can be mixed freely and used repeatedly. Their effects are cumulative. For example:

odk:subset --reasoner WHELK \
           --query "'nervous system'" \
           --query "'part of' some 'nervous system'" \
           --term UBERON:0000955

will create a subset from (1) ‘nervous system’ and all its descendants and equivalents, (2) all classes that are ‘part of’ the ‘nervous system’, and (3) the UBERON:0000955 class.

Expanding the subset

By default, the subset generated by the odk:subset command contains only the classes defined using any of the methods shown in the previous section, plus all the object and annotation properties used by those classes.

Use the --fill-gaps true option to expand the subset so that it contains all the classes that are referenced from within the initial subset.

Several options allow to control how the subset is expanded.

Following only selected relations

By default, the expanded subset will include all classes referenced by any class expression from within the initial subset.

If the --follow-property <PROPERTY> option is used (where PROPERY is the IRI of an object property), only class expressions that use the indicated object property will be considered. The option may be used several times to follow several object properties.

Following only in some namespaces

When the --follow-in <NAMESPACE> is used, only classes that are in the indicated namespace will be included in the expanded subset. Axioms that refer to a class outside of the followed namespace will be excluded from the subset. The option may be used several times to include classes from several namespaces.

For example, to create an expanded subset from classes that are part of the nervous system, but while staying entirely within the Uberon and CL namespaces:

robot odk:subset --input uberon.owl \
                 --reasoner WHELK \
                 --query "'part of' some 'nervous system'" \
                 --fill-gaps true \
                 --follow-in UBERON: --follow-in CL:

Not following in some namespaces

The --not-follow-in <NAMESPACE> option does the opposite of the previous option. It prevents the inclusion of any classes that is in the indicated namespace. Axioms that refer to a class within the not-followed namespace will be excluded from the subset. The option may be used repeatedly to exclude classes from several namespaces.

For example, by default an expanded subset created from the “life stage” terms of Uberon will include several hundreds of seemingly unrelated terms about the central nervous system or the blood. This is because the term neurula stage (UBERON:0000110) has a relationship to GO’s neural tube formation (GO:0001841), which in turn is related to Uberon’s neural tube (UBERON:0001049), and from there to a whole bunch of other Uberon terms. One way therefore to avoid the inclusion of all those terms is to prevent any expansion of the subset into GO territory:

robot odk:subset --input uberon.owl \
                 --query "'life cycle stage'" \
                 --fill-gaps true \
                 --not-follow-in GO:

The --follow-in and --not-follow-in options are mutually exclusive. If both are used in the same odk:subset command, the --follow-in option(s) will take precedence and any --not-follow-in option will be ignored.

Including dangling classes

By default, “dangling” classes (defined, in the context of this command, as classes for which the ontology contains no defining axioms at the exclusion of disjointness axioms, and no annotation assertion axioms) are not considered for inclusion when expanding the subset. If a class from within the initial subset references a dangling class, that reference will not be included.

Use the --no-dangling false option to reverse that behaviour and allow the inclusion of dangling classes into the expanded subset.

Initial subset vs expanded subset

Note that none of the options discussed in the previous sections affect the initial subset (the subset defined by any of the --query, --subset, --term, or --term-file options). They only affect how the subset is expanded.

For example, if the initial subset contains a class in the GO: namespace, that class will be present in the final subset even if the --not-follow-in GO: option is used. To force the exclusion of any GO class, either make sure that the initial subset does not list any such class, or forcibly remove all GO classes from the ontology (e.g. with robot remove or robot filter) before creating the subset.

Likewise, if a dangling class is explicitly added to a subset through the --term or --term-file options, that class will be present in the final subset regardless of the value of the --no-dangling option.

Writing the subset

By default, once the subset is created, it becomes the main ontology that is being manipulated by the ROBOT pipeline (replacing the input ontology). This means that:

  • it can be saved to file using the traditional --output option;
  • it will be passed down to any further ROBOT command.

If you use the --write-to FILE option, the subset will be saved into the indicated file, and will not be passed down to the rest of the ROBOT pipeline (the unmodified input ontology will be passed down instead). This allows creating several subsets from the same ontology:

robot merge -i my-ontology.owl \
      odk:subset --subset MY_SUBSET --write-to my-subset.owl \
      odk:subset --subset ANOTHER_SUBSET --write-to another-subset.owl

Internals and comparison with OWLTools/ROBOT extract

This section intends to briefly explains how the odk:subset command works and how it relates to some existing OWLTools and ROBOT commands.

odk:subset works in four main steps: (1) creating the initial list of classes to include (the so-called “initial subset”), (2) adding any object and annotation properties used within the subset, (3) optionally (if --fill-gaps true is used) expand the subset to closure, and (4) pruning any axiom referring to entities outside of the subset.

For the creation of the initial list of classes (step 1), odk:subset allows the use of

  • oboInOwl:inSubset annotations (as OWLTools’ --extract-ontology-subset command);
  • a DL query (as OWLTools’ --reasoner-query --make-ontology-from-results commands);
  • an explicit list of terms (as ROBOT’s extract command).

For the second step, odk:subset differs from OWLTools in that it will systematically (a) include all the properties used within the initial subset, (b) include only the properties used within the initial subset. With OWLTools, the behaviour was dependent on the exact command used: for example, --extract-ontology-subset would include all object and annotation properties from the source ontology, regardless of whether they are actually needed in the subset or not; on the contrary, --make-ontology-from-results would only include the properties that are effectively used within the subset. With ROBOT’s extract -m subset command, object and annotation properties are only included if they are part of the explicit list of terms ROBOT is asked to extract.

The optional third step (expanding the subset to closure) is roughly similar to the behaviour of OWLTools’ --extract-ontology-subset --fill-gaps, except that the subset is expanded not only for classes but also for object and annotation properties. That is, if, say, an object property (included in the subset as a result of the second step) references another property (for example a super property) or another class (for example in a domain or range restriction), that other property or that other class will be included in the subset as well.

OWLTools’ --extract-ontology-subset command, when used without the --fill-gaps option, works in “gap spanning” mode instead. In that mode, the initial subset is not expanded, but indirect relationships between classes of the subset (involving intermediate classes that are not part of the subset) are preserved by the fabrication of equivalent direct relationships instead. That mode is not covered at all by this odk:subset command, because it is already available in the standard distribution of ROBOT with the extract -m subset command.

Of note however, while extract -m subset implements the core logic of OWLTools’ “gap spanning” mode, it behaves slightly differently in two aspects: (1) it does not take a subset name as a source, and instead requires an explicit list of terms to extract, and (2) as already mentioned above it does not automatically include properties, but instead only includes the properties explicitly mentioned in the list of terms to extract. Whether this is an advantage or an inconvenient is a matter of point of view; on the plus side, it allows you to control very precisely which properties are present in the subset; on the minus side, it requires you to know in advance which properties you are interested in preserving, something you didn’t have to know with the original --extract-ontology-subset command of OWLTools.