Creating an ontology subset
The odk:subset
command creates an ontology subset. It is intended to
replace several OWLTools commands with a consistent behaviour.
Subset definition
The command offers several ways of defining which classes should be included in the subset.
Using a DL query
Use the --query <QUERY>
option to define a subset from a DL query, as
in:
robot odk:subset --input uberon.owl \
--query "'part of' some 'nervous system'"
The subset will include all equivalent classes and subclasses matching
the query (to include superclasses as well, add the --ancestors true
option).
The query can use either quoted labels (as in the example above) or
short-form identifiers (e.g. --query UBERON:0001016
) or a mix of both
(e.g. --query "BFO:0000050 some 'nervous system'"
). When using quoted
labels, if the query consists of a single class, be mindful that you
will most likely need to quote it twice, as in
--query "'nervous system'"
– the outer quotes will be striped by your
command interpreter.
Be also mindful that not all reasoners allow querying using a class
expression – ELK, which the odk:subset
command uses by default, does
not, so you might want to use WHELK instead (--reasoner WHELK
).
Using a subset name or IRI
Use the --subset <IRI>
option to select all classes that are marked
with a oboInOwl:inSubset
annotation whose value is the specified IRI.
For example:
robot odk:subset --input cl.owl \
--subset http://purl.obolibrary.org/obo/cl#BDS_subset
For compatibility with OWLTools’ --extract-ontology-subset
command, if
the argument is not an IRI, this will select all classes with a
oboInOwl:inSubset
annotation whose value ends with the specified
argument prefixed with a #
character, regardless of the namespace. For
example,
robot odk:subset --input cl.owl --subset BDS_subset
will select the same classes as the previous example (as well as any
class carrying a oboInOwl:inSubset
annotation ending with
#BDS_subset
, if such classes exist).
Using an explicit list of terms
Any class whose ID is explicitly specified on the command line with the
--term ID
option, or is listed in the file pointed by the argument to
the --term-file <FILE>
option (which is expected to contain a list of
IDs, with one ID per line, excluding blank lines and lines starting with
#
) will be included in the subset.
Combining several definitions
The --query
, --subset
, --term
, and --term-file
option can be
mixed freely and used repeatedly. Their effects are cumulative. For
example:
odk:subset --reasoner WHELK \
--query "'nervous system'" \
--query "'part of' some 'nervous system'" \
--term UBERON:0000955
will create a subset from (1) ‘nervous system’ and all its descendants and equivalents, (2) all classes that are ‘part of’ the ‘nervous system’, and (3) the UBERON:0000955 class.
Expanding the subset
By default, the subset generated by the odk:subset
command contains
only the classes defined using any of the methods shown in the
previous section, plus all the object and annotation properties used by
those classes.
Use the --fill-gaps true
option to expand the subset so that it
contains all the classes that are referenced from within the initial
subset.
Several options allow to control how the subset is expanded.
Following only selected relations
By default, the expanded subset will include all classes referenced by any class expression from within the initial subset.
If the --follow-property <PROPERTY>
option is used (where PROPERY
is
the IRI of an object property), only class expressions that use the
indicated object property will be considered. The option may be used
several times to follow several object properties.
Following only in some namespaces
When the --follow-in <NAMESPACE>
is used, only classes that are in the
indicated namespace will be included in the expanded subset. Axioms that
refer to a class outside of the followed namespace will be excluded from
the subset. The option may be used several times to include classes from
several namespaces.
For example, to create an expanded subset from classes that are part of the nervous system, but while staying entirely within the Uberon and CL namespaces:
robot odk:subset --input uberon.owl \
--reasoner WHELK \
--query "'part of' some 'nervous system'" \
--fill-gaps true \
--follow-in UBERON: --follow-in CL:
Not following in some namespaces
The --not-follow-in <NAMESPACE>
option does the opposite of the
previous option. It prevents the inclusion of any classes that is in the
indicated namespace. Axioms that refer to a class within the
not-followed namespace will be excluded from the subset. The option may
be used repeatedly to exclude classes from several namespaces.
For example, by default an expanded subset created from the “life stage”
terms of Uberon will include several hundreds of seemingly unrelated
terms about the central nervous system or the blood. This is because the
term neurula stage
(UBERON:0000110) has a relationship to GO’s
neural tube formation
(GO:0001841), which in turn is related to
Uberon’s neural tube
(UBERON:0001049), and from there to a whole bunch
of other Uberon terms. One way therefore to avoid the inclusion of all
those terms is to prevent any expansion of the subset into GO territory:
robot odk:subset --input uberon.owl \
--query "'life cycle stage'" \
--fill-gaps true \
--not-follow-in GO:
The --follow-in
and --not-follow-in
options are mutually
exclusive. If both are used in the same odk:subset
command, the
--follow-in
option(s) will take precedence and any --not-follow-in
option will be ignored.
Including dangling classes
By default, “dangling” classes (defined, in the context of this command, as classes for which the ontology contains no defining axioms at the exclusion of disjointness axioms, and no annotation assertion axioms) are not considered for inclusion when expanding the subset. If a class from within the initial subset references a dangling class, that reference will not be included.
Use the --no-dangling false
option to reverse that behaviour and
allow the inclusion of dangling classes into the expanded subset.
Initial subset vs expanded subset
Note that none of the options discussed in the previous sections affect
the initial subset (the subset defined by any of the --query
,
--subset
, --term
, or --term-file
options). They only affect how
the subset is expanded.
For example, if the initial subset contains a class in the GO:
namespace,
that class will be present in the final subset even if the
--not-follow-in GO:
option is used. To force the exclusion of any GO
class, either make sure that the initial subset does not list any such
class, or forcibly remove all GO classes from the ontology (e.g. with
robot remove
or robot filter
) before creating the subset.
Likewise, if a dangling class is explicitly added to a subset through
the --term
or --term-file
options, that class will be present in the
final subset regardless of the value of the --no-dangling
option.
Writing the subset
By default, once the subset is created, it becomes the main ontology that is being manipulated by the ROBOT pipeline (replacing the input ontology). This means that:
- it can be saved to file using the traditional
--output
option; - it will be passed down to any further ROBOT command.
If you use the --write-to FILE
option, the subset will be saved into
the indicated file, and will not be passed down to the rest of the
ROBOT pipeline (the unmodified input ontology will be passed down
instead). This allows creating several subsets from the same ontology:
robot merge -i my-ontology.owl \
odk:subset --subset MY_SUBSET --write-to my-subset.owl \
odk:subset --subset ANOTHER_SUBSET --write-to another-subset.owl
Internals and comparison with OWLTools/ROBOT extract
This section intends to briefly explains how the odk:subset
command
works and how it relates to some existing OWLTools and ROBOT commands.
odk:subset
works in four main steps: (1) creating the initial list of
classes to include (the so-called “initial subset”), (2) adding any
object and annotation properties used within the subset, (3) optionally
(if --fill-gaps true
is used) expand the subset to closure, and (4)
pruning any axiom referring to entities outside of the subset.
For the creation of the initial list of classes (step 1), odk:subset
allows the use of
oboInOwl:inSubset
annotations (as OWLTools’--extract-ontology-subset
command);- a DL query (as OWLTools’
--reasoner-query --make-ontology-from-results
commands); - an explicit list of terms (as ROBOT’s
extract
command).
For the second step, odk:subset
differs from OWLTools in that it will
systematically (a) include all the properties used within the initial
subset, (b) include only the properties used within the initial
subset. With OWLTools, the behaviour was dependent on the exact command
used: for example, --extract-ontology-subset
would include all
object and annotation properties from the source ontology, regardless of
whether they are actually needed in the subset or not; on the contrary,
--make-ontology-from-results
would only include the properties that
are effectively used within the subset. With ROBOT’s extract -m subset
command, object and annotation properties are only included if they are
part of the explicit list of terms ROBOT is asked to extract.
The optional third step (expanding the subset to closure) is roughly
similar to the behaviour of OWLTools’
--extract-ontology-subset --fill-gaps
, except that the subset is
expanded not only for classes but also for object and annotation
properties. That is, if, say, an object property (included in the subset
as a result of the second step) references another property (for example
a super property) or another class (for example in a domain or range
restriction), that other property or that other class will be included
in the subset as well.
OWLTools’ --extract-ontology-subset
command, when used without the
--fill-gaps
option, works in “gap spanning” mode instead. In that
mode, the initial subset is not expanded, but indirect relationships
between classes of the subset (involving intermediate classes that are
not part of the subset) are preserved by the fabrication of equivalent
direct relationships instead. That mode is not covered at all by this
odk:subset
command, because it is already available in the standard
distribution of ROBOT with the extract -m subset
command.
Of note however, while
extract -m subset
implements the core logic of OWLTools’ “gap spanning” mode, it behaves slightly differently in two aspects: (1) it does not take a subset name as a source, and instead requires an explicit list of terms to extract, and (2) as already mentioned above it does not automatically include properties, but instead only includes the properties explicitly mentioned in the list of terms to extract. Whether this is an advantage or an inconvenient is a matter of point of view; on the plus side, it allows you to control very precisely which properties are present in the subset; on the minus side, it requires you to know in advance which properties you are interested in preserving, something you didn’t have to know with the original--extract-ontology-subset
command of OWLTools.