Part 7: SQLite files

The most efficient way to work with OAK is through SQLite files. OAK accepts SQLite files that follow the Semantic SQL schema.

The SQL Database Adapter wraps SQLite or any relational database.

Hint

You may also want to try the Semantic-SQL tutorial

Download a SQLite file

You can download ready made SQLite files for any OBO Library ontology

For example: the Cell Ontology (CL) is available from https://s3.amazonaws.com/bbop-sqlite/cl.db.gz

Example

wget https://s3.amazonaws.com/bbop-sqlite/cl.db.gz
gzip -d cl.db.gz
runoak -i cl.db relationships "enteric neuron"

This will show all relationships centered around the subject of CL:0007011:

Ancestor statistics
id	label	visits	distance
subject	predicate	object	subject_label	predicate_label	object_label
CL:0007011	BFO:0000050	UBERON:0002005	enteric neuron	part of	enteric nervous system
CL:0007011	RO:0002100	UBERON:0002005	enteric neuron	has soma location	enteric nervous system
CL:0007011	RO:0002202	CL:0002607	enteric neuron	develops from	migratory enteric neural crest cell
CL:0007011	rdfs:subClassOf	CL:0000029	enteric neuron	None	neural crest derived neuron
CL:0007011	rdfs:subClassOf	CL:0000107	enteric neuron	None	autonomic neuron

Hint

OAK will automatically treat anything with .db as a sqlite database

You can be more explicit and force the sqlite adapter to be used, regardless of suffix using a sqlite selector:

wget https://s3.amazonaws.com/bbop-sqlite/cl.db.gz
gzip -d cl.db.gz
runoak -i sqlite:cl.db relationships "enteric neuron"

Fetching ready-made SQLite files

You can also specify that the sqlite file should be loaded from the repository:

runoak -i sqlite:obo:pato search t~shape

This will download the pato.db sqlite file once, and cache it.

PyStow is used to cache the file, and the default location is ~/.data/oaklib.

By default, a cached SQLite file will be automatically refreshed (downloaded again) if it is older than 7 days. For details on how to alter the behavior of the cache, see the Cache Control section in the CLI documentation.

Building your own SQLite files

You can use the semsql command that should be pre-installed with OAK

There are two paths

using ODK docker
without docker, with dependencies pre-installed

With docker

If you have an OWL file in ./path/to/obi.owl

Then you can do this:

docker run -w /work  -v `pwd`:/work --rm -ti obolibrary/odkfull:dev semsql make path/to/obi.db

This will do a one-time build of obi.db, using the ODK docker. You will need Docker installed (but you don’t need to do anything else)

You can then query the file as normal:

runoak -i path/to/obi.db info assay

Warning

for this to work, the OWL file must be in RDF/XML. Also, imports merging will NOT be done by default, please merge in advance using ROBOT if this is your desired behavior.

Note

The recipe above works for any OWL file in a descendant of your current folder. If you wish to use a file outside of your current folder, then change the option from -v `pwd`:/work to -v /path/to/:/work/

Without docker

Prerequisites

For this to work you will need to install the following dependencies and ensure they’re loaded in your PATH.

relation-graph
rdftab
riot - On MacOS, can install using HomeBrew via: brew install jena

Then, run:

semsql make path/to/obi.db

Consult the SemSQL docs for more details.

In future we hope to wrap these more seamlessly in Python.

Validating an ontology

the SQLite implementation is the most efficient way to validate an ontology

runoak -i sqlite:obo:cl validate

Other RDBMSs

We avoid SQLite specific features so in theory OAK should work with any RDBMS that follows the semantic-sql schema, but currently SQLite is the focus of development and testing

Python ORM

OAK abstracts away the details of the underlying database and ways of accessing it, but for some purposes you may wish to write direct SQL or use the ORM layer. Consult SemSQL docs for details.