Part 7: SQLite files

The most efficient way to work with OAK is through SQLite files. OAK accepts SQLite files that follow the Semantic SQL schema.

The SQL Database Adapter wraps SQLite or any relational database.

Hint

You may also want to try the Semantic-SQL tutorial

Download a SQLite file

You can download ready made SQLite files for any OBO Library ontology

For example: the Cell Ontology (CL) is available from https://s3.amazonaws.com/bbop-sqlite/cl.db.gz

Example

wget https://s3.amazonaws.com/bbop-sqlite/cl.db.gz
gzip -d cl.db.gz
runoak -i cl.db relationships "enteric neuron"

This will show all relationships centered around the subject of CL:0007011:

Ancestor statistics

id

label

visits

distance

subject

predicate

object

subject_label

predicate_label

object_label

CL:0007011

BFO:0000050

UBERON:0002005

enteric neuron

part of

enteric nervous system

CL:0007011

RO:0002100

UBERON:0002005

enteric neuron

has soma location

enteric nervous system

CL:0007011

RO:0002202

CL:0002607

enteric neuron

develops from

migratory enteric neural crest cell

CL:0007011

rdfs:subClassOf

CL:0000029

enteric neuron

None

neural crest derived neuron

CL:0007011

rdfs:subClassOf

CL:0000107

enteric neuron

None

autonomic neuron

Hint

OAK will automatically treat anything with .db as a sqlite database

You can be more explicit and force the sqlite adapter to be used, regardless of suffix using a sqlite selector:

wget https://s3.amazonaws.com/bbop-sqlite/cl.db.gz
gzip -d cl.db.gz
runoak -i sqlite:cl.db relationships "enteric neuron"

Fetching ready-made SQLite files

You can also specify that the sqlite file should be loaded from the repository:

runoak -i sqlite:obo:pato search t~shape

This will download the pato.db sqlite file once, and cache it.

PyStow is used to cache the file, and the default location is ~/.data/oaklib.

By default, a cached SQLite file will be automatically refreshed (downloaded again) if it is older than 7 days. For details on how to alter the behavior of the cache, see the Cache Control section in the CLI documentation.

Building your own SQLite files

You can use the semsql command that should be pre-installed with OAK

There are two paths

  • using ODK docker

  • without docker, with dependencies pre-installed

With docker

If you have an OWL file in ./path/to/obi.owl

Then you can do this:

docker run -w /work  -v `pwd`:/work --rm -ti obolibrary/odkfull:dev semsql make path/to/obi.db

This will do a one-time build of obi.db, using the ODK docker. You will need Docker installed (but you don’t need to do anything else)

You can then query the file as normal:

runoak -i path/to/obi.db info assay

Warning

for this to work, the OWL file must be in RDF/XML. Also, imports merging will NOT be done by default, please merge in advance using ROBOT if this is your desired behavior.

Note

The recipe above works for any OWL file in a descendant of your current folder. If you wish to use a file outside of your current folder, then change the option from -v `pwd`:/work to -v /path/to/:/work/

Without docker

Prerequisites

For this to work you will need to install the following dependencies and ensure they’re loaded in your PATH.

  1. relation-graph

  2. rdftab

  3. riot - On MacOS, can install using HomeBrew via: brew install jena

Then, run:

semsql make path/to/obi.db

Consult the SemSQL docs for more details.

In future we hope to wrap these more seamlessly in Python.

Validating an ontology

the SQLite implementation is the most efficient way to validate an ontology

runoak -i sqlite:obo:cl validate

Other RDBMSs

We avoid SQLite specific features so in theory OAK should work with any RDBMS that follows the semantic-sql schema, but currently SQLite is the focus of development and testing

Python ORM

OAK abstracts away the details of the underlying database and ways of accessing it, but for some purposes you may wish to write direct SQL or use the ORM layer. Consult SemSQL docs for details.