How to Validate an OBO ontology using Obo Metadata Ontology schema
Step 1: Obtain the sqlite version of the ontology
Currently the sqlite version of ontologies are not distributed alongside them in OBO
You can either
(a) make the sqlite file yourself, see INCATools/semantic-sql
(b) get a ready-made download from the s3 bucket
The second option is likely easiest.
For example:
Step 2: Install oaklib
pip install oaklib
Check your install works:
runoak --help
Step 3: Validate an individual ontology
runoak -i sqlite:uberon.db validate
This will stream yaml output. The output is linkml objects using the SHACL Validation vocabulary
severity: ERROR
subject: CARO:0000003
predicate: rdfs:label
info: Missing slot (label) for CARO:0000003
severity: ERROR
subject: CARO:0000006
predicate: rdfs:label
info: Missing slot (label) for CARO:0000006
Step 3 (alternative): Validate multiple ontologies
runoak validate-multiple db/*.db -o obo-validation.tsv
Currently only the following are implemented:
MinCountConstraintComponent checks (required or recommended)
MaxCountConstraintComponent checks
DatatypeConstraintComponent: basic type checks (literal vs object) DOES NOT YET CHECK SPECIFIC LITERAL TYPE
Using your own schema
TODO: add an option to pass in your own yaml file
How this works
The Python API is described here:
Currently there is only one implementation, the SqlDatabase implementation
The validation is driven entirely by a LinkML schema
Currently this schema lives within this repo, but the goal is to have it live outside and be imported
Different implementations are free to use this in different ways
The SqlDatabase implementation attempts to do this in a performant way doing whole-database predicate-based queries
Validation results use the Validator Datamodel, which reuses many URIs from SHACL
See notebooks folder in