Quick Start

This page walks through the most common workflow: validating a list of CURIEs, enriching them, and inspecting the results.

See also

For a high-level summary, read Overview.

Installation

Pandasaurus requires Python 3.9–3.11. Install with pip or Poetry:

pip install pandasaurus

or

poetry add pandasaurus

Validate CURIEs

Use pandasaurus.curie_validator.CurieValidator to confirm that your seed terms exist and aren’t obsoleted:

from pandasaurus.curie_validator import CurieValidator

seeds = ["CL:0000084", "CL:0000787", "CL:0000636"]
terms = CurieValidator.construct_term_list(seeds)
CurieValidator.get_validation_report(terms)  # raises if invalid or obsoleted

Handle Invalid Terms

If pandasaurus.utils.pandasaurus_exceptions.InvalidTerm is raised, inspect the invalid IRIs from the exception message, update your seed list, and rerun.

Run an Enrichment

Instantiate pandasaurus.query.Query with your validated CURIEs and call an enrichment method, e.g. simple_enrichment():

from pandasaurus.query import Query

query = Query(seeds, force_fail=True)
df = query.simple_enrichment()
print(df.head())

force_fail=True ensures the constructor raises immediately on invalid or obsoleted terms.

Review Obsoleted Terms

If the seed list contains obsoleted CURIEs, use update_obsoleted_terms() to replace them with their suggested alternatives:

query.update_obsoleted_terms()

Generate a Graph

Every enrichment populates pandasaurus.query.Query.graph_df, which can be converted into a NetworkX-compatible graph:

graph = query.graph  # rdflib.Graph after transitive reduction
# or export query.graph_df for plotting

Next Steps