Semantic Similarity Interface
- class oaklib.interfaces.semsim_interface.SemanticSimilarityInterface(resource: ~oaklib.resource.OntologyResource | None = None, strict: bool = False, _multilingual: bool | None = None, autosave: bool = <factory>, exclude_owl_top_and_bottom: bool = <factory>, ontology_metamodel_mapper: ~oaklib.mappers.ontology_metadata_mapper.OntologyMetadataMapper | None = None, _converter: ~curies.api.Converter | None = None, auto_relax_axioms: bool | None = None, cache_lookups: bool = False, property_cache: ~oaklib.utilities.keyval_cache.KeyValCache = <factory>, _edge_index: ~oaklib.indexes.edge_index.EdgeIndex | None = None, _entailed_edge_index: ~oaklib.indexes.edge_index.EdgeIndex | None = None, _prefix_map: ~typing.Mapping[str, str] | None = None)[source]
An interface for calculating similarity measures between pairs of terms or collections of terms
- cached_information_content_map: Dict[str, float] = None
Mapping from term to information content
- most_recent_common_ancestors(subject: str, object: str, predicates: List[str] | None = None, include_owl_thing: bool = True) Iterable[str] [source]
Most recent common ancestors (MRCAs) for a pair of entities
The MRCAs are the set of Common Ancestors (CAs) that are not themselves proper ancestors of another CA
- Parameters:
subject
object
predicates
include_owl_thing
- Returns:
- setwise_most_recent_common_ancestors(subjects: List[str], predicates: List[str] | None = None, include_owl_thing: bool = True) Iterable[str] [source]
Most recent common ancestors (MRCAs) for a set of entities
The MRCAs are the set of Common Ancestors (CAs) that are not themselves proper ancestors of another CA
- Parameters:
subjects
predicates
include_owl_thing
- Returns:
- multiset_most_recent_common_ancestors(subjects: List[str], predicates: List[str] | None = None, asymmetric=True) Iterable[Tuple[str, str, str]] [source]
All pairwise common ancestors for all pairs in a set of terms
- Parameters:
subjects
predicates
asymmetric
- Returns:
- common_ancestors(subject: str, object: str, predicates: List[str] | None = None, subject_ancestors: List[str] | None = None, object_ancestors: List[str] | None = None, include_owl_thing: bool = True) Iterable[str] [source]
Common ancestors of a subject-object pair
- Parameters:
subject
object
predicates
subject_ancestors – optional pre-generated ancestor list
object_ancestors – optional pre-generated ancestor list
include_owl_thing
- Returns:
- load_information_content_scores(source: str) None [source]
Load term information content values from file
- Parameters:
source
- Returns:
- set_information_content_scores(scores: Iterable[Tuple[str, float]]) None [source]
Load term information content values from file
- Parameters:
source
- Returns:
- get_information_content(curie: str, predicates: List[str] | None = None) float | None [source]
Returns the information content of a term.
IC(t) = -log2(Pr(t))
- Parameters:
curie
predicates
- Returns:
- information_content_scores(curies: Iterable[str] | None = None, predicates: List[str] | None = None, object_closure_predicates: List[str] | None = None, use_associations: bool | None = None, term_to_entities_map: Dict[str, List[str]] | None = None, **kwargs) Iterator[Tuple[str, float]] [source]
Yields entity-score pairs for a given collection of entities.
The Information Content (IC) score for a term t is determined by:
IC(t) = -log2(Pr(t))
Where the probability Pr(t) is determined by the frequency of that term against the whole corpus:
Pr(t) = freq(t)/|items|
- Parameters:
curies
predicates
object_closure_predicates
use_associations
term_to_entities_map
kwargs
- Returns:
- pairwise_similarity(subject: str, object: str, predicates: List[str] | None = None, subject_ancestors: List[str] | None = None, object_ancestors: List[str] | None = None, min_jaccard_similarity: float | None = None, min_ancestor_information_content: float | None = None) TermPairwiseSimilarity | None [source]
Pairwise similarity between a pair of ontology terms
- Parameters:
subject
object
predicates
subject_ancestors – optional pre-generated ancestor list
object_ancestors – optional pre-generated ancestor list
min_jaccard_similarity – minimum Jaccard similarity for a pair to be considered
min_ancestor_information_content – minimum IC for a common ancestor to be considered
- Returns:
- all_by_all_pairwise_similarity(subjects: Iterable[str], objects: Iterable[str], predicates: List[str] | None = None, min_jaccard_similarity: float | None = None, min_ancestor_information_content: float | None = None) Iterator[TermPairwiseSimilarity] [source]
Compute similarity for all combinations of terms in subsets vs all terms in objects
- Parameters:
subjects
objects
predicates
- Returns: