Semantic Similarity Interface

class oaklib.interfaces.semsim_interface.SemanticSimilarityInterface(resource: ~oaklib.resource.OntologyResource | None = None, strict: bool = False, _multilingual: bool | None = None, autosave: bool = <factory>, exclude_owl_top_and_bottom: bool = <factory>, ontology_metamodel_mapper: ~oaklib.mappers.ontology_metadata_mapper.OntologyMetadataMapper | None = None, _converter: ~curies.api.Converter | None = None, auto_relax_axioms: bool | None = None, cache_lookups: bool = False, property_cache: ~oaklib.utilities.keyval_cache.KeyValCache = <factory>, _edge_index: ~oaklib.indexes.edge_index.EdgeIndex | None = None, _entailed_edge_index: ~oaklib.indexes.edge_index.EdgeIndex | None = None, _prefix_map: ~typing.Mapping[str, str] | None = None)[source]

An interface for calculating similarity measures between pairs of terms or collections of terms

cached_information_content_map: Dict[str, float] = None

Mapping from term to information content

most_recent_common_ancestors(subject: str, object: str, predicates: List[str] | None = None, include_owl_thing: bool = True) Iterable[str][source]

Most recent common ancestors (MRCAs) for a pair of entities

The MRCAs are the set of Common Ancestors (CAs) that are not themselves proper ancestors of another CA

Parameters:
  • subject

  • object

  • predicates

  • include_owl_thing

Returns:

setwise_most_recent_common_ancestors(subjects: List[str], predicates: List[str] | None = None, include_owl_thing: bool = True) Iterable[str][source]

Most recent common ancestors (MRCAs) for a set of entities

The MRCAs are the set of Common Ancestors (CAs) that are not themselves proper ancestors of another CA

Parameters:
  • subjects

  • predicates

  • include_owl_thing

Returns:

multiset_most_recent_common_ancestors(subjects: List[str], predicates: List[str] | None = None, asymmetric=True) Iterable[Tuple[str, str, str]][source]

All pairwise common ancestors for all pairs in a set of terms

Parameters:
  • subjects

  • predicates

  • asymmetric

Returns:

common_ancestors(subject: str, object: str, predicates: List[str] | None = None, subject_ancestors: List[str] | None = None, object_ancestors: List[str] | None = None, include_owl_thing: bool = True) Iterable[str][source]

Common ancestors of a subject-object pair

Parameters:
  • subject

  • object

  • predicates

  • subject_ancestors – optional pre-generated ancestor list

  • object_ancestors – optional pre-generated ancestor list

  • include_owl_thing

Returns:

load_information_content_scores(source: str) None[source]

Load term information content values from file

Parameters:

source

Returns:

set_information_content_scores(scores: Iterable[Tuple[str, float]]) None[source]

Load term information content values from file

Parameters:

source

Returns:

get_information_content(curie: str, predicates: List[str] | None = None) float | None[source]

Returns the information content of a term.

IC(t) = -log2(Pr(t))

Parameters:
  • curie

  • predicates

Returns:

information_content_scores(curies: Iterable[str] | None = None, predicates: List[str] | None = None, object_closure_predicates: List[str] | None = None, use_associations: bool | None = None, term_to_entities_map: Dict[str, List[str]] | None = None, **kwargs) Iterator[Tuple[str, float]][source]

Yields entity-score pairs for a given collection of entities.

The Information Content (IC) score for a term t is determined by:

IC(t) = -log2(Pr(t))

Where the probability Pr(t) is determined by the frequency of that term against the whole corpus:

Pr(t) = freq(t)/|items|

Parameters:
  • curies

  • predicates

  • object_closure_predicates

  • use_associations

  • term_to_entities_map

  • kwargs

Returns:

pairwise_similarity(subject: str, object: str, predicates: List[str] | None = None, subject_ancestors: List[str] | None = None, object_ancestors: List[str] | None = None, min_jaccard_similarity: float | None = None, min_ancestor_information_content: float | None = None) TermPairwiseSimilarity | None[source]

Pairwise similarity between a pair of ontology terms

Parameters:
  • subject

  • object

  • predicates

  • subject_ancestors – optional pre-generated ancestor list

  • object_ancestors – optional pre-generated ancestor list

  • min_jaccard_similarity – minimum Jaccard similarity for a pair to be considered

  • min_ancestor_information_content – minimum IC for a common ancestor to be considered

Returns:

all_by_all_pairwise_similarity(subjects: Iterable[str], objects: Iterable[str], predicates: List[str] | None = None, min_jaccard_similarity: float | None = None, min_ancestor_information_content: float | None = None) Iterator[TermPairwiseSimilarity][source]

Compute similarity for all combinations of terms in subsets vs all terms in objects

Parameters:
  • subjects

  • objects

  • predicates

Returns: