cluster

A cluster is an ordered set of hits related to a model which satisfy the model distance constraints.

cluster API reference

cluster

class macsypy.cluster.Cluster(hits: list[CoreHit | ModelHit], model, hit_weights)[source]

Handle hits relative to a model which collocates

__contains__(m_hit: ModelHit) bool[source]
Parameters:

m_hit – The hit to test

Returns:

True if the hit is in the cluster hits, False otherwise

__init__(hits: list[CoreHit | ModelHit], model, hit_weights) None[source]
Parameters:
  • hits – the hits constituting this cluster

  • model – the model associated to this cluster

  • hit_weights – the weight of the hit to compute the score

__str__() str[source]
Returns:

a string representation of this cluster

__weakref__

list of weak references to the object (if defined)

_check_replicon_consistency() None[source]
Raise:

MacsypyError if all hits of a cluster are NOT related to the same replicon

fulfilled_function(*genes: ModelGene | str) frozenset[str][source]
Parameters:

genes – The genes which must be tested.

Returns:

the common functions between genes and this cluster.

property functions: frozenset[str]
Returns:

The set of functions encoded by this cluster function mean gene name or reference gene name for exchangeables genes for instance

<model vers=”2.0”>

<gene a presence=”mandatory”/> <gene b presence=”accessory”/>

<exchangeable>

<gene c />

</exchangeable>

<gene/>

</model>

the functions for a cluster corresponding to this model wil be {‘a’ , ‘b’}

property hit_weights: HitWeight
Returns:

the different weight for the hits used to compute the score

property loner: bool
Returns:

True if this cluster is made of only some hits representing the same gene and this gene is tag as loner False otherwise: - contains several hits coding for different genes - contains one hit but gene is not tag as loner (max_gene_required = 1)

merge(cluster: Cluster, before: bool = False) None[source]

merge the cluster param in this one. (do it in place)

Parameters:
  • cluster

  • before (bool) – If False the hits of the cluster will be added at the end of this one, Otherwise the cluster hits will be inserted before the hits of this one.

Raises:

MacsypyError – if the two clusters have not the same model

property multi_system: bool
Returns:

True if this cluster is made of only one hit representing a multi_system gene False otherwise:

  • contains several hits

  • contains one hit but gene is not tag as loner (max_gene_required = 1)

replace(old: ModelHit, new: ModelHit) None[source]

replace hit old in this cluster by new one. (do it in place)

Parameters:
  • old – the hit to replace

  • new – the new hit

Returns:

None

property replicon_name: str
Returns:

The name of the replicon where this cluster is located

Return type:

str

property score: float
Returns:

The score for this cluster

build_clusters

macsypy.cluster.build_clusters(hits: list[ModelHit], rep_info: RepliconInfo, model: Model, hit_weights: HitWeight) tuple[list[~macsypy.cluster.Cluster], dict[slice(<class 'str'>, macsypy.hit.Loner | macsypy.hit.LonerMultiSystem, None)]][source]

From a list of filtered hits, and replicon information (topology, length), build all lists of hits that satisfied the constraints:

  • max_gene_inter_space

  • loner

  • multi_system

If Yes create a cluster. A cluster contains at least two hits separated by less or equal than max_gene_inter_space Except for loner genes which are allowed to be alone in a cluster

Parameters:
  • hits – list of filtered hits

  • rep_info – the replicon to analyse

  • model – the model to study

  • hit_weights – the hit weight needed to compute the cluster score

Returns:

list of regular clusters, the special clusters (loners not in cluster and multi systems)

Return type:

tuple with 2 elements

  • true_clusters which is list of Cluster objects

  • true_loners: a dict { str function: :class:macsypy.hit.Loner | :class:macsypy.hit.LonerMultiSystem object}