hit

This module implements class relative to hit and some functions to do some computation on hit objects.

macsypy.hit.CoreHit

Modelize a hmm hit on the replicon. There is only one Corehit for a CoreGene.

macsypy.hit.ModelHit

Modelize a hit and its relation to the Model.

macsypy.hit.AbstractCounterpartHit

Parent class of Loner, MultiSystem. It’s inherits from ModelHit.

macsypy.hit.Loner

Modelize “true” Loner.

macsypy.hit.MultiSystem

Modelize hit which can be used in several Systems (same model)

macsypy.hit.LonerMultiSystem

Modelize a hit representing a gene Loner and MultiSystem at same time.

macsypy.hit.HitWeight

The weights apply to the hit to compute score

macsypy.hit.get_best_hit_4_func()

Return the best hit for a given function

macsypy.hit.sort_model_hits()

Sort hits

macsypy.hit.compute_best_MSHit()

Choose among svereal multisystem hits the best one

macsypy.hit.get_best_hits()

If several profile hit the same gene return the best hit

A Hit is created when hmmsearch find similarities between a profile and protein of the input dataset

Below the ingheritance diagram of Hits

Inheritance diagram of macsypy.hit.CoreHit, macsypy.hit.ModelHit, macsypy.hit.AbstractCounterpartHit, macsypy.hit.Loner, macsypy.hit.MultiSystem, macsypy.hit.LonerMultiSystem

And a diagram showing the interaction between CoreGene, ModelGene, Model, Hit, Loner, … interactions

../../_images/gene_obj_interaction.svg

The diagram above represents the models, genes and hit generated from the definitions below.

<model name="A" inter_gene_max_space="2">
    <gene name="abc" presence="mandatory"/>
    <gene name="def" presence="accessory"/>
</model>

<model name="B" inter_gene_max_space="5">
    <gene name="def" presence="mandatory"/>
        <exchangeables>
            <gene name="abc"/>
        </exchangeables>
    <gene name="ghj" presence="accessory"
</model>

hit API reference

CoreHit

class macsypy.hit.CoreHit(gene: CoreGene, hit_id: str, hit_seq_length: int, replicon_name: str, position_hit: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int)[source]

Handle the hits filtered from the Hmmer search. The hits are instanced by HMMReport.extract() method In one run of MacSyFinder, there exists only one CoreHit per gene These hits are independent of any macsypy.model.Model instance.

__eq__(other: CoreHit) bool[source]

Return True if two hits are totally equivalent, False otherwise.

Parameters:

other – the hit to compare to the current object

Returns:

the result of the comparison

__gt__(other: CoreHit) bool[source]

compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.

Parameters:

other – the hit to compare to the current object

Returns:

True if self is > other, False otherwise

__hash__() int[source]

To be hashable, it’s needed to be put in a set or used as dict key

__init__(gene: CoreGene, hit_id: str, hit_seq_length: int, replicon_name: str, position_hit: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int) None[source]
Parameters:
  • gene – the gene corresponding to this profile

  • hit_id – the identifier of the hit

  • hit_seq_length – the length of the hit sequence

  • replicon_name – the name of the replicon

  • position_hit – the rank of the sequence matched in the input dataset file

  • i_eval – the best-domain evalue (i-evalue, “independent evalue”)

  • score – the score of the hit

  • profile_coverage – percentage of the profile that matches the hit sequence

  • sequence_coverage – percentage of the hit sequence that matches the profile

  • begin_match – where the hit with the profile starts in the sequence

  • end_match – where the hit with the profile ends in the sequence

__lt__(other: CoreHit) bool[source]

Compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.

Parameters:

other – the hit to compare to the current object

Returns:

True if self is < other, False otherwise

__str__() str[source]
Returns:

Useful information on the CoreHit: regarding Hmmer statistics, and sequence information

Return type:

str

__weakref__

list of weak references to the object (if defined)

get_position() int[source]
Returns:

the position of the hit (rank in the input dataset file)

ModelHit

class macsypy.hit.ModelHit(hit: CoreHit, gene_ref: ModelGene, gene_status: GeneStatus)[source]

Encapsulates a macsypy.report.CoreHit This class stores a CoreHit that has been attributed to a putative system. Thus, it also stores:

  • the system,

  • the status of the gene in this system, (‘mandatory’, ‘accessory’, …

  • the gene in the model for which it’s an occurrence

for one gene it can exist several ModelHit instance one for each Model containing this gene

__eq__(other: ModelHit) bool[source]

Return self==value.

__gt__(other: ModelHit) bool[source]

Return self>value.

__hash__() int[source]

To be hashable, it’s needed to be put in a set or used as dict key

__init__(hit: CoreHit, gene_ref: ModelGene, gene_status: GeneStatus) None[source]
Parameters:
  • hit – a match between a hmm profile and a replicon

  • gene_ref

    The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the macsypy.gene.ModelGene.alternate_of()

    hit.gene_ref.alternate_of()
    

  • gene_status

__lt__(other: ModelHit) bool[source]

Return self<value.

__str__() str[source]

Return str(self).

__weakref__

list of weak references to the object (if defined)

property hit: CoreHit
Returns:

The CoreHit below this ModelHit

property loner: bool
Returns:

True if the hit represent a loner macsypy.Gene.ModelGene, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.

  • a hit representing a loner gene but include in a cluster is not a true loner

  • a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)

property multi_model: bool
Returns:

True if the hit represent a multi_model macsypy.Gene.ModelGene, False otherwise.

property multi_system: bool
Returns:

True if the hit represent a multi_system macsypy.Gene.ModelGene, False otherwise.

AbstractCounterpartHit

class macsypy.hit.AbstractCounterpartHit(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: set[ModelHit] | None = None)[source]

Abstract Class to handle ModelHit wit equivalent for instance Loner or MultiSystem hit

__init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: set[ModelHit] | None = None) None[source]
Parameters:
  • hit – a match between a hmm profile and a replicon

  • gene_ref

    The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the macsypy.gene.ModelGene.alternate_of()

    hit.gene_ref.alternate_of()
    

  • gene_status

__str__() str[source]

Return str(self).

property counterpart: set[ModelHit]
Returns:

The set of hits that can play the same role

property loner: bool
Returns:

True if the hit represent a loner macsypy.Gene.ModelGene, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.

  • a hit representing a loner gene but include in a cluster is not a true loner

  • a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)

property multi_system: bool
Returns:

True if the hit represent a multi_system macsypy.Gene.ModelGene, False otherwise.

Loner

class macsypy.hit.Loner(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]

Handle hit which encode for a gene tagged as loner and which not clustering with other hit.

__init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None) None[source]

hit that is outside a cluster, the gene_ref is a loner

Parameters:
  • hit – a match between a hmm profile and a replicon

  • gene_ref

    The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the macsypy.gene.ModelGene.alternate_of()

    hit.gene_ref.alternate_of()
    

  • gene_status

  • counterpart – the other occurrence of the gene or exchangeable in the replicon

property loner
Returns:

True if the hit represent a loner macsypy.Gene.ModelGene, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.

  • a hit representing a loner gene but include in a cluster is not a true loner

  • a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)

MultiSystem

class macsypy.hit.MultiSystem(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]

Handle hit which encode for a gene tagged as loner and which not clustering with other hit.

__init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]

hit that is outside a cluster, the gene_ref is a loner

Parameters:
  • hit – a match between a hmm profile and a replicon

  • gene_ref

    The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the macsypy.gene.ModelGene.alternate_of()

    hit.gene_ref.alternate_of()
    

  • gene_status

  • counterpart – the other occurence of the gene or exchangeable in the replicon

property multi_system: bool
Returns:

True if the hit represent a multi_system macsypy.Gene.ModelGene, False otherwise.

LonerMultiSystem

class macsypy.hit.LonerMultiSystem(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]
Handle hit which encode for a gene
  • gene tagged as multi-system

  • and gene tagged as loner also

  • and the hit do not clustering with other hits.

__init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]

hit that is outside a cluster, the gene_ref is loner and multi_system

Parameters:
  • hit – a match between a hmm profile and a replicon

  • gene_ref (macsypy.gene.ModelGene object) –

    The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the macsypy.gene.ModelGene.alternate_of()

    hit.gene_ref.alternate_of()
    

  • gene_status (macsypy.gene.GeneStatus object) –

  • counterpart (list of macsypy.hit.CoreHit) – the other occurence of the gene or exchangeable in the replicon

HitWeight

class macsypy.hit.HitWeight(itself: float = 1, exchangeable: float = 0.8, mandatory: float = 1, accessory: float = 0.5, neutral: float = 0, out_of_cluster: float = 0.7)[source]

The weight to compute the cluster and system score see user documentation macsyfinder functioning for further details by default

  • itself = 1

  • exchangeable = 0.8

  • mandatory = 1

  • accessory = 0.5

  • neutral = 0

  • out_of_cluster = 0.7

__delattr__(name)

Implement delattr(self, name).

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(itself: float = 1, exchangeable: float = 0.8, mandatory: float = 1, accessory: float = 0.5, neutral: float = 0, out_of_cluster: float = 0.7) None
__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

__weakref__

list of weak references to the object (if defined)

get_best_hit_4_func

macsypy.hit.get_best_hit_4_func(function: str, hits: Iterable[ModelHit], key: str = 'score') ModelHit[source]

select the best Loner among several ones encoding for same function

  • score

  • i_evalue

  • profile_coverage

Parameters:
  • function – the name of the function fulfill by the hits (all hits must have same function)

  • hits – the hits to filter.

  • key – The criterion used to select the best hit ‘score’, i_evalue’, ‘profile_coverage’

Returns:

the best hit

sort_model_hits

macsypy.hit.sort_model_hits(model_hits: Iterable[ModelHit]) dict[slice(<class 'str'>, list[macsypy.hit.ModelHit], None)][source]

Sort macsypy.hit.ModelHit per function

Parameters:

model_hits – a sequence of macsypy.hit.ModelHit

Returns:

dict {str function name: [model_hit, …] }

compute_best_MSHit

macsypy.hit.compute_best_MSHit(ms_registry: dict[slice(<class 'str'>, list[macsypy.hit.MultiSystem | macsypy.hit.LonerMultiSystem], None)]) list[MultiSystem | LonerMultiSystem][source]
Parameters:

ms_registry

Returns:

get_best_hits

macsypy.hit.get_best_hits(hits: Iterable[CoreHit | ModelHit], key: Literal['score', 'i_eval', 'profile_coverage'] = 'score') list[CoreHit | ModelHit][source]

If several hits match the same protein, keep only the best match based either on

  • score

  • i_evalue

  • profile_coverage

Parameters:
  • hits ([ macsypy.hit.CoreHit object, …]) – the hits to filter, all hits must match the same protein.

  • key (str) – The criterion used to select the best hit ‘score’, i_evalue’, ‘profile_coverage’

Returns:

the list of the best hits

Return type:

[ macsypy.hit.CoreHit object, …]