hit
This module implements class relative to hit and some functions to do some computation on hit objects.
Modelize a hmm hit on the replicon. There is only one Corehit for a CoreGene. |
|
Modelize a hit and its relation to the Model. |
|
Parent class of Loner, MultiSystem. It’s inherits from ModelHit. |
|
Modelize “true” Loner. |
|
Modelize hit which can be used in several Systems (same model) |
|
Modelize a hit representing a gene Loner and MultiSystem at same time. |
|
The weights apply to the hit to compute score |
|
Return the best hit for a given function |
|
Sort hits |
|
Choose among svereal multisystem hits the best one |
|
If several profile hit the same gene return the best hit |
A Hit is created when hmmsearch find similarities between a profile and protein of the input dataset
Below the ingheritance diagram of Hits
And a diagram showing the interaction between CoreGene, ModelGene, Model, Hit, Loner, … interactions
hit API reference
CoreHit
- class macsypy.hit.CoreHit(gene: CoreGene, hit_id: str, hit_seq_length: int, replicon_name: str, position_hit: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int)[source]
Handle the hits filtered from the Hmmer search. The hits are instanced by
HMMReport.extract()
method In one run of MacSyFinder, there exists only one CoreHit per gene These hits are independent of anymacsypy.model.Model
instance.- __eq__(other: CoreHit) bool [source]
Return True if two hits are totally equivalent, False otherwise.
- Parameters:
other – the hit to compare to the current object
- Returns:
the result of the comparison
- __gt__(other: CoreHit) bool [source]
compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.
- Parameters:
other – the hit to compare to the current object
- Returns:
True if self is > other, False otherwise
- __init__(gene: CoreGene, hit_id: str, hit_seq_length: int, replicon_name: str, position_hit: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int) None [source]
- Parameters:
gene – the gene corresponding to this profile
hit_id – the identifier of the hit
hit_seq_length – the length of the hit sequence
replicon_name – the name of the replicon
position_hit – the rank of the sequence matched in the input dataset file
i_eval – the best-domain evalue (i-evalue, “independent evalue”)
score – the score of the hit
profile_coverage – percentage of the profile that matches the hit sequence
sequence_coverage – percentage of the hit sequence that matches the profile
begin_match – where the hit with the profile starts in the sequence
end_match – where the hit with the profile ends in the sequence
- __lt__(other: CoreHit) bool [source]
Compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.
- Parameters:
other – the hit to compare to the current object
- Returns:
True if self is < other, False otherwise
- __str__() str [source]
- Returns:
Useful information on the CoreHit: regarding Hmmer statistics, and sequence information
- Return type:
str
- __weakref__
list of weak references to the object (if defined)
ModelHit
- class macsypy.hit.ModelHit(hit: CoreHit, gene_ref: ModelGene, gene_status: GeneStatus)[source]
Encapsulates a
macsypy.report.CoreHit
This class stores a CoreHit that has been attributed to a putative system. Thus, it also stores:the system,
the status of the gene in this system, (‘mandatory’, ‘accessory’, …
the gene in the model for which it’s an occurrence
for one gene it can exist several ModelHit instance one for each Model containing this gene
- __init__(hit: CoreHit, gene_ref: ModelGene, gene_status: GeneStatus) None [source]
- Parameters:
hit – a match between a hmm profile and a replicon
gene_ref –
The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the
macsypy.gene.ModelGene.alternate_of()
hit.gene_ref.alternate_of()
gene_status –
- __weakref__
list of weak references to the object (if defined)
- property loner: bool
- Returns:
True if the hit represent a loner
macsypy.Gene.ModelGene
, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.a hit representing a loner gene but include in a cluster is not a true loner
a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)
- property multi_model: bool
- Returns:
True if the hit represent a multi_model
macsypy.Gene.ModelGene
, False otherwise.
- property multi_system: bool
- Returns:
True if the hit represent a multi_system
macsypy.Gene.ModelGene
, False otherwise.
AbstractCounterpartHit
- class macsypy.hit.AbstractCounterpartHit(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: set[ModelHit] | None = None)[source]
Abstract Class to handle ModelHit wit equivalent for instance Loner or MultiSystem hit
- __init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: set[ModelHit] | None = None) None [source]
- Parameters:
hit – a match between a hmm profile and a replicon
gene_ref –
The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the
macsypy.gene.ModelGene.alternate_of()
hit.gene_ref.alternate_of()
gene_status –
- property loner: bool
- Returns:
True if the hit represent a loner
macsypy.Gene.ModelGene
, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.a hit representing a loner gene but include in a cluster is not a true loner
a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)
- property multi_system: bool
- Returns:
True if the hit represent a multi_system
macsypy.Gene.ModelGene
, False otherwise.
Loner
- class macsypy.hit.Loner(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]
Handle hit which encode for a gene tagged as loner and which not clustering with other hit.
- __init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None) None [source]
hit that is outside a cluster, the gene_ref is a loner
- Parameters:
hit – a match between a hmm profile and a replicon
gene_ref –
The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the
macsypy.gene.ModelGene.alternate_of()
hit.gene_ref.alternate_of()
gene_status –
counterpart – the other occurrence of the gene or exchangeable in the replicon
- property loner
- Returns:
True if the hit represent a loner
macsypy.Gene.ModelGene
, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.a hit representing a loner gene but include in a cluster is not a true loner
a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)
MultiSystem
- class macsypy.hit.MultiSystem(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]
Handle hit which encode for a gene tagged as loner and which not clustering with other hit.
- __init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]
hit that is outside a cluster, the gene_ref is a loner
- Parameters:
hit – a match between a hmm profile and a replicon
gene_ref –
The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the
macsypy.gene.ModelGene.alternate_of()
hit.gene_ref.alternate_of()
gene_status –
counterpart – the other occurence of the gene or exchangeable in the replicon
- property multi_system: bool
- Returns:
True if the hit represent a multi_system
macsypy.Gene.ModelGene
, False otherwise.
LonerMultiSystem
- class macsypy.hit.LonerMultiSystem(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]
- Handle hit which encode for a gene
gene tagged as multi-system
and gene tagged as loner also
and the hit do not clustering with other hits.
- __init__(hit: CoreHit | ModelHit, gene_ref: ModelGene | None = None, gene_status: GeneStatus | None = None, counterpart: Iterable[CoreHit] | None = None)[source]
hit that is outside a cluster, the gene_ref is loner and multi_system
- Parameters:
hit – a match between a hmm profile and a replicon
gene_ref (
macsypy.gene.ModelGene
object) –The ModelGene link to this hit The ModeleGene have the same name as the CoreGene But one hit can be linked to several ModelGene (several Model) To know for what gene this hit play role use the
macsypy.gene.ModelGene.alternate_of()
hit.gene_ref.alternate_of()
gene_status (
macsypy.gene.GeneStatus
object) –counterpart (list of
macsypy.hit.CoreHit
) – the other occurence of the gene or exchangeable in the replicon
HitWeight
- class macsypy.hit.HitWeight(itself: float = 1, exchangeable: float = 0.8, mandatory: float = 1, accessory: float = 0.5, neutral: float = 0, out_of_cluster: float = 0.7)[source]
The weight to compute the cluster and system score see user documentation macsyfinder functioning for further details by default
itself = 1
exchangeable = 0.8
mandatory = 1
accessory = 0.5
neutral = 0
out_of_cluster = 0.7
- __delattr__(name)
Implement delattr(self, name).
- __eq__(other)
Return self==value.
- __hash__()
Return hash(self).
- __init__(itself: float = 1, exchangeable: float = 0.8, mandatory: float = 1, accessory: float = 0.5, neutral: float = 0, out_of_cluster: float = 0.7) None
- __repr__()
Return repr(self).
- __setattr__(name, value)
Implement setattr(self, name, value).
- __weakref__
list of weak references to the object (if defined)
get_best_hit_4_func
- macsypy.hit.get_best_hit_4_func(function: str, hits: Iterable[ModelHit], key: str = 'score') ModelHit [source]
select the best Loner among several ones encoding for same function
score
i_evalue
profile_coverage
- Parameters:
function – the name of the function fulfill by the hits (all hits must have same function)
hits – the hits to filter.
key – The criterion used to select the best hit ‘score’, i_evalue’, ‘profile_coverage’
- Returns:
the best hit
sort_model_hits
- macsypy.hit.sort_model_hits(model_hits: Iterable[ModelHit]) dict[slice(<class 'str'>, list[macsypy.hit.ModelHit], None)] [source]
Sort
macsypy.hit.ModelHit
per function- Parameters:
model_hits – a sequence of
macsypy.hit.ModelHit
- Returns:
dict {str function name: [model_hit, …] }
compute_best_MSHit
- macsypy.hit.compute_best_MSHit(ms_registry: dict[slice(<class 'str'>, list[macsypy.hit.MultiSystem | macsypy.hit.LonerMultiSystem], None)]) list[MultiSystem | LonerMultiSystem] [source]
- Parameters:
ms_registry –
- Returns:
get_best_hits
- macsypy.hit.get_best_hits(hits: Iterable[CoreHit | ModelHit], key: Literal['score', 'i_eval', 'profile_coverage'] = 'score') list[CoreHit | ModelHit] [source]
If several hits match the same protein, keep only the best match based either on
score
i_evalue
profile_coverage
- Parameters:
hits ([
macsypy.hit.CoreHit
object, …]) – the hits to filter, all hits must match the same protein.key (str) – The criterion used to select the best hit ‘score’, i_evalue’, ‘profile_coverage’
- Returns:
the list of the best hits
- Return type:
[
macsypy.hit.CoreHit
object, …]