… MacSyFinder - Detection of macromolecular systems in protein datasets: using systems modelling and similarity search. Authors: Sophie Abby, Bertrand Néron Copyright © 2014-2023 Institut Pasteur (Paris), and CNRS. See the COPYRIGHT file for details MacsyFinder is distributed under the terms of the GNU General Public License (GPLv3). See the COPYING file for details.

search_genes

manage the paralelization of code which execute in fine hmmsearch to find the genes constituting the models in the input dataset.

search_genes API reference

search_genes

Manage the hmm step (hmmsearch or recover results from previous run) in parallele

macsypy.search_genes.search_genes(genes: list[ModelGene], cfg: Config) → list[HMMReport][source]

For each gene of the list, use the corresponding profile to perform an Hmmer search, and parse the output to generate a HMMReport that is saved in a file after CoreHit filtering. These tasks are performed in parallel using threads. The number of workers can be limited by worker_nb directive in the config object or in the command-line with the “-w” option.

Parameters:

genes – the genes to search in the input sequence dataset
cfg – the configuration object

macsypy.search_genes.worker_cpu(genes_nb: int, cfg: Config) → tuple[int, int][source]

Compute the optimum number of worker and cpu per worker The number of worker is set by the user (1 by default 0 means all worker available)

we use one worker per gene if number of workers is greater than number of genes then several cpu can be use by hmsearch to speed up the search step

Parameters:

genes_nb – the number of genes to search
cfg – The macsyfinder configuration

Returns:

the number of worker and cpu_per_worker to use

Return type:

tuple (int worker_nb, int cpu_per_worker)