- … MacSyFinder - Detection of macromolecular systems in protein datasets
using systems modelling and similarity search. Authors: Sophie Abby, Bertrand Néron Copyright © 2014-2023 Institut Pasteur (Paris), and CNRS. See the COPYRIGHT file for details MacsyFinder is distributed under the terms of the GNU General Public License (GPLv3). See the COPYING file for details.
search_genes
manage the paralelization of code which execute in fine hmmsearch to find the genes constituting the models in the input dataset.
search_genes API reference
search_genes
Manage the hmm step (hmmsearch or recover results from previous run) in parallele
- macsypy.search_genes.search_genes(genes: list[ModelGene], cfg: Config) list[HMMReport] [source]
For each gene of the list, use the corresponding profile to perform an Hmmer search, and parse the output to generate a HMMReport that is saved in a file after CoreHit filtering. These tasks are performed in parallel using threads. The number of workers can be limited by worker_nb directive in the config object or in the command-line with the “-w” option.
- Parameters:
genes – the genes to search in the input sequence dataset
cfg – the configuration object
- macsypy.search_genes.worker_cpu(genes_nb: int, cfg: Config) tuple[int, int] [source]
Compute the optimum number of worker and cpu per worker The number of worker is set by the user (1 by default 0 means all worker available)
we use one worker per gene if number of workers is greater than number of genes then several cpu can be use by hmsearch to speed up the search step
- Parameters:
genes_nb – the number of genes to search
cfg – The macsyfinder configuration
- Returns:
the number of worker and cpu_per_worker to use
- Return type:
tuple (int worker_nb, int cpu_per_worker)