… MacSyFinder - Detection of macromolecular systems in protein datasets

using systems modelling and similarity search. Authors: Sophie Abby, Bertrand Néron Copyright © 2014-2023 Institut Pasteur (Paris), and CNRS. See the COPYRIGHT file for details MacsyFinder is distributed under the terms of the GNU General Public License (GPLv3). See the COPYING file for details.

search_genes

manage the paralelization of code which execute in fine hmmsearch to find the genes constituting the models in the input dataset.

search_genes API reference

search_genes

Manage the hmm step (hmmsearch or recover results from previous run) in parallele

macsypy.search_genes.search_genes(genes: list[ModelGene], cfg: Config) list[HMMReport][source]

For each gene of the list, use the corresponding profile to perform an Hmmer search, and parse the output to generate a HMMReport that is saved in a file after CoreHit filtering. These tasks are performed in parallel using threads. The number of workers can be limited by worker_nb directive in the config object or in the command-line with the “-w” option.

Parameters:
  • genes – the genes to search in the input sequence dataset

  • cfg – the configuration object

macsypy.search_genes.worker_cpu(genes_nb: int, cfg: Config) tuple[int, int][source]

Compute the optimum number of worker and cpu per worker The number of worker is set by the user (1 by default 0 means all worker available)

we use one worker per gene if number of workers is greater than number of genes then several cpu can be use by hmsearch to speed up the search step

Parameters:
  • genes_nb – the number of genes to search

  • cfg – The macsyfinder configuration

Returns:

the number of worker and cpu_per_worker to use

Return type:

tuple (int worker_nb, int cpu_per_worker)