scripts

The are 4 entry points.

  • macsyfinder: which is the main scripts

  • macsydata: which allow to manage the models

  • macsyconfig: an interactive conversational utility to generate macsyfinder configuration file

  • macsyprofile: an utility dedicated to modelers which gather information about hmmer output

API reference

macsyfinder

Main entrypoint to macsyfinder

macsypy.scripts.macsyfinder._loner_warning(systems: list[System]) list[str][source]
Parameters:

systems – sequence of systems

Returns:

warning for loner which have less occurrences than systems occurrences in which this lone is used except if the loner is also multi system

macsypy.scripts.macsyfinder._outfile_header(models_fam_name: str, models_version: str, skipped_replicons: list[str] | None = None) str[source]
Returns:

The 2 first lines of each result file

macsypy.scripts.macsyfinder._search_in_ordered_replicon(hits_by_replicon: dict[slice(<class 'str'>, list[macsypy.hit.CoreHit], None)], models_to_detect: list[~macsypy.model.Model], config: ~macsypy.config.Config, logger: ~logging.Logger) tuple[list[System], list[RejectedCandidate]][source]
Parameters:
  • hits_by_replicon – the hits sort by replicon and position

  • models_to_detect – the models to search

  • config – MSF configuration

  • logger – the logger

macsypy.scripts.macsyfinder._search_in_unordered_replicon(hits_by_replicon: dict[slice(<class 'str'>, list[macsypy.hit.CoreHit], None)], models_to_detect: list[~macsypy.model.Model], logger: ~logging.Logger) tuple[list[LikelySystem], list[UnlikelySystem]][source]
Parameters:
  • hits_by_replicon – the hits sort by replicon and position

  • models_to_detect – the models to search

  • logger – the logger

macsypy.scripts.macsyfinder.alarm_handler(signum: Signals, frame) None[source]

Handle signal alarm flush loggers :param signum: :param frame: :raise: Timeout

macsypy.scripts.macsyfinder.get_version_message() str[source]
Returns:

the long description of the macsyfinder version

macsypy.scripts.macsyfinder.likely_systems_to_tsv(models_fam_name: str, models_version: str, likely_systems: list[LikelySystem], hit_system_tracker: HitSystemTracker, sys_file: IO) None[source]

print likely systems occurrences (from unordered replicon) in a file in tabulated separeted value (tsv) format

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • likely_systems – list of systems found

  • hit_system_tracker – a filled HitSystemTracker.

  • sys_file – The file where to write down the systems occurrences

Returns:

None

macsypy.scripts.macsyfinder.likely_systems_to_txt(models_fam_name: str, models_version: str, likely_systems: list[LikelySystem], hit_system_tracker: HitSystemTracker, sys_file: IO)[source]

print likely systems occurrences (from unordered replicon) in a file in text human readable format

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • likely_systems – list of systems found

  • hit_system_tracker – a filled HitSystemTracker.

  • sys_file – file object

Returns:

None

macsypy.scripts.macsyfinder.list_models(args: Namespace) str[source]
Parameters:

args – The command line argument once parsed

Returns:

a string representation of all models and submodels installed.

macsypy.scripts.macsyfinder.loners_to_tsv(models_fam_name: str, models_version: str, systems: list[System], sys_file: IO)[source]

get loners from valid systems and save them on file

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • systems – the systems from which the loners are extract

  • sys_file – the file where loners are saved

macsypy.scripts.macsyfinder.main(args: list[str] | None = None, loglevel: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int | None = None)[source]

main entry point to MacSyFinder do some check before to launch main_search_systems() which is the real function that perform a search

Parameters:
  • args – the arguments passed on the command line without the program name

  • loglevel – the output verbosity

macsypy.scripts.macsyfinder.multisystems_to_tsv(models_fam_name: str, models_version: str, systems: list[System], sys_file: IO)[source]

get multisystems from valid systems and save them on file

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • systems – the systems from which the loners are extract

  • sys_file – the file where multisystems are saved

macsypy.scripts.macsyfinder.parse_args(args: list[str]) tuple[ArgumentParser, Namespace][source]
Parameters:

args – The arguments provided on the command line

Returns:

The arguments parsed

macsypy.scripts.macsyfinder.rejected_candidates_to_tsv(models_fam_name: str, models_version: str, rejected_candidates: list[RejectedCandidate], cand_file: IO, skipped_replicons: list[str] | None = None)[source]

print rejected clusters in a file

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • rejected_candidates – list of candidates which does not contitute a system

  • cand_file – The file where to write down the rejected candidates

  • skipped_replicons – the replicons name for which msf reach the timeout

Returns:

None

macsypy.scripts.macsyfinder.rejected_candidates_to_txt(models_fam_name: str, models_version: str, rejected_candidates: list[RejectedCandidate], cand_file: IO, skipped_replicons: list[str] | None = None)[source]

print rejected clusters in a file

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • rejected_candidates – list of candidates which does not contitute a system

  • cand_file – The file where to write down the rejected candidates

  • skipped_replicons – the replicons name for which msf reach the timeout

Returns:

None

macsypy.scripts.macsyfinder.search_systems(config: Config, model_registry: ModelRegistry, models_def_to_detect: list[DefinitionLocation], logger: Logger) tuple[list[System | LikelySystem], list[RejectedCandidate] | list[UnlikelySystem]][source]

Do the job, this function is the orchestrator of all the macsyfinder mechanics at the end several files are produced containing the results

  • macsyfinder.conf: The set of variables used to runt this job

  • macsyfinder.systems: The list of the potential systems

  • macsyfinder.rejected_cluster: The list of all clusters and clusters combination

    which has been rejected and the reason

  • macsyfinder.log: the copy of the standard output

Parameters:
Returns:

the systems and rejected clusters found

Return type:

([macsypy.system.System, …], [macsypy.cluster.RejectedCandidate, …])

macsypy.scripts.macsyfinder.solutions_to_tsv(models_fam_name: str, models_version: str, solutions: list[Solution], hit_system_tracker: HitSystemTracker, sys_file: IO, skipped_replicons: list[str] | None = None) None[source]

print solution in a file in tabulated format A solution is a set of systems which represents an optimal combination of systems to maximize the score.

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • solutions – list of systems found

  • hit_system_tracker – a filled HitSystemTracker.

  • sys_file – The file where to write down the systems occurrences

  • skipped_replicons – the replicons name for which msf reach the timeout

Returns:

None

macsypy.scripts.macsyfinder.summary_best_solution(models_fam_name: str, models_version: str, best_solution_path: str, sys_file: IO, models_fqn: list[str], replicon_names: list[str], skipped_replicons: list[str] | None = None) None[source]

do a summary of best_solution in best_solution_path and write it on out_path a summary compute the number of system occurrence for each model and each replicon

replicon        model_fqn_1  model_fqn_2  ....
rep_name_1           1           2
rep_name_2           2           0

columns are separated by character

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • best_solution_path (str) – the path to the best_solution file in tsv format

  • sys_file – the file where to save the summary

  • models_fqn – the fully qualified names of the models

  • replicon_names – the names of the replicons used

  • skipped_replicons – the replicons name for which msf reach the timeout

macsypy.scripts.macsyfinder.systems_to_tsv(models_fam_name: str, models_version: str, systems: list[System], hit_system_tracker: HitSystemTracker, sys_file: IO, skipped_replicons: list[str] | None = None) None[source]

print systems occurrences in a file in tabulated format

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • systems – list of systems found

  • hit_system_tracker – a filled HitSystemTracker.

  • sys_file – The file where to write down the systems occurrences

  • skipped_replicons – the replicons name for which msf reach the timeout

Returns:

None

macsypy.scripts.macsyfinder.systems_to_txt(models_fam_name: str, models_version: str, systems: list[System], hit_system_tracker: HitSystemTracker, sys_file: IO, skipped_replicons: list[str] | None = None) None[source]

print systems occurrences in a file in human-readable format

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • systems – list of systems found

  • hit_system_tracker – a filled HitSystemTracker.

  • sys_file – The file where to write down the systems occurrences

  • skipped_replicons – the replicons name for which msf reach the timeout

Returns:

None

macsypy.scripts.macsyfinder.unlikely_systems_to_txt(models_fam_name: str, models_version: str, unlikely_systems: list[UnlikelySystem], sys_file: IO)[source]

print hits (from unordered replicon) which probably does not make a system occurrences in a file in human readable format

Parameters:
  • models_fam_name – the family name of the models (Conj, CrisprCAS, …)

  • models_version – the version of the models

  • unlikely_systems – list of macsypy.system.UnLikelySystem objects

  • sys_file – The file where to write down the systems occurrences

Returns:

None

macsydata

This is the entrypoint to the macsydata command macsydata allow the user to manage the MacSyFinder models

macsypy.scripts.macsydata._find_all_installed_packages(models_dir: list[str] | None = None) ModelRegistry[source]
Returns:

all models installed

macsypy.scripts.macsydata._find_installed_package(pack_name: str, models_dir: list[str] | None = None) ModelLocation | None[source]

search if a package names pack_name is already installed

Parameters:

pack_name – the name of the family model to search

Returns:

The model location corresponding to the pack_name

macsypy.scripts.macsydata._get_remote_available_versions(pack_name: str, org: str) list[str][source]

Ask the organization org the available version for the package pack_name :param pack_name: the name of the package :param org: The remote organization to query :return: list of available version for the package

macsypy.scripts.macsydata._search_in_desc(pattern: str, remote: RemoteModelIndex, packages: list[str], match_case: bool = False) tuple[str, str, str][source]
Parameters:
  • pattern – the substring to search packages descriptions

  • remote – the uri of the macsy-models index

  • packages – list of packages to search in

  • match_case – True if the search is case-sensitive, False otherwise

Returns:

macsypy.scripts.macsydata._search_in_pack_name(pattern: str, remote: RemoteModelIndex, packages: list[str], match_case: bool = False) list[tuple[str, str, dict]][source]
Parameters:
  • pattern – the substring to search packages names

  • remote – the uri of the macsy-models index

  • packages – list of packages to search in

  • match_case – True if the search is case-sensitive, False otherwise

Returns:

macsypy.scripts.macsydata.build_arg_parser() ArgumentParser[source]

Build argument parser.

macsypy.scripts.macsydata.cmd_name(args: Namespace) str[source]

Return the name of the command being executed (scriptname + operation).

Example

macsydata uninstall

Parameters:

args – the arguments passed on the command line

macsypy.scripts.macsydata.do_available(args: Namespace) None[source]

List Models available on macsy-models :param args: the arguments passed on the command line :return: None

macsypy.scripts.macsydata.do_check(args: Namespace) None[source]
Parameters:

args – the arguments passed on the command line

Return type:

None

macsypy.scripts.macsydata.do_cite(args: Namespace) None[source]

How to cite an installed model.

Parameters:

args – the arguments passed on the command line

macsypy.scripts.macsydata.do_download(args: Namespace) str[source]

Download tarball from remote models’ repository.

Parameters:

args (argparse.Namespace object) – the arguments passed on the command line

macsypy.scripts.macsydata.do_freeze(args: Namespace) None[source]

display all models installed with their respective version, in requirement format.

Parameters:

args – the arguments passed on the command line

macsypy.scripts.macsydata.do_help(args: Namespace) None[source]

Display on stdout the content of readme file if the readme file does not exist display a message to the user see macsypy.package.help()

Parameters:

args – the arguments passed on the command line (the package name)

Returns:

None

Raises:

ValueError – if the package name is not known.

macsypy.scripts.macsydata.do_info(args: Namespace) None[source]

Show information about installed model.

Parameters:

args – the arguments passed on the command line

Raises:

ValueError – if the package is not found locally

macsypy.scripts.macsydata.do_init_package(args: Namespace) None[source]

Create a template for data package

  • skeleton for metadata.yml

  • definitions directory with a skeleton of models.xml

  • profiles directory

  • skeleton for README.md file

  • COPYRIGHT file (if holders option is set)

  • LICENSE file (if license option is set)

Parameters:

args – The parsed commandline subcommand arguments

Returns:

None

macsypy.scripts.macsydata.do_install(args: Namespace) None[source]

Install new models in macsyfinder local models repository.

Parameters:

args – the arguments passed on the command line

Raises:
  • RuntimeError – if there is problem is installed package

  • ValueError – if the package and/or version is not found

macsypy.scripts.macsydata.do_list(args: Namespace) None[source]

List installed models.

Parameters:

args – the arguments passed on the command line

Search macsy-models for Model in a remote index. by default search in package name, if option -S is set search also in description by default the search is case-insensitive except if option –match-case is set.

Parameters:

args – the arguments passed on the command line

macsypy.scripts.macsydata.do_show_definition(args: Namespace) None[source]

display on stdout the definition if only a package or sub-package is specified display all model definitions in the corresponding package or subpackage

for instance

TXSS+/bacterial T6SSii T6SSiii

display models TXSS+/bacterial/T6SSii and TXSS+/bacterial/T6SSiii

TXSS+/bacterial all or TXSS+/bacterial

display all models contains in TXSS+/bacterial subpackage

Parameters:

args – the arguments passed on the command line

macsypy.scripts.macsydata.do_uninstall(args: Namespace) None[source]

Remove models from macsyfinder local models repository.

Parameters:

args – the arguments passed on the command line

Raises:

ValueError – if the package is not found locally

macsypy.scripts.macsydata.get_version_message() str[source]
Returns:

the long description of the macsyfinder version

Return type:

str

macsypy.scripts.macsydata.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True) Logger[source]
Parameters:
  • level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’

  • out – if the log message must be displayed

Returns:

logger

macsypy.scripts.macsydata.main(args: list[str] | None = None) None[source]

Main entry point.

Parameters:

args – the arguments passed on the command line (before parsing)

macsypy.scripts.macsydata.verbosity_to_log_level(verbosity: int) int[source]

transform the number of -v option in loglevel :param verbosity: number of -v option on the command line :return: an int corresponding to a logging level

macsyconfig

Entrypoint for macsyconfig command which generate a MacSyFinder config file

class macsypy.scripts.macsyconfig.ConfigParserWithComments(defaults=None, dict_type=<class 'dict'>, allow_no_value=False, *, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section='DEFAULT', interpolation=<object object>, converters=<object object>)[source]

Extend ConfigParser to allow comment in serialization

add_comment(section: str, option: str, comment: str, comment_nb: int = count(1), add_space_before: bool = False, add_space_after: bool = True) None[source]

Write a comment in .ini-format (start line with #)

Parameters:
  • section – the name of the section

  • option – the name of the option

  • comment – the comment linked to this option

  • comment_nb – the identifier of the comment by default an integer

  • add_space_before

  • add_space_after

write(file: IO) None[source]

Write an .ini-format representation of the configuration state.

Parameters:

file (file) – the file object wher to write the configuration

class macsypy.scripts.macsyconfig.Theme(ERROR: str = '\x1b[1m\x1b[31m', WARN: str = '\x1b[33m', SECTION: str = '\x1b[35m', RESET: str = '\x1b[0m', RETRY: str = '\x1b[33m', QUESTION: str = '\x1b[32m', EMPHASIZE: str = '\x1b[1m', EXPLANATION: str = '\x1b[0m', DEFAULT: str = '\x1b[1m\x1b[32m')[source]

Handle color combination to highlight interactive question

__delattr__(name)

Implement delattr(self, name).

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(ERROR: str = '\x1b[1m\x1b[31m', WARN: str = '\x1b[33m', SECTION: str = '\x1b[35m', RESET: str = '\x1b[0m', RETRY: str = '\x1b[33m', QUESTION: str = '\x1b[32m', EMPHASIZE: str = '\x1b[1m', EXPLANATION: str = '\x1b[0m', DEFAULT: str = '\x1b[1m\x1b[32m') None
__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

__weakref__

list of weak references to the object (if defined)

macsypy.scripts.macsyconfig._validator(cast_func: Callable, raw: Any, default: Any, sequence: bool = False) Any[source]
Parameters:
  • cast_func – the function which will cast the raw value

  • raw – the raw value

  • default – the default value

  • sequence – True if the value is a sequence, False otherwise

Returns:

The cast Value

Raises:

MacsypyError – if the raw value cannot be cast

macsypy.scripts.macsyconfig.ask(question: str, validator: Callable, default: Any | None = None, expected: Any | None = None, explanation: str = '', sequence: bool = False, question_color: str | None = None, retry: int = 2)[source]

ask a question on the terminal and return the user response check if the user response is allowed (right type, among allowed values, …)

Parameters:
  • question – The question to prompt to the user on the terminal

  • validator – what validator to be used to check the user response

  • default – the default value

  • expected – the values allowed (can be a list of value

  • explanation – some explanation about the option

  • sequence – True if the parameter accept a sequence of value (comma separated values)

  • question_color – the color of the question display to the user

  • retry – The number of time to repeat the question if the response is rejected

Returns:

the value casted in right type

macsypy.scripts.macsyconfig.check_bool(raw: str, default: bool, expected, sequence: bool = False) bool[source]

Check if value can be cast in str

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_choice(raw: str, default: str, expected: list[str], sequence: bool = False) str[source]

Check if value is in list of expected values

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – the allowed values for this option

  • sequence – True if parameter accept a sequence of value, False otherwise

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_dir(raw: str, default: str, expected, sequence: bool = False) str[source]

Check if value point to a directory

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_exe(raw: str, default: str, expected, sequence: bool = False) str[source]

Check if value point to an executable

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_file(raw: str, default: str, expected, sequence: bool = False) str[source]

Check if value point to a file

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_float(raw: str, default: float, expected, sequence: bool = False) float[source]

Check if value can be cast in float

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_positive_int(raw: str, default: int, expected, sequence: bool = False) int[source]

Check if value can be cast in integer >=0

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_str(raw: str, default: str, expected, sequence: bool = False) str[source]

Check if value can be cast in str

Parameters:
  • raw – the value return by the user

  • default – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns:

value

Raises:

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.epilog(path: str) str[source]
Returns:

the text to the user before to start the configuration

macsypy.scripts.macsyconfig.main(args: list[str] | None = None) None[source]

The main entrypoint of the script

Parameters:

args – the command line arguments.

macsypy.scripts.macsyconfig.parse_args(args: list[str]) Namespace[source]

parse command line

Parameters:

args – the command line arguments

Returns:

macsypy.scripts.macsyconfig.prolog() str[source]
Returns:

the text displayed to the user when the configuration file is generated

macsypy.scripts.macsyconfig.serialize(config: ConfigParserWithComments, path: str) None[source]

save the configuration on file

Parameters:
  • config – the config to save

  • path (str) – where to store the configuration

macsypy.scripts.macsyconfig.set_base_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None[source]

Options for base section

Parameters:
  • config – The config to setup

  • defaults – the macsyfinder defaults values

  • use_defaults (bool) – If True do not ask any question use the defaults values

macsypy.scripts.macsyconfig.set_general_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None[source]

Options for general section

Parameters:
  • config – The config to setup

  • defaults – the macsyfinder defaults values

  • use_defaults (bool) – If True do not ask any question use the defaults values

macsypy.scripts.macsyconfig.set_hmmer_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None[source]

Options for hmmer section

Parameters:
  • config – The config to setup

  • defaults – the macsyfinder defaults values

  • use_defaults (bool) – If True do not ask any question use the defaults values

macsypy.scripts.macsyconfig.set_path_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None[source]

Options for directories section

Parameters:
  • config – The config to setup

  • defaults – the macsyfinder defaults values

  • use_defaults (bool) – If True do not ask any question use the defaults values

macsypy.scripts.macsyconfig.set_score_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None[source]

Options for scoring section

Parameters:
  • config – The config to setup

  • defaults – the macsyfinder defaults values

  • use_defaults (bool) – If True do not ask any question use the defaults values

macsypy.scripts.macsyconfig.set_section(sec_name: str, options: dict[slice(<class 'str'>, typing.Any, None)], config: ~macsypy.scripts.macsyconfig.ConfigParserWithComments, defaults: ~macsypy.config.MacsyDefaults, use_defaults: bool = False) ConfigParserWithComments[source]

iter over options of a section ask question for each option and set this option in the config

Parameters:
  • sec_name – the name of the section

  • options – a dictionnary with the options to set up for this section

  • config – The config to fill in.

  • defaults – the macsyfinder defaults values

  • use_defaults – The user skip this section so use defaults to set in config object

Returns:

configuration

macsyprofile

class macsypy.scripts.macsyprofile.HmmProfile(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]

Handle the HMM output files

__init__(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]
Parameters:
  • gene_name – the name of the gene corresponding to the profile search reported here

  • hmmer_output – The path to the raw Hmmer output file

  • cfg – the configuration object

__weakref__

list of weak references to the object (if defined)

_build_my_db(hmm_output: str) dict[slice(<class 'str'>, None, None)][source]

Build the keys of a dictionary object to store sequence identifiers of hits.

Parameters:

hmm_output – the path to the hmmsearch output to parse.

Returns:

a dictionary containing a key for each sequence id of the hits

_fill_my_db(db: dict[slice(<class 'str'>, tuple[int, int], None)]) None[source]

Fill the dictionary with information on the matched sequences

Parameters:

db – the database containing all sequence id of the hits.

_hit_start(line: str) bool[source]
Parameters:

line – the line to parse

Returns:

True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise

_parse_hmm_body(hit_id: str, gene_profile_lg: int, seq_lg: int, coverage_threshold: float, replicon_name: str, position_hit: int, i_evalue_sel: float, b_grp: list[list[str]]) list[CoreHit][source]

Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)

Parameters:
  • hit_id – the sequence identifier

  • gene_profile_lg – the length of the profile matched

  • seq_lg – the length of the sequence

  • coverage_threshold – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection.

  • replicon_name – the identifier of the replicon

  • position_hit – the rank of the sequence matched in the input dataset file

  • i_evalue_sel – the maximal i-evalue (independent evalue) for hit selection

  • b_grp – the Hmmer output lines to deal with (grouped by hit)

Returns:

a sequence of hits

_parse_hmm_header(h_grp: str) str[source]
Parameters:

h_grp – the sequence of string return by groupby function representing the header of a hit

Returns:

the sequence identifier from a set of lines that corresponds to a single hit

parse() list[LightHit][source]

parse a hmm output file and extract all hits and do some basic computation (coverage profile)

Returns:

The list of extracted hits

class macsypy.scripts.macsyprofile.LightHit(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int)[source]

Handle hmm hits

__eq__(other)

Return self==value.

__hash__ = None
__init__(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int) None
__repr__()

Return repr(self).

__str__() str[source]

Return str(self).

__weakref__

list of weak references to the object (if defined)

macsypy.scripts.macsyprofile.get_gene_name(path: str, suffix: str) str[source]
Parameters:
  • path – The path to the hmm output to analyse

  • suffix – the suffix of the hmm output file

Returns:

the name of the analysed gene

macsypy.scripts.macsyprofile.get_profile_len(path: str) int[source]

Parse the HMM profile to extract the length and the presence of GA bit threshold

Parameters:

path – The path to the hmm profile used to produce the hmm search output to analyse

Returns:

the length, presence of ga bit threshold

macsypy.scripts.macsyprofile.get_version_message() str[source]
Returns:

the long description of the macsyfinder version

macsypy.scripts.macsyprofile.header(cmd: list[str]) str[source]
Parameters:

cmd – the command use dto launch this analyse

Returns:

The header of the result file

macsypy.scripts.macsyprofile.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True)[source]
Parameters:
  • level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’

  • out – if the log message must be displayed

Returns:

logger

macsypy.scripts.macsyprofile.main(args: list[str] | None = None, log_level: str | int | None = None) None[source]

main entry point to macsyprofile

Parameters:
  • args – the arguments passed on the command line without the program name

  • log_level – the output verbosity

macsypy.scripts.macsyprofile.parse_args(args: list[str]) Namespace[source]
Parameters:

args – The arguments provided on the command line

Returns:

The arguments parsed

macsypy.scripts.macsyprofile.verbosity_to_log_level(verbosity: int) int[source]

transform the number of -v option in loglevel :param verbosity: number of -v option on the command line :return: an int corresponding to a logging level