scripts
The are 4 entry points.
macsyfinder: which is the main scripts
macsydata: which allow to manage the models
macsyconfig: an interactive conversational utility to generate macsyfinder configuration file
macsyprofile: an utility dedicated to modelers which gather information about hmmer output
API reference
macsyfinder
Main entrypoint to macsyfinder
- macsypy.scripts.macsyfinder._loner_warning(systems: list[System]) list[str] [source]
- Parameters:
systems – sequence of systems
- Returns:
warning for loner which have less occurrences than systems occurrences in which this lone is used except if the loner is also multi system
- macsypy.scripts.macsyfinder._outfile_header(models_fam_name: str, models_version: str, skipped_replicons: list[str] | None = None) str [source]
- Returns:
The 2 first lines of each result file
- macsypy.scripts.macsyfinder._search_in_ordered_replicon(hits_by_replicon: dict[slice(<class 'str'>, list[macsypy.hit.CoreHit], None)], models_to_detect: list[~macsypy.model.Model], config: ~macsypy.config.Config, logger: ~logging.Logger) tuple[list[System], list[RejectedCandidate]] [source]
- Parameters:
hits_by_replicon – the hits sort by replicon and position
models_to_detect – the models to search
config – MSF configuration
logger – the logger
- macsypy.scripts.macsyfinder._search_in_unordered_replicon(hits_by_replicon: dict[slice(<class 'str'>, list[macsypy.hit.CoreHit], None)], models_to_detect: list[~macsypy.model.Model], logger: ~logging.Logger) tuple[list[LikelySystem], list[UnlikelySystem]] [source]
- Parameters:
hits_by_replicon – the hits sort by replicon and position
models_to_detect – the models to search
logger – the logger
- macsypy.scripts.macsyfinder.alarm_handler(signum: Signals, frame) None [source]
Handle signal alarm flush loggers :param signum: :param frame: :raise: Timeout
- macsypy.scripts.macsyfinder.get_version_message() str [source]
- Returns:
the long description of the macsyfinder version
- macsypy.scripts.macsyfinder.likely_systems_to_tsv(models_fam_name: str, models_version: str, likely_systems: list[LikelySystem], hit_system_tracker: HitSystemTracker, sys_file: IO) None [source]
print likely systems occurrences (from unordered replicon) in a file in tabulated separeted value (tsv) format
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
likely_systems – list of systems found
hit_system_tracker – a filled HitSystemTracker.
sys_file – The file where to write down the systems occurrences
- Returns:
None
- macsypy.scripts.macsyfinder.likely_systems_to_txt(models_fam_name: str, models_version: str, likely_systems: list[LikelySystem], hit_system_tracker: HitSystemTracker, sys_file: IO)[source]
print likely systems occurrences (from unordered replicon) in a file in text human readable format
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
likely_systems – list of systems found
hit_system_tracker – a filled HitSystemTracker.
sys_file – file object
- Returns:
None
- macsypy.scripts.macsyfinder.list_models(args: Namespace) str [source]
- Parameters:
args – The command line argument once parsed
- Returns:
a string representation of all models and submodels installed.
- macsypy.scripts.macsyfinder.loners_to_tsv(models_fam_name: str, models_version: str, systems: list[System], sys_file: IO)[source]
get loners from valid systems and save them on file
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
systems – the systems from which the loners are extract
sys_file – the file where loners are saved
- macsypy.scripts.macsyfinder.main(args: list[str] | None = None, loglevel: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int | None = None)[source]
main entry point to MacSyFinder do some check before to launch
main_search_systems()
which is the real function that perform a search- Parameters:
args – the arguments passed on the command line without the program name
loglevel – the output verbosity
- macsypy.scripts.macsyfinder.multisystems_to_tsv(models_fam_name: str, models_version: str, systems: list[System], sys_file: IO)[source]
get multisystems from valid systems and save them on file
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
systems – the systems from which the loners are extract
sys_file – the file where multisystems are saved
- macsypy.scripts.macsyfinder.parse_args(args: list[str]) tuple[ArgumentParser, Namespace] [source]
- Parameters:
args – The arguments provided on the command line
- Returns:
The arguments parsed
- macsypy.scripts.macsyfinder.rejected_candidates_to_tsv(models_fam_name: str, models_version: str, rejected_candidates: list[RejectedCandidate], cand_file: IO, skipped_replicons: list[str] | None = None)[source]
print rejected clusters in a file
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
rejected_candidates – list of candidates which does not contitute a system
cand_file – The file where to write down the rejected candidates
skipped_replicons – the replicons name for which msf reach the timeout
- Returns:
None
- macsypy.scripts.macsyfinder.rejected_candidates_to_txt(models_fam_name: str, models_version: str, rejected_candidates: list[RejectedCandidate], cand_file: IO, skipped_replicons: list[str] | None = None)[source]
print rejected clusters in a file
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
rejected_candidates – list of candidates which does not contitute a system
cand_file – The file where to write down the rejected candidates
skipped_replicons – the replicons name for which msf reach the timeout
- Returns:
None
- macsypy.scripts.macsyfinder.search_systems(config: Config, model_registry: ModelRegistry, models_def_to_detect: list[DefinitionLocation], logger: Logger) tuple[list[System | LikelySystem], list[RejectedCandidate] | list[UnlikelySystem]] [source]
Do the job, this function is the orchestrator of all the macsyfinder mechanics at the end several files are produced containing the results
macsyfinder.conf: The set of variables used to runt this job
macsyfinder.systems: The list of the potential systems
- macsyfinder.rejected_cluster: The list of all clusters and clusters combination
which has been rejected and the reason
macsyfinder.log: the copy of the standard output
- Parameters:
config (
macsypy.config.Config
object) – The MacSyFinder Configurationmodel_registry (
macsypy.registries.ModelRegistry
object) – the registry of all modelsmodels_def_to_detect (list of
macsypy.registries.DefinitionLocation
objects) – the definitions to detectlogger (
colorlog.Logger
object) – The logger use to display information to the user. It must be initialized. seemacsypy.init_logger()
- Returns:
the systems and rejected clusters found
- Return type:
([
macsypy.system.System
, …], [macsypy.cluster.RejectedCandidate
, …])
- macsypy.scripts.macsyfinder.solutions_to_tsv(models_fam_name: str, models_version: str, solutions: list[Solution], hit_system_tracker: HitSystemTracker, sys_file: IO, skipped_replicons: list[str] | None = None) None [source]
print solution in a file in tabulated format A solution is a set of systems which represents an optimal combination of systems to maximize the score.
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
solutions – list of systems found
hit_system_tracker – a filled HitSystemTracker.
sys_file – The file where to write down the systems occurrences
skipped_replicons – the replicons name for which msf reach the timeout
- Returns:
None
- macsypy.scripts.macsyfinder.summary_best_solution(models_fam_name: str, models_version: str, best_solution_path: str, sys_file: IO, models_fqn: list[str], replicon_names: list[str], skipped_replicons: list[str] | None = None) None [source]
do a summary of best_solution in best_solution_path and write it on out_path a summary compute the number of system occurrence for each model and each replicon
replicon model_fqn_1 model_fqn_2 .... rep_name_1 1 2 rep_name_2 2 0
columns are separated by character
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
best_solution_path (str) – the path to the best_solution file in tsv format
sys_file – the file where to save the summary
models_fqn – the fully qualified names of the models
replicon_names – the names of the replicons used
skipped_replicons – the replicons name for which msf reach the timeout
- macsypy.scripts.macsyfinder.systems_to_tsv(models_fam_name: str, models_version: str, systems: list[System], hit_system_tracker: HitSystemTracker, sys_file: IO, skipped_replicons: list[str] | None = None) None [source]
print systems occurrences in a file in tabulated format
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
systems – list of systems found
hit_system_tracker – a filled HitSystemTracker.
sys_file – The file where to write down the systems occurrences
skipped_replicons – the replicons name for which msf reach the timeout
- Returns:
None
- macsypy.scripts.macsyfinder.systems_to_txt(models_fam_name: str, models_version: str, systems: list[System], hit_system_tracker: HitSystemTracker, sys_file: IO, skipped_replicons: list[str] | None = None) None [source]
print systems occurrences in a file in human-readable format
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
systems – list of systems found
hit_system_tracker – a filled HitSystemTracker.
sys_file – The file where to write down the systems occurrences
skipped_replicons – the replicons name for which msf reach the timeout
- Returns:
None
- macsypy.scripts.macsyfinder.unlikely_systems_to_txt(models_fam_name: str, models_version: str, unlikely_systems: list[UnlikelySystem], sys_file: IO)[source]
print hits (from unordered replicon) which probably does not make a system occurrences in a file in human readable format
- Parameters:
models_fam_name – the family name of the models (Conj, CrisprCAS, …)
models_version – the version of the models
unlikely_systems – list of
macsypy.system.UnLikelySystem
objectssys_file – The file where to write down the systems occurrences
- Returns:
None
macsydata
This is the entrypoint to the macsydata command macsydata allow the user to manage the MacSyFinder models
- macsypy.scripts.macsydata._find_all_installed_packages(models_dir: list[str] | None = None) ModelRegistry [source]
- Returns:
all models installed
- macsypy.scripts.macsydata._find_installed_package(pack_name: str, models_dir: list[str] | None = None) ModelLocation | None [source]
search if a package names pack_name is already installed
- Parameters:
pack_name – the name of the family model to search
- Returns:
The model location corresponding to the pack_name
- macsypy.scripts.macsydata._get_remote_available_versions(pack_name: str, org: str) list[str] [source]
Ask the organization org the available version for the package pack_name :param pack_name: the name of the package :param org: The remote organization to query :return: list of available version for the package
- macsypy.scripts.macsydata._search_in_desc(pattern: str, remote: RemoteModelIndex, packages: list[str], match_case: bool = False) tuple[str, str, str] [source]
- Parameters:
pattern – the substring to search packages descriptions
remote – the uri of the macsy-models index
packages – list of packages to search in
match_case – True if the search is case-sensitive, False otherwise
- Returns:
- macsypy.scripts.macsydata._search_in_pack_name(pattern: str, remote: RemoteModelIndex, packages: list[str], match_case: bool = False) list[tuple[str, str, dict]] [source]
- Parameters:
pattern – the substring to search packages names
remote – the uri of the macsy-models index
packages – list of packages to search in
match_case – True if the search is case-sensitive, False otherwise
- Returns:
- macsypy.scripts.macsydata.cmd_name(args: Namespace) str [source]
Return the name of the command being executed (scriptname + operation).
- Example
macsydata uninstall
- Parameters:
args – the arguments passed on the command line
- macsypy.scripts.macsydata.do_available(args: Namespace) None [source]
List Models available on macsy-models :param args: the arguments passed on the command line :return: None
- macsypy.scripts.macsydata.do_check(args: Namespace) None [source]
- Parameters:
args – the arguments passed on the command line
- Return type:
None
- macsypy.scripts.macsydata.do_cite(args: Namespace) None [source]
How to cite an installed model.
- Parameters:
args – the arguments passed on the command line
- macsypy.scripts.macsydata.do_download(args: Namespace) str [source]
Download tarball from remote models’ repository.
- Parameters:
args (
argparse.Namespace
object) – the arguments passed on the command line
- macsypy.scripts.macsydata.do_freeze(args: Namespace) None [source]
display all models installed with their respective version, in requirement format.
- Parameters:
args – the arguments passed on the command line
- macsypy.scripts.macsydata.do_help(args: Namespace) None [source]
Display on stdout the content of readme file if the readme file does not exist display a message to the user see
macsypy.package.help()
- Parameters:
args – the arguments passed on the command line (the package name)
- Returns:
None
- Raises:
ValueError – if the package name is not known.
- macsypy.scripts.macsydata.do_info(args: Namespace) None [source]
Show information about installed model.
- Parameters:
args – the arguments passed on the command line
- Raises:
ValueError – if the package is not found locally
- macsypy.scripts.macsydata.do_init_package(args: Namespace) None [source]
Create a template for data package
skeleton for metadata.yml
definitions directory with a skeleton of models.xml
profiles directory
skeleton for README.md file
COPYRIGHT file (if holders option is set)
LICENSE file (if license option is set)
- Parameters:
args – The parsed commandline subcommand arguments
- Returns:
None
- macsypy.scripts.macsydata.do_install(args: Namespace) None [source]
Install new models in macsyfinder local models repository.
- Parameters:
args – the arguments passed on the command line
- Raises:
RuntimeError – if there is problem is installed package
ValueError – if the package and/or version is not found
- macsypy.scripts.macsydata.do_list(args: Namespace) None [source]
List installed models.
- Parameters:
args – the arguments passed on the command line
- macsypy.scripts.macsydata.do_search(args: Namespace) None [source]
Search macsy-models for Model in a remote index. by default search in package name, if option -S is set search also in description by default the search is case-insensitive except if option –match-case is set.
- Parameters:
args – the arguments passed on the command line
- macsypy.scripts.macsydata.do_show_definition(args: Namespace) None [source]
display on stdout the definition if only a package or sub-package is specified display all model definitions in the corresponding package or subpackage
for instance
TXSS+/bacterial T6SSii T6SSiii
display models TXSS+/bacterial/T6SSii and TXSS+/bacterial/T6SSiii
TXSS+/bacterial all or TXSS+/bacterial
display all models contains in TXSS+/bacterial subpackage
- Parameters:
args – the arguments passed on the command line
- macsypy.scripts.macsydata.do_uninstall(args: Namespace) None [source]
Remove models from macsyfinder local models repository.
- Parameters:
args – the arguments passed on the command line
- Raises:
ValueError – if the package is not found locally
- macsypy.scripts.macsydata.get_version_message() str [source]
- Returns:
the long description of the macsyfinder version
- Return type:
str
- macsypy.scripts.macsydata.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True) Logger [source]
- Parameters:
level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’
out – if the log message must be displayed
- Returns:
logger
macsyconfig
Entrypoint for macsyconfig command which generate a MacSyFinder config file
- class macsypy.scripts.macsyconfig.ConfigParserWithComments(defaults=None, dict_type=<class 'dict'>, allow_no_value=False, *, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section='DEFAULT', interpolation=<object object>, converters=<object object>)[source]
Extend ConfigParser to allow comment in serialization
- add_comment(section: str, option: str, comment: str, comment_nb: int = count(1), add_space_before: bool = False, add_space_after: bool = True) None [source]
Write a comment in .ini-format (start line with #)
- Parameters:
section – the name of the section
option – the name of the option
comment – the comment linked to this option
comment_nb – the identifier of the comment by default an integer
add_space_before –
add_space_after –
- class macsypy.scripts.macsyconfig.Theme(ERROR: str = '\x1b[1m\x1b[31m', WARN: str = '\x1b[33m', SECTION: str = '\x1b[35m', RESET: str = '\x1b[0m', RETRY: str = '\x1b[33m', QUESTION: str = '\x1b[32m', EMPHASIZE: str = '\x1b[1m', EXPLANATION: str = '\x1b[0m', DEFAULT: str = '\x1b[1m\x1b[32m')[source]
Handle color combination to highlight interactive question
- __delattr__(name)
Implement delattr(self, name).
- __eq__(other)
Return self==value.
- __hash__()
Return hash(self).
- __init__(ERROR: str = '\x1b[1m\x1b[31m', WARN: str = '\x1b[33m', SECTION: str = '\x1b[35m', RESET: str = '\x1b[0m', RETRY: str = '\x1b[33m', QUESTION: str = '\x1b[32m', EMPHASIZE: str = '\x1b[1m', EXPLANATION: str = '\x1b[0m', DEFAULT: str = '\x1b[1m\x1b[32m') None
- __repr__()
Return repr(self).
- __setattr__(name, value)
Implement setattr(self, name, value).
- __weakref__
list of weak references to the object (if defined)
- macsypy.scripts.macsyconfig._validator(cast_func: Callable, raw: Any, default: Any, sequence: bool = False) Any [source]
- Parameters:
cast_func – the function which will cast the raw value
raw – the raw value
default – the default value
sequence – True if the value is a sequence, False otherwise
- Returns:
The cast Value
- Raises:
MacsypyError – if the raw value cannot be cast
- macsypy.scripts.macsyconfig.ask(question: str, validator: Callable, default: Any | None = None, expected: Any | None = None, explanation: str = '', sequence: bool = False, question_color: str | None = None, retry: int = 2)[source]
ask a question on the terminal and return the user response check if the user response is allowed (right type, among allowed values, …)
- Parameters:
question – The question to prompt to the user on the terminal
validator – what validator to be used to check the user response
default – the default value
expected – the values allowed (can be a list of value
explanation – some explanation about the option
sequence – True if the parameter accept a sequence of value (comma separated values)
question_color – the color of the question display to the user
retry – The number of time to repeat the question if the response is rejected
- Returns:
the value casted in right type
- macsypy.scripts.macsyconfig.check_bool(raw: str, default: bool, expected, sequence: bool = False) bool [source]
Check if value can be cast in str
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – not used here to have the same signature for all check_xxx functions
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.check_choice(raw: str, default: str, expected: list[str], sequence: bool = False) str [source]
Check if value is in list of expected values
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – the allowed values for this option
sequence – True if parameter accept a sequence of value, False otherwise
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.check_dir(raw: str, default: str, expected, sequence: bool = False) str [source]
Check if value point to a directory
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – not used here to have the same signature for all check_xxx functions
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.check_exe(raw: str, default: str, expected, sequence: bool = False) str [source]
Check if value point to an executable
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – not used here to have the same signature for all check_xxx functions
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.check_file(raw: str, default: str, expected, sequence: bool = False) str [source]
Check if value point to a file
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – not used here to have the same signature for all check_xxx functions
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.check_float(raw: str, default: float, expected, sequence: bool = False) float [source]
Check if value can be cast in float
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – not used here to have the same signature for all check_xxx functions
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.check_positive_int(raw: str, default: int, expected, sequence: bool = False) int [source]
Check if value can be cast in integer >=0
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – not used here to have the same signature for all check_xxx functions
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.check_str(raw: str, default: str, expected, sequence: bool = False) str [source]
Check if value can be cast in str
- Parameters:
raw – the value return by the user
default – the default value for the option
expected – not used here to have the same signature for all check_xxx functions
- Returns:
value
- Raises:
MacsypyError – if the value cannot be cast in right type
- macsypy.scripts.macsyconfig.epilog(path: str) str [source]
- Returns:
the text to the user before to start the configuration
- macsypy.scripts.macsyconfig.main(args: list[str] | None = None) None [source]
The main entrypoint of the script
- Parameters:
args – the command line arguments.
- macsypy.scripts.macsyconfig.parse_args(args: list[str]) Namespace [source]
parse command line
- Parameters:
args – the command line arguments
- Returns:
- macsypy.scripts.macsyconfig.prolog() str [source]
- Returns:
the text displayed to the user when the configuration file is generated
- macsypy.scripts.macsyconfig.serialize(config: ConfigParserWithComments, path: str) None [source]
save the configuration on file
- Parameters:
config – the config to save
path (str) – where to store the configuration
- macsypy.scripts.macsyconfig.set_base_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None [source]
Options for base section
- Parameters:
config – The config to setup
defaults – the macsyfinder defaults values
use_defaults (bool) – If True do not ask any question use the defaults values
- macsypy.scripts.macsyconfig.set_general_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None [source]
Options for general section
- Parameters:
config – The config to setup
defaults – the macsyfinder defaults values
use_defaults (bool) – If True do not ask any question use the defaults values
- macsypy.scripts.macsyconfig.set_hmmer_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None [source]
Options for hmmer section
- Parameters:
config – The config to setup
defaults – the macsyfinder defaults values
use_defaults (bool) – If True do not ask any question use the defaults values
- macsypy.scripts.macsyconfig.set_path_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None [source]
Options for directories section
- Parameters:
config – The config to setup
defaults – the macsyfinder defaults values
use_defaults (bool) – If True do not ask any question use the defaults values
- macsypy.scripts.macsyconfig.set_score_options(config: ConfigParserWithComments, defaults: MacsyDefaults, use_defaults: bool = False) None [source]
Options for scoring section
- Parameters:
config – The config to setup
defaults – the macsyfinder defaults values
use_defaults (bool) – If True do not ask any question use the defaults values
- macsypy.scripts.macsyconfig.set_section(sec_name: str, options: dict[slice(<class 'str'>, typing.Any, None)], config: ~macsypy.scripts.macsyconfig.ConfigParserWithComments, defaults: ~macsypy.config.MacsyDefaults, use_defaults: bool = False) ConfigParserWithComments [source]
iter over options of a section ask question for each option and set this option in the config
- Parameters:
sec_name – the name of the section
options – a dictionnary with the options to set up for this section
config – The config to fill in.
defaults – the macsyfinder defaults values
use_defaults – The user skip this section so use defaults to set in config object
- Returns:
configuration
macsyprofile
- class macsypy.scripts.macsyprofile.HmmProfile(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]
Handle the HMM output files
- __init__(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]
- Parameters:
gene_name – the name of the gene corresponding to the profile search reported here
hmmer_output – The path to the raw Hmmer output file
cfg – the configuration object
- __weakref__
list of weak references to the object (if defined)
- _build_my_db(hmm_output: str) dict[slice(<class 'str'>, None, None)] [source]
Build the keys of a dictionary object to store sequence identifiers of hits.
- Parameters:
hmm_output – the path to the hmmsearch output to parse.
- Returns:
a dictionary containing a key for each sequence id of the hits
- _fill_my_db(db: dict[slice(<class 'str'>, tuple[int, int], None)]) None [source]
Fill the dictionary with information on the matched sequences
- Parameters:
db – the database containing all sequence id of the hits.
- _hit_start(line: str) bool [source]
- Parameters:
line – the line to parse
- Returns:
True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise
- _parse_hmm_body(hit_id: str, gene_profile_lg: int, seq_lg: int, coverage_threshold: float, replicon_name: str, position_hit: int, i_evalue_sel: float, b_grp: list[list[str]]) list[CoreHit] [source]
Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)
- Parameters:
hit_id – the sequence identifier
gene_profile_lg – the length of the profile matched
seq_lg – the length of the sequence
coverage_threshold – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection.
replicon_name – the identifier of the replicon
position_hit – the rank of the sequence matched in the input dataset file
i_evalue_sel – the maximal i-evalue (independent evalue) for hit selection
b_grp – the Hmmer output lines to deal with (grouped by hit)
- Returns:
a sequence of hits
- class macsypy.scripts.macsyprofile.LightHit(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int)[source]
Handle hmm hits
- __eq__(other)
Return self==value.
- __hash__ = None
- __init__(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int) None
- __repr__()
Return repr(self).
- __weakref__
list of weak references to the object (if defined)
- macsypy.scripts.macsyprofile.get_gene_name(path: str, suffix: str) str [source]
- Parameters:
path – The path to the hmm output to analyse
suffix – the suffix of the hmm output file
- Returns:
the name of the analysed gene
- macsypy.scripts.macsyprofile.get_profile_len(path: str) int [source]
Parse the HMM profile to extract the length and the presence of GA bit threshold
- Parameters:
path – The path to the hmm profile used to produce the hmm search output to analyse
- Returns:
the length, presence of ga bit threshold
- macsypy.scripts.macsyprofile.get_version_message() str [source]
- Returns:
the long description of the macsyfinder version
- macsypy.scripts.macsyprofile.header(cmd: list[str]) str [source]
- Parameters:
cmd – the command use dto launch this analyse
- Returns:
The header of the result file
- macsypy.scripts.macsyprofile.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True)[source]
- Parameters:
level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’
out – if the log message must be displayed
- Returns:
logger
- macsypy.scripts.macsyprofile.main(args: list[str] | None = None, log_level: str | int | None = None) None [source]
main entry point to macsyprofile
- Parameters:
args – the arguments passed on the command line without the program name
log_level – the output verbosity