What’s new in MacSyFinder v2?¶
A group of hits that respect the distance constraints but each hit represent the same gene on the model, is not considered as a cluster.
A group of hits that respect the distance constraints but all hits represent a Neutral gene in model, is not considered as a cluster.
If a replicon is skipped due to timeout during best_solution phase. The results corresponding to this replicon are not produced, but a warning indicating that msf skip this replicon appear in outputs.
Update MSF citation, fix minor bugs and add add few features
Force MSF run even the out dir already exists and is not empty. Use this option with caution, MSF will erase everything in out dir before to run. https://github.com/gem-pasteur/macsyfinder/issues/61
Macsyfinder with python subprocess kill main process on error¶
If an error occurred during HMM phase, all processes were killed as well the mother process but MSF stoped with an ugly traceback. https://github.com/gem-pasteur/macsyfinder/issues/60
In Gembase format parsing¶
The genes were not well grouped by contigs for draft genomes.
Cannot join current thread error during unit tests phase¶
Sometimes the testsuite failed with the following error: “cannot join current thread” https://github.com/gem-pasteur/macsyfinder/issues/58
Patch macsydata to fix CVE-2007-4559 https://github.com/gem-pasteur/macsyfinder/pull/57
Squash cluster of loners¶
If a cluster is made up with only loners, then the hits are treated by MSF as loners and not as regular cluster.
New option –timeout¶
In some case msf can take a long time to find the best solution (in ‘gembase’ and ‘ordered_replicon mode’). The timeout is per replicon. If this step reach the timeout, the replicon is skipped (for gembase mode the analyse of other replicons continue). NUMBER[SUFFIX] NUMBER seconds. SUFFIX may be ‘s’ for seconds (the default), ‘m’ for minutes, ‘h’ for hours or ‘d’ for days for instance 1h2m3s means 1 hour 2 min 3 sec. NUMBER must be an integer.
For Version 2, MacSyFinder was carried under Python 3.
New features and search engine¶
MacSyFinder v2 is a major release. The search engine was changed for a more intuitive and comprehensive exploration of putative systems.
The search is now more thorough and avoid undesirable side-effects of the previous search engine. Being more thorough, it now also includes a scoring scheme to build candidate systems from sets of detected components (clusters), and can offer several optimal “solutions” (sets of detected systems) based on a combinatorial exploration of detected clusters. See here for more details.
The search engine being different, one might want to check that models carried from v1 to v2 have the expected behaviour.
Several new features were added, including:
- a new type of gene component “neutral” was added in order to provide more possibilities for systems’ modelling in macsy-models. See here for more details.
- a new component feature was introduced: “multi-model”, that corresponds to components that are allowed to participate in occurrences of systems from different models. See here for more.
- more flexibility was introduced in the search for systems’ components using HMMER. It is now possible to use the cut_ga threshold when provided in the HMM profiles used for components’ similarity search. This enables to have a search tailored for each HMM profile, and thus component. See here for more details.
- a new file structure was created to better organize MacSyFinder’s packages (i.e. that include systems’ models and corresponding HMMER profiles). See here for details.
- a tool to easily install and distribute MacSyFinder’s packages was created. See here for more details on macsydata.
- the format for MacSyFinder’s models has slightly changed, in order to offer more possibilities, and more readibility. To see how to carry models from v1 to v2, visit here.
Also, the search modes corresponding to “unordered” and “unordered_replicon” were merged into the “unordered” search mode - as they basically correspond to the same behaviour.
In v2, output files were also re-defined. See here for more details.
MacSyFinder v2 no longer requires the formatdb or makeblastdb tools from NCBI. However, new dependencies are used, but as they are Python libraries, it should be transparent for the user, and not require manual installations. See here for details.
Models are more formalized¶
The models data are more formalized, with a well defined structure. For instance the definitions and profiles must be packed together in what we call a macsy-model package If you intend to model new systems please refer to the Modeller Guide.
We now provide a new tool to manage the models. See Models installation with macsydata.
The modeler can provide some spcific configuration values released along the model package. See Model configuration.
Modeller helper tool¶
To help modellers create new models we provide a new helper tool macsyprofile, which analyses HMMER raw output files from results of a previous MacSyFinder run, to provide information on all hits even if filtered out. See macsyprofile.
Models installation with macsydata provide also some options to help the modeller as
- macsydata init to init a new model package.
- macsydata check to check the integrity of a model package, before to use/publish it.