MacSyFinder Quick Start¶
We recommend to install MacSyFinder using pip in a virtual environment (for further details see Installation).
python3 -m venv MacSyFinder cd MacSyFinder source bin/activate pip install macsyfinder
hmmsearch from the HMMER package (http://hmmer.org/) must be installed.
Prepare your data. You need a file containing all protein sequences of your genome of interest in fasta format (for further details see Input dataset).
You need to have models to search in your input data. Please refer to Macromolecular models to create your own package of models. We will soon provide a set of predefined models for you to test.
To see all options available. All command-line options are described in the Command-line options section. In order to run MacSyFinder on your favorite dataset as soon as you have installed it, you can simply follow the following steps:
On a “metagenomic” (unordered) dataset for example:
macsyfinder --db-type unordered --sequence-db metagenome.fasta --models model_family all
will detect all models of model_family modelled in .xml files placed in the “my-models” folder without taking into account any gene order.
On a completely assembled genome (where the gene order is known, and is relevant for systems’ detection):
macsyfinder --db-type ordered_replicon --sequence-db mygenome.fasta --models model_family ModelA ModelB
will detect the macromolecular systems described in the two models “ModelA” and “ModelB” in a complete genome from the “ModelA.xml” and “ModelB.xml” definition files placed in the folder “my-models/model_family/definitions”.
If you want to run the same analysis as above but with models not installed by macsydata:
macsyfinder --db-type ordered_replicon --sequence-db mygenome.fasta --models-dir my-models --models model_family ModelA ModelB
my-models is the directory containing the model packages. The models must follow the macsy-models package structure.
Systems names have to be spelled in a case-sensitive way to run their detection from the command-line. The name of the System corresponds to the suffix defined for xml files (.xml by default), for example “toto” for a model defined in “toto.xml”.
The “all” keyword allows to detect all models available in the definitions folder in a single run. See the Command-line options.