MacSyFinder Quick Start

We recommend to install MacSyFinder using pip in a virtual environment (for further details see Installation).
```
python3 -m venv MacSyFinder
cd MacSyFinder
source bin/activate
pip install macsyfinder
```
Warning

hmmsearch from the HMMER package (http://hmmer.org/) must be installed.
Prepare your data. You need a file containing all protein sequences of your genome of interest in a FASTA file (for further details see Input dataset). In the best case scenario, they would be ordered as the corresponding genes are ordered along the replicons.
You need to install, or make available to MacSyFinder the models to search in your input genome data. Please refer to Macromolecular models to create your own package of models. Otherwise, macsy-models contributed by the community are available here: https://github.com/macsy-models and can be retrieved and installed using the msf_data command, installed as part of the MacSyFinder suite.
Command lines:
- Type: macsyfinder -h
  
  To see all the options available. All command-line options are described in the Command-line options section. In order to run MacSyFinder on your favorite dataset as soon as you have installed the macsy-model of interest, you can simply follow the following steps:
- Install the macsy-models of interest from the Macsy Models repository:
  
  msf_data install some-public-models
- On a “unordered” genome dataset:
  
  macsyfinder --db-type unordered --sequence-db unordered_genome.fasta --models model_family all
  
  will search for systems corresponding to all the models of model_family modeled in .xml files shipped with the “some-public-models” macsy-model package, without taking into account the gene order.
- On a completely assembled genome (where the gene order is known):
  
  macsyfinder --db-type ordered_replicon --sequence-db mygenome.fasta --models-dir my-models --models model_family ModelA ModelB
  
  will detect the macromolecular systems described in the two models “ModelA” and “ModelB” in a complete genome from the “ModelA.xml” and “ModelB.xml” definition files placed in the folder “my-models/model_family/definitions”.
- If you want to run the same analysis as above but with local macsy-models not installed by msf_data:
  
  macsyfinder --db-type ordered_replicon --sequence-db mygenome.fasta --models-dir my-models --models model_family ModelA ModelB
  
  my-models is the directory containing the macsy-model packages. NB: The models must follow the macsy-models package structure.

Note

Systems names have to be spelled in a case-sensitive way to run their detection from the command-line. The name of the System corresponds to the suffix defined for xml files (.xml by default), for example “toto” for a model defined in “toto.xml”.

The “all” keyword allows to detect all models available in the definitions folder in a single run. See the Command-line options.

An example data set

We provide here an example dataset comprising a replicon and the output files expected with MacSyFinder, release 2.0 when running the TXSScan macsy-models. The genomic dataset consists in the complete sequence of chromosome I from Vibrio cholerae O1 biovar El Tor str. N16961 (published here: https://pubmed.ncbi.nlm.nih.gov/10952301/).

The chromosome to annotate is presented as a multi-FASTA file of the proteins ordered as the genes encoding them. An annotation of the protein secretion systems and appendages was run on the genome, using the macsyfinder set of models (“macsy-model”) TXSScan, V1.1.1 in the case of these examples. There are two output files offered, the one expected with the “ordered” genome mode of annotation, and the other with the “unordered” mode of genome annotation. The following command lines were used to obtain the output files:

1. The genome is downloaded from here. It will serve as an input file in the next command-line examples.

2. The TXSScan models for annotation of secretion systems are installed. The command line is the following:

msf_data install TXSScan # Installs the latest version of TXSScan

MacSyFinder is run on the genome, here using 8 workers for the HMM search (“-w 8” option):

In “ordered” mode:

macsyfinder --sequence-db VICH001.B.00001.C001.fasta -o macsyfinder_TXSScan_VICH001_ordered --models TXSScan all --db-type ordered_replicon -w 8 # specified output folder: macsyfinder_TXSScan_VICH001_ordered

In “unordered” mode:

macsyfinder --sequence-db VICH001.B.00001.C001.fasta -o macsyfinder_TXSScan_VICH001_unordered --models TXSScan all --db-type unordered -w 8 # specified output folder: macsyfinder_TXSScan_VICH001_unordered

The documentation on the generated output files can be consulted here. See also our FAQ: What search mode to be used?

Note

A more comprehensive example of genome datasets with dedicated command lines and expected output files can be found here.