report

A “HMMReport” object represents the results of a Hmmer program search on a dataset with a hidden Markov model protein profile (see this section). This object has methods to extract and filter Hmmer raw outputs (see generated output files), and then build Hits relevant for system detection. For matches selected with the filtering parameters, “Hit” objects (macsypy.HMMReport.Hit) are built.

report API reference

HMMReport

class macsypy.report.HMMReport(gene, hmmer_output, cfg)[source]

Handle the results from the HMM search. Extract a synthetic report from the raw hmmer output, after having applied a hit filtering. This class is an abstract class. There are two implementations of this abstract class depending on whether the input sequence dataset is “ordered” (“gembase” or “ordered_replicon” db_type) or not (“unordered” db_type).

__init__(gene, hmmer_output, cfg)[source]
Parameters:
  • gene (macsypy.gene.CoreGene object) – the gene corresponding to the profile search reported here

  • hmmer_output (string) – The path to the raw Hmmer output file

  • cfg (macsypy.config.Config object) – the configuration object

__str__()[source]
Returns:

string representation of this report

Return type:

str

__weakref__

list of weak references to the object (if defined)

_build_my_db(hmm_output)[source]

Build the keys of a dictionary object to store sequence identifiers of hits.

Parameters:

hmm_output (string) – the path to the hmmsearch output to parse.

Returns:

a dictionary containing a key for each sequence id of the hits

Return type:

dict

_fill_my_db(db)[source]

Fill the dictionary with information on the matched sequences

Parameters:

db (dict) – the database containing all sequence id of the hits.

abstract _get_replicon_name(hit_id)[source]

This method is used by extract method and must be implemented by concrete class

Parameters:

hit_id (str) – the id of the current hit extract from hmm output.

Returns:

The name of the replicon

_hit_start(line)[source]
Parameters:

line (string) – the line to parse

Returns:

True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise

Return type:

boolean.

_parse_hmm_body(hit_id, gene_profile_lg, seq_lg, coverage_threshold, replicon_name, position_hit, i_evalue_sel, b_grp)[source]

Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)

Parameters:
  • hit_id (str) – the sequence identifier

  • gene_profile_lg (int) – the length of the profile matched

  • coverage_threshold (float) – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection.

  • replicon_name (str) – the identifier of the replicon

  • position_hit (int) – the rank of the sequence matched in the input dataset file

  • i_evalue_sel (float) – the maximal i-evalue (independent evalue) for hit selection

  • b_grp (list of list of strings) – the Hmmer output lines to deal with (grouped by hit)

Paramint seq_lg:

the length of the sequence

Returns:

a sequence of hits

Return type:

list of macsypy.report.CoreHit objects

_parse_hmm_header(h_grp)[source]
Parameters:

h_grp (sequence of string (<itertools._grouper object at 0x7ff9912e3b50>)) – the sequence of string return by groupby function representing the header of a hit

Returns:

the sequence identifier from a set of lines that corresponds to a single hit

Return type:

string

best_hit()[source]

Return the best hit among multiple hits

extract()[source]

Parse the output file of hmmer compute from an unordered genes base and produced a new synthetic report file.

save_extract()[source]

Write the string representation of the extract report in a file. The name of this file is the concatenation of the gene name and of the “res_extract_suffix” from the config object

GeneralHMMReport

class macsypy.report.GeneralHMMReport(gene, hmmer_output, cfg)[source]

Handle HMM report. Extract a synthetic report from the raw hmmer output. Dedicated to any type of ‘unordered’ datasets.

_get_replicon_name(hit_id)[source]

This method is used by extract method and must be implemented by concrete class

Parameters:

hit_id (str) – the id of the current hit extract from hmm output.

Returns:

The name of the replicon

OrderedHMMReport

class macsypy.report.OrderedHMMReport(gene, hmmer_output, cfg)[source]

Handle HMM report. Extract a synthetic report from the raw hmmer output. Dedicated to ‘ordered_replicon’ datasets.

_get_replicon_name(hit_id)[source]

This method is used by extract method and must be implemented by concrete class

Parameters:

hit_id (str) – the id of the current hit extract from hmm output.

Returns:

The name of the replicon

GembaseHMMReport

class macsypy.report.GembaseHMMReport(gene, hmmer_output, cfg)[source]

Handle HMM report. Extract a synthetic report from the raw hmmer output. Dedicated to ‘gembase’ format datasets.

_get_replicon_name(hit_id)[source]

This method is used by extract method and must be implemented by concrete class

Parameters:

hit_id (str) – the id of the current hit extract from hmm output.

Returns:

The name of the replicon