hit

A Hit is created when hmmsearch find similarities between a profile and protein of the input dataset

../../_images/gene_obj_interaction.svg

A diagram showing the interaction between CoreGene, ModelGene, Model, HIt, ValidHit interactions The diagram above represents the models, genes and hit generated from the definitions below.

hit

class macsypy.hit.Hit(gene, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]

Handle the hits filtered from the Hmmer search. The hits are instanciated by HMMReport.extract() method

__eq__(other)[source]

Return True if two hits are totally equivalent, False otherwise.

Parameters:other (macsypy.report.Hit object) – the hit to compare to the current object
Returns:the result of the comparison
Return type:boolean
__gt__(other)[source]

compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.

Parameters:other (macsypy.report.Hit object) – the hit to compare to the current object
Returns:True if self is > other, False otherwise
__hash__()[source]

To be hashable, it’s needed to be put in a set or used as dict key

__init__(gene, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]
Parameters:
  • gene (macsypy.gene.CoreGene object) – the gene corresponding to this profile
  • hit_id (str) – the identifier of the hit
  • hit_seq_length (int) – the length of the hit sequence
  • replicon_name (str) – the name of the replicon
  • position_hit (int) – the rank of the sequence matched in the input dataset file
  • i_eval (float) – the best-domain evalue (i-evalue, “independent evalue”)
  • score (float) – the score of the hit
  • profile_coverage (float) – percentage of the profile that matches the hit sequence
  • sequence_coverage (float) – percentage of the hit sequence that matches the profile
  • begin_match (int) – where the hit with the profile starts in the sequence
  • end_match (int) – where the hit with the profile ends in the sequence
__lt__(other)[source]

Compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.

Parameters:other (macsypy.report.Hit object) – the hit to compare to the current object
Returns:True if self is < other, False otherwise
__str__()[source]
Returns:Useful information on the Hit: regarding Hmmer statistics, and sequence information
Return type:str
__weakref__

list of weak references to the object (if defined)

get_position()[source]
Returns:the position of the hit (rank in the input dataset file)
Return type:integer
class macsypy.hit.HitWeight(itself: float = 1, exchangeable: float = 0.8, mandatory: float = 1, accessory: float = 0.5, neutral: float = 0, loner_multi_system: float = 0.7)[source]

The weight to compute the cluster and system score see user documentation macsyfinder functionning for further details by default

  • itself = 1
  • exchangeable = 0.8
  • mandatory = 1
  • accessory = 0.5
  • neutral = 0
  • loner_multi_system = 0.7
__weakref__

list of weak references to the object (if defined)

class macsypy.hit.ValidHit(hit, gene_ref, gene_status)[source]

Encapsulates a macsypy.report.Hit This class stores a Hit that has been attributed to a putative system. Thus, it also stores:

  • the system,
  • the status of the gene in this system, (‘mandatory’, ‘accessory’, …
  • the gene in the model for which it’s an occurrence
__eq__(other)[source]

Return self==value.

__gt__(other)[source]

Return self>value.

__init__(hit, gene_ref, gene_status)[source]
Parameters:
__lt__(other)[source]

Return self<value.

__weakref__

list of weak references to the object (if defined)

loner
Returns:True if the hit represent a loner macsypy.Gene.ModelGene, False otherwise.
multi_system
Returns:True if the hit represent a multi_systems macsypy.Gene.ModelGene, False otherwise.
macsypy.hit.get_best_hits(hits, key='score')[source]

If several hits match the same protein, keep only the best match based either on

  • score
  • i_evalue
  • profile_coverage
Parameters:
  • hits ([ macsypy.hit.Hit object, …]) – the hits to filter, all hits must match the same protein.
  • key (str) – The criterion used to select the best hit ‘score’, i_evalue’, ‘profile_coverage’
Returns:

the list of the best hits

Return type:

[ macsypy.hit.Hit object, …]