Hidden Markov model speed heuristic and iterative HMM search procedure

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases.

Results

We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K.

Conclusions

Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

Hidden Markov models for detecting remote protein homologies.

C L Barrett, Jeffery Hughey, K Karplus (1997)

A new hidden Markov model method (SAM-T98) for finding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (HMM) from the sequence and homologs found using the HMM for database search. SAM-T98 is also used to construct model libraries automatically from sequences in structural databases. We evaluate the SAM-T98 method with four datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against WU-BLASTP and against DOUBLE-BLAST, a two-step method similar to ISS, but using BLAST instead of FASTA. SAM-T98 had the fewest errors in all tests-dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP (Structural Classification of Proteins)-domains test, SAM-T98 got 880 true positives and 68 false positives, DOUBLE-BLAST got 533 true positives with 71 false positives, and WU-BLASTP got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to find family or fold relationships. One key to the performance of the HMM method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model. A World Wide Web server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite, can be found at http://www.cse.ucsc.edu/research/compbi o/ karplus@cse.ucsc.edu; http://www.cse.ucsc.edu/karplus

0 comments Cited 130 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The ASTRAL compendium for protein structure and sequence analysis.

S E Brenner (2000)

The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. The SPACI scores included in the system summarize the overall characteristics of a protein structure. A structural alignments database indicates residue equivalencies in superimposed protein domain structures. The PDB sequence-map files provide a linkage between the amino acid sequence of the molecule studied (SEQRES records in a database entry) and the sequence of the atoms experimentally observed in the structure (ATOM records). These maps are combined with information in the SCOPdatabase to provide sequences of protein domains. Selected subsets of the domain database, with varying degrees of similarity measured in several different ways, are also available. ASTRALmay be accessed at http://astral.stanford.edu/

0 comments Cited 119 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

SCOP: a structural classification of proteins database.

L Lo Conte, B Ailey, T J Hubbard … (2000)

The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and distant evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database so far. The sequences of proteins in SCOP provide the basis of the ASTRAL sequence libraries that can be used as a source of data to calibrate sequence search algorithms and for the generation of statistics on, or selections of, protein structures. Links can be made from SCOP to PDB-ISL: a library containing sequences homologous to proteins of known structure. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL http://scop.mrc-lmb.cam.ac.uk/scop/

0 comments Cited 116 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2010

Publication date (Electronic): 18 August 2010

Volume: 11

Page: 431

Affiliations

[1 ]Department of Immunology and Pathology, Washington University School of Medicine, St. Louis, Missouri, USA

[2 ]Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, USA

[3 ]School of Computer Science & Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel

Article

Publisher ID: 1471-2105-11-431

DOI: 10.1186/1471-2105-11-431

PMC ID: 2931519

PubMed ID: 20718988

SO-VID: d2c386d4-27c0-4a28-9629-fea0109ee894

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 28 April 2010

Date accepted : 18 August 2010

Comments

Comment on this article

scite_

Cited by 532

Accelerated Profile HMM Searches
Authors: Sean R. Eddy
HMMER web server: 2015 update
Authors: Robert D. Finn, Jody Clements, William Arndt …
The Pfam protein families database
Authors: Marco Punta, Penny C. Coggill, Ruth Y. Eberhardt …

See all cited by

Most referenced authors 795

See all reference authors

Hidden Markov model speed heuristic and iterative HMM search procedure

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genetoberfest

Most cited references 11

Hidden Markov models for detecting remote protein homologies.

The ASTRAL compendium for protein structure and sequence analysis.

SCOP: a structural classification of proteins database.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 148

Cited by 532

Most referenced authors 795