+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PSI-Search: iterative HOE-reduced profile SSEARCH searching

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Summary: Iterative similarity searches with PSI-BLAST position-specific score matrices (PSSMs) find many more homologs than single searches, but PSSMs can be contaminated when homologous alignments are extended into unrelated protein domains—homologous over-extension (HOE). PSI-Search combines an optimal Smith–Waterman local alignment sequence search, using SSEARCH, with the PSI-BLAST profile construction strategy. An optional sequence boundary-masking procedure, which prevents alignments from being extended after they are initially included, can reduce HOE errors in the PSSM profile. Preventing HOE improves selectivity for both PSI-BLAST and PSI-Search, but PSI-Search has ~4-fold better selectivity than PSI-BLAST and similar sensitivity at 50% and 60% family coverage. PSI-Search is also produces 2- for 4-fold fewer false-positives than JackHMMER, but is ~5% less sensitive.

          Availability and implementation: PSI-Search is available from the authors as a standalone implementation written in Perl for Linux-compatible platforms. It is also available through a web interface ( and SOAP and REST Web Services (

          Contact: pearson@ ; rodrigo.lopez@

          Related collections

          Most cited references 12

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Hidden Markov model speed heuristic and iterative HMM search procedure

            Background Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. Results We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. Conclusions Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.
              • Record: found
              • Abstract: found
              • Article: not found

              Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

              The sensitivity and selectivity of the FASTA and the Smith-Waterman protein sequence comparison algorithms were evaluated using the superfamily classification provided in the National Biomedical Research Foundation/Protein Identification Resource (PIR) protein sequence database. Sequences from each of the 34 superfamilies in the PIR database with 20 or more members were compared against the protein sequence database. The similarity scores of the related and unrelated sequences were determined using either the FASTA program or the Smith-Waterman local similarity algorithm. These two sets of similarity scores were used to evaluate the ability of the two comparison algorithms to identify distantly related protein sequences. The FASTA program using the ktup = 2 sensitivity setting performed as well as the Smith-Waterman algorithm for 19 of the 34 superfamilies. Increasing the sensitivity by setting ktup = 1 allowed FASTA to perform as well as Smith-Waterman on an additional 7 superfamilies. The rigorous Smith-Waterman method performed better than FASTA with ktup = 1 on 8 superfamilies, including the globins, immunoglobulin variable regions, calmodulins, and plastocyanins. Several strategies for improving the sensitivity of FASTA were examined. The greatest improvement in sensitivity was achieved by optimizing a band around the best initial region found for every library sequence. For every superfamily except the globins and immunoglobulin variable regions, this strategy was as sensitive as a full Smith-Waterman. For some sequences, additional sensitivity was achieved by including conserved but nonidentical residues in the lookup table used to identify the initial region.

                Author and article information

                Oxford University Press
                15 June 2012
                25 April 2012
                25 April 2012
                : 28
                : 12
                : 1650-1651
                1EMBL – European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 2Department of Biochemistry and Molecular Genetics, Charlottesville, VA 22908, USA
                Author notes
                * To whom correspondence should be addressed.

                Associate Editor: Alfonso Valencia

                © The Author(s) 2012. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Page count
                Pages: 2
                Applications Note
                Sequence Analysis

                Bioinformatics & Computational biology


                Comment on this article