35
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We study the problem of local alignment, which is finding pairs of similar subsequences with gaps. The problem exists in biosequence databases. BLAST is a typical software for finding local alignment based on heuristic, but could miss results. Using the Smith-Waterman algorithm, we can find all local alignments in O(mn) time, where m and n are lengths of a query and a text, respectively. A recent exact approach BWT-SW improves the complexity of the Smith-Waterman algorithm under constraints, but still much slower than BLAST. This paper takes on the challenge of designing an accurate and efficient algorithm for evaluating local-alignment searches, especially for long queries. In this paper, we propose an efficient software called ALAE to speed up BWT-SW using a compressed suffix array. ALAE utilizes a family of filtering techniques to prune meaningless calculations and an algorithm for reusing score calculations. We also give a mathematical analysis and show that the upper bound of the total number of calculated entries using ALAE could vary from 4.50mn0.520 to 9.05mn0.896 for random DNA sequences and vary from 8.28mn0.364 to 7.49mn0.723 for random protein sequences. We demonstrate the significant performance improvement of ALAE on BWT-SW using a thorough experimental study on real biosequences. ALAE guarantees correctness and accelerates BLAST for most of parameters.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: not found

          Human-mouse alignments with BLASTZ.

          The Mouse Genome Analysis Consortium aligned the human and mouse genome sequences for a variety of purposes, using alignment programs that suited the various needs. For investigating issues regarding genome evolution, a particularly sensitive method was needed to permit alignment of a large proportion of the neutrally evolving regions. We selected a program called BLASTZ, an independent implementation of the Gapped BLAST algorithm specifically designed for aligning two long genomic sequences. BLASTZ was subsequently modified, both to attain efficiency adequate for aligning entire mammalian genomes and to increase its sensitivity. This work describes BLASTZ, its modifications, the hardware environment on which we run it, and several empirical studies to validate its results.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Suffix Arrays: A New Method for On-Line String Searches

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Indexing compressed text

                Bookmark

                Author and article information

                Journal
                01 August 2012
                Article
                1208.0274
                3ca63b01-174f-41d2-a027-dc6337636881

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1507-1518 (2012)
                VLDB2012
                cs.DB
                Ahmet Sacan

                Comments

                Comment on this article