42
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

      research-article
      BMC Bioinformatics
      BioMed Central
      Smith-Waterman, Needleman-Wunsch, Semi-global alignment, Sequence alignment, SIMD, Database search

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Sequence alignment algorithms are a key component of many bioinformatics applications.

          Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations.

          Results

          A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar’s ‘striped’ approach. Rognes’s SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail’s prefix scan implementation is generally the fastest, faster even than Farrar’s ‘striped’ approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license.

          Conclusions

          Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-016-0930-z) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          Amino acid substitution matrices from protein blocks.

          Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            An improved algorithm for matching biological sequences.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Striped Smith-Waterman speeds database searches six times over other SIMD implementations.

              The only algorithm guaranteed to find the optimal local alignment is the Smith-Waterman. It is also one of the slowest due to the number of computations required for the search. To speed up the algorithm, Single-Instruction Multiple-Data (SIMD) instructions have been used to parallelize the algorithm at the instruction level. A faster implementation of the Smith-Waterman algorithm is presented. This algorithm achieved 2-8 times performance improvement over other SIMD based Smith-Waterman implementations. On a 2.0 GHz Xeon Core 2 Duo processor, speeds of >3.0 billion cell updates/s were achieved. http://farrar.michael.googlepages.com/Smith-waterman
                Bookmark

                Author and article information

                Contributors
                jeff.daily@pnnl.gov
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                10 February 2016
                10 February 2016
                2016
                : 17
                : 81
                Affiliations
                Pacific Northwest National Laboratory, High Performance Computing Group, 902 Battelle Boulevard, P.O. Box 999, MSIN J4-30, Richland, 99352 WA USA
                Article
                930
                10.1186/s12859-016-0930-z
                4748600
                26864881
                e1e6709b-105f-435e-b871-6834f671500f
                © Daily. 2016

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 18 November 2015
                : 3 February 2016
                Categories
                Software
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                smith-waterman,needleman-wunsch,semi-global alignment,sequence alignment,simd,database search

                Comments

                Comment on this article