Blog
About

1,522
views
0
recommends
+1 Recommend
0 collections
    21
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

          Related collections

          Most cited references 24

          • Record: found
          • Abstract: found
          • Article: not found

          MUSCLE: multiple sequence alignment with high accuracy and high throughput.

           Robert Edgar (2004)
          We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Clustal W and Clustal X version 2.0.

            The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems. The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2. The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/clustalw2/
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

               K Katoh (2002)
              A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
                Bookmark

                Author and article information

                Journal
                Mol Syst Biol
                Mol. Syst. Biol
                Molecular Systems Biology
                Nature Publishing Group
                1744-4292
                2011
                11 October 2011
                11 October 2011
                : 7
                : 539
                Affiliations
                [1 ]School of Medicine and Medical Science, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin , Dublin, Ireland
                [2 ]Computational and Systems Biology, Genome Institute of Singapore , Singapore
                [3 ]Structural and Computational Biology Unit, European Molecular Biology Laboratory , Heidelberg, Germany
                [4 ]Department of Biomolecular Engineering, University of California , Santa Cruz, CA, USA
                [5 ]EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus , Hinxton, Cambridge, UK
                [6 ]Gene Center Munich, University of Munich (LMU) , Muenchen, Germany
                [7 ]Département de Biologie Structurale et Génomique, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), CNRS/INSERM/Université de Strasbourg , Illkirch, France
                Author notes
                [a ]UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland. Tel.: +353 1 716 6833; Fax: +353 1 716 6713; des.higgins@ 123456ucd.ie
                [*]

                These authors contributed equally to this work

                Article
                msb201175
                10.1038/msb.2011.75
                3261699
                21988835
                Copyright © 2011, EMBO and Macmillan Publishers Limited

                This is an open-access article distributed under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 Unported License, which allows readers to alter, transform, or build upon the article and then distribute the resulting work under the same or similar license to this one. The work must be attributed back to the original author and commercial use is not permitted without specific permission.

                Categories
                Report

                Comments

                Comment on this article