29
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features

      research-article
      1 , 2 , 1 , 2 , *
      Nucleic Acids Research
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site ( http://msa.sbc.su.se/).

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: not found

          Improved tools for biological sequence comparison.

          We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Pfam: clans, web tools and services

            Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (), the USA (), France () and Sweden ().
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              An improved algorithm for matching biological sequences.

                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                February 2009
                22 December 2008
                22 December 2008
                : 37
                : 3
                : 858-865
                Affiliations
                1Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 and 2Stockholm Bioinformatics Centre, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
                Author notes
                *To whom correspondence should be addressed. Tel: +46 8 55 37 85 67; Fax: +46 8 55 37 82 14; Email: Erik.Sonnhammer@ 123456sbc.su.se

                The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

                Article
                gkn1006
                10.1093/nar/gkn1006
                2647288
                19103665
                f3d770a4-57b5-44d4-90ba-cb399eaeebdd
                © 2008 The Author(s)

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 30 October 2008
                : 27 November 2008
                : 1 December 2008
                Categories
                Computational Biology

                Genetics
                Genetics

                Comments

                Comment on this article