59
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A simple method to control over-alignment in the MAFFT multiple sequence alignment program

      research-article
      1 , * , 1 , 2
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments). Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction.

          Results: The proposed method utilizes a variable scoring matrix for different pairs of sequences (or groups) in a single multiple sequence alignment, based on the global similarity of each pair. This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions. Regarding sensitivity, the effect of the proposed method is slightly negative in real protein-based benchmarks, and mostly neutral in simulation-based benchmarks. This approach is based on natural biological reasoning and should be compatible with many methods based on dynamic programming for multiple sequence alignment.

          Availability and implementation: The new feature is available in MAFFT versions 7.263 and higher. http://mafft.cbrc.jp/alignment/software/

          Contact: katoh@ 123456ifrec.osaka-u.ac.jp

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: not found

          Amino acid substitution matrices from protein blocks.

          Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            An algorithm for progressive multiple alignment of sequences with insertions.

            Dynamic programming algorithms guarantee to find the optimal alignment between two sequences. For more than a few sequences, exact algorithms become computationally impractical, and progressive algorithms iterating pairwise alignments are widely used. These heuristic methods have a serious drawback because pairwise algorithms do not differentiate insertions from deletions and end up penalizing single insertion events multiple times. Such an unrealistically high penalty for insertions typically results in overmatching of sequences and an underestimation of the number of insertion events. We describe a modification of the traditional alignment algorithm that can distinguish insertion from deletion and avoid repeated penalization of insertions and illustrate this method with a pair hidden Markov model that uses an evolutionary scoring function. In comparison with a traditional progressive alignment method, our algorithm infers a greater number of insertion events and creates gaps that are phylogenetically consistent but spatially less concentrated. Our results suggest that some insertion/deletion "hot spots" may actually be artifacts of traditional alignment algorithms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A beginner's guide to eukaryotic genome annotation.

              The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 July 2016
                26 February 2016
                26 February 2016
                : 32
                : 13
                : 1933-1942
                Affiliations
                1Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan
                2Institute for Virus Research, Kyoto University, Kyoto 606-8507, Japan
                Author notes
                *To whom correspondence should be addressed.

                Associate Editor: Janet Kelso

                Article
                btw108
                10.1093/bioinformatics/btw108
                4920119
                27153688
                f471591c-9f1c-4934-b530-a3e5955b1b62
                © The Author 2016. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 05 October 2015
                : 15 February 2016
                : 19 February 2016
                Page count
                Pages: 10
                Categories
                Original Papers
                Sequence Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article