75
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach.

          Results

          Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences.

          Conclusion

          On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: not found
          • Article: not found

          Individual Comparisons by Ranking Methods

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources

            Background In order to improve gene prediction, extrinsic evidence on the gene structure can be collected from various sources of information such as genome-genome comparisons and EST and protein alignments. However, such evidence is often incomplete and usually uncertain. The extrinsic evidence is usually not sufficient to recover the complete gene structure of all genes completely and the available evidence is often unreliable. Therefore extrinsic evidence is most valuable when it is balanced with sequence-intrinsic evidence. Results We present a fairly general method for integration of external information. Our method is based on the evaluation of hints to potentially protein-coding regions by means of a Generalized Hidden Markov Model (GHMM) that takes both intrinsic and extrinsic information into account. We used this method to extend the ab initio gene prediction program AUGUSTUS to a versatile tool that we call AUGUSTUS+. In this study, we focus on hints derived from matches to an EST or protein database, but our approach can be used to include arbitrary user-defined hints. Our method is only moderately effected by the length of a database match. Further, it exploits the information that can be derived from the absence of such matches. As a special case, AUGUSTUS+ can predict genes under user-defined constraints, e.g. if the positions of certain exons are known. With hints from EST and protein databases, our new approach was able to predict 89% of the exons in human chromosome 22 correctly. Conclusion Sensitive probabilistic modeling of extrinsic evidence such as sequence database matches can increase gene prediction accuracy. When a match of a sequence interval to an EST or protein sequence is used it should be treated as compound information rather than as information about individual positions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.

              The performance and time complexity of an improved version of the segment-to-segment approach to multiple sequence alignment is discussed. In this approach, alignments are composed from gap-free segment pairs, and the score of an alignment is defined as the sum of so-called weights of these segment pairs. A modification of the weight function used in the original version of the alignment program DIALIGN has two important advantages: it can be applied to both globally and locally related sequence sets, and the running time of the program is considerably improved. The time complexity of the algorithm is discussed theoretically, and the program running time is reported for various test examples. The program is available on-line at the Bielefeld University Bioinformatics Server (BiBiServ) http://bibiserv.TechFak.Uni-Bielefeld.DE/dial ign/
                Bookmark

                Author and article information

                Journal
                Algorithms Mol Biol
                Algorithms for Molecular Biology : AMB
                BioMed Central
                1748-7188
                2008
                27 May 2008
                : 3
                : 6
                Affiliations
                [1 ]University of Tübingen, Wilhelm-Schickard-Institut für Informatik, Sand 13, 72076 Tübingen, Germany
                [2 ]University of Göttingen, Institute of Microbiology and Genetics, Goldschmidtstr. 1, 37077 Göttingen, Germany
                Article
                1748-7188-3-6
                10.1186/1748-7188-3-6
                2430965
                18505568
                07ddeaa6-385b-44b0-9ff5-c2afcaa1ab73
                Copyright © 2008 Subramanian et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 25 March 2008
                : 27 May 2008
                Categories
                Research

                Molecular biology
                Molecular biology

                Comments

                Comment on this article