DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach.

Results

Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences.

Conclusion

On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.

Related collections

Most cited references 30

Record: found
Abstract: not found
Article: not found

Individual Comparisons by Ranking Methods

Frank Wilcoxon (1945)

0 comments Cited 2044 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources

Mario Stanke, Oliver Schöffmann, Burkhard Morgenstern … (2006)

Background In order to improve gene prediction, extrinsic evidence on the gene structure can be collected from various sources of information such as genome-genome comparisons and EST and protein alignments. However, such evidence is often incomplete and usually uncertain. The extrinsic evidence is usually not sufficient to recover the complete gene structure of all genes completely and the available evidence is often unreliable. Therefore extrinsic evidence is most valuable when it is balanced with sequence-intrinsic evidence. Results We present a fairly general method for integration of external information. Our method is based on the evaluation of hints to potentially protein-coding regions by means of a Generalized Hidden Markov Model (GHMM) that takes both intrinsic and extrinsic information into account. We used this method to extend the ab initio gene prediction program AUGUSTUS to a versatile tool that we call AUGUSTUS+. In this study, we focus on hints derived from matches to an EST or protein database, but our approach can be used to include arbitrary user-defined hints. Our method is only moderately effected by the length of a database match. Further, it exploits the information that can be derived from the absence of such matches. As a special case, AUGUSTUS+ can predict genes under user-defined constraints, e.g. if the positions of certain exons are known. With hints from EST and protein databases, our new approach was able to predict 89% of the exons in human chromosome 22 correctly. Conclusion Sensitive probabilistic modeling of extrinsic evidence such as sequence database matches can increase gene prediction accuracy. When a match of a sequence interval to an EST or protein sequence is used it should be treated as compound information rather than as information about individual positions.

0 comments Cited 550 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.

B. Morgenstern (1999)

The performance and time complexity of an improved version of the segment-to-segment approach to multiple sequence alignment is discussed. In this approach, alignments are composed from gap-free segment pairs, and the score of an alignment is defined as the sum of so-called weights of these segment pairs. A modification of the weight function used in the original version of the alignment program DIALIGN has two important advantages: it can be applied to both globally and locally related sequence sets, and the running time of the program is considerably improved. The time complexity of the algorithm is discussed theoretically, and the program running time is reported for various test examples. The program is available on-line at the Bielefeld University Bioinformatics Server (BiBiServ) http://bibiserv.TechFak.Uni-Bielefeld.DE/dial ign/

0 comments Cited 154 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Algorithms Mol Biol

Title: Algorithms for Molecular Biology : AMB

Publisher: BioMed Central

ISSN (Electronic): 1748-7188

Publication date Collection: 2008

Publication date (Electronic): 27 May 2008

Volume: 3

Page: 6

Affiliations

[1 ]University of Tübingen, Wilhelm-Schickard-Institut für Informatik, Sand 13, 72076 Tübingen, Germany

[2 ]University of Göttingen, Institute of Microbiology and Genetics, Goldschmidtstr. 1, 37077 Göttingen, Germany

Article

Publisher ID: 1748-7188-3-6

DOI: 10.1186/1748-7188-3-6

PMC ID: 2430965

PubMed ID: 18505568

SO-VID: 07ddeaa6-385b-44b0-9ff5-c2afcaa1ab73

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Higher order chromatin architecture

Most cited references 30

Individual Comparisons by Ranking Methods

Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources

DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 130

Cited by 68

Most referenced authors 1,278