SeqAn An efficient, generic C++ library for sequence analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [ 1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.

Results

To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.

Conclusion

We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.

Related collections

Most cited references 37

Record: found
Abstract: found
Article: not found

A whole-genome assembly of Drosophila.

E W Myers, G Sutton, A Delcher … (2000)

We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly's sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9- megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99. 99% without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community.

0 comments Cited 382 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

Michael Brudno, Chuong B. Do, Gregory M. Cooper … (2003)

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an approximately 1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu.

0 comments Cited 377 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

An improved algorithm for matching biological sequences.

Osamu Gotoh (1982)

0 comments Cited 335 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2008

Publication date (Electronic): 9 January 2008

Volume: 9

Page: 11

Affiliations

[1 ]Algorithmische Bioinformatik, Institut für Informatik, Takustr. 9, 14195 Berlin, Germany

[2 ]International Max Planck Research School for Computational Biology and Scientific Computing, Ihnestr. 63 – 73, 14195 Berlin, Germany

Article

Publisher ID: 1471-2105-9-11

DOI: 10.1186/1471-2105-9-11

PMC ID: 2246154

PubMed ID: 18184432

SO-VID: 77a45eff-3f48-4d83-86b0-443f4ba9b4c6

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

SeqAn An efficient, generic C++ library for sequence analysis

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 37

A whole-genome assembly of Drosophila.

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

An improved algorithm for matching biological sequences.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 175

Cited by 110

Most referenced authors 1,633