Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site ( http://msa.sbc.su.se/).

Related collections

Most cited references 26

Record: found
Abstract: found
Article: not found

Improved tools for biological sequence comparison.

W R Pearson, D J Lipman (1988)

We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.

0 comments Cited 845 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Pfam: clans, web tools and services

Robert D. Finn, Jaina Mistry, Benjamin Schuster-Böckler … (2005)

Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (), the USA (), France () and Sweden ().

0 comments Cited 680 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

An improved algorithm for matching biological sequences.

Osamu Gotoh (1982)

0 comments Cited 335 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): February 2009

Publication date (Electronic): 22 December 2008

Publication date PMC-release: 22 December 2008

Volume: 37

Issue: 3

Pages: 858-865

Affiliations

¹Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 and ²Stockholm Bioinformatics Centre, Albanova, Stockholm University, SE-10691 Stockholm, Sweden

Author notes

*To whom correspondence should be addressed. Tel: +46 8 55 37 85 67; Fax: +46 8 55 37 82 14; Email: Erik.Sonnhammer@ 123456sbc.su.se

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

Article

Publisher ID: gkn1006

DOI: 10.1093/nar/gkn1006

PMC ID: 2647288

PubMed ID: 19103665

SO-VID: f3d770a4-57b5-44d4-90ba-cb399eaeebdd

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 30 October 2008

Date revision received : 27 November 2008

Date accepted : 1 December 2008

Comments

Comment on this article

scite_

Cited by 102

See all cited by

Most referenced authors 1,202

See all reference authors

Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features

Read this article at

Abstract

Related collections

Genes & Diseases

Most cited references 26

Improved tools for biological sequence comparison.

Pfam: clans, web tools and services

An improved algorithm for matching biological sequences.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 80

Cited by 102

Most referenced authors 1,202