Aligning 415 519 proteins in less than two hours on PC

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Rapid development of modern sequencing platforms enabled an unprecedented growth of protein families databases. The abundance of sets composed of hundreds of thousands sequences is a great challenge for multiple sequence alignment algorithms. In the article we introduce FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilisation of longest common subsequence measure for determining pairwise similarities, a novel method of gap costs evaluation, and a new iterative refinement scheme. Importantly, its implementation is highly optimised and parallelised to make the most of modern computer platforms. Thanks to the above, quality indicators, namely sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms like Clustal Omega or MAFFT for datasets exceeding a few thousand of sequences. The quality does not compromise time and memory requirements which are an order of magnitude lower than that of existing solutions. For example, a family of 415 519 sequences was analysed in less than two hours and required only 8GB of RAM. FAMSA is freely available at http://sun.aei.polsl.pl/REFRESH/famsa.

Related collections

Author and article information

Journal

Publication date Created: 2016-03-22

Article

ArXiV ID: 1603.06958

SO-VID: 1ff0689c-a349-4b6e-befe-dc5d15448614

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories q-bio.GN cs.DS

ScienceOpen disciplines: Data structures & Algorithms,Genetics

Data availability:

ScienceOpen disciplines: Data structures & Algorithms, Genetics

Aligning 415 519 proteins in less than two hours on PC

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 283