A simple method to control over-alignment in the MAFFT multiple sequence alignment program

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments). Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction.

Results: The proposed method utilizes a variable scoring matrix for different pairs of sequences (or groups) in a single multiple sequence alignment, based on the global similarity of each pair. This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions. Regarding sensitivity, the effect of the proposed method is slightly negative in real protein-based benchmarks, and mostly neutral in simulation-based benchmarks. This approach is based on natural biological reasoning and should be compatible with many methods based on dynamic programming for multiple sequence alignment.

Availability and implementation: The new feature is available in MAFFT versions 7.263 and higher. http://mafft.cbrc.jp/alignment/software/

Contact: katoh@ 123456ifrec.osaka-u.ac.jp

Supplementary information: Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 31

Record: found
Abstract: found
Article: not found

Amino acid substitution matrices from protein blocks.

S Henikoff, J. Henikoff (1992)

Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.

0 comments Cited 1085 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

An algorithm for progressive multiple alignment of sequences with insertions.

Ari Löytynoja, Nick Goldman (2005)

Dynamic programming algorithms guarantee to find the optimal alignment between two sequences. For more than a few sequences, exact algorithms become computationally impractical, and progressive algorithms iterating pairwise alignments are widely used. These heuristic methods have a serious drawback because pairwise algorithms do not differentiate insertions from deletions and end up penalizing single insertion events multiple times. Such an unrealistically high penalty for insertions typically results in overmatching of sequences and an underestimation of the number of insertion events. We describe a modification of the traditional alignment algorithm that can distinguish insertion from deletion and avoid repeated penalization of insertions and illustrate this method with a pair hidden Markov model that uses an evolutionary scoring function. In comparison with a traditional progressive alignment method, our algorithm infers a greater number of insertion events and creates gaps that are phylogenetically consistent but spatially less concentrated. Our results suggest that some insertion/deletion "hot spots" may actually be artifacts of traditional alignment algorithms.

0 comments Cited 409 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A beginner's guide to eukaryotic genome annotation.

Mark Yandell, Daniel Ence (2012)

The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches.

0 comments Cited 257 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Journal ID (hwp): bioinfo

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 01 July 2016

Publication date (Electronic): 26 February 2016

Publication date PMC-release: 26 February 2016

Volume: 32

Issue: 13

Pages: 1933-1942

Affiliations

¹Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan

²Institute for Virus Research, Kyoto University, Kyoto 606-8507, Japan

Author notes

*To whom correspondence should be addressed.

Associate Editor: Janet Kelso

Article

Publisher ID: btw108

DOI: 10.1093/bioinformatics/btw108

PMC ID: 4920119

PubMed ID: 27153688

SO-VID: f471591c-9f1c-4934-b530-a3e5955b1b62

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

History

Date received : 05 October 2015

Date revision received : 15 February 2016

Date accepted : 19 February 2016

Page count

Pages: 10

Comments

Comment on this article

scite_

Cited by 214

See all cited by

Most referenced authors 1,174

See all reference authors

A simple method to control over-alignment in the MAFFT multiple sequence alignment program

Read this article at

Abstract

Related collections

AIP Publishing: Coronavirus

Most cited references 31

Amino acid substitution matrices from protein blocks.

An algorithm for progressive multiple alignment of sequences with insertions.

A beginner's guide to eukaryotic genome annotation.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 56

Cited by 214

Most referenced authors 1,174