EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows.

Related collections

Most cited references 15

Record: found
Abstract: found
Article: not found

The Bioperl toolkit: Perl modules for the life sciences.

Jason E Stajich, David Block, Kris Boulez … (2002)

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

0 comments Cited 714 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee

Fabrice Armougom, Sébastien Moretti, Olivier Poirot … (2006)

Expresso is a multiple sequence alignment server that aligns sequences using structural information. The user only needs to provide sequences. The server runs BLAST to identify close homologues of the sequences within the PDB database. These PDB structures are used as templates to guide the alignment of the original sequences using structure-based sequence alignment methods like SAP or Fugue. The final result is a multiple sequence alignment of the original sequences based on the structural information of the templates. An advanced mode makes it possible to either upload private structures or specify which PDB templates should be used to model each sequence. Providing the suitable structural information is available, Expresso delivers sequence alignments with accuracy comparable with structure-based alignments. The server is available on .

0 comments Cited 188 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

An alignment confidence score capturing robustness to guide tree uncertainty.

Giddy Landan, Eyal Privman, Osnat Penn … (2010)

Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.

0 comments Cited 144 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Evol Bioinform Online

Journal ID (iso-abbrev): Evol. Bioinform. Online

Journal ID (publisher-id): Evolutionary Bioinformatics

Title: Evolutionary Bioinformatics Online

Publisher: Libertas Academica

ISSN (Electronic): 1176-9343

Publication date Collection: 2016

Publication date (Electronic): 28 November 2016

Volume: 12

Pages: 277-284

Affiliations

[1 ]Joint Research Unit “Infection and Public Health” FISABIO, Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Paterna, Valencia, Spain.

[2 ]CIBER in Epidemiology and Public Health, Madrid, Spain.

Author notes

CORRESPONDENCE: fernando.gonzalez@ 123456uv.es

Article

Publisher ID: ebo-12-2016-277

DOI: 10.4137/EBO.S40583

PMC ID: 5127606

SO-VID: 9edc11c1-981b-455b-97ac-ea6576ba50ae

License:

This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.

History

Date received : 18 July 2016

Date revision received : 02 October 2016

Date accepted : 05 October 2016

Comments

Comment on this article

scite_

Cited by 3

See all cited by

Most referenced authors 721

See all reference authors

EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers

Read this article at

Abstract

Related collections

Genetoberfest

Most cited references 15

The Bioperl toolkit: Perl modules for the life sciences.

Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee

An alignment confidence score capturing robustness to guide tree uncertainty.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 149

Cited by 3

Most referenced authors 721