10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: not found

          The Bioperl toolkit: Perl modules for the life sciences.

          The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee

            Expresso is a multiple sequence alignment server that aligns sequences using structural information. The user only needs to provide sequences. The server runs BLAST to identify close homologues of the sequences within the PDB database. These PDB structures are used as templates to guide the alignment of the original sequences using structure-based sequence alignment methods like SAP or Fugue. The final result is a multiple sequence alignment of the original sequences based on the structural information of the templates. An advanced mode makes it possible to either upload private structures or specify which PDB templates should be used to model each sequence. Providing the suitable structural information is available, Expresso delivers sequence alignments with accuracy comparable with structure-based alignments. The server is available on .
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An alignment confidence score capturing robustness to guide tree uncertainty.

              Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.
                Bookmark

                Author and article information

                Journal
                Evol Bioinform Online
                Evol. Bioinform. Online
                Evolutionary Bioinformatics
                Evolutionary Bioinformatics Online
                Libertas Academica
                1176-9343
                2016
                28 November 2016
                : 12
                : 277-284
                Affiliations
                [1 ]Joint Research Unit “Infection and Public Health” FISABIO, Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Paterna, Valencia, Spain.
                [2 ]CIBER in Epidemiology and Public Health, Madrid, Spain.
                Author notes
                Article
                ebo-12-2016-277
                10.4137/EBO.S40583
                5127606
                9edc11c1-981b-455b-97ac-ea6576ba50ae
                © 2016 the author(s), publisher and licensee Libertas Academica Ltd.

                This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.

                History
                : 18 July 2016
                : 02 October 2016
                : 05 October 2016
                Categories
                Original Research

                Bioinformatics & Computational biology
                multiple sequence alignment,gappiness,outlier sequence

                Comments

                Comment on this article