8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Multiple sequence alignment (MSA) is ubiquitous in evolution and bioinformatics. MSAs are usually taken to be a known and fixed quantity on which to perform downstream analysis despite extensive evidence that MSA accuracy and uncertainty affect results. These errors are known to cause a wide range of problems for downstream evolutionary inference, ranging from false inference of positive selection to long branch attraction artifacts. The most popular approach to dealing with this problem is to remove (filter) specific columns in the MSA that are thought to be prone to error. Although popular, this approach has had mixed success and several studies have even suggested that filtering might be detrimental to phylogenetic studies. We present a graph-based clustering method to address MSA uncertainty and error in the software Divvier (available at https://github.com/simonwhelan/Divvier), which uses a probabilistic model to identify clusters of characters that have strong statistical evidence of shared homology. These clusters can then be used to either filter characters from the MSA (partial filtering) or represent each of the clusters in a new column (divvying). We validate Divvier through its performance on real and simulated benchmarks, finding Divvier substantially outperforms existing filtering software by retaining more true pairwise homologies calls and removing more false positive pairwise homologies. We also find that Divvier, in contrast to other filtering tools, can alleviate long branch attraction artifacts induced by MSA and reduces the variation in tree estimates caused by MSA uncertainty.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.

          O. Gascuel (1997)
          We propose an improved version of the neighbor-joining (NJ) algorithm of Saitou and Nei. This new algorithm, BIONJ, follows the same agglomerative scheme as NJ, which consists of iteratively picking a pair of taxa, creating a new mode which represents the cluster of these taxa, and reducing the distance matrix by replacing both taxa by this node. Moreover, BIONJ uses a simple first-order model of the variances and covariances of evolutionary distance estimates. This model is well adapted when these estimates are obtained from aligned sequences. At each step it permits the selection, from the class of admissible reductions, of the reduction which minimizes the variance of the new distance matrix. In this way, we obtain better estimates to choose the pair of taxa to be agglomerated during the next steps. Moreover, in comparison with NJ's estimates, these estimates become better and better as the algorithm proceeds. BIONJ retains the good properties of NJ--especially its low run time. Computer simulations have been performed with 12-taxon model trees to determine BIONJ's efficiency. When the substitution rates are low (maximum pairwise divergence approximately 0.1 substitutions per site) or when they are constant among lineages, BIONJ is only slightly better than NJ. When the substitution rates are higher and vary among lineages,BIONJ clearly has better topological accuracy. In the latter case, for the model trees and the conditions of evolution tested, the topological error reduction is on the average around 20%. With highly-varying-rate trees and with high substitution rates (maximum pairwise divergence approximately 1.0 substitutions per site), the error reduction may even rise above 50%, while the probability of finding the correct tree may be augmented by as much as 15%.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters

            Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il.
              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              Biological sequence analysis

                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Mol Biol Evol
                Mol. Biol. Evol
                molbev
                Molecular Biology and Evolution
                Oxford University Press
                0737-4038
                1537-1719
                October 2019
                18 June 2019
                18 June 2019
                : 36
                : 10
                : 2340-2351
                Affiliations
                [1 ] Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
                [2 ] Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Pakistan
                Author notes
                Corresponding author: E-mail: simon.whelan.evolution@ 123456gmail.com .
                Author information
                http://orcid.org/0000-0003-3056-3173
                Article
                msz142
                10.1093/molbev/msz142
                6933875
                31209473
                3a6a5f17-ca0d-495b-bc3d-7cd6b0d1a9b5
                © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                Page count
                Pages: 12
                Categories
                Methods

                Molecular biology
                multiple sequence alignment,filtering,homology,phylogenetic inference
                Molecular biology
                multiple sequence alignment, filtering, homology, phylogenetic inference

                Comments

                Comment on this article