38
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PHYRN: A Robust Method for Phylogenetic Analysis of Highly Divergent Sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Both multiple sequence alignment and phylogenetic analysis are problematic in the “twilight zone” of sequence similarity (≤25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at “midnight zone” genetic distances (∼7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: not found

          Profile hidden Markov models.

          S. Eddy (1998)
          The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference

            PHYML Online is a web interface to PHYML, a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large datasets in reasonable computing time. The server and its documentation are available at .
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An algorithm for progressive multiple alignment of sequences with insertions.

              Dynamic programming algorithms guarantee to find the optimal alignment between two sequences. For more than a few sequences, exact algorithms become computationally impractical, and progressive algorithms iterating pairwise alignments are widely used. These heuristic methods have a serious drawback because pairwise algorithms do not differentiate insertions from deletions and end up penalizing single insertion events multiple times. Such an unrealistically high penalty for insertions typically results in overmatching of sequences and an underestimation of the number of insertion events. We describe a modification of the traditional alignment algorithm that can distinguish insertion from deletion and avoid repeated penalization of insertions and illustrate this method with a pair hidden Markov model that uses an evolutionary scoring function. In comparison with a traditional progressive alignment method, our algorithm infers a greater number of insertion events and creates gaps that are phylogenetically consistent but spatially less concentrated. Our results suggest that some insertion/deletion "hot spots" may actually be artifacts of traditional alignment algorithms.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2012
                13 April 2012
                : 7
                : 4
                : e34261
                Affiliations
                [1 ]Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
                [2 ]Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
                [3 ]Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America
                [4 ]Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
                [5 ]Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
                [6 ]Department of Biochemistry and Molecular Medicine, School of Medicine, University of California Davis, Davis, California, United States of America
                [7 ]Department of Physiology and Membrane Biology, School of Medicine, University of California Davis, Davis, California, United States of America
                [8 ]Center for Translational Bioscience and Computing, University of California Davis, Davis, California, United States of America
                Tel Aviv University, Israel
                Author notes

                Conceived and designed the experiments: GB ECH RLP DBVR. Performed the experiments: GB KDK YH SVC ZZ NLH LAK MG DNH MEP FD EJS. Analyzed the data: GB ECH RLP DBVR. Wrote the paper: GB ECH RLP DBVR.

                Article
                PONE-D-11-24752
                10.1371/journal.pone.0034261
                3325999
                22514627
                bc4da0ee-cbbb-46b6-995b-9a9a462594fb
                This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
                History
                : 30 November 2011
                : 24 February 2012
                Page count
                Pages: 13
                Categories
                Research Article
                Biology
                Computational Biology
                Biological Data Management
                Evolutionary Modeling
                Sequence Analysis
                Evolutionary Biology
                Evolutionary Systematics
                Phylogenetics
                Evolutionary Theory
                Genetics
                Molecular Genetics
                Proteomics
                Sequence Analysis

                Uncategorized
                Uncategorized

                Comments

                Comment on this article