+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Discordance of Species Trees with Their Most Likely Gene Trees

      1 , 2

      PLoS Genetics

      Public Library of Science

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of within-species evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies.


          Different genomic regions evolving along the branches of a tree of species relationships can have different evolutionary histories. Consequently, estimates of species trees from genetic data may be influenced by the particular choice of genomic regions used in an analysis. Recent work has focused on circumventing this problem by combining information from multiple regions to attempt to produce accurate species tree estimates.

          The authors show that the use of multiple genomic regions for species tree inference is subject to a surprising new difficulty, the problem of “anomalous gene trees.” Not only can individual genes or genomic regions have genealogical histories that differ in shape, or topology, from a species tree, the gene tree topology most likely to evolve can differ from the species tree topology. As a result, the “democratic vote” procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can converge on the wrong species tree as more genes are added. As it becomes more feasible to simultaneously investigate many regions of a genome, species tree inference algorithms will need to begin taking the problem of anomalous gene trees into consideration.

          Related collections

          Most cited references 39

          • Record: found
          • Abstract: found
          • Article: not found

          Evolutionary relationship of DNA sequences in finite populations.

           F Tajima (1983)
          With the aim of analyzing and interpreting data on DNA polymorphism obtained by DNA sequencing or restriction enzyme technique, a mathematical theory on the expected evolutionary relationship among DNA sequences (nucleons) sampled is developed under the assumption that the evolutionary change of nucleons is determined solely by mutation and random genetic drift. The statistical property of the number of nucleotide differences between randomly chosen nucleons and that of heterozygosity or nucleon diversity is investigated using this theory. These studies indicate that the estimates of the average number of nucleotide differences and nucleon diversity have a large variance, and a large part of this variance is due to stochastic factors. Therefore, increasing sample size does not help reduce the variance significantly The distribution of sample allele (nucleomorph) frequencies is also studied, and it is shown that a small number of samples are sufficient in order to know the distribution pattern.
            • Record: found
            • Abstract: found
            • Article: not found

            Genome-scale approaches to resolving incongruence in molecular phylogenies.

            One of the most pervasive challenges in molecular phylogenetics is the incongruence between phylogenies obtained using different data sets, such as individual genes. To systematically investigate the degree of incongruence, and potential methods for resolving it, we screened the genome sequences of eight yeast species and selected 106 widely distributed orthologous genes for phylogenetic analyses, singly and by concatenation. Our results suggest that data sets consisting of single or a small number of concatenated genes have a significant probability of supporting conflicting topologies. By contrast, analyses of the entire data set of concatenated genes yielded a single, fully resolved species tree with maximum support. Comparable results were obtained with a concatenation of a minimum of 20 genes; substantially more genes than commonly used but a small fraction of any genome. These results have important implications for resolving branches of the tree of life.
              • Record: found
              • Abstract: found
              • Article: not found

              Inferring phylogeny despite incomplete lineage sorting.

              It is now well known that incomplete lineage sorting can cause serious difficulties for phylogenetic inference, but little attention has been paid to methods that attempt to overcome these difficulties by explicitly considering the processes that produce them. Here we explore approaches to phylogenetic inference designed to consider retention and sorting of ancestral polymorphism. We examine how the reconstructability of a species (or population) phylogeny is affected by (a) the number of loci used to estimate the phylogeny and (b) the number of individuals sampled per species. Even in difficult cases with considerable incomplete lineage sorting (times between divergences less than 1 N(e) generations), we found the reconstructed species trees matched the "true" species trees in at least three out of five partitions, as long as a reasonable number of individuals per species were sampled. We also studied the tradeoff between sampling more loci versus more individuals. Although increasing the number of loci gives more accurate trees for a given sampling effort with deeper species trees (e.g., total depth of 10 N(e) generations), sampling more individuals often gives better results than sampling more loci with shallower species trees (e.g., depth = 1 N(e)). Taken together, these results demonstrate that gene sequences retain enough signal to achieve an accurate estimate of phylogeny despite widespread incomplete lineage sorting. Continued improvement in our methods to reconstruct phylogeny near the species level will require a shift to a compound model that considers not only nucleotide or character state substitutions, but also the population genetics processes of lineage sorting. [Coalescence; divergence; population; speciation.].

                Author and article information

                Role: Editor
                PLoS Genet
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                May 2006
                26 May 2006
                : 2
                : 5
                [1 ] Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
                [2 ] Department of Human Genetics, Bioinformatics Program, and the Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, United States of America
                Harvard University, United States of America
                Author notes
                06-PLGE-RA-0033R3 plge-02-05-05
                Copyright: © 2006 Degnan and Rosenberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Page count
                Pages: 7
                Research Article
                Genetics/Population Genetics
                Custom metadata
                Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genet 2(5): e68. DOI: 10.1371/journal.pgen.0020068



                Comment on this article