Discordance of Species Trees with Their Most Likely Gene Trees

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of within-species evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies.

Synopsis

Different genomic regions evolving along the branches of a tree of species relationships can have different evolutionary histories. Consequently, estimates of species trees from genetic data may be influenced by the particular choice of genomic regions used in an analysis. Recent work has focused on circumventing this problem by combining information from multiple regions to attempt to produce accurate species tree estimates.

The authors show that the use of multiple genomic regions for species tree inference is subject to a surprising new difficulty, the problem of “anomalous gene trees.” Not only can individual genes or genomic regions have genealogical histories that differ in shape, or topology, from a species tree, the gene tree topology most likely to evolve can differ from the species tree topology. As a result, the “democratic vote” procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can converge on the wrong species tree as more genes are added. As it becomes more feasible to simultaneously investigate many regions of a genome, species tree inference algorithms will need to begin taking the problem of anomalous gene trees into consideration.

Related collections

Most cited references 34

Record: found
Abstract: found
Article: not found

Inferring phylogeny despite incomplete lineage sorting.

Wayne P. Maddison, L Lacey Knowles (2006)

It is now well known that incomplete lineage sorting can cause serious difficulties for phylogenetic inference, but little attention has been paid to methods that attempt to overcome these difficulties by explicitly considering the processes that produce them. Here we explore approaches to phylogenetic inference designed to consider retention and sorting of ancestral polymorphism. We examine how the reconstructability of a species (or population) phylogeny is affected by (a) the number of loci used to estimate the phylogeny and (b) the number of individuals sampled per species. Even in difficult cases with considerable incomplete lineage sorting (times between divergences less than 1 N(e) generations), we found the reconstructed species trees matched the "true" species trees in at least three out of five partitions, as long as a reasonable number of individuals per species were sampled. We also studied the tradeoff between sampling more loci versus more individuals. Although increasing the number of loci gives more accurate trees for a given sampling effort with deeper species trees (e.g., total depth of 10 N(e) generations), sampling more individuals often gives better results than sampling more loci with shallower species trees (e.g., depth = 1 N(e)). Taken together, these results demonstrate that gene sequences retain enough signal to achieve an accurate estimate of phylogeny despite widespread incomplete lineage sorting. Continued improvement in our methods to reconstruct phylogeny near the species level will require a shift to a compound model that considers not only nucleotide or character state substitutions, but also the population genetics processes of lineage sorting. [Coalescence; divergence; population; speciation.].

0 comments Cited 303 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci.

Bruce Rannala, Ziheng Yang (2003)

The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.

0 comments Cited 259 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Gene Trees in Species Trees

Wayne P. Maddison (1997)

0 comments Cited 230 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Genet

Journal ID (publisher-id): pgen

Title: PLoS Genetics

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-7390

ISSN (Electronic): 1553-7404

Publication date (Print): May 2006

Publication date (Electronic): 26 May 2006

Volume: 2

Issue: 5

Electronic Location Identifier: e68

Affiliations

[1 ] Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America

[2 ] Department of Human Genetics, Bioinformatics Program, and the Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, United States of America

Harvard University, United States of America

Author notes

E-mail: jdegnan@ 123456hsph.harvard.edu (JHD); rnoah@ 123456umich.edu (NAR)

Article

Publisher ID: 06-PLGE-RA-0033R3 Serial Item and Contribution ID: plge-02-05-05

DOI: 10.1371/journal.pgen.0020068

PMC ID: 1464820

PubMed ID: 16733550

SO-VID: 04552c63-f0cd-4754-800a-ddb263e8f835

Copyright © Copyright: © 2006 Degnan and Rosenberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 31 January 2006

Date accepted : 23 March 2006

Page count

Pages: 7

Custom metadata

citation Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genet 2(5): e68. DOI: 10.1371/journal.pgen.0020068

Discordance of Species Trees with Their Most Likely Gene Trees

Read this article at

Abstract

Synopsis

Related collections

Genes & Diseases

Most cited references 34

Inferring phylogeny despite incomplete lineage sorting.

Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci.

Gene Trees in Species Trees

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 19

Cited by 256

Most referenced authors 183