Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method.

Results

This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura’s two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees.

Conclusions

We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.

Related collections

Most cited references 26

Record: found
Abstract: found
Article: not found

Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.

H Kishino, T Yano, M. Hasegawa (1984)

A new statistical method for estimating divergence dates of species from DNA sequence data by a molecular clock approach is developed. This method takes into account effectively the information contained in a set of DNA sequence data. The molecular clock of mitochondrial DNA (mtDNA) was calibrated by setting the date of divergence between primates and ungulates at the Cretaceous-Tertiary boundary (65 million years ago), when the extinction of dinosaurs occurred. A generalized least-squares method was applied in fitting a model to mtDNA sequence data, and the clock gave dates of 92.3 +/- 11.7, 13.3 +/- 1.5, 10.9 +/- 1.2, 3.7 +/- 0.6, and 2.7 +/- 0.6 million years ago (where the second of each pair of numbers is the standard deviation) for the separation of mouse, gibbon, orangutan, gorilla, and chimpanzee, respectively, from the line leading to humans. Although there is some uncertainty in the clock, this dating may pose a problem for the widely believed hypothesis that the pipedal creature Australopithecus afarensis, which lived some 3.7 million years ago at Laetoli in Tanzania and at Hadar in Ethiopia, was ancestral to man and evolved after the human-ape splitting. Another likelier possibility is that mtDNA was transferred through hybridization between a proto-human and a proto-chimpanzee after the former had developed bipedalism.

0 comments Cited 887 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Comparison of phylogenetic trees

D.F. Robinson, L.R. Foulds (1981)

0 comments Cited 644 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.

O. Gascuel (1997)

We propose an improved version of the neighbor-joining (NJ) algorithm of Saitou and Nei. This new algorithm, BIONJ, follows the same agglomerative scheme as NJ, which consists of iteratively picking a pair of taxa, creating a new mode which represents the cluster of these taxa, and reducing the distance matrix by replacing both taxa by this node. Moreover, BIONJ uses a simple first-order model of the variances and covariances of evolutionary distance estimates. This model is well adapted when these estimates are obtained from aligned sequences. At each step it permits the selection, from the class of admissible reductions, of the reduction which minimizes the variance of the new distance matrix. In this way, we obtain better estimates to choose the pair of taxa to be agglomerated during the next steps. Moreover, in comparison with NJ's estimates, these estimates become better and better as the algorithm proceeds. BIONJ retains the good properties of NJ--especially its low run time. Computer simulations have been performed with 12-taxon model trees to determine BIONJ's efficiency. When the substitution rates are low (maximum pairwise divergence approximately 0.1 substitutions per site) or when they are constant among lineages, BIONJ is only slightly better than NJ. When the substitution rates are higher and vary among lineages,BIONJ clearly has better topological accuracy. In the latter case, for the model trees and the conditions of evolution tested, the topological error reduction is on the average around 20%. With highly-varying-rate trees and with high substitution rates (maximum pairwise divergence approximately 1.0 substitutions per site), the error reduction may even rise above 50%, while the probability of finding the correct tree may be augmented by as much as 15%.

0 comments Cited 607 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Algorithms Mol Biol

Journal ID (iso-abbrev): Algorithms Mol Biol

Title: Algorithms for Molecular Biology : AMB

Publisher: BioMed Central

ISSN (Electronic): 1748-7188

Publication date Collection: 2012

Publication date (Electronic): 31 August 2012

Volume: 7

Page: 22

Affiliations

[1 ]Center for Biotechnology, Bielefeld University, Bielefeld, Germany

[2 ]Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA

[3 ]Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel

Article

Publisher ID: 1748-7188-7-22

DOI: 10.1186/1748-7188-7-22

PMC ID: 3538584

PubMed ID: 22938153

SO-VID: d1d7870b-cc33-4cbc-b8a4-63e79a12b51e

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Open source discrete and agent-based modeling frameworks for biology

Most cited references 26

Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.

Comparison of phylogenetic trees

BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 157

Cited by 2

Most referenced authors 620