32
views
0
recommends
+1 Recommend
0 collections
0
shares
• Record: found
• Abstract: found
• Article: found
Is Open Access

# Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci

1 , 2 , 1 , 2 , *

Systematic Biology

Oxford University Press

Bookmark
There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

### Abstract

Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ( $L$ ) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/ $L$ , and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates.

### Most cited references33

• Record: found
• Abstract: found
• Article: found
Is Open Access

### MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space

(2012)
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d N /d S rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.
Bookmark
• Record: found
• Abstract: found
• Article: found
Is Open Access

### BEAST: Bayesian evolutionary analysis by sampling trees

(2007)
Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. Results BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at under the GNU LGPL license. Conclusion BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.
Bookmark
• Record: found
• Abstract: found

### PAML 4: phylogenetic analysis by maximum likelihood.

(2007)
PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).
Bookmark

### Author and article information

###### Journal
Syst Biol
Syst. Biol
sysbio
sysbio
Systematic Biology
Oxford University Press
1063-5157
1076-836X
March 2015
11 December 2014
11 December 2014
: 64
: 2
: 267-280
###### Affiliations
1Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 2Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
###### Author notes
*Correspondence to be sent to: Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, England. E-mail: z.yang@ 123456ucl.ac.uk

Associate Editor: Peter Foster

###### Article
syu109
10.1093/sysbio/syu109
4380039
25503979
© The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

###### Page count
Pages: 14
###### Categories
Regular Articles

Animal science & Zoology