Our ability to correlate biological evolution with climate change, geological evolution,
and other historical patterns is essential to understanding the processes that shape
biodiversity. Combining data from the fossil record with molecular phylogenetics represents
an exciting synthetic approach to this challenge. The first molecular divergence dating
analysis (Zuckerkandl and Pauling 1962) was based on a measure of the amino acid differences
in the hemoglobin molecule, with replacement rates established (calibrated) using
paleontological age estimates from textbooks (e.g., Dodson 1960). Since that time,
the amount of molecular sequence data has increased dramatically, affording ever-greater
opportunities to apply molecular divergence approaches to fundamental problems in
evolutionary biology. To capitalize on these opportunities, increasingly sophisticated
divergence dating methods have been, and continue to be, developed. In contrast, comparatively,
little attention has been devoted to critically assessing the paleontological and
associated geological data used in divergence dating analyses. The lack of rigorous
protocols for assigning calibrations based on fossils raises serious questions about
the credibility of divergence dating results (e.g., Shaul and Graur 2002; Brochu et
al. 2004; Graur and Martin 2004; Hedges and Kumar 2004; Reisz and Müller 2004a, 2004b;
Theodor 2004; van Tuinen and Hadly 2004a, 2004b; van Tuinen et al. 2004; Benton and
Donoghue 2007; Donoghue and Benton 2007; Parham and Irmis 2008; Ksepka 2009; Benton
et al. 2009; Heads 2011).
The assertion that incorrect calibrations will negatively influence divergence dating
studies is not controversial. Attempts to identify incorrect calibrations through
the use of a posteriori methods are available (e.g., Near and Sanderson 2004; Near
et al. 2005; Rutschmann et al. 2007; Marshall 2008; Pyron 2010; Dornburg et al. 2011).
We do not deny that a posteriori methods are a useful means of evaluating calibrations,
but there can be no substitute for a priori assessment of the veracity of paleontological
data.
Incorrect calibrations, those based upon fossils that are phylogenetically misplaced
or assigned incorrect ages, clearly introduce error into an analysis. Consequently,
thorough and explicit justification of both phylogenetic and chronologic age assessments
is necessary for all fossils used for calibration. Such explicit justifications will
help to ensure that divergence dating studies are based on the best available data.
Unfortunately, the majority of previously published calibrations lack explicit explanations
and justifications of the age and phylogenetic position of the key fossils. In the
absence of explicit justifications, it is difficult to distinguish between correct
and incorrect calibrations, and it becomes difficult to reevaluate previous claims
in light of new data. Paleontology is a dynamic science, with new data and perspectives
constantly emerging as a result of new discoveries (see Kimura 2010 for a recent case
where the age of the earliest known record of a clade was more than doubled). Calibrations
based upon the best available evidence at a given time can become inappropriate as
the discovery of new specimens, new phylogenetic analyses, and ongoing stratigraphic
and geochronologic revisions refine our understanding of the fossil record.
Our primary goals in this paper are to establish the best practices for justifying
fossils used for the temporal calibration of molecular phylogenies. Our examples derive
mainly, but not exclusively, from the vertebrate fossil record. We hope that our recommendations
will lead to more credible calibrations and, as a result, more reliable divergence
dates throughout the tree of life. A secondary goal is to help the community (researchers,
editors, and reviewers) who might be unfamiliar with fossils to understand and overcome
the challenges associated with using paleontological data. In order to accomplish
these goals, we present a specimen-based protocol for selecting and documenting relevant
fossils and discuss future directions for evaluating and utilizing phylogenetic and
temporal data from the fossil record. We likewise encourage biologists relying on
nonfossil calibrations for molecular divergence estimates (e.g., ages of island or
mountain range formations, continental drift, and biomarkers) to develop their own
set of rigorous guidelines so that their calibrations may also be evaluated in a systematic
way.
A SPECIMEN-BASED APPROACH TO JUSTIFYING PALEONTOLOGICAL DATA
Most studies use a Bayesian framework for estimating divergence dates with probability
curves between a minimum and a maximum bound to represent calibrations (time priors)
(Thorne et al. 1998; Drummond et al. 2006; Yang 2006; Yang and Rannala 2006). An appropriately
constructed fossil calibration uses the oldest assigned fossil of a taxon as the basis
for its minimum age and then constructs these other parameters around it (Benton and
Donoghue 2007; Donoghue and Benton 2007). One key to improving the use of paleontological
data is recognizing that this first step can be tied explicitly to one or a small
set of museum specimens, creating a readily auditable chain of evidence. To minimize
error and maximize clarity, all calibration data should be derived explicitly from
specific fossil specimens. If links between calibration data and specimens cannot
be made, then there are serious questions about the validity of the proposed time
priors. In this respect, the fossil specimens used for calibrations represent a standard,
much in the same way that a holotype specimen (or type series) is a taxonomic standard.
In both cases, these specimens provide a necessary reference point for future inquiries.
The explicit reporting of specimen data is just as crucial to the scientific integrity
of a fossil calibration study as is making genetic sequences publicly available or
reporting analytical methods. Thus, it is worthwhile to compile, reiterate, and expand
on the caveats from previous studies that pertain to the construction and reporting
of fossil calibrations (e.g., Graur and Martin 2004; Hedges and Kumar 2004; van Tuinen
and Hadly 2004a, 2004b; Benton and Donoghue 2007; Donoghue and Benton 2007; Gandolfo
et al. 2008; Parham and Irmis 2008; Benton et al. 2009; Ksepka 2009; Sanders et al.
2010) while providing a simple and explicit protocol (in checklist form) to address
them.
The checklist can be divided into two parts, justifying phylogenetic position (Steps
1–3) and justifying age (Steps 4 and 5). In most cases, the data needed to justify
calibrations are rarely found in a single publication but tend to be spread across
many. In addition to being derived from many sources, such information is rarely explicitly
flagged as potentially valuable for calibrations. Therefore, a rigorous and explicit
approach is needed for justifying the use of paleontological and geological data for
divergence dating. The following steps can be used to develop new calibrations and
as a checklist for vetting and justifying previously published calibrations based
on fossils. If all five steps are fulfilled, then a calibration can be considered
well justified.
(1) Museum numbers of specimen(s) that demonstrate all the relevant characters and
provenance data should be listed. Referrals of additional specimens to the focal taxon
should be justified.
(2) An apomorphy-based diagnosis of the specimen(s) or an explicit, up-to-date, phylogenetic
analysis that includes the specimen(s) should be referenced.
(3) Explicit statements on the reconciliation of morphological and molecular data
sets should be given.
(4) The locality and stratigraphic level (to the best of current knowledge) from which
the calibrating fossil(s) was/were collected should be specified.
(5) Reference to a published radioisotopic age and/or numeric timescale and details
of numeric age selection should be given.
(1) Museum Numbers of Specimen(s) that Demonstrate all the Relevant Characters and
Provenance Data Should be Listed. Referrals of Additional Specimens to the Focal Taxon
Should be Justified
Ideally, a fossil used for calibration would be based on a single specimen that preserves
all the characters that allow it to be unambiguously assigned to a clade. Single-specimen
operational taxonomic units (OTUs) are preferable because, aside from rare mixed specimens,
they are almost guaranteed to be from a single species. However, divergence dating
studies that use paleontological data for calibrations usually rely on OTUs from phylogenetic
analyses that are based on sets of specimens referred to a single taxon by various
criteria. In some cases, the basis for a taxonomic referral can be as poor as documenting
that the specimen was recovered from the same region or horizon where other specimens
were previously reported. Consequently, “chimeric taxa” are a recurring problem in
paleontology (Meyer-Berthaud et al. 1992; Padian 2000; Parham 2005).
Because single-specimen fossil OTUs are not always possible, it is necessary to revisit
the association and referral of specimens. It may be possible to refer specimens from
different localities to a single taxon if there are overlapping diagnostic elements
or even through phylogenetic analysis (Gandolfo et al. 1997; Yates 2003; Pol 2004;
Boyd et al. 2009; Makovicky 2010). In cases where previously recognized OTUs cannot
be objectively assembled, it is necessary to restrict the calibration to a subset
of specimens (e.g., Danilov and Parham 2005) or eliminate the OTU from the calibration.
(2) An Apomorphy-Based Diagnosis of the Specimen(s) or an Explicit, Up-to-Date, Phylogenetic
Analysis that Includes the Specimen(s) Should be Referenced
Incorrect phylogenetic placement of fossil calibrations can introduce large errors
into divergence date estimates (Lee 1999; Brochu 2000; van Tuinen and Hedges 2004;
Phillips et al. 2010). Fossil-calibrated dating studies rely on the paleontological
literature for calibration placement but many of the putative oldest representatives
of a lineage have never been included in a formal phylogenetic analysis. Gandolfo
et al. (2008) identified several instances in which incorrect identifications and
taxonomic assignments led to inappropriate fossil calibrations. This is a particular
problem for clades that are understudied, represented by a sparse fossil record, and/or
routinely overidentified (i.e., placed in a lower level taxon than the data can demonstrate)
in the literature (e.g., Cenozoic amphibians and reptiles, Bever 2005; Bell et al.
2010; Sanders et al. 2010). The fact that different authorities may use the same taxon
names to refer to different biological entities confounds the problem and may be particularly
prevalent when addressing the fossil record of extant lineages. This is why we recommend
the use of an apomorphy-based approach to identifying and phylogenetically placing
specimens that are relevant for paleontological calibrations. These guidelines can
also be applied to trace fossils (e.g., tetrapod footprints) in the case that their
identifications are well supported and they show strong evidence for the antiquity
of a lineage based on explicit apomorphies (Carrano and Wilson 2001; Li et al. 2008;
Brusatte et al. 2011).
Because fossils are incompletely preserved, many extinct species have controversial
phylogenetic assignments. Given the analytical burden placed on paleontological data,
it is imperative that up-to-date evidence supporting the taxonomic assignment of relevant
OTUs be explicitly provided. A recurring pitfall is the understandable enthusiasm
of paleontologists to report the oldest geological record of a clade, frequently based
upon fragmentary evidence. This can be problematic on two counts. First, fragmentary
remains often provide insufficient anatomical evidence to discriminate whether shared
characters are products of convergence or common descent. Second, with fragmentary
specimens, it can be difficult to distinguish whether the critical fossil belongs
to the stem or the crown of the clade that it is being used to calibrate. By definition,
the earliest stem members will possess the smallest subset of the diagnostic characters
of the crown, and so assigning fragmentary fossils to either the crown or the stem
of a clade requires detailed knowledge of character evolution that is not always available.
Conversely, fossil specimens of crown clades may not be recognized as such because
they lack one or more of the diagnostic characters as a consequence of taphonomy or
secondary loss (Hennig 1981; Donoghue and Purnell 2009; Sansom et al. 2010). This
issue is especially true for crown clades that are united on the basis of strong molecular
evidence but for which limited morphological support is known (e.g., Afrotheria or
Boreoeutheria among placental mammals; see Asher et al. 2009). This problem is also
likely to occur in poorly represented basal taxa of lineages that underwent substantial
morphological evolution long after their origin. In those cases, the taxa that might
be of greatest interest in constraining the time of divergence from the nearest living
relative may be difficult to identify.
These complexities underscore the need to carefully justify the phylogenetic placement
of any specimen used for calibrations. It is not enough to cite a paper that merely
mentions the taxon or specimen(s) because the strictness of criteria used in the reported
phylogenetic placement of fossils varies among authors (especially when it comes to
fragmentary, undescribed, and/or unanalyzed specimens). The phylogenetic position
of a fossil taxon can be unstable even when relatively complete specimens are available.
Therefore, a thorough knowledge of the paleontological literature is required to make
sure that the most recent and/or valid study is being cited. After all, claims about
the oldest member(s) of a lineage may change as new data and analyses are published.
A good example of this phenomenon is the case of the putative oldest placental mammals,
the zhelestids (Archibald 1996). Zhelestids are Cretaceous mammal fossils that were
initially hypothesized to be nested deeply within the crown clade of modern orders
of placental mammals (Eutheria), the rest of which do not appear until the Cenozoic.
In more recent analyses, zhelestids have been steadily moving down the tree (Archibald
et al. 2001) and now are hypothesized to be on the stem of Eutheria (Luo and Wible
2005) where they offer no evidence about a minimum date for crown Eutheria. This stemward
change in phylogenetic position arose from increasing clarity about the relationships
of mammalian orders rather than from correcting errors in earlier morphological study
or discovery of better specimens. All three phenomena—new specimens, new interpretations
of existing specimens, and phylogenetic revisions—can lead to major revisions in the
phylogenetic placement of fossils.
Existing databases such as the Paleobiology Database (www.pbdb.org) may contain detailed
taxonomic, geographic, geologic, and stratigraphic information associated with fossil
specimens, but relevant phylogenetic information justifying the taxonomic placement
of these individual specimens is usually lacking. Moreover, rates of polyphyly in
mammalian and molluscan morphotaxa were recently documented to be as high as 19% (Jablonski
and Finarelli 2009), illustrating the risks of uncritically accepting taxonomic allocations
represented in large scale databases (as well as the need to construct databases following
our specimen-based protocol). Whereas existing databases are extremely useful for
identifying the potential oldest specimens assignable to a given clade, explicit,
apomorphy-based information is still necessary to justify the phylogenetic position
of a specimen for calibration.
(3) Explicit Statements on the Reconciliation of Morphological and Molecular Data
Sets Should be Given
In the best cases, fossil specimens possess unambiguous apomorphies that allow them
to be assigned to a single extant lineage with confidence. In these instances, assigning
fossils to nodes is straightforward. Regardless of the tree topology, the fossil will
track the extant lineage and serve as a candidate calibration for all nodes in which
it is nested (Fig. 1, Example 1; see, e.g., Smith 2010). In other cases, the position
of a fossil is supported by ambiguous apomorphies (i.e., homoplastic characters) and
is therefore highly dependent on the topology of a specific analysis. In addition
to the changing position of a taxon given different morphological analyses (see 2
above), any discrepancy between topologies of morphological and molecular phylogenetic
analyses is a potential pitfall that has been underemphasized (Benton et al. 2009;
Lyson et al. 2010; Wiens et al. 2010). Different topologies from morphological and
molecular analyses can affect fossil calibrations in several ways. In some cases,
the placement of a fossil may become ambiguous (Fig. 1, Example 2) leading to uncertainty
about which node(s) it can be used to calibrate. If morphological data show high levels
of homoplasy, the polarization of morphological characters also may be sensitive to
shifting topologies (Fig. 1, Example 3). Different topologies imply different hypotheses
of character evolution, potentially impacting the placement of fossils in a tree (Asher
et al. 2005; Cadena et al. 2012). Unless morphological and molecular trees are in
agreement, the phylogenetic position of a fossil cannot be automatically transferred
to a molecular-based topology. Therefore, merely citing a morphological phylogeny
that places a fossil taxon (i.e., 2) is insufficient justification for a fossil calibration.
FIGURE 1.
Example 1: A fossil (†) with unambiguous synapomorphies can be assigned to a specific
lineage (D) with confidence. Regardless of the topology, the fossil will track the
extant lineage and serve as a candidate calibration for all nodes above which it is
nested. Example 2: Competing phylogenetic hypotheses from different data sets can
change the position of fossil calibrations. In the morphological analysis, a fossil
is found to be closely related to lineages C and D. Two arrows show the nodes that
the fossil could calibrate. A molecular study with a different topology separates
lineages C and D, making the placement of the fossil ambiguous. If the fossil is closely
related to C, then it could calibrate three nodes. If the fossil is closely related
to D, then it is a candidate calibration for just one node. Example 3: Changes to
outgroup topology can change the polarization of morphological characters and placement
of fossils. In the morphological analysis, a fossil (†) is placed in the C + D clade,
sister to D. A molecular analysis changes the relationships of the outgroups (A and
B). In a combined analysis, the morphological characters for the C + D clade are polarized
in a different way and so using the fossil to calibrate clade C + D would be inappropriate.
Some problems of incongruent morphological and molecular topologies can be mitigated
by either “total evidence” (sensu Kluge 1989) analyses (e.g., Brochu 1997; Hermsen
and Hendricks 2008; O'Leary and Gatesy 2008; Ksepka 2009) or through the use of a
“molecular scaffold” in resolving morphological character distribution and, therefore,
the phylogenetic position of species known only from fossils (e.g., Springer et al.
2001; Danilov and Parham 2006). Both those approaches incorporate, and therefore explicitly
attempt to reconcile, the morphological data from fossil specimens with the topologies
of molecular analyses though they make different assumptions about the accuracy of
molecular versus morphological data. These methods do not solve every problem, so
a conservative approach to calibrating analyses based on poorly supported or controversial
placements is warranted. In some cases, it may be conceivable that the morphological
and molecular data sets are so incongruent that neither a total evidence nor a molecular
scaffold approach are sufficient for reconciling the position of an extinct taxon.
For example, given current uncertainty concerning the phylogenetic position of turtles
among amniotes, any use of the oldest fossil turtle specimens to calibrate amniote
branching events has a two-thirds probability of introducing error into the analysis
(see Lyson et al. 2010, Lyson et al. 2012). We recommend against using such controversial
OTUs to calibrate divergence dating analyses.
(4) The Locality and Stratigraphic Level (to the Best of Current Knowledge) from which
the Calibrating Fossil(s) Was/Were Collected Should be Specified
Unless they are subjected to direct radioisotopic analysis (which is rarely possible),
the provenance of specimens used for calibrations must be documented. The accuracy
with which a particular fossil can be located to a specific level in a stratigraphic
column varies but depends largely on how detailed the locality data are. It might
be constrained to a discrete bed in a measured stratigraphic section, or a geologic
formation or group, or a depositional basin. Many specimens, especially those collected
more than 50 years ago or those derived from the commercial trade, lack detailed stratigraphic
and geographic occurrence data and so have limited value for calibration purposes.
Almost any fossil found in situ can be assigned to its source rock unit and often
to a particular stratigraphic level within that unit. In the best cases, calibration
data will be based upon fossils with precise locality information and stratigraphic
context that can be assigned to a particular meter level in a chronostratigraphically
well-studied section (Fig. 2). The accuracy with which a fossil can be placed within
a stratigraphic framework will have a major impact on estimates of its relative (stratigraphic)
and numeric (absolute) age, particularly in light of improvements in correlation,
revisions of stratigraphy, and refinements in geochronology. Geologic units (e.g.,
groups, formations, and members) are the key lithostratigraphic units used by field
geologists to correlate and divide the sedimentary rock sequence in a geographic region;
they generally have formal names (e.g., Willwood Formation, Fig. 2) and explicitly
defined bases and tops.
FIGURE 2.
Every fossil taxon has geographic and geological contexts that provide a basis for
determining its age. The example given here is for Diacodexis ilicis. Depending on
the phylogeny used, D. ilicis can be a useful minimum calibration for artiodactyl
mammals. Six specimens of D. ilicis are known (Gingerich 1989) and the holotype, UM
(University of Michigan) 87854, is among the oldest well-dated specimens. UM 87854
is from the Clarks Fork depositional basin in northern Wyoming. Within the Clarks
Fork Basin, it is from the Willwood Formation. Within the Willwood Formation, it is
from Locality UM SC-67. Locality UM SC-67 is part of a well-studied stratigraphic
section for the Early Eocene. Within the Early Eocene, Locality UM SC-67 can be placed
in the Wasatchian Land-Mammal Age. Within the Wasatchian, Locality UM SC-67 can be
assigned to the biozone Wa-0 and occurs within a global negative carbon isotopic excursion.
Wa-0 spans the latter part of this carbon isotope excursion and is inferred to represent
∼95 ky in the stratigraphic section, where UM 87854 occurs (Abdul Aziz et al. 2008);
the entire global carbon isotope excursion is currently dated to 55.65–55.93 on the
basis of radioisotopic ages and orbital tuning methods based on the earth's precessional
cycles (Westerhold et al. 2009), giving specimen UM 87854 a minimum age of 55.65 Ma.
Geologic units are never of uniform scale, whether in terms of thickness or geographic
extent, because they merely represent mappable units of distinctive rock types. Most
importantly, rock units do not represent equal units of time—some rock units may be
deposited geologically instantaneously, whereas others might represent millions of
years with different portions of the total time range represented at particular outcrops.
Nor do the boundaries between lithologic units necessarily coincide with geochronologic
divisions (i.e., units of geologic time). But the assignment of a fossil to a named
geologic rock unit provides a fixed standard of the relative age of the fossil that
can then be used to establish a numeric age as outlined below (5).
Stratigraphy is not a static field. Episodically, stratigraphic nomenclature is revised
or entirely redefined with the establishment of new “type sections,” and new lithostratigraphic
or biostratigraphic schemes proposed. New descriptions and correlations can lead to
refined interpretations of the geologic unit present at a particular geographic locality
(e.g., Martz and Parker 2010). The dynamic nature of stratigraphy highlights the importance
of detailed geographic locality information for fossil specimens in order to determine
the impact of revised stratigraphic interpretations, correlations, and geochronologies
upon divergence dating calibrations and, ultimately, divergence time estimates.
(5) Reference to a Published Radioisotopic Age and/or Numeric Timescale and Details
of Numeric Age Selection Should be Given
Divergence dating analyses require numeric ages, but paleontologists do not routinely
use or report numeric ages. The numeric age of a fossil is generally outside the purview
of most paleontologists' research interests for two reasons. First, the geochronologic
data required for numeric dates can be difficult to establish for a particular rock
unit and geographic locality. Second, though geochronologies evolve, named rock units
change much less frequently and so provide a more stable albeit relative comparative
framework for reporting fossil occurrences. The translation of fossil occurrences
to numeric ages frequently involves a daisy chain of correlations through different
geographic localities on the basis of overlapping geological and paleontological evidence
(e.g., van Tuinen and Hadly 2004a; Benton et al. 2009; Smith 2011). However, for the
vast majority of calibrations, this translation is not explained, meaning the actual
numbers used in calculations are not adequately justified.
The numeric age of a fossil is not necessarily stable, particularly if it is established
through correlation rather than through direct dating at the section in which the
fossil was found. Any numeric age for a fossil specimen is merely the best current
estimate and can be refined through time. For example, radioisotopic dating methods
have improved dating precision by roughly an order of magnitude in the past 20 years
as a result of new methods, recalibration of standards, and cross-testing among existing
methods (e.g., Mundil et al. 2004; Erwin 2006; Renne et al. 2010). 40Ar/39Ar and U-Pb
ages differ systematically by ∼1%, something that requires correction prior to comparison
(e.g., Renne et al. 2010). Because of this ongoing refinement, it is important to
fully explain the basis upon which the numeric age is established. If the chain of
inference is explicit, the consequences of revisions will be easily identified. At
its most basic level, our recommendation for justifying the numeric age of a calibration
point is that the translation of relative intervals from paleontological studies should
reference geochronological literature or published timescales that include numeric
ages (e.g., Hess and Lippolt 1986; Menning et al. 2000; Gradstein et al. 2004; Ogg
2010; Walker and Geissman 2009). Of course, even compiled geologic timescales rely
on some interpolation, are themselves constantly undergoing revision, and can become
obsolete. Referencing these timescales makes it easier for later workers to revise
reported ages.
A second part of this step in the protocol involves the logistical interpretation
of the numeric age from the geological timescale. For a minimum age constraint, the
youngest age interpretation of the fossil should be used (i.e., the uppermost limit
of the relevant time interval) rather than the common practice of adopting a midpoint
in the possible range. Because a fossil necessarily postdates the origination of the
lineage to which it is assigned, choosing the youngest possible age from an interval
will necessarily bias the minimum further from the true age of origination. However,
it is important to recognize that the minimum age is only one end pointof a constraint
and is meant to partially bracket, not approximate on its own, the age of origination.
Therefore, the minimum age should accommodate the youngest possible age of the fossil
including the error associated with the geochronologic age (van Tuinen et al. 2004;
Donoghue and Benton 2007; Benton and Donoghue 2007; Benton et al. 2009).
This youngest possible age should be applied as a hard minimum. The logic behind assigning
hardminima based on the youngest possible age of the oldest-known fossil has been
discussed extensively (e.g., van Tuinen et al. 2004; Benton and Donoghue 2007; Donoghue
and Benton 2007). Some authors may still choose to use soft minima in cases of hypothesized
anagenesis or geologic uncertainty, but such instances require careful justification.
The arbitrary assignment of a minimum age that postdates the stated youngest estimates
for a fossil should be avoided. The justification for arbitrarily expanding the interval
might appeal to a conservative bias, but when paleontological data are properly established
and justified that practice serves only to introduce unnecessary error into the analysis.
In some cases, either because of poor correlations or poorly documented provenance,
the age of a fossil may not be well constrained beyond a very broad stratigraphic
interval. But in many cases, it is possible to determine much more precise and accurate
dates than are given by a stratigraphic interval. Those data may not be available
in the publications describing the fossil specimens used for calibrations, and so
it is usually necessary to compile evidence from multiple studies. Anatomically trained
fossil systematists may not be able to retrieve those data any more easily than molecular
systematists, but by listing the specimen numbers, rock units, and ages in a standardized
way, others may check the claim, thus facilitating the refinement of numeric dates
over time.
Useful Discussions
In addition to the five steps of the specimen-based protocol, we recommend that authors
include some discussion about the history of each node that addresses rejected or
obsolete calibrations. Such detailed discussions of calibrations already exist in
some papers (e.g., Benton and Donoghue 2007; Hurley et al. 2007; Benton et al. 2009).
These summary discussions make it easier for others to assess the justification by
highlighting the relevant literature and argumentation. We should expect that through
discovery, description, critique, and phylogenetic/stratigraphic analysis that even
the best-justified calibrations would eventually be refined or even dramatically changed.
In order to facilitate the evolution of justifications, we recommend that explanatory
discussions (or citations of such discussions) should become a standard part of calibration
reporting.
Other Parameters
The justification of the phylogenetic position and age of a fossil is an important
first step to calibrating a node in a divergence dating analysis. In addition to determining
what nodes can even be assigned time priors (some may not have useable fossils), this
step provides the most tangible data from the fossil record: the hard minimum bound
of a calibration interval. The maximum bound and the distribution of probabilities
within the minimum–maximum interval are also ostensibly based on the fossil record,
but in a much more complex way, because they describe probability of origination before
the oldest known fossil. The idiosyncratic nature of these other parameters precludes
us from developing a standard protocol for them.
Ideally, the maximum constraint is established as older than all the oldest possible
records, extending back to encompass a time when the ecologic, biogeographic, geologic,
and taphonomic conditions for the existence of the lineage are met, but no records
are known. For the maximum bound, an intuitive approach that takes into account preservation
potential and phylogenetic bracketing has been proposed (e.g., Reisz and Müller 2004a;
Müller and Reisz 2005; Benton and Donoghue 2007; Donoghue and Benton 2007; Benton
et al. 2009). This approach is borrowed and developed from the fossil recovery potential
function established by Marshall (1997). Researchers who use this intuitive approach
should provide detailed arguments justifying their decisions so that others can evaluate
them and, following the arguments of Benton and Donoghue (2007) and Ho and Phillips
(2009), the maximum bounds should be soft and liberal.
Most studies use a Bayesian framework for estimating divergence dates with probability
curves between minimum and maximum bounds. In theory, such complex, parameter-rich
priors may be better models of the fossil record, but there is presently no practical
way to estimate curve parameters (Ho and Phillips 2009). Lee and Skinner (2011) note,
“current practice often consists of little more than educated guesswork.” A review
of recent studies shows that these parameters are usually not justified (Warnock et
al. 2012). The implications of these choices are only recently being explored (Inoue
et al. 2010; Clarke et al. 2011; Lee and Skinner 2011; Warnock et al. 2012). But the
fact that a widely applied methodology is subjected to such ambiguous assumptions
that have a major impact on results (Clarke et al. 2011, Warnock et al. 2012) is a
major limitation of molecular divergence dating studies. The development of objective
methods for estimating maximum bounds and probability curves should be a priority
(see Future Directions section).
EXAMPLES OF FOSSIL CALIBRATIONS
In order to demonstrate the application of our specimen-based protocol, we apply it
to two widely used calibrations in the vertebrate scion of the tree of life: the crocodile–bird
node (Archosauria, below) and the human–chimpanzee node (Hominini, online Appendices
1, 2 (available from http://www.sysbio.oxfordjournals.org/)). A survey of the literature
shows variation among the interpretations of the paleontological data for these nodes
(Fig. 3). The application of the specimen-based protocol to these “classic” nodes
results in new hard minima. We also provide examples of our recommended node calibration
discussions as well as maximum bounds, the latter following the approach of Benton
and Donoghue (2007).
FIGURE 3.
Examples of variation among constraint ages for two commonly calibrated nodes. Dark
circles show point calibrations, lines show constraint intervals. Top: Archosauria,
crocodile–bird node. Bottom: Homini, human–chimpanzee node. The data for this figure
can be found in online Appendix 2.
Example from Archosaurian Reptiles (Crocodile–Bird Node: 247.2 Ma Hard Minimum, 256
Ma Soft Maximum)
(1) Museum numbers of specimen(s) that demonstrate all the relevant characters and
provenance data should be listed. Referrals of additional specimens to the focal taxon
should be justified.—
The Institute of Vertebrate Paleontology and Paleoanthropology (IVPP) V 6026, holotype
of Xilousuchus sapingensis (Wu 1981).
(2) An apomorphy-based diagnosis of the specimen(s) or an explicit, up-to-date, phylogenetic
analysis that includes the specimen(s) should be referenced.—
Nesbitt et al. 2011 conclusively demonstrates that IVPP V 6026 belongs along the stem
of Crocodylia (in the clade Pseudosuchia). Specifically, IVPP V 6026 preserves a number
of synapomorphies placing it in the basal clade Poposauroidea, which also includes
Arizonasaurus babbitti (Nesbitt et al. 2011), a previously proposed calibration point
for Archosauria. As discussed by Nesbitt et al. 2011, IVPP V 6026 can be placed in
Archosauria because it possesses the following synapomorphies: palatal processes of
the maxilla meet at the midline, external foramen for the abducens nerve is contained
wholly within the prootic, and the dorsolateral margin of the posterior process of
the maxilla preserves an antorbital fossa. It is placed within Poposauroidea because
it possesses: a posterodorsal process of the premaxilla that is restricted to the
ventral border of the external naris, the anterodorsal margin of the maxilla borders
the external naris, a concave anterodorsal margin is present at the base of the dorsal
process of the maxilla, foramina for entrance of cerebral branches of internal carotid
artery into the braincase are positioned on the ventral surface, and it lacks the
distal expansion of neural spines of the dorsal vertebrae (Nesbitt et al. 2011).
(3) Explicit statements on the reconciliation of morphological and molecular data
sets should be given.—
The morphological hypothesis of Nesbitt et al. (2011) is concordant with molecular
hypotheses of higher level tetrapod relationships that support a monophyletic Archosauria
(e.g., Zardoya and Meyer 1998; Hedges and Poling 1999; Kumazawa and Nishida 1999;
Rest et al. 2003; Iwabe et al. 2005; Hugall et al. 2007; Alfaro et al. 2009; Becker
et al. 2011; Shen et al. 2011; Lyson et al. 2012). Recently, several molecular data
sets have recovered support for a novel turtle–crocodilian clade (Hedges and Poling
1999; Mannen and Li 1999; Cao et al. 2000; Shedlock et al. 2007) or a novel turtle–bird
clade (Cotton and Page 2002). However, support for these topologies over an alternative
where turtles are the sister taxon to a monophyletic Archosauria is often weak (Cao
et al. 2000; Iwabe et al., 2005; Katsu et al. 2009). The majority of recent molecular
analyses support a monophyletic Archosauria (Iwabe et al. 2005; Hugall et al. 2007;
Alfaro et al. 2009; Katsu et al. 2009; Lyson et al. 2012), and placement of turtles
as archosauriforms or archosaurs does not affect the oldest potential calibration
point or the phylogenetic placement of IVPP V 6026.
(4) The locality and stratigraphic level (to the best of current knowledge) from which
the calibrating fossil(s) was/were collected should be specified.—
Heshanggou Formation, Hazhen Commune, Fugu County, northeast Shaanxi Province of China
(Wu 1981).
(5) Reference to a published radioisotopic age and/or numeric time scale and details
of numeric age selection should be given.—
Although the Heshanggou Formation preserves vertebrate fossils that are thought to
be correlative to other Early Triassic assemblages in Pangaea (Shu and Norris 1988;
Rubidge 2005; Nesbitt et al. 2011), what is most relevant is that the formation preserves
an extensive palynofloral (pollen and spore) assemblage that is unambiguously Early
Triassic in age (Shu and Norris 1988) and easily correlative to other Early Triassic
palynofloras worldwide (cf. Lindström and McLaughlin 2007; Kürschner and Herngreen
2010). Furthermore, the Heshanggou Formation also preserves the plant macrofossil
taxon Pleuromeia sternbergii (Shu and Norris 1988), a classic Early Triassic disaster
taxon found in abundance throughout Pangaea (Wang 1996; Looy et al. 2001; Grauvogel-Stamm
and Ash 2005; Krassilov and Karasev 2009). The palynofloral data specifically suggest
that these strata were deposited during the Olenekian Stage (late Early Triassic)
(Shu and Norris 1988). The entire Early Triassic is well sampled with high-precision
U-Pb radioisotopic ages, so the duration of the Olenekian is well constrained to 251.3–247.2
Ma (Mundil et al. 2010).
Discussion.—
The bird–crocodile split (crown groups Aves–Crocodylia and total groups Ornithodira–Pseudosuchia)
is one of the fundamental calibrations for tetrapod vertebrate studies (e.g., Hugall
et al. 2007; Alfaro et al. 2009) because it often serves as an external calibration
for both squamate and avian molecular analyses (e.g., Paton et al. 2002; Pereira and
Baker 2006; Kumazawa 2007; Okajima and Kumazawa 2010) and is relevant to well-used
model organisms in developmental biology (Benton and Donoghue 2007). Early analyses
used a secondary calibration for this divergence, which has been rightly criticized
(Graur and Martin 2004; Müller and Reisz 2005). Fossils on the lineage to birds (Ornithodira)
have not generally been used for calibration because, until recently, they postdated
by 10–15 myr the earliest putative fossils from the crocodylian total group (Pseudosuchia)
(Müller and Reisz 2005; Benton and Donoghue 2007).
Müller and Reisz (2005) proposed an age of 243–251 Ma for this divergence, based on
the presence of the pseudosuchian Ar. babbitti during the lower Anisian stage of the
Middle Triassic (Nesbitt 2003, 2005), which is 245–247 Ma using recent high-precision
U-Pb radioisotopic age data (Mundil et al. 2010). Ar. babbitti is phylogenetically
well constrained as a member of the Pseudosuchia (e.g., Nesbitt 2003; Nesbitt and
Norell 2006; Nesbitt 2007; Weinbaum and Hungerbühler 2007; Nesbitt et al. 2011). However,
its geologic age is less secure. Ar. babbitti is from the Holbrook Member of the Moenkopi
Formation in Arizona, U.S.A.; this has been dated to the Anisian using long-distance
vertebrate biostratigraphic correlations (Morales 1987; Lucas and Schoch 2002), and
this is consistent with very limited magnetostratigraphic (Steiner et al. 1993) and
radioisotopic age data (Dickinson and Gehrels 2009). Unfortunately, Triassic vertebrate
biostratigraphy is in constant flux (e.g., Rayfield et al. 2005, 2009; Irmis et al.
2010), so the age assignment of Ar. babbitti is not particularly robust. Nonetheless,
accepting an Anisian age for this taxon, it would give a minimum age of divergence
for the bird–crocodile split of 242–247 Ma (Mundil et al. 2010).
Some studies have followed this calibration (e.g., Hugall et al. 2007), whereas Benton
and Donoghue (2007) proposed an age of 235–250.4 Ma for this divergence, based on
the presence of the archosauriform Vjushkovisaurus berdjanensis from the ?Anisian
of Russia. V. berdjanensis is a poor choice for calibrating the bird–crocodile split
because it has never been included in a phylogenetic analysis, and there is morphological
character evidence suggesting that it is a basal archosauriform outside of crown Archosauria
(i.e., phylogenetically predates the bird–crocodile divergence) (Gower and Sennikov
2000).
Nesbitt et al. (2010) recently described Asilisaurus kongwe from the Anisian (Middle
Triassic) of Tanzania, a basal dinosauromorph that is the oldest representative of
the stem lineage of birds. Although phylogenetically well constrained, this fossil
is no older than Ar. babbitti. It also suffers similar problems in geologic dating;
the age of As. kongwe is only constrained by long-distance vertebrate biostratigraphy
(see Nesbitt et al. 2010
supplementary information). Brusatte et al. (2011) recently described putative basal
dinosauromorph footprints from the Early Triassic, but these records face the same
identification problems as other ichnofossils (see above) and are no older than IVPP
V 6026.
IVPP V 6026, from the Early Triassic Heshanggou Formation of China, was first described
as a basal archosauriform, outside of crown Archosauria. Gower and Sennikov (1996a,
1996b, 1997) agreed with this basal phylogenetic position; however, reevaluation of
the specimen by Nesbitt et al. (2011) conclusively demonstrates that it belongs to
the Pseudosuchia. This absolute age of IVPP V 6026 therefore provides a minimum constraint
for the bird–crocodile split.
Soft maximum age constraint.—
Justification for the maximum age constraint for Archosauria is difficult because
recent fossil discoveries have steadily pushed back the age of divergence for this
node. Current fossil evidence suggests that Archosauria diverged during the earliest
Triassic based on the fact that a number of lineages must have diverged prior to the
Olenekian (Nesbitt et al. 2011), but this does not eliminate the possibility that
older representatives of Archosauria will be discovered. Non-archosaur archosauriforms
appear to have achieved a global distribution by the end of the Early Triassic, with
taxa present in Eastern Europe, Asia, South America, South Africa, and Antarctica
(Smith et al. 2011). Despite this widespread distribution, which includes relatively
well-sampled assemblages (e.g., the South African Lystrosaurus Assemblage Zone), X.
sapingensis remains the only Early Triassic archosauriform with well-established archosaur
affinities. We suggest that the age of the oldest basal archosauriform (part of the
archosaur stem lineage, but basal to the avian-crocodylian divergence) Archosaurus
rossicus provides a conservative estimate for a soft maximum because it is the oldest
archosauriform fossil known, and pre-dates by several million years all putative crown
archosaur fossils. A. rossicus is clearly identifiable as a basal archosauriform (Gower
and Sennikov 2000) and is dated to the Changhsingian (latest Permian) by palynofloral
biostratigraphy (Krassilov and Karasev 2009). The Changhsingian is dated by high-precision
U-Pb ages to 256–252.3 Ma. Given that we do not know exactly what part of the Changhsingian
A. rossicus is from, we suggest that a conservative soft maximum for Archosauria would
be 256 Ma.
Using the Checklist to Test Calibrations
Previous studies identified inappropriate calibrations that have introduced error
into divergence dating analyses (e.g., Graur and Martin 2004; Gandolfo et al. 2008;
Ksepka 2009; Sanders et al. 2010). In order to show how the application of the specimen-based
protocol can identify inappropriate calibrations, we include two examples (from boid
snakes and charadriiform birds; online Appendix 3). In the boid snake example, the
published minimum age (55 Ma) cannot be substantiated with specimen-based evidence
so we recommend a much younger minimum age (16.0 Ma). Similarly, in the charadriiform
bird example, the published minimum age (28.4 Ma) cannot be substantiated with specimen
evidence. In fact, we cannot identify another specimen that will satisfy all the steps
of the protocol for that node and so recommend that future workers do not calibrate
it.
It is not certain that errors like these would be identified by cross-validation studies,
but even then if they were it would not be clear why the fossil data were incongruent.
Regardless, vetting calibrations beforehand is clearly preferable to including poorly
substantiated data into any analysis. The checklist is an important first step to
identifying other incorrect calibrations and establishing more reliable time priors.
FUTURE DIRECTIONS
Better-justified calibrations are more likely to be free of error, which will make
molecular divergence dates more accurate and therefore provide greater rigor in testing
hypotheses. A specimen-based protocol will focus attention on discrepancies between
the fossil record and published calibration points, making it easier for later researchers
to identify and correct errors and refine calibrations as new data come to light.
Standardizing the reporting of data in publications (e.g., our checklist) is a crucial
first step. In addition to providing this tool, we identify some unresolved issues
such as the need for more objective methods for selecting some parameters of time
priors (the maximum constraint and probability curves) and the difficulty associated
with compiling information from traditionally separate disciplines. In both cases,
solutions can be found in bioinformatic approaches.
Objective Calibration Parameters
Following our checklist protocol will help workers identify the oldest certain fossil
of a lineage that can anchor a time prior with an objective hard minimum age. But
diagnostic fossils always postdate the origination time of the lineage they represent
(e.g., Marshall 1990; Norell 1992; Benton and Ayala 2003; Benton and Donoghue 2007).
The probability of origination before the oldest certain fossil is what the other
Bayesian calibration parameters attempt to estimate. Therefore, objectively establishing
these parameters requires quantitative estimates of factors that would contribute
to the nonpreservation of a lineage. Inoue et al. (2010, p. 74) highlight “the need
for probabilistic modeling of fossil depositions, preservations, and sampling to provide
statistical summaries of information in the fossil record.”.
The amount of effort required to create rigorous paleontological databases suitable
for calculating priors is intimidating. To date, few studies have integrated databases
of fossil occurrences with measurements of rock record bias at small taxonomic or
geographic scales (Holland 2003; Benton et al. 2004; Uhen and Pyenson 2007; Marx 2009)
and never for the purpose of developing time priors. This approach was extended to
calculate Bayesian “confidence intervals” (sensu Strauss and Sadler 1989; Hedman 2010;
see, e.g., Friedman and Brazeau 2010) on origination dates that could be adapted as
Bayesian priors for divergence dating studies, but we do not know of any studies that
have done this yet. Another method permits estimation of origination time posteriors
based on the temporal distribution of crown and stem members, which were then used
as time priors for divergence dating (Wilkinson et al. 2011). The development and
comparison of these and other methods to objectively estimate time prior parameters
should be a priority for the divergence dating effort. Even then, the utility of any
method will be limited by the accessibility of relevant data. Obtaining genetic sequences
is not the limiting factor, but the synthesis will require substantially more input
from paleontologists in order to collate, vet, and organize the data from the fossil
record.
Rapid Access to Information
A common thread among the problems we address is the difficulty associated with compiling
information from traditionally separate disciplines. For example, even the first step
of the specimen-based protocol, listing specimen numbers and justifying referrals,
is a daunting challenge for a molecular biologist wanting a vetted time-prior for
their study. Such challenges can be met through collaboration or by citing a study
that has already applied the protocol. Both solutions are not only effective but also
introduce logistical difficulties and delays. Ideally, these data could be collated
ahead of time or at least made more accessible. The next step is to create Internet
resources that facilitate the storage, retrieval, and explication of paleontological
calibration data similar to the way that molecular sequence data are archived on GenBank.
The Internet is an ideal platform for this endeavor as exemplified by the Date-a-Clade
Website (www.fossilrecord.net/dateaclade/index.html) based on Benton and Donoghue
(2007). We can imagine a broader, dynamic, community-contributed instar of that resource
(Ksepka et al. 2011) that is linked to other online clearinghouses of biological data
such as the Paleobiology Database (www.paleodb.org), Morphobank (www.morphobank.org),
TimeTree Database (timetree.org), and the Encyclopedia of Life (www.eol.org).
We encourage paleontologists and geochronologists to take a more proactive role by
providing data that match the steps outlined above, engaging in collaborative studies,
and contributing to efforts to provide these data to their neontologist colleagues.
Paleontologists have important incentives to contribute streamlined calibration data
for divergence dating. If paleontological data can be elevated to their proper position
in this synthesis, it will result in more citations and funding. Raising the bar to
expect morphological phylogenies to be explicitly reconciled with molecular analyses
will encourage the comparison of phylogenetic signals arising for different classes
of data and highlight potential areas for future research (e.g., differences in rates
of morphological and molecular evolution, relationships between diversification and
disparity). The recommendations to explicitly justify numeric ages will further integrate
stratigraphy and geochronology with anatomical systematics. The need for more objective
ways to characterize soft maximum dates should stimulate the development of bioinformatic
methods for better quantifying the fossil record. Beyond cleaning up suspect data,
refocusing on the fossils will catalyze synthesis among different fields. By bringing
all the neglected facets of paleontology to the fore, we can build a new community
of interdisciplinary scholars eager to develop a more holistic and rigorous approach
to the study of evolution and the history of life.
SUPPLEMENTARY MATERIAL
Supplementary appendices can be found online in the Dryad data repository (doi:10.5061/dryad.m7n455k0).
FUNDING
This work was supported by the John D. and Catherine T. MacArthur Foundation funding
of the Biodiversity Synthesis Group of the Encyclopedia of Life and a BioSynC Synthesis
Meeting (to J.F.P.); Natural Environment Research Council (to M.J.B. and P.C.J.D.);
The Biotechnology and Biological Science Research Council (to P.C.J.D., Z.Y., and
J.G.I.); Grants from the Leverhulme Trust (to M.J.B.); continuing work is supported
by the National Evolutionary Synthesis Center (NESCent) (National Science Foundation
no. EF-0905606). Funding to pay the Open Access publication charge was provided through
an award from the University of Utah J. Willard Marriott Library Open Access Publishing
Fund.