Our ability to correlate biological evolution with climate change, geological evolution, and other historical patterns is essential to understanding the processes that shape biodiversity. Combining data from the fossil record with molecular phylogenetics represents an exciting synthetic approach to this challenge. The first molecular divergence dating analysis (Zuckerkandl and Pauling 1962) was based on a measure of the amino acid differences in the hemoglobin molecule, with replacement rates established (calibrated) using paleontological age estimates from textbooks (e.g., Dodson 1960). Since that time, the amount of molecular sequence data has increased dramatically, affording ever-greater opportunities to apply molecular divergence approaches to fundamental problems in evolutionary biology. To capitalize on these opportunities, increasingly sophisticated divergence dating methods have been, and continue to be, developed. In contrast, comparatively, little attention has been devoted to critically assessing the paleontological and associated geological data used in divergence dating analyses. The lack of rigorous protocols for assigning calibrations based on fossils raises serious questions about the credibility of divergence dating results (e.g., Shaul and Graur 2002; Brochu et al. 2004; Graur and Martin 2004; Hedges and Kumar 2004; Reisz and Müller 2004a, 2004b; Theodor 2004; van Tuinen and Hadly 2004a, 2004b; van Tuinen et al. 2004; Benton and Donoghue 2007; Donoghue and Benton 2007; Parham and Irmis 2008; Ksepka 2009; Benton et al. 2009; Heads 2011). The assertion that incorrect calibrations will negatively influence divergence dating studies is not controversial. Attempts to identify incorrect calibrations through the use of a posteriori methods are available (e.g., Near and Sanderson 2004; Near et al. 2005; Rutschmann et al. 2007; Marshall 2008; Pyron 2010; Dornburg et al. 2011). We do not deny that a posteriori methods are a useful means of evaluating calibrations, but there can be no substitute for a priori assessment of the veracity of paleontological data. Incorrect calibrations, those based upon fossils that are phylogenetically misplaced or assigned incorrect ages, clearly introduce error into an analysis. Consequently, thorough and explicit justification of both phylogenetic and chronologic age assessments is necessary for all fossils used for calibration. Such explicit justifications will help to ensure that divergence dating studies are based on the best available data. Unfortunately, the majority of previously published calibrations lack explicit explanations and justifications of the age and phylogenetic position of the key fossils. In the absence of explicit justifications, it is difficult to distinguish between correct and incorrect calibrations, and it becomes difficult to reevaluate previous claims in light of new data. Paleontology is a dynamic science, with new data and perspectives constantly emerging as a result of new discoveries (see Kimura 2010 for a recent case where the age of the earliest known record of a clade was more than doubled). Calibrations based upon the best available evidence at a given time can become inappropriate as the discovery of new specimens, new phylogenetic analyses, and ongoing stratigraphic and geochronologic revisions refine our understanding of the fossil record. Our primary goals in this paper are to establish the best practices for justifying fossils used for the temporal calibration of molecular phylogenies. Our examples derive mainly, but not exclusively, from the vertebrate fossil record. We hope that our recommendations will lead to more credible calibrations and, as a result, more reliable divergence dates throughout the tree of life. A secondary goal is to help the community (researchers, editors, and reviewers) who might be unfamiliar with fossils to understand and overcome the challenges associated with using paleontological data. In order to accomplish these goals, we present a specimen-based protocol for selecting and documenting relevant fossils and discuss future directions for evaluating and utilizing phylogenetic and temporal data from the fossil record. We likewise encourage biologists relying on nonfossil calibrations for molecular divergence estimates (e.g., ages of island or mountain range formations, continental drift, and biomarkers) to develop their own set of rigorous guidelines so that their calibrations may also be evaluated in a systematic way. A SPECIMEN-BASED APPROACH TO JUSTIFYING PALEONTOLOGICAL DATA Most studies use a Bayesian framework for estimating divergence dates with probability curves between a minimum and a maximum bound to represent calibrations (time priors) (Thorne et al. 1998; Drummond et al. 2006; Yang 2006; Yang and Rannala 2006). An appropriately constructed fossil calibration uses the oldest assigned fossil of a taxon as the basis for its minimum age and then constructs these other parameters around it (Benton and Donoghue 2007; Donoghue and Benton 2007). One key to improving the use of paleontological data is recognizing that this first step can be tied explicitly to one or a small set of museum specimens, creating a readily auditable chain of evidence. To minimize error and maximize clarity, all calibration data should be derived explicitly from specific fossil specimens. If links between calibration data and specimens cannot be made, then there are serious questions about the validity of the proposed time priors. In this respect, the fossil specimens used for calibrations represent a standard, much in the same way that a holotype specimen (or type series) is a taxonomic standard. In both cases, these specimens provide a necessary reference point for future inquiries. The explicit reporting of specimen data is just as crucial to the scientific integrity of a fossil calibration study as is making genetic sequences publicly available or reporting analytical methods. Thus, it is worthwhile to compile, reiterate, and expand on the caveats from previous studies that pertain to the construction and reporting of fossil calibrations (e.g., Graur and Martin 2004; Hedges and Kumar 2004; van Tuinen and Hadly 2004a, 2004b; Benton and Donoghue 2007; Donoghue and Benton 2007; Gandolfo et al. 2008; Parham and Irmis 2008; Benton et al. 2009; Ksepka 2009; Sanders et al. 2010) while providing a simple and explicit protocol (in checklist form) to address them. The checklist can be divided into two parts, justifying phylogenetic position (Steps 1–3) and justifying age (Steps 4 and 5). In most cases, the data needed to justify calibrations are rarely found in a single publication but tend to be spread across many. In addition to being derived from many sources, such information is rarely explicitly flagged as potentially valuable for calibrations. Therefore, a rigorous and explicit approach is needed for justifying the use of paleontological and geological data for divergence dating. The following steps can be used to develop new calibrations and as a checklist for vetting and justifying previously published calibrations based on fossils. If all five steps are fulfilled, then a calibration can be considered well justified. (1) Museum numbers of specimen(s) that demonstrate all the relevant characters and provenance data should be listed. Referrals of additional specimens to the focal taxon should be justified. (2) An apomorphy-based diagnosis of the specimen(s) or an explicit, up-to-date, phylogenetic analysis that includes the specimen(s) should be referenced. (3) Explicit statements on the reconciliation of morphological and molecular data sets should be given. (4) The locality and stratigraphic level (to the best of current knowledge) from which the calibrating fossil(s) was/were collected should be specified. (5) Reference to a published radioisotopic age and/or numeric timescale and details of numeric age selection should be given. (1) Museum Numbers of Specimen(s) that Demonstrate all the Relevant Characters and Provenance Data Should be Listed. Referrals of Additional Specimens to the Focal Taxon Should be Justified Ideally, a fossil used for calibration would be based on a single specimen that preserves all the characters that allow it to be unambiguously assigned to a clade. Single-specimen operational taxonomic units (OTUs) are preferable because, aside from rare mixed specimens, they are almost guaranteed to be from a single species. However, divergence dating studies that use paleontological data for calibrations usually rely on OTUs from phylogenetic analyses that are based on sets of specimens referred to a single taxon by various criteria. In some cases, the basis for a taxonomic referral can be as poor as documenting that the specimen was recovered from the same region or horizon where other specimens were previously reported. Consequently, “chimeric taxa” are a recurring problem in paleontology (Meyer-Berthaud et al. 1992; Padian 2000; Parham 2005). Because single-specimen fossil OTUs are not always possible, it is necessary to revisit the association and referral of specimens. It may be possible to refer specimens from different localities to a single taxon if there are overlapping diagnostic elements or even through phylogenetic analysis (Gandolfo et al. 1997; Yates 2003; Pol 2004; Boyd et al. 2009; Makovicky 2010). In cases where previously recognized OTUs cannot be objectively assembled, it is necessary to restrict the calibration to a subset of specimens (e.g., Danilov and Parham 2005) or eliminate the OTU from the calibration. (2) An Apomorphy-Based Diagnosis of the Specimen(s) or an Explicit, Up-to-Date, Phylogenetic Analysis that Includes the Specimen(s) Should be Referenced Incorrect phylogenetic placement of fossil calibrations can introduce large errors into divergence date estimates (Lee 1999; Brochu 2000; van Tuinen and Hedges 2004; Phillips et al. 2010). Fossil-calibrated dating studies rely on the paleontological literature for calibration placement but many of the putative oldest representatives of a lineage have never been included in a formal phylogenetic analysis. Gandolfo et al. (2008) identified several instances in which incorrect identifications and taxonomic assignments led to inappropriate fossil calibrations. This is a particular problem for clades that are understudied, represented by a sparse fossil record, and/or routinely overidentified (i.e., placed in a lower level taxon than the data can demonstrate) in the literature (e.g., Cenozoic amphibians and reptiles, Bever 2005; Bell et al. 2010; Sanders et al. 2010). The fact that different authorities may use the same taxon names to refer to different biological entities confounds the problem and may be particularly prevalent when addressing the fossil record of extant lineages. This is why we recommend the use of an apomorphy-based approach to identifying and phylogenetically placing specimens that are relevant for paleontological calibrations. These guidelines can also be applied to trace fossils (e.g., tetrapod footprints) in the case that their identifications are well supported and they show strong evidence for the antiquity of a lineage based on explicit apomorphies (Carrano and Wilson 2001; Li et al. 2008; Brusatte et al. 2011). Because fossils are incompletely preserved, many extinct species have controversial phylogenetic assignments. Given the analytical burden placed on paleontological data, it is imperative that up-to-date evidence supporting the taxonomic assignment of relevant OTUs be explicitly provided. A recurring pitfall is the understandable enthusiasm of paleontologists to report the oldest geological record of a clade, frequently based upon fragmentary evidence. This can be problematic on two counts. First, fragmentary remains often provide insufficient anatomical evidence to discriminate whether shared characters are products of convergence or common descent. Second, with fragmentary specimens, it can be difficult to distinguish whether the critical fossil belongs to the stem or the crown of the clade that it is being used to calibrate. By definition, the earliest stem members will possess the smallest subset of the diagnostic characters of the crown, and so assigning fragmentary fossils to either the crown or the stem of a clade requires detailed knowledge of character evolution that is not always available. Conversely, fossil specimens of crown clades may not be recognized as such because they lack one or more of the diagnostic characters as a consequence of taphonomy or secondary loss (Hennig 1981; Donoghue and Purnell 2009; Sansom et al. 2010). This issue is especially true for crown clades that are united on the basis of strong molecular evidence but for which limited morphological support is known (e.g., Afrotheria or Boreoeutheria among placental mammals; see Asher et al. 2009). This problem is also likely to occur in poorly represented basal taxa of lineages that underwent substantial morphological evolution long after their origin. In those cases, the taxa that might be of greatest interest in constraining the time of divergence from the nearest living relative may be difficult to identify. These complexities underscore the need to carefully justify the phylogenetic placement of any specimen used for calibrations. It is not enough to cite a paper that merely mentions the taxon or specimen(s) because the strictness of criteria used in the reported phylogenetic placement of fossils varies among authors (especially when it comes to fragmentary, undescribed, and/or unanalyzed specimens). The phylogenetic position of a fossil taxon can be unstable even when relatively complete specimens are available. Therefore, a thorough knowledge of the paleontological literature is required to make sure that the most recent and/or valid study is being cited. After all, claims about the oldest member(s) of a lineage may change as new data and analyses are published. A good example of this phenomenon is the case of the putative oldest placental mammals, the zhelestids (Archibald 1996). Zhelestids are Cretaceous mammal fossils that were initially hypothesized to be nested deeply within the crown clade of modern orders of placental mammals (Eutheria), the rest of which do not appear until the Cenozoic. In more recent analyses, zhelestids have been steadily moving down the tree (Archibald et al. 2001) and now are hypothesized to be on the stem of Eutheria (Luo and Wible 2005) where they offer no evidence about a minimum date for crown Eutheria. This stemward change in phylogenetic position arose from increasing clarity about the relationships of mammalian orders rather than from correcting errors in earlier morphological study or discovery of better specimens. All three phenomena—new specimens, new interpretations of existing specimens, and phylogenetic revisions—can lead to major revisions in the phylogenetic placement of fossils. Existing databases such as the Paleobiology Database (www.pbdb.org) may contain detailed taxonomic, geographic, geologic, and stratigraphic information associated with fossil specimens, but relevant phylogenetic information justifying the taxonomic placement of these individual specimens is usually lacking. Moreover, rates of polyphyly in mammalian and molluscan morphotaxa were recently documented to be as high as 19% (Jablonski and Finarelli 2009), illustrating the risks of uncritically accepting taxonomic allocations represented in large scale databases (as well as the need to construct databases following our specimen-based protocol). Whereas existing databases are extremely useful for identifying the potential oldest specimens assignable to a given clade, explicit, apomorphy-based information is still necessary to justify the phylogenetic position of a specimen for calibration. (3) Explicit Statements on the Reconciliation of Morphological and Molecular Data Sets Should be Given In the best cases, fossil specimens possess unambiguous apomorphies that allow them to be assigned to a single extant lineage with confidence. In these instances, assigning fossils to nodes is straightforward. Regardless of the tree topology, the fossil will track the extant lineage and serve as a candidate calibration for all nodes in which it is nested (Fig. 1, Example 1; see, e.g., Smith 2010). In other cases, the position of a fossil is supported by ambiguous apomorphies (i.e., homoplastic characters) and is therefore highly dependent on the topology of a specific analysis. In addition to the changing position of a taxon given different morphological analyses (see 2 above), any discrepancy between topologies of morphological and molecular phylogenetic analyses is a potential pitfall that has been underemphasized (Benton et al. 2009; Lyson et al. 2010; Wiens et al. 2010). Different topologies from morphological and molecular analyses can affect fossil calibrations in several ways. In some cases, the placement of a fossil may become ambiguous (Fig. 1, Example 2) leading to uncertainty about which node(s) it can be used to calibrate. If morphological data show high levels of homoplasy, the polarization of morphological characters also may be sensitive to shifting topologies (Fig. 1, Example 3). Different topologies imply different hypotheses of character evolution, potentially impacting the placement of fossils in a tree (Asher et al. 2005; Cadena et al. 2012). Unless morphological and molecular trees are in agreement, the phylogenetic position of a fossil cannot be automatically transferred to a molecular-based topology. Therefore, merely citing a morphological phylogeny that places a fossil taxon (i.e., 2) is insufficient justification for a fossil calibration. FIGURE 1. Example 1: A fossil (†) with unambiguous synapomorphies can be assigned to a specific lineage (D) with confidence. Regardless of the topology, the fossil will track the extant lineage and serve as a candidate calibration for all nodes above which it is nested. Example 2: Competing phylogenetic hypotheses from different data sets can change the position of fossil calibrations. In the morphological analysis, a fossil is found to be closely related to lineages C and D. Two arrows show the nodes that the fossil could calibrate. A molecular study with a different topology separates lineages C and D, making the placement of the fossil ambiguous. If the fossil is closely related to C, then it could calibrate three nodes. If the fossil is closely related to D, then it is a candidate calibration for just one node. Example 3: Changes to outgroup topology can change the polarization of morphological characters and placement of fossils. In the morphological analysis, a fossil (†) is placed in the C + D clade, sister to D. A molecular analysis changes the relationships of the outgroups (A and B). In a combined analysis, the morphological characters for the C + D clade are polarized in a different way and so using the fossil to calibrate clade C + D would be inappropriate. Some problems of incongruent morphological and molecular topologies can be mitigated by either “total evidence” (sensu Kluge 1989) analyses (e.g., Brochu 1997; Hermsen and Hendricks 2008; O'Leary and Gatesy 2008; Ksepka 2009) or through the use of a “molecular scaffold” in resolving morphological character distribution and, therefore, the phylogenetic position of species known only from fossils (e.g., Springer et al. 2001; Danilov and Parham 2006). Both those approaches incorporate, and therefore explicitly attempt to reconcile, the morphological data from fossil specimens with the topologies of molecular analyses though they make different assumptions about the accuracy of molecular versus morphological data. These methods do not solve every problem, so a conservative approach to calibrating analyses based on poorly supported or controversial placements is warranted. In some cases, it may be conceivable that the morphological and molecular data sets are so incongruent that neither a total evidence nor a molecular scaffold approach are sufficient for reconciling the position of an extinct taxon. For example, given current uncertainty concerning the phylogenetic position of turtles among amniotes, any use of the oldest fossil turtle specimens to calibrate amniote branching events has a two-thirds probability of introducing error into the analysis (see Lyson et al. 2010, Lyson et al. 2012). We recommend against using such controversial OTUs to calibrate divergence dating analyses. (4) The Locality and Stratigraphic Level (to the Best of Current Knowledge) from which the Calibrating Fossil(s) Was/Were Collected Should be Specified Unless they are subjected to direct radioisotopic analysis (which is rarely possible), the provenance of specimens used for calibrations must be documented. The accuracy with which a particular fossil can be located to a specific level in a stratigraphic column varies but depends largely on how detailed the locality data are. It might be constrained to a discrete bed in a measured stratigraphic section, or a geologic formation or group, or a depositional basin. Many specimens, especially those collected more than 50 years ago or those derived from the commercial trade, lack detailed stratigraphic and geographic occurrence data and so have limited value for calibration purposes. Almost any fossil found in situ can be assigned to its source rock unit and often to a particular stratigraphic level within that unit. In the best cases, calibration data will be based upon fossils with precise locality information and stratigraphic context that can be assigned to a particular meter level in a chronostratigraphically well-studied section (Fig. 2). The accuracy with which a fossil can be placed within a stratigraphic framework will have a major impact on estimates of its relative (stratigraphic) and numeric (absolute) age, particularly in light of improvements in correlation, revisions of stratigraphy, and refinements in geochronology. Geologic units (e.g., groups, formations, and members) are the key lithostratigraphic units used by field geologists to correlate and divide the sedimentary rock sequence in a geographic region; they generally have formal names (e.g., Willwood Formation, Fig. 2) and explicitly defined bases and tops. FIGURE 2. Every fossil taxon has geographic and geological contexts that provide a basis for determining its age. The example given here is for Diacodexis ilicis. Depending on the phylogeny used, D. ilicis can be a useful minimum calibration for artiodactyl mammals. Six specimens of D. ilicis are known (Gingerich 1989) and the holotype, UM (University of Michigan) 87854, is among the oldest well-dated specimens. UM 87854 is from the Clarks Fork depositional basin in northern Wyoming. Within the Clarks Fork Basin, it is from the Willwood Formation. Within the Willwood Formation, it is from Locality UM SC-67. Locality UM SC-67 is part of a well-studied stratigraphic section for the Early Eocene. Within the Early Eocene, Locality UM SC-67 can be placed in the Wasatchian Land-Mammal Age. Within the Wasatchian, Locality UM SC-67 can be assigned to the biozone Wa-0 and occurs within a global negative carbon isotopic excursion. Wa-0 spans the latter part of this carbon isotope excursion and is inferred to represent ∼95 ky in the stratigraphic section, where UM 87854 occurs (Abdul Aziz et al. 2008); the entire global carbon isotope excursion is currently dated to 55.65–55.93 on the basis of radioisotopic ages and orbital tuning methods based on the earth's precessional cycles (Westerhold et al. 2009), giving specimen UM 87854 a minimum age of 55.65 Ma. Geologic units are never of uniform scale, whether in terms of thickness or geographic extent, because they merely represent mappable units of distinctive rock types. Most importantly, rock units do not represent equal units of time—some rock units may be deposited geologically instantaneously, whereas others might represent millions of years with different portions of the total time range represented at particular outcrops. Nor do the boundaries between lithologic units necessarily coincide with geochronologic divisions (i.e., units of geologic time). But the assignment of a fossil to a named geologic rock unit provides a fixed standard of the relative age of the fossil that can then be used to establish a numeric age as outlined below (5). Stratigraphy is not a static field. Episodically, stratigraphic nomenclature is revised or entirely redefined with the establishment of new “type sections,” and new lithostratigraphic or biostratigraphic schemes proposed. New descriptions and correlations can lead to refined interpretations of the geologic unit present at a particular geographic locality (e.g., Martz and Parker 2010). The dynamic nature of stratigraphy highlights the importance of detailed geographic locality information for fossil specimens in order to determine the impact of revised stratigraphic interpretations, correlations, and geochronologies upon divergence dating calibrations and, ultimately, divergence time estimates. (5) Reference to a Published Radioisotopic Age and/or Numeric Timescale and Details of Numeric Age Selection Should be Given Divergence dating analyses require numeric ages, but paleontologists do not routinely use or report numeric ages. The numeric age of a fossil is generally outside the purview of most paleontologists' research interests for two reasons. First, the geochronologic data required for numeric dates can be difficult to establish for a particular rock unit and geographic locality. Second, though geochronologies evolve, named rock units change much less frequently and so provide a more stable albeit relative comparative framework for reporting fossil occurrences. The translation of fossil occurrences to numeric ages frequently involves a daisy chain of correlations through different geographic localities on the basis of overlapping geological and paleontological evidence (e.g., van Tuinen and Hadly 2004a; Benton et al. 2009; Smith 2011). However, for the vast majority of calibrations, this translation is not explained, meaning the actual numbers used in calculations are not adequately justified. The numeric age of a fossil is not necessarily stable, particularly if it is established through correlation rather than through direct dating at the section in which the fossil was found. Any numeric age for a fossil specimen is merely the best current estimate and can be refined through time. For example, radioisotopic dating methods have improved dating precision by roughly an order of magnitude in the past 20 years as a result of new methods, recalibration of standards, and cross-testing among existing methods (e.g., Mundil et al. 2004; Erwin 2006; Renne et al. 2010). 40Ar/39Ar and U-Pb ages differ systematically by ∼1%, something that requires correction prior to comparison (e.g., Renne et al. 2010). Because of this ongoing refinement, it is important to fully explain the basis upon which the numeric age is established. If the chain of inference is explicit, the consequences of revisions will be easily identified. At its most basic level, our recommendation for justifying the numeric age of a calibration point is that the translation of relative intervals from paleontological studies should reference geochronological literature or published timescales that include numeric ages (e.g., Hess and Lippolt 1986; Menning et al. 2000; Gradstein et al. 2004; Ogg 2010; Walker and Geissman 2009). Of course, even compiled geologic timescales rely on some interpolation, are themselves constantly undergoing revision, and can become obsolete. Referencing these timescales makes it easier for later workers to revise reported ages. A second part of this step in the protocol involves the logistical interpretation of the numeric age from the geological timescale. For a minimum age constraint, the youngest age interpretation of the fossil should be used (i.e., the uppermost limit of the relevant time interval) rather than the common practice of adopting a midpoint in the possible range. Because a fossil necessarily postdates the origination of the lineage to which it is assigned, choosing the youngest possible age from an interval will necessarily bias the minimum further from the true age of origination. However, it is important to recognize that the minimum age is only one end pointof a constraint and is meant to partially bracket, not approximate on its own, the age of origination. Therefore, the minimum age should accommodate the youngest possible age of the fossil including the error associated with the geochronologic age (van Tuinen et al. 2004; Donoghue and Benton 2007; Benton and Donoghue 2007; Benton et al. 2009). This youngest possible age should be applied as a hard minimum. The logic behind assigning hardminima based on the youngest possible age of the oldest-known fossil has been discussed extensively (e.g., van Tuinen et al. 2004; Benton and Donoghue 2007; Donoghue and Benton 2007). Some authors may still choose to use soft minima in cases of hypothesized anagenesis or geologic uncertainty, but such instances require careful justification. The arbitrary assignment of a minimum age that postdates the stated youngest estimates for a fossil should be avoided. The justification for arbitrarily expanding the interval might appeal to a conservative bias, but when paleontological data are properly established and justified that practice serves only to introduce unnecessary error into the analysis. In some cases, either because of poor correlations or poorly documented provenance, the age of a fossil may not be well constrained beyond a very broad stratigraphic interval. But in many cases, it is possible to determine much more precise and accurate dates than are given by a stratigraphic interval. Those data may not be available in the publications describing the fossil specimens used for calibrations, and so it is usually necessary to compile evidence from multiple studies. Anatomically trained fossil systematists may not be able to retrieve those data any more easily than molecular systematists, but by listing the specimen numbers, rock units, and ages in a standardized way, others may check the claim, thus facilitating the refinement of numeric dates over time. Useful Discussions In addition to the five steps of the specimen-based protocol, we recommend that authors include some discussion about the history of each node that addresses rejected or obsolete calibrations. Such detailed discussions of calibrations already exist in some papers (e.g., Benton and Donoghue 2007; Hurley et al. 2007; Benton et al. 2009). These summary discussions make it easier for others to assess the justification by highlighting the relevant literature and argumentation. We should expect that through discovery, description, critique, and phylogenetic/stratigraphic analysis that even the best-justified calibrations would eventually be refined or even dramatically changed. In order to facilitate the evolution of justifications, we recommend that explanatory discussions (or citations of such discussions) should become a standard part of calibration reporting. Other Parameters The justification of the phylogenetic position and age of a fossil is an important first step to calibrating a node in a divergence dating analysis. In addition to determining what nodes can even be assigned time priors (some may not have useable fossils), this step provides the most tangible data from the fossil record: the hard minimum bound of a calibration interval. The maximum bound and the distribution of probabilities within the minimum–maximum interval are also ostensibly based on the fossil record, but in a much more complex way, because they describe probability of origination before the oldest known fossil. The idiosyncratic nature of these other parameters precludes us from developing a standard protocol for them. Ideally, the maximum constraint is established as older than all the oldest possible records, extending back to encompass a time when the ecologic, biogeographic, geologic, and taphonomic conditions for the existence of the lineage are met, but no records are known. For the maximum bound, an intuitive approach that takes into account preservation potential and phylogenetic bracketing has been proposed (e.g., Reisz and Müller 2004a; Müller and Reisz 2005; Benton and Donoghue 2007; Donoghue and Benton 2007; Benton et al. 2009). This approach is borrowed and developed from the fossil recovery potential function established by Marshall (1997). Researchers who use this intuitive approach should provide detailed arguments justifying their decisions so that others can evaluate them and, following the arguments of Benton and Donoghue (2007) and Ho and Phillips (2009), the maximum bounds should be soft and liberal. Most studies use a Bayesian framework for estimating divergence dates with probability curves between minimum and maximum bounds. In theory, such complex, parameter-rich priors may be better models of the fossil record, but there is presently no practical way to estimate curve parameters (Ho and Phillips 2009). Lee and Skinner (2011) note, “current practice often consists of little more than educated guesswork.” A review of recent studies shows that these parameters are usually not justified (Warnock et al. 2012). The implications of these choices are only recently being explored (Inoue et al. 2010; Clarke et al. 2011; Lee and Skinner 2011; Warnock et al. 2012). But the fact that a widely applied methodology is subjected to such ambiguous assumptions that have a major impact on results (Clarke et al. 2011, Warnock et al. 2012) is a major limitation of molecular divergence dating studies. The development of objective methods for estimating maximum bounds and probability curves should be a priority (see Future Directions section). EXAMPLES OF FOSSIL CALIBRATIONS In order to demonstrate the application of our specimen-based protocol, we apply it to two widely used calibrations in the vertebrate scion of the tree of life: the crocodile–bird node (Archosauria, below) and the human–chimpanzee node (Hominini, online Appendices 1, 2 (available from http://www.sysbio.oxfordjournals.org/)). A survey of the literature shows variation among the interpretations of the paleontological data for these nodes (Fig. 3). The application of the specimen-based protocol to these “classic” nodes results in new hard minima. We also provide examples of our recommended node calibration discussions as well as maximum bounds, the latter following the approach of Benton and Donoghue (2007). FIGURE 3. Examples of variation among constraint ages for two commonly calibrated nodes. Dark circles show point calibrations, lines show constraint intervals. Top: Archosauria, crocodile–bird node. Bottom: Homini, human–chimpanzee node. The data for this figure can be found in online Appendix 2. Example from Archosaurian Reptiles (Crocodile–Bird Node: 247.2 Ma Hard Minimum, 256 Ma Soft Maximum) (1) Museum numbers of specimen(s) that demonstrate all the relevant characters and provenance data should be listed. Referrals of additional specimens to the focal taxon should be justified.— The Institute of Vertebrate Paleontology and Paleoanthropology (IVPP) V 6026, holotype of Xilousuchus sapingensis (Wu 1981). (2) An apomorphy-based diagnosis of the specimen(s) or an explicit, up-to-date, phylogenetic analysis that includes the specimen(s) should be referenced.— Nesbitt et al. 2011 conclusively demonstrates that IVPP V 6026 belongs along the stem of Crocodylia (in the clade Pseudosuchia). Specifically, IVPP V 6026 preserves a number of synapomorphies placing it in the basal clade Poposauroidea, which also includes Arizonasaurus babbitti (Nesbitt et al. 2011), a previously proposed calibration point for Archosauria. As discussed by Nesbitt et al. 2011, IVPP V 6026 can be placed in Archosauria because it possesses the following synapomorphies: palatal processes of the maxilla meet at the midline, external foramen for the abducens nerve is contained wholly within the prootic, and the dorsolateral margin of the posterior process of the maxilla preserves an antorbital fossa. It is placed within Poposauroidea because it possesses: a posterodorsal process of the premaxilla that is restricted to the ventral border of the external naris, the anterodorsal margin of the maxilla borders the external naris, a concave anterodorsal margin is present at the base of the dorsal process of the maxilla, foramina for entrance of cerebral branches of internal carotid artery into the braincase are positioned on the ventral surface, and it lacks the distal expansion of neural spines of the dorsal vertebrae (Nesbitt et al. 2011). (3) Explicit statements on the reconciliation of morphological and molecular data sets should be given.— The morphological hypothesis of Nesbitt et al. (2011) is concordant with molecular hypotheses of higher level tetrapod relationships that support a monophyletic Archosauria (e.g., Zardoya and Meyer 1998; Hedges and Poling 1999; Kumazawa and Nishida 1999; Rest et al. 2003; Iwabe et al. 2005; Hugall et al. 2007; Alfaro et al. 2009; Becker et al. 2011; Shen et al. 2011; Lyson et al. 2012). Recently, several molecular data sets have recovered support for a novel turtle–crocodilian clade (Hedges and Poling 1999; Mannen and Li 1999; Cao et al. 2000; Shedlock et al. 2007) or a novel turtle–bird clade (Cotton and Page 2002). However, support for these topologies over an alternative where turtles are the sister taxon to a monophyletic Archosauria is often weak (Cao et al. 2000; Iwabe et al., 2005; Katsu et al. 2009). The majority of recent molecular analyses support a monophyletic Archosauria (Iwabe et al. 2005; Hugall et al. 2007; Alfaro et al. 2009; Katsu et al. 2009; Lyson et al. 2012), and placement of turtles as archosauriforms or archosaurs does not affect the oldest potential calibration point or the phylogenetic placement of IVPP V 6026. (4) The locality and stratigraphic level (to the best of current knowledge) from which the calibrating fossil(s) was/were collected should be specified.— Heshanggou Formation, Hazhen Commune, Fugu County, northeast Shaanxi Province of China (Wu 1981). (5) Reference to a published radioisotopic age and/or numeric time scale and details of numeric age selection should be given.— Although the Heshanggou Formation preserves vertebrate fossils that are thought to be correlative to other Early Triassic assemblages in Pangaea (Shu and Norris 1988; Rubidge 2005; Nesbitt et al. 2011), what is most relevant is that the formation preserves an extensive palynofloral (pollen and spore) assemblage that is unambiguously Early Triassic in age (Shu and Norris 1988) and easily correlative to other Early Triassic palynofloras worldwide (cf. Lindström and McLaughlin 2007; Kürschner and Herngreen 2010). Furthermore, the Heshanggou Formation also preserves the plant macrofossil taxon Pleuromeia sternbergii (Shu and Norris 1988), a classic Early Triassic disaster taxon found in abundance throughout Pangaea (Wang 1996; Looy et al. 2001; Grauvogel-Stamm and Ash 2005; Krassilov and Karasev 2009). The palynofloral data specifically suggest that these strata were deposited during the Olenekian Stage (late Early Triassic) (Shu and Norris 1988). The entire Early Triassic is well sampled with high-precision U-Pb radioisotopic ages, so the duration of the Olenekian is well constrained to 251.3–247.2 Ma (Mundil et al. 2010). Discussion.— The bird–crocodile split (crown groups Aves–Crocodylia and total groups Ornithodira–Pseudosuchia) is one of the fundamental calibrations for tetrapod vertebrate studies (e.g., Hugall et al. 2007; Alfaro et al. 2009) because it often serves as an external calibration for both squamate and avian molecular analyses (e.g., Paton et al. 2002; Pereira and Baker 2006; Kumazawa 2007; Okajima and Kumazawa 2010) and is relevant to well-used model organisms in developmental biology (Benton and Donoghue 2007). Early analyses used a secondary calibration for this divergence, which has been rightly criticized (Graur and Martin 2004; Müller and Reisz 2005). Fossils on the lineage to birds (Ornithodira) have not generally been used for calibration because, until recently, they postdated by 10–15 myr the earliest putative fossils from the crocodylian total group (Pseudosuchia) (Müller and Reisz 2005; Benton and Donoghue 2007). Müller and Reisz (2005) proposed an age of 243–251 Ma for this divergence, based on the presence of the pseudosuchian Ar. babbitti during the lower Anisian stage of the Middle Triassic (Nesbitt 2003, 2005), which is 245–247 Ma using recent high-precision U-Pb radioisotopic age data (Mundil et al. 2010). Ar. babbitti is phylogenetically well constrained as a member of the Pseudosuchia (e.g., Nesbitt 2003; Nesbitt and Norell 2006; Nesbitt 2007; Weinbaum and Hungerbühler 2007; Nesbitt et al. 2011). However, its geologic age is less secure. Ar. babbitti is from the Holbrook Member of the Moenkopi Formation in Arizona, U.S.A.; this has been dated to the Anisian using long-distance vertebrate biostratigraphic correlations (Morales 1987; Lucas and Schoch 2002), and this is consistent with very limited magnetostratigraphic (Steiner et al. 1993) and radioisotopic age data (Dickinson and Gehrels 2009). Unfortunately, Triassic vertebrate biostratigraphy is in constant flux (e.g., Rayfield et al. 2005, 2009; Irmis et al. 2010), so the age assignment of Ar. babbitti is not particularly robust. Nonetheless, accepting an Anisian age for this taxon, it would give a minimum age of divergence for the bird–crocodile split of 242–247 Ma (Mundil et al. 2010). Some studies have followed this calibration (e.g., Hugall et al. 2007), whereas Benton and Donoghue (2007) proposed an age of 235–250.4 Ma for this divergence, based on the presence of the archosauriform Vjushkovisaurus berdjanensis from the ?Anisian of Russia. V. berdjanensis is a poor choice for calibrating the bird–crocodile split because it has never been included in a phylogenetic analysis, and there is morphological character evidence suggesting that it is a basal archosauriform outside of crown Archosauria (i.e., phylogenetically predates the bird–crocodile divergence) (Gower and Sennikov 2000). Nesbitt et al. (2010) recently described Asilisaurus kongwe from the Anisian (Middle Triassic) of Tanzania, a basal dinosauromorph that is the oldest representative of the stem lineage of birds. Although phylogenetically well constrained, this fossil is no older than Ar. babbitti. It also suffers similar problems in geologic dating; the age of As. kongwe is only constrained by long-distance vertebrate biostratigraphy (see Nesbitt et al. 2010 supplementary information). Brusatte et al. (2011) recently described putative basal dinosauromorph footprints from the Early Triassic, but these records face the same identification problems as other ichnofossils (see above) and are no older than IVPP V 6026. IVPP V 6026, from the Early Triassic Heshanggou Formation of China, was first described as a basal archosauriform, outside of crown Archosauria. Gower and Sennikov (1996a, 1996b, 1997) agreed with this basal phylogenetic position; however, reevaluation of the specimen by Nesbitt et al. (2011) conclusively demonstrates that it belongs to the Pseudosuchia. This absolute age of IVPP V 6026 therefore provides a minimum constraint for the bird–crocodile split. Soft maximum age constraint.— Justification for the maximum age constraint for Archosauria is difficult because recent fossil discoveries have steadily pushed back the age of divergence for this node. Current fossil evidence suggests that Archosauria diverged during the earliest Triassic based on the fact that a number of lineages must have diverged prior to the Olenekian (Nesbitt et al. 2011), but this does not eliminate the possibility that older representatives of Archosauria will be discovered. Non-archosaur archosauriforms appear to have achieved a global distribution by the end of the Early Triassic, with taxa present in Eastern Europe, Asia, South America, South Africa, and Antarctica (Smith et al. 2011). Despite this widespread distribution, which includes relatively well-sampled assemblages (e.g., the South African Lystrosaurus Assemblage Zone), X. sapingensis remains the only Early Triassic archosauriform with well-established archosaur affinities. We suggest that the age of the oldest basal archosauriform (part of the archosaur stem lineage, but basal to the avian-crocodylian divergence) Archosaurus rossicus provides a conservative estimate for a soft maximum because it is the oldest archosauriform fossil known, and pre-dates by several million years all putative crown archosaur fossils. A. rossicus is clearly identifiable as a basal archosauriform (Gower and Sennikov 2000) and is dated to the Changhsingian (latest Permian) by palynofloral biostratigraphy (Krassilov and Karasev 2009). The Changhsingian is dated by high-precision U-Pb ages to 256–252.3 Ma. Given that we do not know exactly what part of the Changhsingian A. rossicus is from, we suggest that a conservative soft maximum for Archosauria would be 256 Ma. Using the Checklist to Test Calibrations Previous studies identified inappropriate calibrations that have introduced error into divergence dating analyses (e.g., Graur and Martin 2004; Gandolfo et al. 2008; Ksepka 2009; Sanders et al. 2010). In order to show how the application of the specimen-based protocol can identify inappropriate calibrations, we include two examples (from boid snakes and charadriiform birds; online Appendix 3). In the boid snake example, the published minimum age (55 Ma) cannot be substantiated with specimen-based evidence so we recommend a much younger minimum age (16.0 Ma). Similarly, in the charadriiform bird example, the published minimum age (28.4 Ma) cannot be substantiated with specimen evidence. In fact, we cannot identify another specimen that will satisfy all the steps of the protocol for that node and so recommend that future workers do not calibrate it. It is not certain that errors like these would be identified by cross-validation studies, but even then if they were it would not be clear why the fossil data were incongruent. Regardless, vetting calibrations beforehand is clearly preferable to including poorly substantiated data into any analysis. The checklist is an important first step to identifying other incorrect calibrations and establishing more reliable time priors. FUTURE DIRECTIONS Better-justified calibrations are more likely to be free of error, which will make molecular divergence dates more accurate and therefore provide greater rigor in testing hypotheses. A specimen-based protocol will focus attention on discrepancies between the fossil record and published calibration points, making it easier for later researchers to identify and correct errors and refine calibrations as new data come to light. Standardizing the reporting of data in publications (e.g., our checklist) is a crucial first step. In addition to providing this tool, we identify some unresolved issues such as the need for more objective methods for selecting some parameters of time priors (the maximum constraint and probability curves) and the difficulty associated with compiling information from traditionally separate disciplines. In both cases, solutions can be found in bioinformatic approaches. Objective Calibration Parameters Following our checklist protocol will help workers identify the oldest certain fossil of a lineage that can anchor a time prior with an objective hard minimum age. But diagnostic fossils always postdate the origination time of the lineage they represent (e.g., Marshall 1990; Norell 1992; Benton and Ayala 2003; Benton and Donoghue 2007). The probability of origination before the oldest certain fossil is what the other Bayesian calibration parameters attempt to estimate. Therefore, objectively establishing these parameters requires quantitative estimates of factors that would contribute to the nonpreservation of a lineage. Inoue et al. (2010, p. 74) highlight “the need for probabilistic modeling of fossil depositions, preservations, and sampling to provide statistical summaries of information in the fossil record.”. The amount of effort required to create rigorous paleontological databases suitable for calculating priors is intimidating. To date, few studies have integrated databases of fossil occurrences with measurements of rock record bias at small taxonomic or geographic scales (Holland 2003; Benton et al. 2004; Uhen and Pyenson 2007; Marx 2009) and never for the purpose of developing time priors. This approach was extended to calculate Bayesian “confidence intervals” (sensu Strauss and Sadler 1989; Hedman 2010; see, e.g., Friedman and Brazeau 2010) on origination dates that could be adapted as Bayesian priors for divergence dating studies, but we do not know of any studies that have done this yet. Another method permits estimation of origination time posteriors based on the temporal distribution of crown and stem members, which were then used as time priors for divergence dating (Wilkinson et al. 2011). The development and comparison of these and other methods to objectively estimate time prior parameters should be a priority for the divergence dating effort. Even then, the utility of any method will be limited by the accessibility of relevant data. Obtaining genetic sequences is not the limiting factor, but the synthesis will require substantially more input from paleontologists in order to collate, vet, and organize the data from the fossil record. Rapid Access to Information A common thread among the problems we address is the difficulty associated with compiling information from traditionally separate disciplines. For example, even the first step of the specimen-based protocol, listing specimen numbers and justifying referrals, is a daunting challenge for a molecular biologist wanting a vetted time-prior for their study. Such challenges can be met through collaboration or by citing a study that has already applied the protocol. Both solutions are not only effective but also introduce logistical difficulties and delays. Ideally, these data could be collated ahead of time or at least made more accessible. The next step is to create Internet resources that facilitate the storage, retrieval, and explication of paleontological calibration data similar to the way that molecular sequence data are archived on GenBank. The Internet is an ideal platform for this endeavor as exemplified by the Date-a-Clade Website (www.fossilrecord.net/dateaclade/index.html) based on Benton and Donoghue (2007). We can imagine a broader, dynamic, community-contributed instar of that resource (Ksepka et al. 2011) that is linked to other online clearinghouses of biological data such as the Paleobiology Database (www.paleodb.org), Morphobank (www.morphobank.org), TimeTree Database (timetree.org), and the Encyclopedia of Life (www.eol.org). We encourage paleontologists and geochronologists to take a more proactive role by providing data that match the steps outlined above, engaging in collaborative studies, and contributing to efforts to provide these data to their neontologist colleagues. Paleontologists have important incentives to contribute streamlined calibration data for divergence dating. If paleontological data can be elevated to their proper position in this synthesis, it will result in more citations and funding. Raising the bar to expect morphological phylogenies to be explicitly reconciled with molecular analyses will encourage the comparison of phylogenetic signals arising for different classes of data and highlight potential areas for future research (e.g., differences in rates of morphological and molecular evolution, relationships between diversification and disparity). The recommendations to explicitly justify numeric ages will further integrate stratigraphy and geochronology with anatomical systematics. The need for more objective ways to characterize soft maximum dates should stimulate the development of bioinformatic methods for better quantifying the fossil record. Beyond cleaning up suspect data, refocusing on the fossils will catalyze synthesis among different fields. By bringing all the neglected facets of paleontology to the fore, we can build a new community of interdisciplinary scholars eager to develop a more holistic and rigorous approach to the study of evolution and the history of life. SUPPLEMENTARY MATERIAL Supplementary appendices can be found online in the Dryad data repository (doi:10.5061/dryad.m7n455k0). FUNDING This work was supported by the John D. and Catherine T. MacArthur Foundation funding of the Biodiversity Synthesis Group of the Encyclopedia of Life and a BioSynC Synthesis Meeting (to J.F.P.); Natural Environment Research Council (to M.J.B. and P.C.J.D.); The Biotechnology and Biological Science Research Council (to P.C.J.D., Z.Y., and J.G.I.); Grants from the Leverhulme Trust (to M.J.B.); continuing work is supported by the National Evolutionary Synthesis Center (NESCent) (National Science Foundation no. EF-0905606). Funding to pay the Open Access publication charge was provided through an award from the University of Utah J. Willard Marriott Library Open Access Publishing Fund.