+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      DNA Barcoding: Promise and Pitfalls


      PLoS Biology

      Public Library of Science

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          In this issue of PLoS Biology, Hebert et al. (2004) have set out to test the resolution and performance of “DNA barcoding,” using a single mtDNA gene, cytochrome c oxidase I (COI), for a sample of North American birds. Before turning to details of this study, it is useful as context to consider the following questions: What is DNA barcoding, and what does it promise? What is new about it? Why is it controversial? What are the potential pitfalls? Put simply, the intent of DNA barcoding is to use large-scale screening of one or a few reference genes in order to (i) assign unknown individuals to species, and (ii) enhance discovery of new species (Hebert et al. 2003; Stoeckle 2003). Proponents envisage development of a comprehensive database of sequences, preferably associated with voucher specimens representing described species, against which sequences from sampled individuals can be compared. Given the long history of use of molecular markers (e.g., allozymes, rDNA, and mtDNA) for these purposes (Avise 2004), there is nothing fundamentally new in the DNA barcoding concept, except increased scale and proposed standardization. The former is inevitable. Standardization, i.e., the selection of one or more reference genes, is of proven value in the microbial community and in stimulating large-scale phylogenetic analyses, but whether “one gene fits all” is open to debate. Why, then, all the fuss? Initial reactions to the DNA barcoding concept have ranged from unbridled enthusiasm, especially from ecologists (Janzen 2004), to outright condemnation, largely from taxonomists (e.g., see the February 2003 issue of Trends in Ecology and Evolution). The former view reflects a real need to connect different life history stages and to increase the precision and efficiency of field studies involving diverse and difficult-to-identify taxa. The criticisms are mainly in response to the view that single-gene sequences should be the primary identifier for species (“DNA taxonomy”; Tautz et al. 2002; see also Blaxter 2004). At least for the macrobiota, the DNA barcoding community has moved away from this to emphasize the importance of embedding any large-scale sequence database within the existing framework and practice of systematics, including the importance of voucher specimens and of integrating molecular with morphological characters. Another point of contention—that DNA barcodes have limited phylogenetic resolution—arises from confusion about the scope of inference. At best, single-gene assays can hope to identify an individual to species or reveal inconsistencies between molecular variation and current perceptions of species boundaries. DNA barcoding should not be confused with efforts to resolve the “tree of life.” It should connect with and benefit from such projects, but resolving phylogeny at scales from species to major eukaryotic clades requires a very different strategy for selecting genes. Indeed, the very characteristic that makes the COI gene a candidate for high-throughput DNA barcoding—highly constrained amino acid sequence and thus broad applicability of primers (Hebert et al. 2003)—also limits its information content at deeper phylogenetic levels (e.g., Russo et al. 1996; Zardoya and Meyer 1996; Naylor and Brown 1997). Finally, while superficially appealing, the very term DNA barcoding is unfortunate, as it implies that each species has a fixed and invariant characteristic—like a barcode on a supermarket product. As evolutionary biologists, we should question this analogy. In evaluating the promise and pitfalls of DNA barcoding, we need to separate the two areas of application: molecular diagnostics of individuals relative to described taxa, and DNA-led discovery of new species. Both are inherently phylogenetic and rely on a solid taxonomic foundation, including adequate sampling of variation within species and inclusion of all previously described extant species within a given genus. Accurate diagnosis depends on low intraspecific variation compared with that between species, such that a short DNA sequence will allow precise allocation of an individual to a described taxon. The extensive literature on mtDNA phylogeography (Avise 2000) indicates that this condition often holds, although there are exceptions. Furthermore, within many species there is sufficient structure that it will be possible to allocate an individual to a particular geographic population. Such identifications should be accompanied by a statement of confidence—e.g., node support in a phylogenetic analysis and caveats in relation to the breath of sampling in the reference database (e.g., whale forensics; Palumbi and Cipriano 1998). DNA-led species discovery is more contentious, but again is not new. In animals, inclusion of mtDNA evidence in biogeographic and systematic analyses often reveals unexpected diversity or discordance with morphology, which then prompts re-evaluation of morphological and ecological characteristics and, if warranted, taxonomic revision. But, despite recent proposals (Wiens and Penkrot 2002; Hebert et al. 2004), it does not follow that mtDNA divergence should be a primary criterion for recognizing species boundaries (see also Sites and Marshall 2003). Potential limitations of using mtDNA to infer species boundaries include retention of ancestral polymorphism, male-biased gene flow, selection on any mtDNA nucleotide (as the whole genome is one linkage group), introgression following hybridization, and paralogy resulting from transfer of mtDNA gene copies to the nucleus. These are acknowledged by Hebert et al. (2004) and well documented in the literature (Bensasson et al. 2001; Ballard and Whitlock 2004), including that on birds (Degnan 1993; Quinn and White 1987; Lovette and Bermingham 2001; Weckstein et al. 2001). More specifically, using some level of mtDNA divergence as a yardstick for species boundaries ignores the low precision with which coalescence of mtDNA predicts phylogenetic divergence at nuclear genes (Hudson and Turelli 2003). An additional problem with focusing on mtDNA (or any other molecular) divergence as a primary criterion for recognizing species is that it will lead us to overlook new or rapidly diverged species, such as might arise through divergent selection or polyploidy, and thus to conclude that speciation requires long-term isolation. For example, a recent mtDNA analysis of North American birds (Johnson and Cicero 2004) showed that numerous avian species have low divergences and that speciation can occur relatively rapidly under certain circumstances. We contend, therefore, that whereas divergent or discordant mtDNA sequences might stimulate taxonomic reassessment based on nuclear genes as well as morphology, ecology, or behavior, mtDNA divergence is neither necessary nor sufficient as a criterion for delineating species. This view accords with existing practice: taxonomic splits in North American birds typically are based on multiple lines of biological evidence, e.g., morphological and vocal differences as well as genetic data (American Ornithologists' Union 1998). We turn now to the core of Hebert et al.'s paper—COI sequencing of a substantial sample of North American birds (260 of 667 species) and its validity as a test of the barcoding concept. Their aim is to test “the correspondence between species boundaries signaled by COI barcodes and those established by prior taxonomic research.” North American birds are an interesting choice because their species-level taxonomy is relatively well resolved and there has been extensive previous analysis of levels of mtDNA sequence divergence within and among described species (Klicka and Zink 1997; Avise and Walker 1998; Johnson and Cicero 2004). Herbert et al. (2004) found differences in COI sequences “between closely related species” that were 19–24 times greater in magnitude than the differences within species (7.05%–7.93% versus 0.27%–0.43%, respectively). From these data, they conclude that most North American bird species can be discriminated via molecular diagnosis of individuals and propose a “standard sequence threshold” of ten times the mean intraspecific variation (yielding a 2.7% threshold in birds) to flag genetically divergent taxa as “provisional species.” Thus, their analysis seeks to address both potential applications of DNA barcoding. Although Herbert et al. sampled a large number of species, a true test of the precision of mtDNA barcodes to assign individuals to species would include comparisons with sister species—the most closely related extant relatives. This would require that all members of a genus be examined, rather than a random sample of imprecisely defined close relatives, and that taxa be included from more than one geographic region. Johnson and Cicero (2004) showed the importance of comparing sister species when examining genetic divergence values in North American birds, with results that contrast strongly with those of Hebert et al. as well as previous studies (e.g., Klicka and Zink 1997). For 39 pairs of avian sister species, mtDNA sequence divergences ranged from 0.0% to 8.2%, with an average of 1.9% (cf. 7% to 8% among closely related species in Hebert et al.). Of these, 29 pairs (74%) are at or below the 2.7% threshold proposed by Herbert et al. and thus would not be recognized as species despite biological differences. Moreover, although only a few of these 39 pairs (see Table 1 in Johnson and Cicero [2004]) had sufficient sampling to assess intraspecific variation in mtDNA sequences, these typically showed paraphyly in mtDNA haplotypes. Therefore, there are still too few cases with adequate sampling of intraspecific diversity for sister species pairs to know how common paraphyly is, although a recent meta-analysis found that 17% of bird species deviated from mtDNA monophyly (Funk and Omland 2003). Collectively, these observations cast doubt on the precision of DNA barcoding for allocating individuals to previously described avian species. Empidonax flycatchers, which are renowned for their morphological similarity and could thereby benefit from DNA-based identification tools, provide an example of the importance of a more detailed analysis. A complete molecular phylogeny for this group (Johnson and Cicero 2002) yielded distances between four pairs of sister species that ranged from 0.7% (E. difficilis versus E. occidentalis) to 4.6% (E. traillii versus E. alnorum); notably, the genetic distance between mainland and island populations of E. difficilis (E. d. difficilis and E. d. insulicola, 0.9%) was greater than that between sister species (Johnson and Cicero 2002). Herbert et al.'s analysis included only two species of Empidonax (E. traillii and E. virescens), which are not sisters but members of divergent clades. Because E. virescens is genetically distant from all other species of Empidonax (10.3% to 12.5% uncorrected distance; Johnson and Cicero 2002), its comparison with E. trailli therefore inflates estimates of interspecific distances within the genus. Another key point of Hebert et al.'s analysis was to estimate levels of intraspecific diversity. For 130 species of the 260 examined, more than two individuals were sequenced (n = 2 to 12 individuals per species, mean = 2.4), and pooled pairwise genetic distances were found to be uncorrelated with geographic distances, leading Hebert et al. to conclude that “high levels of intraspecific divergence in COI in North American birds appear uncommon.” However, this makes the assumption that there is a common underlying pattern of phylogeographic structure, which is unlikely for North American birds (Zink 1996, Zink et al. 2001). If there is significant variation, assessment of intraspecific diversity can be based on a small sample of individuals only if individuals are sampled across existing population subdivisions for which geography and phenotypic variation are reasonable initial surrogates. The analyses presented by Hebert et al. will certainly stimulate further debate (a reply by Hebert et al. to the present letter is posted at http://www.barcodinglife.com), but, for the reasons outlined here, they are not yet a definitive test of the utility of DNA barcoding for either diagnosis of individuals or discovery of species. We also question whether the results for North American birds can be extrapolated to the tropics, where DNA barcoding could have maximum value. In general, among-population sequence divergence increases with decreasing latitude, even excluding previously glaciated regions (Martin and MacKay 2004), and studies of intraspecific genetic diversity in Neotropical birds have revealed a higher level of phylogeographic subdivision compared to temperate species (Remsen 1997, Lovette and Bermingham 2001). Thus, the general utility of mtDNA barcoding across different biogeographic regions—and between resident versus migratory taxa—requires further scrutiny. There is little doubt that large-scale and standardized sequencing, when integrated with existing taxonomic practice, can contribute significantly to the challenges of identifying individuals and increasing the rate of discovering biological diversity. But to determine when and where this approach is applicable, we now need to discover the boundary conditions. The real challenge lies with tropical taxa and those with limited dispersal and thus substantial phylogeographic structure. Such analyses need to be taxonomically broad and need to extend beyond the focal geographic region to ensure that potential sister taxa are evaluated and can be discriminated. There is also the need to examine groups with frequent (possibly cryptic) hybridization, recent radiations, and high rates of gene transfer from mtDNA to the nucleus. Only then will the skeptics be satisfied.

          Related collections

          Most cited references 42

          • Record: found
          • Abstract: found
          • Article: not found

          Identification of Birds through DNA Barcodes

          Introduction The use of nucleotide sequence differences in a single gene to investigate evolutionary relationships was first widely applied by Carl Woese (Woese and Fox 1977). He recognized that sequence differences in a conserved gene, ribosomal RNA, could be used to infer phylogenetic relationships. Sequence comparisons of rRNA from many different organisms led initially to recognition of the Archaea, and subsequently to a redrawing of the tree of life. More recently, the polymerase chain reaction has allowed sequence diversity in any gene to be examined. Genes that evolve slowly, like rRNA, often do not differ among closely related organisms, but they are indispensable in recovering ancient relationships, providing insights as far back as the origin of cellular life (Woese 2000). On the other hand, genes that evolve rapidly may overwrite the traces of ancient affinities, but regularly reveal divergences between closely related species. Mitochondrial DNA (mtDNA) has been widely employed in phylogenetic studies of animals because it evolves much more rapidly than nuclear DNA, resulting in the accumulation of differences between closely related species (Brown et al. 1979; Moore 1995; Mindell et al. 1997). In fact, the rapid pace of sequence change in mtDNA results in differences between populations that have only been separated for brief periods of time. John Avise was the first to recognize that sequence divergences in mtDNA provide a record of evolutionary history within species, thereby linking population genetics and systematics and establishing the field of phylogeography (Avise et al. 1987). Avise and others also found that sister species usually show pronounced mtDNA divergences, and more generally that “biotic entities registered in mtDNA genealogies…and traditional taxonomic assignments tend to converge” (Avise and Walker 1999). Although many species show phylogeographic subdivisions, these usually coalesce into single lineages “at distances much shorter than the internodal branch lengths of the species tree” (Moore 1995). In other words, sequence divergences are much larger among species than within species, and thus mtDNA genealogies generally capture the biological discontinuities recognized by taxonomists as species. Taking advantage of this fact, taxonomic revisions at the species level now regularly include analysis of mtDNA divergences. For example, many newly recognized species of birds have been defined, in part, on the basis of divergences in their mtDNA (e.g., Avise and Zink 1988; Gill and Slikas 1992; Murray et al. 1994; AOU 1998; Banks et al. 2000, 2002, 2003). The general concordance of mtDNA trees with species trees implies that, rather than analyzing DNA from morphologically identified specimens, it could be used the other way around, namely to identify specimens by analyzing their DNA. Past applications of DNA-based species identification range from reconstructing food webs by identifying fragments in stomachs (Symondson 2002) to recognizing products prepared from protected species (Palumbi and Cipriano 1998) and resolving complexes of mosquitoes that transmit malaria and dengue fever (Phuc et al. 2003). Despite such demonstrations, the lack of a lingua franca has limited the use of DNA as a general tool for species identifications. If a short region of mtDNA that consistently differentiated species could be found and accepted as a standard, a library of sequences linked to vouchered specimens would make this sequence an identifier for species, a “DNA barcode” (Hebert et al. 2003a). Recent work suggests that a 648-bp region of the mitochondrial gene, cytochrome c oxidase I (COI), might serve as a DNA barcode for the identification of animal species. This gene region is easily recovered and it provides good resolution, as evidenced by the fact that deep sequence divergences were the rule between 13,000 closely related pairs of animal species (Hebert et al. 2003b). The present study extends these earlier investigations by testing the correspondence between species boundaries signaled by COI barcodes and those established by prior taxonomic work. Such tests require the analysis of groups that have been studied intensively enough to create a firm system of binomials; birds satisfy this requirement. Although GenBank holds many bird sequences, these derive from varied gene regions while a test of species identification requires comparisons of sequences from a standard gene region across species. Accordingly, the barcode region of COI was sequenced in 260 of the 667 bird species that breed in North America (AOU 1998). Results All 260 bird species had a different COI sequence(s); none was shared between species. COI sequences in the 130 species represented by two or more individuals were either identical or most similar to other sequences of the same species. Furthermore, with a few interesting exceptions discussed below, COI sequence differences between closely related species were far higher than differences within species (18-fold higher; average Kimura-2-parameter [K2P] differences between and within species, 7.93% and 0.43%, respectively) (Figure 1). In most cases the neighbor-joining (NJ) tree showed shallow intraspecific and deep interspecific divergences (Figure 2). However, in four exceptional cases, there were deep divergences within a species (Tringa solitaria, Solitary Sandpiper; Sturnella magna, Eastern Meadowlark; Cisthorus palustris, Marsh Wren; and Vireo gilvus, Warbling Vireo). COI sequences in each of these polytypic species separated into pairs of divergent clusters in the NJ tree. The intraspecific K2P distances in these exceptional species were 3.7%–7.2%, 9- to 17-fold higher than the average distance (Figures 2, 3, and S1). Setting aside these polytypic species, the average intraspecific distance was very low, 0.27%, and the maximum average intraspecific difference was only 1.24%. Most congeneric species pairs showed divergences well above this value, but 13 species in four genera had interspecific distances that were below 1.25%. They included Larus argentatus, L. canus, L. delawarensis, L. glaucoides, L. hyperboreus, L. marinus, and L. thayeri (Herring Gull, Mew Gull, Ring-billed Gull, Iceland Gull, Glaucous Gull, Great Black-Backed Gull, and Thayer's Gull); Haematopus bachmani and H. palliatus (Black Oystercatcher and American Oystercatcher); Corvus brachyrhynchos and C. caurinus (American Crow and Northwestern Crow); and Anas platyrhynchos and A. rubripes (Mallard and American Black Duck) (Figure S1). Although species were the focus of this study, we noted that the NJ tree of COI sequences generally matched avian classifications at higher levels, with most genera, families, and orders appearing as nested monophyletic lineages concordant with current taxonomy (Figures 3 and S1). Discussion The simplest test of species identification by DNA barcode is whether any sequences are found in two species; none was in this study. Although sequences were not shared by species, sequence variation did occur in some species. Thus the second test is whether the differences within species are much less than those among species. In this study we found that COI differences among most of the 260 North American bird species far exceeded those within species. In order to conservatively test the effectiveness of COI barcodes as an identification tool, our sample must not have underestimated variability within species or have overestimated it among species. Our measures of intraspecific variation could be underestimates if members of a species show sequence divergence across their distribution that our study failed to adequately register. The two to three representatives of the 130 species used to examine this issue were collected from sites that were, on average, approximately 1,080 km apart, suggesting adequate representation of genetic diversity across their ranges. However, to further investigate this issue, we compared sequence differences within species to geographic distances between the collection points for their specimens and found these were unrelated (Figure 4). Based on these results, high levels of intraspecific divergence in COI in North American birds appear uncommon, given that we analyzed 130 different species in a variety of orders. Our findings are supported by a review of 34 mostly North American birds which showed a similarly low average maximum intraspecific K2P divergence of mtDNA of 0.7% (Moore 1995). Similarly, Weibel and Moore (2002) reported an average intraspecific divergence of 0.24% in their study of COI variation in woodpeckers. We conclude that our investigation has not underestimated intraspecific variation in any systematic fashion. On the other hand, our discovery of four polytypic species within a sample of 130 makes it likely there are other North American birds with divergent populations that may represent hidden species. Recent studies have identified marked mtDNA divergences within North American populations of Common Ravens (Omland et al. 2000), Fox Sparrows (Zink and Weckstein 2003), and Curve-billed Thrashers (Zink and Blackwell-Rago 2000), leading to proposals to split each into two or more species. Species with Holarctic distributions are particularly good candidates for unrecognized species, and recent DNA and morphological investigations have led taxonomists to split several such species into two, including Wilson's and Common Snipes, American and Eurasian Three-toed Woodpeckers, and American and Water Pipits (Zink et al. 1995, 2002; Miller 1996; AOU 1998; Banks et al. 2000, 2002, 2003). Widespread application of COI barcodes across the global ranges of birds will undoubtedly lead to the recognition of further hidden species. Any critical test of the effectiveness of barcodes must also consider the possibility that our study has overestimated variability among species. We therefore looked at species individually, comparing their minimum distance to a congener with the maximum divergence within each species. This analysis included a number of well-recognized sibling species, including Calidris mauri and C. pusilla, Fraternicula arctica and F. corniculata, and Empidonax traillii and E. virescens. There were sufficient data to perform this analysis on three of the four polytypic species and on 70 of the 126 remaining species (Figure 5). The average maximum K2P divergence within these 70 species was 0.29%, while the average minimum distance to a congener was 7.05% (24-fold higher), values comparable to those for the entire data set. Prior studies that looked exclusively at sister species of birds found an average K2P mtDNA distance of 5.1% in 35 pairs (Klicka and Zink 1997) and 3.5% in 47 pairs (Johns and Avise 1998). More generally, 98% of sister species pairs of vertebrates were observed to have K2P mtDNA divergences greater than 2% (Johns and Avise 1998). Thus it appears that a COI barcode will enable the separation of most sister species of birds. There is a possibility that the North American bird fauna is not representative of the global situation. The recent and extensive glaciations in North America may have decreased within-species variability by inducing bottlenecks in population size or may have increased variation between species by pruning many sister taxa (Avise and Walker 1998; Mila et al. 2000). This issue can only be resolved by evaluating the efficacy of barcodes in tropical and southern temperate faunas to ascertain if our results are general. We note that recent mtDNA studies in these settings have found both multiple sibling species in what were thought to be single species (Ryan and Bloomer 1999) and geographically structured variation suggesting the presence of cryptic species (Hackett and Rosenberg 1990; Bates et al. 1999). The diagnosis of species is particularly difficult when they are young. Moreover, hybridization is often common when the ranges of recently arisen species overlap, further complicating identifications. Such newly emerged species are sometimes referred to as superspecies (Mayr and Short 1970), or species complexes, to indicate their close genetic similarity. For example, the white-headed gulls are thought to have diverged very recently, some less than 10,000 years ago (Crochet et al. 2002, 2003), and hybridization is common among many of them. It is thus not surprising that their COI barcodes and other gene loci are very similar. DNA barcodes can help to define the limits of such recently emerged species, but more gene loci need to be surveyed and more work is required to determine which analytical methods can best deduce species boundaries in such cases. The NJ method used here has the advantage of speed, and performs strongly when sequence divergences are low, so it is generally appropriate for recovering intra- and interspecies phylogeny. However, a library of COI barcodes linked to named specimens will provide the large data sets needed to test the efficacy of varied tree-building methods (for review, see Holder and Lewis 2003). Even between species that diverged long ago, hybridization will lead to shared or very similar sequences at COI and other gene loci. Because mitochondrial DNA is maternally inherited, a COI barcode will assign F1 hybrids to the species of their female parent. Hybridization leading to the transfer of mtDNA from one species to another can result in a mtDNA tree that is incongruent with the species tree, but it will not necessarily prevent species from being distinguished, unless the mitochondrial transfer is so recent that their sequences have not diverged (Moore 1995). However, recent hybridization will lead species to share COI barcodes, and we expect that more intensive study will reveal such shared sequences in species that are known to hybridize, such as the white-headed gulls (Crochet et al. 2003) and Mallard/Black Ducks (Ankney et al. 1986; Avise et al. 1990). In other cases, a lack of COI divergence may indicate that populations are part of a single species, helping to sort out misleading morphological classifications. For example, the blue and white morphs of Chen caerulescens, Snow Goose, were thought to be different species until recently (Cooke et al. 1995). The close COI similarity of American and Black Oystercatchers revealed in this study is consistent with suggestions that these are allopatrically distributed color morphs of a single species (Jehl 1985). Low COI divergences between American and Northwestern Crows similarly support earlier suggestions that these taxa are conspecific (Sibley and Monroe 1990; Madge and Burn 1994). Just as COI similarities among species already questioned by taxonomists may reinforce these queries, deep COI divergences within species may reinforce suspicions of hidden diversity. For example, three of the four polytypic species in this study (Eastern Meadowlark, Marsh Wren, and Warbling Vireo) are split into two by some taxonomists (Wells 1998), and the fourth, Solitary Sandpiper, contains two allopatric subspecies with morphological differences (Godfrey 1976). In these cases, suspicions in the minds of taxonomists are reinforced by large COI divergences. If these species had not been the subject of prior scrutiny, COI barcoding would have flagged them as deserving of such attention. The importance of sampling multiple individuals within each species is highlighted by a recent review which found evidence of species-level paraphyly or polyphyly in 23% of 2,319 animal species, including 16.7% of 331 bird species (Funk and Omland 2003). This review provides a clear discussion of possible causes (imperfect taxonomy, hybridization, incomplete lineage sorting) and indicates the need for the careful reexamination of current taxonomy and for the collection of genetic data across both geographic ranges and morphological variants. Barcoding, together with related developments in sequencing technology, is likely to provide an efficient approach to the assembly of such genetic data. We expect that the assembly of a comprehensive barcode library will help to initiate taxonomic investigations that will ultimately lead to the recognition of many new avian species. This process will begin with the discovery of novel COI barcodes. Some of these cases will simply represent the first barcode records for described but previously unanalyzed species, but taxonomic study will confirm that others derive from new species. We propose that specimens with barcodes diverging deeply from known taxa should be known by a “provisional species” designation that links them to the nearest established taxon. For example, the divergent clusters of Solitary Sandpiper specimens might be called T. solitaria PS-1 and T. solitaria PS-2, highlighting a need for further taxonomic study. What threshold might be appropriate for flagging genetically divergent specimens as provisional species? This threshold should certainly be high enough to separate only specimens that very likely belong to different species. Because patterns of intraspecific and interspecific variation in COI appear similar in various animal groups (Grant and Bowen 1998 [sardines]; Hebert et al. 2003a [moths]; Hogg and Hebert 2004 [springtails]), we propose a standard sequence threshold: 10× the mean intraspecific variation for the group under study. If applied to the birds examined in this study (0.27% average intraspecific variation; 2.7% threshold), a 10× threshold would recognize over 90% of the 260 known species, as well as the four probable new species. As this result demonstrates, a threshold approach will overlook species with short evolutionary histories and those exposed to recent hybridization, but it will be a useful screening tool, especially for groups that have not received intensive taxonomic analysis. For 260 of the 667 bird species breeding in North America, our evidence shows that COI barcodes separate individuals into the categories that taxonomists call species. This adds to the evidence already in hand for insects and other arthropods that barcodes can be an efficient tool for species identification. Should future studies broaden this evidence, a comprehensive library of barcodes will make it easier to probe varied areas of avian biology. A DNA barcode will help, for example, when morphological diagnoses are difficult, as when identifying remnants (including eggs, nestlings, and adults) in the stomachs of predators. A DNA barcode could similarly identify fragments of birds that strike aircraft (Dove 2000) and recognize carcasses of protected or regulated species (Guglich et al. 1994). DNA barcodes could also reveal the species of avian blood in mosquitoes carrying West Nile virus (Michael et al. 2001; Lee et al. 2002), help experts distinguish morphologically similar juveniles or nonbreeding adults in banding work, and allow expanded nonlethal study of endangered or threatened populations. The two essential components for an effective DNA barcode system (and thus a new master key to the encyclopedia of life [Wilson 2003]) are standardization on a uniform barcode sequence, such as COI, and a library of sequences linked to named voucher specimens. The present study provides an initial set of COI barcodes for about 40% of North American birds. More detailed sampling of COI sequences is needed for these species, and barcodes need to be gathered for the remaining North American birds and for those in other geographic regions. This work could represent a first step toward a DNA barcode system for all animal and plant life, an initiative with potentially widespread scientific and practical benefits (Stoeckle 2003; Wilson 2003; Blaxter 2004; Janzen 2004). Materials and Methods Existing data can only yield limited new insights into the effectiveness of a DNA-based identification system for birds. Two mitochondrial genes, cyt b and COI, are rivals for the largest number of animal sequence records greater than 600 bp in GenBank (4,791 and 3,009 species, respectively). However, COI coverage for birds is modest; 173 species share COI sequences with 600-bp overlap. As these records derive from a global avifauna of 10,000 species, they provide a limited basis to evaluate the utility of a COI-based identification system for any continental fauna, impelling us to gather new sequences. We employed a stratified sampling design to gain an overview of the patterns of COI sequence divergence among North American birds. The initial level of sampling examined a single individual from each of 260 species to ascertain COI divergences among species. These species were selected on the basis of accessibility without regard to known taxonomic issues. The second level of sampling examined one to three additional individuals from 130 of these species to provide a general sense of intraspecific sequence divergences, as well as a preliminary indication of variation in each species. When possible, these individuals were obtained from widely separated localities in North America. The third level of our analysis involved sequencing four to eight more individuals for the few species where the second level detected more than 2% sequence divergence among individuals. Our studies examined specimens collected over the last 20 years; 98% were obtained from the tissue bank at the Royal Ontario Museum, Toronto, Canada. Collection localities and other specimen information are available in the “Birds of North America” file in the Completed Projects section of the Barcode of Life website (http://www.barcodinglife.com). Taxonomic assignments follow the latest North American checklist (AOU 1998) and its recent supplements (Banks et al. 2000, 2002, 2003). Mitochondrial pseudogenes can complicate PCR-based studies of mitochondrial gene diversity (Bensasson et al. 2001; Thalmann et al. 2004). We used protocols to reduce pseudogene impacts that included extracting DNA from tissues rich in mitochondria (Sorenson and Quinn 1998), employing primers with high universality (Sorenson and Quinn 1998), and amplifying a relatively long PCR product because most pseudogenes are short (Pereira and Baker 2004). DNA extracts were prepared from small samples of muscle using the GeneElute DNA miniprep Kit (Sigma, St. Louis, Missouri, United States), following the manufacturer's protocols. DNA extracts were resuspended in 10 μl of H2O, and a 749-bp region near the 5′ terminus of the COI gene was amplified using primers (BirdF1-TTCTCCAACCACAAAGACATTGGCAC and BirdR1-ACGTGGGAGATAATTCCAAATCCTG). In cases where this primer pair failed, an alternate reverse primer (BirdR2-ACTACATGTGAGATGATTCCGAATCCAG) was generally combined with BirdF1 to generate a 751-bp product, but a third reverse primer (BirdR3-AGGAGTTTGCTAGTACGATGCC) was used for two species of Falco. The 50-μl PCR reaction mixes included 40 μl of ultrapure water, 1.0 U of Taq polymerase, 2.5 μl of MgCl2, 4.5 μl of 10× PCR buffer, 0.5 μl of each primer (0.1 mM), 0.25 μl of each dNTP (0.05 mM), and 0.5–3.0 μl of DNA. The amplification regime consisted of 1 min at 94 °C followed by 5 cycles of 1 min at 94 °C, 1.5 min at 45 °C, and 1.5 min at 72 °C, followed in turn by 30 cycles of 1 min at 4 °C, 1.5 min at 51 °C, and 1.5 min at 72 °C, and a final 5 min at 72 °C. PCR products were visualized in a 1.2% agarose gel. All PCR reactions that generated a single, circa 750-bp, product were then cycle sequenced, while gel purification was used to recover the target gene product in cases where more than one band was present. Sequencing reactions, carried out using Big Dye v3.1 and the BirdF1 primer, were analyzed on an ABI 377 sequencer. The electropherogram and sequence for each specimen are in the “Birds of North America” file, but all sequences have also been deposited in GenBank (see Supporting Information). COI sequences were recovered from all 260 bird species and did not contain insertions, deletions, nonsense, or stop codons, supporting the absence of nuclear pseudogene amplification (Pereira and Baker 2004). In addition to 429 newly collected sequences, nine GenBank sequences from five species were included (these were the only full-length COI sequences corresponding to species in this study). Sequence divergences were calculated using the K2P distance model (Kimura 1980). A NJ tree of K2P distances was created to provide a graphic representation of the patterning of divergences among species (Saitou and Nei 1987). Supporting Information Figure S1 Birds Appendix Complete NJ tree based on K2P distances at COI for 437 sequences from 260 species of North American birds. Entries marked with an asterisk represent COI sequences from GenBank. (100 KB PDF). Click here for additional data file. Accession Numbers Sequences described in Materials and Methods have been deposited in GenBank under accession numbers AY666171 to AY666596.
            • Record: found
            • Abstract: not found
            • Article: not found

            Phylogeography-The History and Formation of Species

              • Record: found
              • Abstract: not found
              • Article: not found

              Mitochondrial pseudogenes: evolution's misplaced witnesses.

               D Bensasson (2001)
              Nuclear copies of mitochondrial DNA (mtDNA) have contaminated PCR-based mitochondrial studies of over 64 different animal species. Since the last review of these nuclear mitochondrial pseudogenes (Numts) in animals, Numts have been found in 53 of the species studied. The recent evidence suggests that Numts are not equally abundant in all species, for example they are more common in plants than in animals, and also more numerous in humans than in Drosophila. Methods for avoiding Numts have now been tested, and several recent studies demonstrate the potential utility of Numt DNA sequences in evolutionary studies. As relics of ancient mtDNA, these pseudogenes can be used to infer ancestral states or root mitochondrial phylogenies. Where they are numerous and selectively unconstrained, Numts are ideal for the study of spontaneous mutation in nuclear genomes.

                Author and article information

                PLoS Biol
                PLoS Biology
                Public Library of Science (San Francisco, USA )
                October 2004
                28 September 2004
                : 2
                : 10
                Copyright: © 2004 Craig Moritz and Carla Cicero. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
                Correspondence and Other Communications

                Life sciences


                Comment on this article