9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories

      1 , 2 , 1 , 3 , *

      PLoS ONE

      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Accuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale analyses of macroecological and biogeographic patterns and to document environmental changes over time. However, taxonomic identifications are often unreliable, especially for non-vascular plants and fungi including lichens, which may lack critical revisions of voucher specimens. Due to the scale of the problem, restudy of millions of collections is unrealistic and other strategies are needed. Here we propose to use verified, georeferenced occurrence data of a given species to apply predictive niche modeling that can then be used to evaluate unverified occurrences of that species. Selecting the charismatic lichen fungus, Usnea longissima, as a case study, we used georeferenced occurrence records based on sequenced specimens to model its predicted niche. Our results suggest that the target species is largely restricted to a narrow range of boreal and temperate forest in the Northern Hemisphere and that occurrence records in GBIF from tropical regions and the Southern Hemisphere do not represent this taxon, a prediction tested by comparison with taxonomic revisions of Usnea for these regions. As a novel approach, we employed Principal Component Analysis on the environmental grid data used for predictive modeling to visualize potential ecogeographical barriers for the target species; we found that tropical regions conform a strong barrier, explaining why potential niches in the Southern Hemisphere were not colonized by Usnea longissima and instead by morphologically similar species. This approach is an example of how data from two of the most important biodiversity repositories, GenBank and GBIF, can be effectively combined to remotely address the problem of inaccuracy of taxonomic identifications in occurrence data repositories and to provide a filtering mechanism which can considerably reduce the number of voucher specimens that need critical revision, in this case from 4,672 to about 100.

          Related collections

          Most cited references 19

          • Record: found
          • Abstract: found
          • Article: not found

          A DNA barcode for land plants.

            (2009)
          DNA barcoding involves sequencing a standard region of DNA as a tool for species identification. However, there has been no agreement on which region(s) should be used for barcoding land plants. To provide a community recommendation on a standard plant barcode, we have compared the performance of 7 leading candidate plastid DNA regions (atpF-atpH spacer, matK gene, rbcL gene, rpoB gene, rpoC1 gene, psbK-psbI spacer, and trnH-psbA spacer). Based on assessments of recoverability, sequence quality, and levels of species discrimination, we recommend the 2-locus combination of rbcL+matK as the plant barcode. This core 2-locus barcode will provide a universal framework for the routine use of DNA sequence data to identify specimens and contribute toward the discovery of overlooked species of land plants.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Database resources of the National Center for Biotechnology Information

            In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria.

              Maxent, one of the most commonly used methods for inferring species distributions and environmental tolerances from occurrence data, allows users to fit models of arbitrary complexity. Model complexity is typically constrained via a process known as L1 regularization, but at present little guidance is available for setting the appropriate level of regularization, and the effects of inappropriately complex or simple models are largely unknown. In this study, we demonstrate the use of information criterion approaches to setting regularization in Maxent, and we compare models selected using information criteria to models selected using other criteria that are common in the literature. We evaluate model performance using occurrence data generated from a known "true" initial Maxent model, using several different metrics for model quality and transferability. We demonstrate that models that are inappropriately complex or inappropriately simple show reduced ability to infer habitat quality, reduced ability to infer the relative importance of variables in constraining species' distributions, and reduced transferability to other time periods. We also demonstrate that information criteria may offer significant advantages over the methods commonly used in the literature.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                11 March 2016
                2016
                : 11
                : 3
                Affiliations
                [1 ]Integrative Research Center & Gantz Family Collections Center, Science & Education, The Field Museum, 1400 South Lake Shore Drive, Chicago, Illinois, 60605–2496, United States of America
                [2 ]Science Action Center, Science & Education, The Field Museum, 1400 South Lake Shore Drive, Chicago, Illinois, 60605–2496, United States of America
                [3 ]Botanical Garden and Botanical Museum, Königin-Luise-Str. 6–8, 14195, Berlin, Germany
                Trier University, GERMANY
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: RL. Performed the experiments: BES MKJ RL. Analyzed the data: BES RL. Contributed reagents/materials/analysis tools: BES MKJ RL. Wrote the paper: BES MKJ RL.

                Article
                PONE-D-15-15859
                10.1371/journal.pone.0151232
                4788202
                26967999
                © 2016 Smith et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Page count
                Figures: 3, Tables: 2, Pages: 15
                Product
                Funding
                This study was conceived as part of a large digitization project of North American lichen and bryophyte collections funded by the National Science Foundation: Digitization TCN Collaborative Research: North American Lichens and Bryophytes: Sensitive Indicators of Environmental Quality and Change (NSF-EF 1115002 to The Field Museum; PI Robert Lücking, co-PI Matt von Konrat; coordinated by NSF-EF 1115116 to the University of Wisconsin-Madison; PI Corinna Gries, co-PI Thomas Nash) and Digitization HUB: A Collections Digitization Framework for the 21st Century (NSF-EF 1115210 to the University of Florida; PI Lawrence Page, co-PIs Bruce MacFadden, Jose Fortes, Pamela Soltis, Gregory Riccardi). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Ecology
                Ecological Niches
                Ecology and Environmental Sciences
                Ecology
                Ecological Niches
                Biology and Life Sciences
                Plant Science
                Lichenology
                Research and Analysis Methods
                Database and Informatics Methods
                Biological Databases
                Sequence Databases
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Sequence Analysis
                Sequence Databases
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Sequence Analysis
                Sequence Databases
                Physical Sciences
                Chemistry
                Chemical Reactions
                Chemical Precipitation
                Biology and Life Sciences
                Organisms
                Fungi
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Forecasting
                Physical Sciences
                Mathematics
                Statistics (Mathematics)
                Statistical Methods
                Forecasting
                Biology and Life Sciences
                Evolutionary Biology
                Population Genetics
                Haplotypes
                Biology and Life Sciences
                Genetics
                Population Genetics
                Haplotypes
                Biology and Life Sciences
                Population Biology
                Population Genetics
                Haplotypes
                Earth Sciences
                Atmospheric Science
                Meteorology
                Custom metadata
                All relevant data are within the paper and its Supporting Information files or can be downloaded from GenBank ( http://www.ncbi.nlm.nih.gov/genbank) using the accession numbers in Table 1 provided or from WorldClim ( http://www.worldclim.org).

                Uncategorized

                Comments

                Comment on this article