332
views
1
recommends
+1 Recommend
2 collections
    0
    shares

      Publish your biodiversity research with us!

      Submit your article here.

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Integrating and visualizing primary data from prospective and legacy taxonomic literature

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Abstract

          Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae ). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.

          Related collections

          Most cited references77

          • Record: found
          • Abstract: not found
          • Article: not found

          Rare species in communities of tropical insect herbivores: pondering the mystery of singletons

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Undersampling bias: the null hypothesis for singleton species in tropical arthropod surveys.

            1. Frequency of singletons - species represented by single individuals - is anomalously high in most large tropical arthropod surveys (average, 32%). 2. We sampled 5965 adult spiders of 352 species (29% singletons) from 1 ha of lowland tropical moist forest in Guyana. 3. Four common hypotheses (small body size, male-biased sex ratio, cryptic habits, clumped distributions) failed to explain singleton frequency. Singletons are larger than other species, not gender-biased, share no particular lifestyle, and are not clumped at 0.25-1 ha scales. 4. Monte Carlo simulation of the best-fit lognormal community shows that the observed data fit a random sample from a community of approximately 700 species and 1-2 million individuals, implying approximately 4% true singleton frequency. 5. Undersampling causes systematic negative bias of species richness, and should be the default null hypothesis for singleton frequencies. 6. Drastically greater sampling intensity in tropical arthropod inventory studies is required to yield realistic species richness estimates. 7. The lognormal distribution deserves greater consideration as a richness estimator when undersampling bias is severe.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Determining species boundaries in a world full of rarity: singletons, species delimitation methods.

                Bookmark

                Author and article information

                Contributors
                Journal
                Biodivers Data J
                Biodivers Data J
                Biodiversity Data Journal
                Biodiversity Data Journal
                Biodiversity Data Journal
                Pensoft Publishers
                1314-2836
                1314-2828
                2015
                12 May 2015
                : 3
                : e5063
                Affiliations
                []Naturalis Biodiversity Center, Leiden, Netherlands
                [§ ]www.Plazi.org, Bern, Switzerland
                [| ]Pensoft, Sofia, Bulgaria
                []KIT / Plazi, Karlsruhe, Germany
                [# ]Pensoft Publishers, Sofia, Bulgaria
                [¤ ]University of Sydney, Sydney, Australia
                [« ]The Open University, Milton Keynes, United Kingdom
                Author notes
                Corresponding author: Jeremy A. Miller ( jeremy.miller@ 123456naturalis.nl ).

                Academic editor: Ross Mounce

                Article
                Biodiversity Data Journal 3676
                10.3897/BDJ.3.e5063
                4442254
                11cae93b-f0d8-42a5-849f-9ccc579cc8e4
                Jeremy A. Miller, Donat Agosti, Lyubomir Penev, Guido Sautter, Teodor Georgiev, Terry Catapano, David Patterson, David King, Serrano Pereira, Rutger Aldo Vos, Soraya Sierra

                This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 09 April 2015
                : 06 May 2015
                Page count
                Figures: 12, Tables: 1, References: 71
                Funding
                Funded by: pro-iBiosphere (2012-2014), European Union Seventh Framework Programme (2007-2013)
                Categories
                General Research Article
                Araneae
                Data Analysis & Modelling
                Bioinformatics
                Taxonomy
                World

                araneae ,biodiversity informatics,data mining,open access,spiders,taxonomy,xml markup

                Comments

                Comment on this article