12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Text-mined fossil biodiversity dynamics using machine learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Documented occurrences of fossil taxa are the empirical foundation for understanding large-scale biodiversity changes and evolutionary dynamics in deep time. The fossil record contains vast amounts of understudied taxa. Yet the compilation of huge volumes of data remains a labour-intensive impediment to a more complete understanding of Earth's biodiversity history. Even so, many occurrence records of species and genera in these taxa can be uncovered in the palaeontological literature. Here, we extract observations of fossils and their inferred ages from unstructured text in books and scientific articles using machine-learning approaches. We use Bryozoa, a group of marine invertebrates with a rich fossil record, as a case study. Building on recent advances in computational linguistics, we develop a pipeline to recognize taxonomic names and geologic time intervals in published literature and use supervised learning to machine-read whether the species in question occurred in a given age interval. Intermediate machine error rates appear comparable to human error rates in a simple trial, and resulting genus richness curves capture the main features of published fossil diversity studies of bryozoans. We believe our automated pipeline, that greatly reduced the time required to compile our dataset, can help others compile similar data for other taxa.

          Related collections

          Most cited references46

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          The Stanford CoreNLP Natural Language Processing Toolkit

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Phanerozoic trends in the global diversity of marine invertebrates.

            It has previously been thought that there was a steep Cretaceous and Cenozoic radiation of marine invertebrates. This pattern can be replicated with a new data set of fossil occurrences representing 3.5 million specimens, but only when older analytical protocols are used. Moreover, analyses that employ sampling standardization and more robust counting methods show a modest rise in diversity with no clear trend after the mid-Cretaceous. Globally, locally, and at both high and low latitudes, diversity was less than twice as high in the Neogene as in the mid-Paleozoic. The ratio of global to local richness has changed little, and a latitudinal diversity gradient was present in the early Paleozoic.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Advances in natural language processing.

              Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area.
                Bookmark

                Author and article information

                Journal
                Proc Biol Sci
                Proc. Biol. Sci
                RSPB
                royprsb
                Proceedings of the Royal Society B: Biological Sciences
                The Royal Society
                0962-8452
                1471-2954
                24 April 2019
                24 April 2019
                24 April 2019
                : 286
                : 1901
                : 20190022
                Affiliations
                [1 ]Natural History Museum, University of Oslo , PO Box 1172, Blindern, 0318 Oslo, Norway
                [2 ]Integrative Research Center, Field Museum , 1400 South Lake Shore Drive, Chicago IL, 60605, USA
                [3 ]Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo , PO Box 1066, Blindern, 0316 Oslo, Norway
                Author notes

                Electronic supplementary material is available online at http://dx.doi.org/10.6084/m9.figshare.c.4464296.

                Author information
                http://orcid.org/0000-0002-7360-7087
                http://orcid.org/0000-0002-0446-4705
                http://orcid.org/0000-0002-3732-6069
                Article
                rspb20190022
                10.1098/rspb.2019.0022
                6501925
                31014224
                daebc680-2228-4f2c-84bc-53d53b59074b
                © 2019 The Authors.

                Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

                History
                : 4 January 2019
                : 2 April 2019
                Funding
                Funded by: H2020 European Research Council, http://dx.doi.org/10.13039/100010663;
                Award ID: 724324
                Categories
                1001
                144
                70
                22
                Palaeobiology
                Research Article
                Custom metadata
                April 24, 2019

                Life sciences
                cheilostome bryozoans,fossil occurrences,palaeobiodiversity,natural language processing,information extraction,literature compilation

                Comments

                Comment on this article