27
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Publish your biodiversity research with us!

      Submit your article here.

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Workflow for Data Extraction from Digitized Herbarium Specimens

      , , , ,
      Biodiversity Information Science and Standards
      Pensoft Publishers

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Based on own work on species and trait recognition and complementary studies from other working groups, we present a workflow for data extraction from digitized herbarium specimens using convolutional neural networks. Digitized herbarium sheets contain: preserved plant material as well as additional objects: the label containing information on the collection event, annotations such as revision labels, or notes on material extraction, identifiers such as barcodes or numbers, envelopes for loose plant material and often scale bars and color charts used in the digitization process. In order to treat these objects appropriately, segmentation techniques (Triki et al. 2018) will be applied to localize and identify the different kinds of objects for specific treatments. Detecting presence of plant organs such as leaves, flowers or fruits is already a first step in data extraction potentially useful for phenological studies. Plant organs will be subject to routines for quantitative (Gaikwad et al. 2018) and qualitative (Younis et al. 2018) trait recognition routines. Text-based objects can be treated as described by Kirchhoff et al. 2018, using OCR techniques and considering the many collection-specific terms and abbreviations as described in Schröder 2019. Additionally, species recognition (Younis et al. 2018) will be applied in order to help further identification of incompletely identified collection items or to detect possible misidentifications. All steps described above need sufficient training data including labelling that may be obtained from collection metadata and trait databases. In order to deal with new incoming digitized collections, unseen data or categories, we propose implementation of a new Deep Learning approach, so-called Lifelong Learning: Past knowledge of the network is dynamically saved in latent space using autoencoder and generatively replayed while the network is trained on new tasks which enables it to solve complex image processing tasks without forgetting former knowledge while incrementally learning new classes and knowledge.

          Related collections

          Most cited references2

          • Record: found
          • Abstract: not found
          • Article: not found

          Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Toward a service-based workflow for automated information extraction from herbarium specimens

            Abstract Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project.
              Bookmark

              Author and article information

              Journal
              Biodiversity Information Science and Standards
              BISS
              Pensoft Publishers
              2535-0897
              July 10 2019
              July 10 2019
              : 3
              Article
              10.3897/biss.3.35190
              fa446b31-2994-42ba-a5c9-ba4e30d11829
              © 2019

              http://creativecommons.org/licenses/by/4.0/

              History

              Comments

              Comment on this article