A Workflow for Data Extraction from Digitized Herbarium Specimens

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Based on own work on species and trait recognition and complementary studies from other working groups, we present a workflow for data extraction from digitized herbarium specimens using convolutional neural networks. Digitized herbarium sheets contain: preserved plant material as well as additional objects: the label containing information on the collection event, annotations such as revision labels, or notes on material extraction, identifiers such as barcodes or numbers, envelopes for loose plant material and often scale bars and color charts used in the digitization process. In order to treat these objects appropriately, segmentation techniques (Triki et al. 2018) will be applied to localize and identify the different kinds of objects for specific treatments. Detecting presence of plant organs such as leaves, flowers or fruits is already a first step in data extraction potentially useful for phenological studies. Plant organs will be subject to routines for quantitative (Gaikwad et al. 2018) and qualitative (Younis et al. 2018) trait recognition routines. Text-based objects can be treated as described by Kirchhoff et al. 2018, using OCR techniques and considering the many collection-specific terms and abbreviations as described in Schröder 2019. Additionally, species recognition (Younis et al. 2018) will be applied in order to help further identification of incompletely identified collection items or to detect possible misidentifications. All steps described above need sufficient training data including labelling that may be obtained from collection metadata and trait databases. In order to deal with new incoming digitized collections, unseen data or categories, we propose implementation of a new Deep Learning approach, so-called Lifelong Learning: Past knowledge of the network is dynamically saved in latent space using autoencoder and generatively replayed while the network is trained on new tasks which enables it to solve complex image processing tasks without forgetting former knowledge while incrementally learning new classes and knowledge.

Related collections

Most cited references 2

Record: found
Abstract: not found
Article: not found

Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks

Sohaib Younis, Claus Weiland, Robert Hoehndorf … (2018)

0 comments Cited 25 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Toward a service-based workflow for automated information extraction from herbarium specimens

Agnes Kirchhoff, Ulrich Bugel, Eduard Santamaria … (2018)

Abstract Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project.

0 comments Cited 8 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Biodiversity Information Science and Standards

Abbreviated Title: BISS

Publisher: Pensoft Publishers

ISSN (Electronic): 2535-0897

Publication date Created: July 10 2019

Publication date (Electronic): July 10 2019

Volume: 3

Article

DOI: 10.3897/biss.3.35190

SO-VID: fa446b31-2994-42ba-a5c9-ba4e30d11829

License:

http://creativecommons.org/licenses/by/4.0/

History

Data availability:

Publish your biodiversity research with us!

Submit your article here.

A Workflow for Data Extraction from Digitized Herbarium Specimens

Read this article at

Abstract

Related collections

Pensoft Biodiversity

Most cited references 2

Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks

Toward a service-based workflow for automated information extraction from herbarium specimens

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 187

Most referenced authors 27