Applications of deep convolutional neural networks to digitized natural history collections

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Natural history collections contain data that are critical for many scientific endeavors. Recent efforts in mass digitization are generating large datasets from these collections that can provide unprecedented insight. Here, we present examples of how deep convolutional neural networks can be applied in analyses of imaged herbarium specimens. We first demonstrate that a convolutional neural network can detect mercury-stained specimens across a collection with 90% accuracy. We then show that such a network can correctly distinguish two morphologically similar plant families 96% of the time. Discarding the most challenging specimen images increases accuracy to 94% and 99%, respectively. These results highlight the importance of mass digitization and deep learning approaches and reveal how they can together deliver powerful new investigative tools.

Related collections

Most cited references 7

Record: found
Abstract: not found
Article: not found

The Value of Museum Collections for Research and Society

Andrew V. Suarez, Neil Tsutsui (2004)

0 comments Cited 217 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Going deeper in the automated identification of Herbarium specimens

Jose Carranza-Rojas, Herve Goeau, Pierre Bonnet … (2017)

Background Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries. Recent initiatives started ambitious preservation plans to digitize this information and make it available to botanists and the general public through web portals. However, thousands of sheets are still unidentified at the species level while numerous sheets should be reviewed and updated following more recent taxonomic knowledge. These annotations and revisions require an unrealistic amount of work for botanists to carry out in a reasonable time. Computer vision and machine learning approaches applied to herbarium sheets are promising but are still not well studied compared to automated species identification from leaf scans or pictures of plants in the field. Results In this work, we propose to study and evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology. In addition, we propose to study if the combination of herbarium sheets with photos of plants in the field is relevant in terms of accuracy, and finally, we explore if herbarium images from one region that has one specific flora can be used to do transfer learning to another region with other species; for example, on a region under-represented in terms of collected data. Conclusions This is, to our knowledge, the first study that uses deep learning to analyze a big dataset with thousands of species from herbaria. Results show the potential of Deep Learning on herbarium species identification, particularly by training and testing across different datasets from different herbaria. This could potentially lead to the creation of a semi, or even fully automated system to help taxonomists and experts with their annotation, classification, and revision works.

0 comments Cited 78 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Automated species identification: why not?

Kevin Gaston, Mark A O'Neill (2004)

Where possible, automation has been a common response of humankind to many activities that have to be repeated numerous times. The routine identification of specimens of previously described species has many of the characteristics of other activities that have been automated, and poses a major constraint on studies in many areas of both pure and applied biology. In this paper, we consider some of the reasons why automated species identification has not become widely employed, and whether it is a realistic option, addressing the notions that it is too difficult, too threatening, too different or too costly. Although recognizing that there are some very real technical obstacles yet to be overcome, we argue that progress in the development of automated species identification is extremely encouraging that such an approach has the potential to make a valuable contribution to reducing the burden of routine identifications. Vision and enterprise are perhaps more limiting at present than practical constraints on what might possibly be achieved.

0 comments Cited 75 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Eric Schuettpelz

Journal

Journal ID (nlm-ta): Biodivers Data J

Journal ID (iso-abbrev): Biodivers Data J

Journal ID (pmc): Biodiversity Data Journal

Journal ID (publisher-id): Biodiversity Data Journal

Title: Biodiversity Data Journal

Publisher: Pensoft Publishers

ISSN (Electronic): 1314-2828

Publication date Collection: 2017

Publication date (Electronic): 02 November 2017

Issue: 5

Electronic Location Identifier: e21139

Affiliations

[1 ] National Museum of Natural History, Smithsonian Institution, Washington, DC, United States of America

[2 ] Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, United States of America

[3 ] NVIDIA, Santa Clara, CA, United States of America

Author notes

Corresponding author: Eric Schuettpelz ( schuettpelze@ 123456si.edu ).

Academic editor: Vincent Smith

Article

Publisher ID: Biodiversity Data Journal Other ID: 8292

DOI: 10.3897/BDJ.5.e21139

PMC ID: 5680669

SO-VID: 7859d4cc-f0ec-4b88-b9c8-a6ac4debc3ed

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 21 September 2017

Date accepted : 21 October 2017

Page count

Figures: 2, Tables: 2, References: 9

Comments

Comment on this article

scite_

Cited by 29

See all cited by

Most referenced authors 127

See all reference authors

- Version 1

Publish your biodiversity research with us!

Submit your article here.

Applications of deep convolutional neural networks to digitized natural history collections

Read this article at

Abstract

Related collections

Pensoft Biodiversity

Most cited references 7

The Value of Museum Collections for Research and Society

Going deeper in the automated identification of Herbarium specimens

Automated species identification: why not?

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 158

Cited by 29

Most referenced authors 127