23
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Publish your biodiversity research with us!

      Submit your article here.

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automated Herbarium Specimen Identification using Deep Learning

      , , , ,
      Proceedings of TDWG
      Pensoft Publishers

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries (Page et al. 2015). Recent initiatives, such as iDigBio (https://www.idigbio.org), aggregate data from and images of vouchered herbarium sheets (and other biocollections) and make this information available to botanists and the general public worldwide through web portals. These ambitious plans to transform and preserve these historical biodiversity data into digital format are supported by the United States National Science Foundation (NSF) Advancing the Digitization of Natural History Collections (ADBC) and the digitization is done by the Thematic Collections Networks (TCNs) funded under the ADBC program. However, thousands of herbarium sheets are still unidentified at the species level while numerous sheets should be reviewed and updated following more recent taxonomic knowledge. These annotations and revisions require an unrealistic amount of work for botanists to carry out in a reasonable time (Bebber et al. 2010). Computer vision and machine learning approaches applied to herbarium sheets are promising (Wijesingha and Marikar 2012) but are still not well studied compared to automated species identification from leaf scans or pictures of plants taken in the field. In a recent study, we evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology (Carranza-Rojas et al. 2017), particularly Convolutional Neural Networks (CNN) (Szegedy et al. 2015). This type of network allows automatic learning of the most prominent visual patterns in the images since they are trainable end-to-end (thus, differentiable), as opposed to previous approaches that use custom, hand-made feature extractors. A first challenge is to use herbarium sheet images alone to automatically identify the species of plants mounted on herbarium sheets. Secondly, we propose studying if the combination of herbarium sheet images with photos of plants in the field (Joly et al. 2015, Carranza-Rojas and Mata-Montero 2016) is a viable idea to train models that provide accurate results during identification. Finally, we explore if herbarium images from one region with a specific flora can be used in transfer learning (a technique in deep learning that first allows training a model with a dataset and then once trained, uses the weighted results to train another model with that knowledge as the baseline) to another region with other species; for example, in a region under-represented in terms of collected data. Our evaluation shows that the accuracy for species identification with deep learning technology, based on herbarium images, reaches 90.3% on a dataset of more than 1200 European plant species. This could potentially lead to the creation of a semi-, or even fully automated system to help taxonomists and experts with their annotation, classification, and revision works. In this paper, we take a closer look at the accuracy levels achieved with respect to the first two challenges. We evaluate the accuracy levels for each species included in the dataset, which encompasses 253,733 images, 1,204 species.

          Related collections

          Most cited references6

          • Record: found
          • Abstract: found
          • Article: not found

          Herbaria are a major frontier for species discovery.

          Despite the importance of species discovery, the processes including collecting, recognizing, and describing new species are poorly understood. Data are presented for flowering plants, measuring quantitatively the lag between the date a specimen of a new species was collected for the first time and when it was subsequently described and published. The data from our sample of new species published between 1970 and 2010 show that only 16% were described within five years of being collected for the first time. The description of the remaining 84% involved much older specimens, with nearly one-quarter of new species descriptions involving specimens >50 y old. Extrapolation of these results suggest that, of the estimated 70,000 species still to be described, more than half already have been collected and are stored in herbaria. Effort, funding, and research focus should, therefore, be directed as much to examining extant herbarium material as collecting new material in the field.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Going deeper in the automated identification of Herbarium specimens

            Background Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries. Recent initiatives started ambitious preservation plans to digitize this information and make it available to botanists and the general public through web portals. However, thousands of sheets are still unidentified at the species level while numerous sheets should be reviewed and updated following more recent taxonomic knowledge. These annotations and revisions require an unrealistic amount of work for botanists to carry out in a reasonable time. Computer vision and machine learning approaches applied to herbarium sheets are promising but are still not well studied compared to automated species identification from leaf scans or pictures of plants in the field. Results In this work, we propose to study and evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology. In addition, we propose to study if the combination of herbarium sheets with photos of plants in the field is relevant in terms of accuracy, and finally, we explore if herbarium images from one region that has one specific flora can be used to do transfer learning to another region with other species; for example, on a region under-represented in terms of collected data. Conclusions This is, to our knowledge, the first study that uses deep learning to analyze a big dataset with thousands of species from herbaria. Results show the potential of Deep Learning on herbarium species identification, particularly by training and testing across different datasets from different herbaria. This could potentially lead to the creation of a semi, or even fully automated system to help taxonomists and experts with their annotation, classification, and revision works.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Digitization of Biodiversity Collections Reveals Biggest Data on Biodiversity

                Bookmark

                Author and article information

                Journal
                Proceedings of TDWG
                TDWGProc
                Pensoft Publishers
                2535-0897
                August 16 2017
                August 16 2017
                : 1
                : e20302
                Article
                10.3897/tdwgproceedings.1.20302
                6d60e5d3-c8de-4916-9a7d-cc64b96798d5
                © 2017

                http://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article