26
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural networks (CNNs) can learn to extract relevant features automatically, without human intervention. Unfortunately, reaching expert-level accuracy in CNN identifications requires substantial computational power and huge training data sets, which are often not available for taxonomic tasks. This can be addressed using feature transfer: a CNN that has been pretrained on a generic image classification task is exposed to the taxonomic images of interest, and information about its perception of those images is used in training a simpler, dedicated identification system. Here, we develop an effective method of CNN feature transfer, which achieves expert-level accuracy in taxonomic identification of insects with training sets of 100 images or less per category, depending on the nature of data set. Specifically, we extract rich representations of intermediate to high-level image features from the CNN architecture VGG16 pretrained on the ImageNet data set. This information is submitted to a linear support vector machine classifier, which is trained on the target problem. We tested the performance of our approach on two types of challenging taxonomic tasks: 1) identifying insects to higher groups when they are likely to belong to subgroups that have not been seen previously and 2) identifying visually similar species that are difficult to separate even for experts. For the first task, our approach reached \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$CDATA[$CDATA[$>$$\end{document} 92% accuracy on one data set (884 face images of 11 families of Diptera, all specimens representing unique species), and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$CDATA[$CDATA[$>$$\end{document} 96% accuracy on another (2936 dorsal habitus images of 14 families of Coleoptera, over 90% of specimens belonging to unique species). For the second task, our approach outperformed a leading taxonomic expert on one data set (339 images of three species of the Coleoptera genus Oxythyrea; 97% accuracy), and both humans and traditional automated identification systems on another data set (3845 images of nine species of Plecoptera larvae; 98.6 % accuracy). Reanalyzing several biological image identification tasks studied in the recent literature, we show that our approach is broadly applicable and provides significant improvements over previous methods, whether based on dedicated CNNs, CNN feature transfer, or more traditional techniques. Thus, our method, which is easy to apply, can be highly successful in developing automated taxonomic identification systems even when training data sets are small and computational budgets limited. We conclude by briefly discussing some promising CNN-based research directions in morphological systematics opened up by the success of these techniques in providing accurate diagnostic tools.

          Related collections

          Most cited references97

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Deep Residual Learning for Image Recognition

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            One-shot learning of object categories.

            Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Scikit‐learn: Machine learning in Python

                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Syst Biol
                Syst. Biol
                sysbio
                Systematic Biology
                Oxford University Press
                1063-5157
                1076-836X
                November 2019
                02 March 2019
                02 March 2019
                : 68
                : 6
                : 876-895
                Affiliations
                [1 ] Savantic AB, Rosenlundsgatan 52, 118 63 Stockholm, Sweden
                [2 ] Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Frescativagen 40, 114 18 Stockholm, Sweden
                [3 ] Department of Zoology, Stockholm University, Universitetsvagen 10, 114 18 Stockholm, Sweden
                [4 ] Disciplinary Domain of Science and Technology, Physics, Department of Physics and Astronomy, Nuclear Physics, Uppsala University, 751 20 Uppsala, Sweden
                [5 ] School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, SE-10044 Sweden
                [6 ] Department of Zoology, Faculty of Science, Charles University in Prague, Viničná 7, CZ-128 43 Praha 2, Czech Republic
                [7 ] Department of Entomology, National Museum, Cirkusová 1740, CZ-193 00 Praha 9 - Horní Počernice, Czech Republic
                Author notes
                Correspondence to be sent to: Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Frescativagen 40, 114 18 Stockholm, Sweden E-mail: miroslav.valan@ 123456nrm.se .
                Article
                syz014
                10.1093/sysbio/syz014
                6802574
                30825372
                9b87722a-1cfd-4eb7-a5c6-9f08164e6963
                © The Author(s) 2019. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 21 February 2018
                : 13 February 2019
                : 20 February 2019
                Page count
                Pages: 20
                Funding
                Funded by: European Union’s Horizon 2020
                Award ID: 642241
                Award ID: 260 434/2018
                Funded by: Ministry of Culture of the Czech Republic
                Award ID: 2018/14
                Award ID: 00023272
                Categories
                Regular Articles

                Animal science & Zoology
                Animal science & Zoology

                Comments

                Comment on this article