61
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Significance

          Motion-sensor cameras in natural habitats offer the opportunity to inexpensively and unobtrusively gather vast amounts of data on animals in the wild. A key obstacle to harnessing their potential is the great cost of having humans analyze each image. Here, we demonstrate that a cutting-edge type of artificial intelligence called deep neural networks can automatically extract such invaluable information. For example, we show deep learning can automate animal identification for 99.3% of the 3.2 million-image Snapshot Serengeti dataset while performing at the same 96.6% accuracy of crowdsourced teams of human volunteers. Automatically, accurately, and inexpensively collecting such data could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into “big data” sciences.

          Abstract

          Having accurate, detailed, and up-to-date information about the location and behavior of animals in the wild would improve our ability to study and conserve ecosystems. We investigate the ability to automatically, accurately, and inexpensively collect such data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into “big data” sciences. Motion-sensor “camera traps” enable collecting wildlife pictures inexpensively, unobtrusively, and frequently. However, extracting information from these pictures remains an expensive, time-consuming, manual task. We demonstrate that such information can be automatically extracted by deep learning, a cutting-edge type of artificial intelligence. We train deep convolutional neural networks to identify, count, and describe the behaviors of 48 species in the 3.2 million-image Snapshot Serengeti dataset. Our deep neural networks automatically identify animals with >93.8% accuracy, and we expect that number to improve rapidly in years to come. More importantly, if our system classifies only images it is confident about, our system can automate animal identification for 99.3% of the data while still performing at the same 96.6% accuracy as that of crowdsourced teams of human volunteers, saving >8.4 y (i.e., >17,000 h at 40 h/wk) of human labeling effort on this 3.2 million-image dataset. Those efficiency gains highlight the importance of using deep neural networks to automate data extraction from camera-trap images, reducing a roadblock for this widely used technology. Our results suggest that deep learning could enable the inexpensive, unobtrusive, high-volume, and even real-time collection of a wealth of information about vast numbers of animals in the wild.

          Related collections

          Most cited references44

          • Record: found
          • Abstract: found
          • Article: not found

          Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

          State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available. Extended tech report
            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            A unified architecture for natural language processing

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Some Studies in Machine Learning Using the Game of Checkers

                Bookmark

                Author and article information

                Journal
                Proc Natl Acad Sci U S A
                Proc. Natl. Acad. Sci. U.S.A
                pnas
                pnas
                PNAS
                Proceedings of the National Academy of Sciences of the United States of America
                National Academy of Sciences
                0027-8424
                1091-6490
                19 June 2018
                5 June 2018
                5 June 2018
                : 115
                : 25
                : E5716-E5725
                Affiliations
                [1] aDepartment of Computer Science, University of Wyoming , Laramie, WY 82071;
                [2] bDepartment of Computer Science and Software Engineering, Auburn University , Auburn, AL 36849;
                [3] cDepartment of Organismic and Evolutionary Biology, Harvard University , Cambridge, MA 02138;
                [4] dDepartment of Physics, University of Oxford , Oxford OX1 3RH, United Kingdom;
                [5] eDepartment of Ecology, Evolution, and Behavior, University of Minnesota , St. Paul, MN 55108;
                [6] fUber AI Labs , San Francisco, CA 94103
                Author notes
                1To whom correspondence should be addressed. Email: jeffclune@ 123456uwyo.edu .

                Edited by James A. Estes, University of California, Santa Cruz, CA, and approved April 30, 2018 (received for review November 7, 2017)

                Author contributions: M.S.N., A.N., and J.C. designed research; M.S.N. performed research; M.S.N. contributed analytic tools; M.S.N., A.N., M.K., A.S., M.S.P., C.P., and J.C. analyzed data; and M.S.N., A.N., M.K., A.S., M.S.P., C.P., and J.C. wrote the paper.

                Article
                201719367
                10.1073/pnas.1719367115
                6016780
                29871948
                141c6b1d-ab42-47f5-abd2-632fa141e8a8
                Copyright © 2018 the Author(s). Published by PNAS.

                This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

                History
                Page count
                Pages: 10
                Funding
                Funded by: National Science Foundation (NSF) 100000001
                Award ID: 1453549
                Award Recipient : Jeff Clune
                Categories
                PNAS Plus
                Biological Sciences
                Ecology
                Physical Sciences
                Computer Sciences
                From the Cover
                PNAS Plus

                deep learning,deep neural networks,artificial intelligence,camera-trap images,wildlife ecology

                Comments

                Comment on this article