18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Deep Learning: The Good, the Bad, and the Ugly

      1
      Annual Review of Vision Science
      Annual Reviews

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Artificial vision has often been described as one of the key remaining challenges to be solved before machines can act intelligently. Recent developments in a branch of machine learning known as deep learning have catalyzed impressive gains in machine vision—giving a sense that the problem of vision is getting closer to being solved. The goal of this review is to provide a comprehensive overview of recent deep learning developments and to critically assess actual progress toward achieving human-level visual intelligence. I discuss the implications of the successes and limitations of modern machine vision algorithms for biological vision and the prospect for neuroscience to inform the design of future artificial vision systems.

          Related collections

          Most cited references84

          • Record: found
          • Abstract: found
          • Article: not found

          DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

          In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            DeepLabCut: markerless pose estimation of user-defined body parts with deep learning

            Quantifying behavior is crucial for many applications in neuroscience. Videography provides easy methods for the observation and recording of animal behavior in diverse settings, yet extracting particular aspects of a behavior for further analysis can be highly time consuming. In motor control studies, humans or other animals are often marked with reflective markers to assist with computer-based tracking, but markers are intrusive, and the number and location of the markers must be determined a priori. Here we present an efficient method for markerless pose estimation based on transfer learning with deep neural networks that achieves excellent results with minimal training data. We demonstrate the versatility of this framework by tracking various body parts in multiple species across a broad collection of behaviors. Remarkably, even when only a small number of frames are labeled (~200), the algorithm achieves excellent tracking performance on test frames that is comparable to human accuracy.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Aggregated Residual Transformations for Deep Neural Networks

                Bookmark

                Author and article information

                Journal
                Annual Review of Vision Science
                Annu. Rev. Vis. Sci.
                Annual Reviews
                2374-4642
                2374-4650
                September 15 2019
                September 15 2019
                : 5
                : 1
                : 399-426
                Affiliations
                [1 ]Department of Cognitive Linguistic and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, Rhode Island 02818, USA;
                Article
                10.1146/annurev-vision-091718-014951
                31394043
                1b42ff76-fa29-42b7-9a76-6a0027c044cc
                © 2019
                History

                Comments

                Comment on this article