+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.

          Author Summary

          Primates are remarkable at determining the category of a visually presented object even in brief presentations, and under changes to object exemplar, position, pose, scale, and background. To date, this behavior has been unmatched by artificial computational systems. However, the field of machine learning has made great strides in producing artificial deep neural network systems that perform highly on object recognition benchmarks. In this study, we measured the responses of neural populations in inferior temporal (IT) cortex across thousands of images and compared the performance of neural features to features derived from the latest deep neural networks. Remarkably, we found that the latest artificial deep neural networks achieve performance equal to the performance of IT cortex. Both deep neural networks and IT cortex create representational spaces in which images with objects of the same category are close, and images with objects of different categories are far apart, even in the presence of large variations in object exemplar, position, pose, scale, and background. Furthermore, we show that the top-level features in these models exceed previous models in predicting the IT neural responses themselves. This result indicates that the latest deep neural networks may provide insight into understanding primate visual processing.

          Related collections

          Most cited references 44

          • Record: found
          • Abstract: found
          • Article: not found

          Speed of processing in the human visual system.

          How long does it take for the human visual system to process a complex natural image? Subjectively, recognition of familiar objects and scenes appears to be virtually instantaneous, but measuring this processing time experimentally has proved difficult. Behavioural measures such as reaction times can be used, but these include not only visual processing but also the time required for response execution. However, event-related potentials (ERPs) can sometimes reveal signs of neural processing well before the motor output. Here we use a go/no-go categorization task in which subjects have to decide whether a previously unseen photograph, flashed on for just 20 ms, contains an animal. ERP analysis revealed a frontal negativity specific to no-go trials that develops roughly 150 ms after stimulus onset. We conclude that the visual processing needed to perform this highly demanding task can be achieved in under 150 ms.
            • Record: found
            • Abstract: found
            • Article: not found

            Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering.

            This study introduces a new method for detecting and sorting spikes from multiunit recordings. The method combines the wavelet transform, which localizes distinctive spike features, with superparamagnetic clustering, which allows automatic classification of the data without assumptions such as low variance or gaussian distributions. Moreover, an improved method for setting amplitude thresholds for spike detection is proposed. We describe several criteria for implementation that render the algorithm unsupervised and fast. The algorithm is compared to other conventional methods using several simulated data sets whose characteristics closely resemble those of in vivo recordings. For these data sets, we found that the proposed algorithm outperformed conventional methods.
              • Record: found
              • Abstract: found
              • Article: not found

              The variable discharge of cortical neurons: implications for connectivity, computation, and information coding.

              Cortical neurons exhibit tremendous variability in the number and temporal distribution of spikes in their discharge patterns. Furthermore, this variability appears to be conserved over large regions of the cerebral cortex, suggesting that it is neither reduced nor expanded from stage to stage within a processing pathway. To investigate the principles underlying such statistical homogeneity, we have analyzed a model of synaptic integration incorporating a highly simplified integrate and fire mechanism with decay. We analyzed a "high-input regime" in which neurons receive hundreds of excitatory synaptic inputs during each interspike interval. To produce a graded response in this regime, the neuron must balance excitation with inhibition. We find that a simple integrate and fire mechanism with balanced excitation and inhibition produces a highly variable interspike interval, consistent with experimental data. Detailed information about the temporal pattern of synaptic inputs cannot be recovered from the pattern of output spikes, and we infer that cortical neurons are unlikely to transmit information in the temporal pattern of spike discharge. Rather, we suggest that quantities are represented as rate codes in ensembles of 50-100 neurons. These column-like ensembles tolerate large fractions of common synaptic input and yet covary only weakly in their spike discharge. We find that an ensemble of 100 neurons provides a reliable estimate of rate in just one interspike interval (10-50 msec). Finally, we derived an expression for the variance of the neural spike count that leads to a stable propagation of signal and noise in networks of neurons-that is, conditions that do not impose an accumulation or diminution of noise. The solution implies that single neurons perform simple algebra resembling averaging, and that more sophisticated computations arise by virtue of the anatomical convergence of novel combinations of inputs to the cortical column from external sources.

                Author and article information

                Role: Editor
                PLoS Comput Biol
                PLoS Comput. Biol
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                December 2014
                18 December 2014
                : 10
                : 12
                [1 ]Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
                [2 ]Harvard–MIT Division of Health Sciences and Technology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
                University of Tübingen and Max Planck Institute for Biologial Cybernetics, Germany
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: CFC HH NP NJM JJD. Performed the experiments: CFC HH DLKY DA EAS NJM. Analyzed the data: CFC HH DLKY DA EAS. Contributed reagents/materials/analysis tools: CFC HH DLKY NP DA EAS. Wrote the paper: CFC HH JJD.


                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Page count
                Pages: 18
                This work was supported by the U.S. National Eye Institute (NIH NEI: 5R01EY014970-09), the National Science Foundation (NSF: 0964269), and the Defense Advanced Research Projects Agency (DARPA: HR0011-10-C-0032). CFC was supported by the U.S. National Eye Institute (NIH: F32 EY022845-01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Research Article
                Biology and Life Sciences
                Computational Biology
                Computational Neuroscience
                Artificial Neural Networks
                Custom metadata
                The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are available from http://dicarlolab.mit.edu/.

                Quantitative & Systems biology


                Comment on this article