11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort’s success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.

          Author Summary

          Although machine learning systems can solve a wide variety of problems, they remain limited in their ability to make logical inferences. We developed a new computational model, called betasort, which addresses these limitations for a certain class of problems: Those in which the algorithm must infer the order of a set of items by trial and error. Unlike extant machine learning systems (but like children and many non-human animals), betasort is able to perform “transitive inferences” about the ordering of a set of images. The patterns of error made by betasort resemble those made by children and non-human animals, and the resulting learning achieved at low computational cost. Additionally, betasort is difficult to classify as either “model-free” or “model-based” according to the formal specifications of those classifications in the machine learning literature. One of the broader implications of these results is that achieving a more comprehensive understanding of how the brain learns will require analysts to entertain other candidate learning models.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          The debate over dopamine's role in reward: the case for incentive salience.

          Debate continues over the precise causal contribution made by mesolimbic dopamine systems to reward. There are three competing explanatory categories: 'liking', learning, and 'wanting'. Does dopamine mostly mediate the hedonic impact of reward ('liking')? Does it instead mediate learned predictions of future reward, prediction error teaching signals and stamp in associative links (learning)? Or does dopamine motivate the pursuit of rewards by attributing incentive salience to reward-related stimuli ('wanting')? Each hypothesis is evaluated here, and it is suggested that the incentive salience or 'wanting' hypothesis of dopamine function may be consistent with more evidence than either learning or 'liking'. In brief, recent evidence indicates that dopamine is neither necessary nor sufficient to mediate changes in hedonic 'liking' for sensory pleasures. Other recent evidence indicates that dopamine is not needed for new learning, and not sufficient to directly mediate learning by causing teaching or prediction signals. By contrast, growing evidence indicates that dopamine does contribute causally to incentive salience. Dopamine appears necessary for normal 'wanting', and dopamine activation can be sufficient to enhance cue-triggered incentive salience. Drugs of abuse that promote dopamine signals short circuit and sensitize dynamic mesolimbic mechanisms that evolved to attribute incentive salience to rewards. Such drugs interact with incentive salience integrations of Pavlovian associative information with physiological state signals. That interaction sets the stage to cause compulsive 'wanting' in addiction, but also provides opportunities for experiments to disentangle 'wanting', 'liking', and learning hypotheses. Results from studies that exploited those opportunities are described here. In short, dopamine's contribution appears to be chiefly to cause 'wanting' for hedonic rewards, more than 'liking' or learning for those rewards.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The time course of perceptual choice: the leaky, competing accumulator model.

            The time course of perceptual choice is discussed in a model of gradual, leaky, stochastic, and competitive information accumulation in nonlinear decision units. Special cases of the model match a classical diffusion process, but leakage and competition work together to address several challenges to existing diffusion, random walk, and accumulator models. The model accounts for data from choice tasks using both time-controlled (e.g., response signal) and standard reaction time paradigms and its adequacy compares favorably with other approaches. A new paradigm that controls the time of arrival of information supporting different choice alternatives provides further support. The model captures choice behavior regardless of the number of alternatives, accounting for the log-linear relation between reaction time and number of alternatives (Hick's law) and explains a complex pattern of visual and contextual priming in visual word identification.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Temporal difference models and reward-related learning in the human brain.

              Temporal difference learning has been proposed as a model for Pavlovian conditioning, in which an animal learns to predict delivery of reward following presentation of a conditioned stimulus (CS). A key component of this model is a prediction error signal, which, before learning, responds at the time of presentation of reward but, after learning, shifts its response to the time of onset of the CS. In order to test for regions manifesting this signal profile, subjects were scanned using event-related fMRI while undergoing appetitive conditioning with a pleasant taste reward. Regression analyses revealed that responses in ventral striatum and orbitofrontal cortex were significantly correlated with this error signal, suggesting that, during appetitive conditioning, computations described by temporal difference learning are expressed in the human brain.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                September 2015
                25 September 2015
                : 11
                : 9
                : e1004523
                Affiliations
                [1 ]Department of Neuroscience, Columbia University, New York, New York, United States of America
                [2 ]Department of Psychology, Columbia University, New York, New York, United States of America
                [3 ]Department of Psychiatry, Columbia University, New York, New York, United States of America
                Oxford University, UNITED KINGDOM
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: GJ VPF HST. Performed the experiments: GJ FM YA. Analyzed the data: GJ FM. Contributed reagents/materials/analysis tools: GJ. Wrote the paper: GJ FM YA VPF HST.

                Article
                PCOMPBIOL-D-15-00744
                10.1371/journal.pcbi.1004523
                4583549
                26407227
                381f1cd2-aa32-4731-b020-0c7fef524d16
                Copyright @ 2015

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

                History
                : 6 May 2015
                : 24 August 2015
                Page count
                Figures: 7, Tables: 1, Pages: 27
                Funding
                This work was supported by US National Institute of Mental Health < http://www.nimh.nih.gov/>, grant number 5R01MH081153 awarded to VPF and HST. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Custom metadata
                All relevant data are within the paper and its Supporting Information files.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article