6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Having multiple selves helps learning agents explore and adapt in complex changing worlds

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Significance

          Adaptive agents must continually satisfy a range of distinct and possibly conflicting needs. In most models of learning, a monolithic agent tries to maximize one value that measures how well it balances its needs. However, this task is difficult when the world is changing and needs are many. Here, we considered an agent as a collection of modules, each dedicated to a particular need and competing for control of action. Compared to the standard monolithic approach, modular agents were much better at maintaining homeostasis of a set of internal variables in simulated environments, both static and changing. These results suggest that having “multiple selves” may represent an evolved solution to the universal problem of balancing multiple needs in changing environments.

          Abstract

          Satisfying a variety of conflicting needs in a changing environment is a fundamental challenge for any adaptive agent. Here, we show that designing an agent in a modular fashion as a collection of subagents, each dedicated to a separate need, powerfully enhanced the agent’s capacity to satisfy its overall needs. We used the formalism of deep reinforcement learning to investigate a biologically relevant multiobjective task: continually maintaining homeostasis of a set of physiologic variables. We then conducted simulations in a variety of environments and compared how modular agents performed relative to standard monolithic agents (i.e., agents that aimed to satisfy all needs in an integrated manner using a single aggregate measure of success). Simulations revealed that modular agents a) exhibited a form of exploration that was intrinsic and emergent rather than extrinsically imposed; b) were robust to changes in nonstationary environments, and c) scaled gracefully in their ability to maintain homeostasis as the number of conflicting objectives increased. Supporting analysis suggested that the robustness to changing environments and increasing numbers of needs were due to intrinsic exploration and efficiency of representation afforded by the modular architecture. These results suggest that the normative principles by which agents have adapted to complex changing environments may also explain why humans have long been described as consisting of “multiple selves.”

          Related collections

          Most cited references89

          • Record: found
          • Abstract: found
          • Article: not found

          Conflict monitoring and cognitive control.

          A neglected question regarding cognitive control is how control processes might detect situations calling for their involvement. The authors propose here that the demand for control may be evaluated in part by monitoring for conflicts in information processing. This hypothesis is supported by data concerning the anterior cingulate cortex, a brain area involved in cognitive control, which also appears to respond to the occurrence of conflict. The present article reports two computational modeling studies, serving to articulate the conflict monitoring hypothesis and examine its implications. The first study tests the sufficiency of the hypothesis to account for brain activation data, applying a measure of conflict to existing models of tasks shown to engage the anterior cingulate. The second study implements a feedback loop connecting conflict monitoring to cognitive control, using this to simulate a number of important behavioral phenomena.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance.

            Historically, the locus coeruleus-norepinephrine (LC-NE) system has been implicated in arousal, but recent findings suggest that this system plays a more complex and specific role in the control of behavior than investigators previously thought. We review neurophysiological and modeling studies in monkey that support a new theory of LC-NE function. LC neurons exhibit two modes of activity, phasic and tonic. Phasic LC activation is driven by the outcome of task-related decision processes and is proposed to facilitate ensuing behaviors and to help optimize task performance (exploitation). When utility in the task wanes, LC neurons exhibit a tonic activity mode, associated with disengagement from the current task and a search for alternative behaviors (exploration). Monkey LC receives prominent, direct inputs from the anterior cingulate (ACC) and orbitofrontal cortices (OFC), both of which are thought to monitor task-related utility. We propose that these frontal areas produce the above patterns of LC activity to optimize utility on both short and long timescales.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Separate neural systems value immediate and delayed monetary rewards.

              When humans are offered the choice between rewards available at different points in time, the relative values of the options are discounted according to their expected delays until delivery. Using functional magnetic resonance imaging, we examined the neural correlates of time discounting while subjects made a series of choices between monetary reward options that varied by delay to delivery. We demonstrate that two separate systems are involved in such decisions. Parts of the limbic system associated with the midbrain dopamine system, including paralimbic cortex, are preferentially activated by decisions involving immediately available rewards. In contrast, regions of the lateral prefrontal cortex and posterior parietal cortex are engaged uniformly by intertemporal choices irrespective of delay. Furthermore, the relative engagement of the two systems is directly associated with subjects' choices, with greater relative fronto-parietal activity when subjects choose longer term options.
                Bookmark

                Author and article information

                Contributors
                Journal
                Proc Natl Acad Sci U S A
                Proc Natl Acad Sci U S A
                PNAS
                Proceedings of the National Academy of Sciences of the United States of America
                National Academy of Sciences
                0027-8424
                1091-6490
                3 July 2023
                11 July 2023
                3 January 2024
                : 120
                : 28
                : e2221180120
                Affiliations
                [1] aPrinceton Neuroscience Institute , Princeton University , Princeton, NJ 08544
                [2] bDepartment of Computer Science , Princeton University , Princeton, NJ 08544
                Author notes
                1To whom correspondence may be addressed. Email: zdulberg@ 123456princeton.edu .

                Edited by Marcus Raichle, Washington University in St Louis School of Medicine, St. Louis, MO; received December 13, 2022; accepted May 9, 2023

                Author information
                https://orcid.org/0000-0001-8216-1999
                https://orcid.org/0000-0003-2316-0763
                Article
                202221180
                10.1073/pnas.2221180120
                10334746
                37399387
                2281e6d5-1d8f-442a-a490-485ad87e21f0
                Copyright © 2023 the Author(s). Published by PNAS.

                This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

                History
                : 13 December 2022
                : 09 May 2023
                Page count
                Pages: 12, Words: 9745
                Funding
                Funded by: John Templeton Foundation (JTF), FundRef 100000925;
                Award ID: 61454
                Award Recipient : Zack Dulberg Award Recipient : Jonathan D. Cohen
                Funded by: DOD | USN | Office of Naval Research (ONR), FundRef 100000006;
                Award ID: N00014-22-1-2002
                Award Recipient : Zack Dulberg Award Recipient : Jonathan D. Cohen
                Categories
                video, Video
                research-article, Research Article
                psych-bio, Psychological and Cognitive Sciences
                comp-sci, Computer Sciences
                431
                411
                Biological Sciences
                Psychological and Cognitive Sciences
                Physical Sciences
                Computer Sciences

                reinforcement learning,modularity,conflict,multiobjective decision-making,exploration

                Comments

                Comment on this article