7
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Cohort Profile: The NSPN 2400 Cohort: a developmental sample supporting the Wellcome Trust NeuroScience in Psychiatry Network

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Why was the cohort set up? Mental and substance use disorders are the leading cause of years lived with disability, worldwide. 1 Other than childhood developmental disorders and neurodegenerative dementias of the elderly, most mental health disorders are first manifest in the second and third decades of life during which the highest proportion of total disability adjusted life years occurs due to their enormous impact on normal, adolescent and young adult functioning; 1 non-syndromal abnormalities can be identified far earlier in life. The normal human brain undergoes a range of normative developmental process during this extended post-pubertal epoch, but the events that account for the massive increases in risk for mental health disorders remain obscure, something compounded by the questionable validity of current psychiatric nosology. Thus, the development of preventative or disease-modifying approaches remain a distant goal. Recent applied neuroscience advances highlight three pathways of exploration in order to reconstruct nosology 2 : 1) studying the extent of variation in cognition and behaviour throughout the general population rather than comparing categories of mentally well and mentally ill; 2) investigating brain systems underlying emotion, cognition and behaviour; if these emerge from integration of activity over large-scale brain networks, it should be possible to mechanistically link the variation in psychological phenotypes with differences in underlying brain systems; 3) adopting a developmental perspective to understand optimal/suboptimal trajectories of neurocognition as early as possible within the high risk period. We aimed to link normal and psychopathological variation at the behavioural, cognitive and emotion level to phenotypic variation at the level of brain systems, subverting the traditional division between adult and child/adolescent psychiatry by measuring specified dimensions in healthy volunteers and patients in the age range of 14–24 years. The NSPN 2400 Cohort was established in July 2012 as a collaboration between the University of Cambridge and University College London supported primarily by a strategic award from the Wellcome Trust. Who is in the cohort? The NSPN 2400 Cohort is a general population sample aged 14-24 years conceived to support an accelerated longitudinal design to measure developmental change. This design involves recruitment of multiple, age-adjacent cohorts followed longitudinally for a limited period of time, which permits estimation of trajectory across a wider range of ages more quickly than a single-cohort longitudinal follow-up. 3 In addition to its efficiency, bias from attrition can be less problematic given that drop outs in cohorts is related to study duration, highlighting another advantage of the accelerated design. 4 The NSPN 2400 Cohort aimed to recruit at least 2000 participants in an age- sex-stratified sample, including equal numbers of males and females for the following five age groups: 14-15, 16-17, 18-19, 20-21, and 22-24.99 years. Participants received a Home Questionnaire Pack (HQP) and Sociodemographic Questionnaire that focused on assessing participants’ mood, behaviour and wellbeing along with demographic characteristics. This was accompanied by an Oragene saliva sampling kit for DNA collection that was returned to the study team by post, together with the completed questionnaires. Two samples with more intensive measures are embedded within the NSPN 2400 Cohort (Figure 1). First, the ‘MRI cohort’ (N = 318) took part in in-unit assessments of brain structure and function, using magnetic resonance imaging (MRI), as well as detailed behavioural assessments of cognitive and social cognitive function using computer-based evaluations, clinical assessments and IQ measures. Participants from each age- sex-stratum were invited in equal numbers using the order in which they had been recruited to the 2400 cohort (assumed to be random) until at least 30 from each stratum had been through the assessment. An additional sub-sample (N = 467) participated in the same computational tests of cognitive function and clinical assessments but without the MRI component. Again, these were recruited from the ten age-sex strata as for the MRI cohort, aiming for a sample size of at least 450 additional subjects with detailed cognition measurement and, including the MRI cohort, a total of 750 or more people with the cognitive assessments, This combined sub-sample with cognition measures (the ‘cognition cohort’) comprises 785 people, of which 318 (the MRI cohort) have both MRI and cognition measurements. When resources for taking blood allowed, participants in both cohorts were asked to provide a venous blood sample for future genetic, epigenetic and gene expression. The MRI and cognition cohorts were followed-up on one or two occasions. By the virtue of this design, there are participants that completed all three waves of HQP as well as three in-unit assessments. Figure 1 Predicted cascade sampling of study cohorts within the NSPN. Recruitment The NIHR Primary Care Research Network (PCRN) engaged 50 GP’s to recruit young people using their sex-age registers by sending out invitations (including an expressions of interest form (EoI)) across Cambridgeshire and Greater London (closest proximity to universities leading the study). Schools and Further Education colleges were also engaged to distribute the EoI forms to 14 to 18-year-old participants. The NSPN recruitment team assisted GP’s and schools by providing invitation to participate letters, which were forwarded to potential participant’s home address that remained unknown to the NSPN investigators. Purposive advertisement was also used during recruitment; invitation letters with EoI were sent to those who responded to advertisements that met the age criteria. If an individual wanted to participate they informed NSPN recruitment team over the phone/sent in completed EoI form. The STROBE diagram (Figure 2) shows that an estimated 30,923 EoI forms were distributed within GP’s practices and schools, of which 4170 (13.5%) were returned to the NSPN recruitment team. From the 4170 pool, 3726 people were eligible for further participation. 444 participants were rejected on the basis of the age- sex- strata being sufficiently populated. The Home Questionnaire Pack was sent to all eligible 3726 participants and returned by 65% of them (N = 2402, marking the baseline assessment stage of the NSPN 2400 Cohort. Figure 2 STROBE diagram showing the recruitment stages of the NSPN 2400 cohort. EoI = expression of interest; HQP = home questionnaire pack. How often have they been followed up? The NSPN 2400 Cohort is predicated on an accelerated longitudinal design (Figure 2). Thus, each participant has been invited to provide data on at least two occasions (at baseline and follow-up 1) through the completion of HQPs, and, ideally, on two occasions, thereafter; median interval for return of subsequent questionnaires (inter-quartile range) between baseline and first follow-up was 12 months (11–14 months), and between second and third assessments was 13 months (12-16 months). Figure 2 shows that follow-up 1 yielded a 70% response and follow-up 2 a 47% response rate when compared with HQP baseline. In total, HQP data at three time points were obtained from 1134 participants (as of December 2016). The reasons for non-response could not be determined as non-response equated to participants not returning the HQP; a minority of participants told us they did not want to take part further. Each HQP follow-up was separated by an interval, described as the difference in days between the return date of HQP baseline and the return date of subsequent follow-up questionnaires. A median interval for HQP follow-up 1 was exactly 1 year (365 days) and the Inter-Quartile Range (IQR) was 85 days. For HQP follow-up 2 the median was 2.25 years (823 days) and the IQR was 120 days. The median interval between HQP follow-up 1 and 2 was 1.1 years (405 days) and the IQR was 114.5 days. Sociodemographic characteristics of those that dropped out at each follow-up are provided in Supplementary Materials section 1. In general, no obvious biases were observed in regards to ethnicity, place of birth, parental qualification and number of males/females for participants that did not complete follow-up questionnaires. Table 1 presents number of participants at follow-up 1 for the Cognition and MRI cohorts and time lag between assessments. Table 1 Calculation of participants falling within each quantile (Q) based on the number of days it took them to return the HQP at each wave of assessment HQP baseline Q1 (0-10 days) Q2 (11-15 days) Q3 (16-24 days) Q4 (25-352 days) Total N = 2402 N = 601 N = 647 N = 565 N = 590 HQP follow-up 1 Q1 (0-13 days) Q2 (14-21 days) Q3 (22-36 days) Q4 (37-793 days) Total N = 1684 N = 464 N = 398 N = 403 N = 419 HQP follow-up 2 Q1 (0-13 days) Q2 (14-23 days) Q3 (24-35 days) Q4 (36-315 days) Total N = 1134 N = 293 N = 271 N = 283 N = 287 Median number of days from the date the questionnaire was sent to the date it was returned were calculated for each HQP wave. For HQP baseline the median was 15 days and the IQR was 14 days. For HQP follow-up 1 the median was 21 days and the IQR was 23 days. Finally, for the HQP follow-up 2 the median was 23 days and the IQR was 22 days. Table 2 presents number of the NSPN 2400 Cohort participants falling within each quantile using the 0.25%, 0.50% and 0.75% cut offs. Table 2 Participant number and time-lag calculation between baseline and follow-up 1 for the cognition and MRI cohorts IUA baseline IUA follow-up 1 Cognition cohort N = 785 N = 568 median time lag: 18.0 months, range: 11.8-31.4 months MRI cohort N = 318 N = 234 median time lag: 15.4 months, range: 11.7-28.0 months Cognition cohort retention was 72% and MRI cohort retention was 74%. IUA, in-unit assessment. A Microsoft Access-based Cohort Management System (CMS) was devised to store identifiable data (held on secure, password protected University of Cambridge servers in accordance with the Data Protection Act (1998)). Upon completion of relevant assessments, data for each participant was recorded/transferred to a database using the Research Electronic Data Capture (REDCap) software. 5 Following successful transfer and quality checks, data were released for manipulation and analysis in an anonymised form to any researcher that was approved by Principal Investigators. What has been measured? Table 3 below lists the self-report instruments included in Home Questionnaire Pack (HQP) to measure common mental health constructs by focusing on mood, behaviour and general well-being. The Sociodemographic Questionnaire (SQ) was primarily built to reflect questions asked within the 2011 public census to define participant’s family characteristics like ethnicity, highest maternal and/or paternal qualification, current postcode, employment status etc. If a participant was under the age of 18, parental consent was sought for them to participate in the study and complete the HQP. The SQ was completed by the parent if the participant was under-age. Table 3 List of measures available in each NSPN cohort NSPN 2400 cohort HQP HQP HQP baseline follow-up 1 follow-up 2 Moods and Feelings Questionnaire 1 X X X Revised Children’s Manifest Anxiety Scale+ X X X Leyton Obsessional Inventory 3 X X X The Antisocial Behaviours Checklista X X X Rosenberg Self-Esteem Scale 4 X X X Life Events Questionnaire 5 X X X Kessler Psychological Distress Scale 6 X X X Antisocial Process Screening Device 7 X X X Child and Adolescent Disposition Scale 8 X X X Drugs Alcohol and Self Injurya X X X Schizotypal Personality Questionnaire 9 X X X Warwick Edinburgh Mental Well-being Scale 10 X X X Inventory of Callous-Unemotional Traits 11 X X X Barratt Impulsive Scale 12 X X X Family Assessment Device (General Family Functioning subscale) 13 X X X Friendship Questionnaire 14 ,a X X X Alabama Parenting Questionnaire 15 X X – Measure of Parenting Style 16 X X – Positive Parenting Questionnairea X X – Affective Personalities Questionnaire 7 , 18 ,a – – X Reflective Function Questionnaire 19 – – X Sociodemographic Questionnairea X Xb Xb Padual Inventory – Washington State University Revision 20 – X X Cognition cohort IUA baseline IUA follow-up 1 Cognitive battery module Orthogonalized Go-NoGo task 21 X X Roulette task 22 X X Human Approach-Avoidance task 23 X X Information Gathering task 24 X X Two-step task 25 X X Delegated Intertemporal Discounting task 26 X X Investor-Trustee task 27 , 28 X X Subjective Well-being task 29 X X Clinical assessment module Edinburgh Handedness Inventory 30 X – Child Trauma Questionnaire 31 X X Tanner Puberty Scale 32 X X Hormone Question Sheeta X X Wechsler Abbreviated Scale of Intelligence (WASI) 33 X X Height, weight, waist circumferencea X X Self-report of youth behaviour 34 – X Snaith Hamilton Pleasure Scale 35 – X Obsessive Compulsive Inventory Revised 36 – X SCID 1 (Depression, Suicidal, Mania, Substance Use) 37 X X SCID 2 (PLIKS: Unusual experience, Hallucination) 37 X X SCID 3 (PLIKS: Delusions) 37 X X SCID 4 (Others) 37 X X Measures for the MRI and cognition cohorts are split in Table 3 to reflect the modular approach to in-unit assessments. Detailed description of both cognitive task battery and MRI acquisitions are provided in Supplementary Materials section 2. Figure 3 is an example of number of participants for each age bin that completed Moods and Feelings Questionnaire (MFQ) as part of HQP. Figure 3 Illustration of number of participants who completed Moods and Feeling Questionnaire (MFQ) within each age group at each stage of recruitment (as of October 2016). Italicised N indicate the total number of participants for each age group. Abbreviations: MFQ = Moods and Feelings Questionnaire. What has it found? Key findings and publications The NSPN 2400 Cohort representativeness To assess the representativeness of the NSPN 2400 Cohort in terms of the England & Wales youth population, five sociodemographic characteristics were compared with data from the 2011 census extracted from the labour market tables produced by the Office of National Statistics (data queries were run on www.nomisweb.co.uk). Detailed explanation and figures can be found in Supplementary Materials section 3. In summary, the NSPN Cohort: 1) broadly matched the ethnicity to the general population of England & Wales, with mixed and Asian groups slightly over-represented; 2) closely resembled the England & Wales population structure when looking at the proportion of UK vs. non-UK births; 3) NSPN volunteers’ parents were more likely to complete qualifications, which translates to an almost 10% difference in achieving Level 1 to 4 qualification when compared with England & Wales. The percentage of vocational qualifications achieved was very similar; 4) on average across the ages there were 5% more females and 5% fewer males compared to England & Wales; 5) an under-representation within the lowest 1st decile and an over-representation within the 9th highest decile was observed when compared to the distribution of Indicator of Multiple Deprivation 6 ranks in England. The remaining deciles are broadly comparable to England. Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome As an example of the kind of work linking the cohort with the biological measures in the sub-groups, we have studied developmental changes in the cerebral cortex. We found, consistently in two MRI cohorts, that human brain changes in adolescence were concentrated on the more densely connected hubs of the connectome. These particularly well connected regions were located in association cortex, parts of the brain that support higher order cognitive and social processing. At age 14, hub regions had lower magnetisation transfer (MT) than other cortical areas, indicating lower myelin content, but had greater increases in this measure during the 14 to 24 year period. This suggests that cortical hubs have more prolonged myelination that the rest of the cortex. This topologically focused process of cortical consolidation was associated with expression of genes enriched for normal synaptic and myelin-related processes and risk of schizophrenia. We conclude that consolidation of anatomical network hubs could be important for normal and potentially different for clinically disordered adolescent brain development. 7 Gene transcription profiles associated with inter-modular hubs and connection distance in human functional magnetic resonance imaging networks Human functional magnetic resonance imaging (fMRI) brain networks have a complex topology comprising integrative components, e.g. long-distance inter-modular edges that are theoretically associated with higher biological cost. We estimated intra-modular degree, inter-modular degree and connection distance for each of 285 cortical nodes in multi-echo fMRI data from 38 healthy adults and matched our neuroimaging data with openly available transcriptomic expression measures of more than 20,000 genes. We showed nodes in superior and lateral cortex with high inter-modular degree and long connection distance had local transcriptional profiles enriched for oxidative metabolism and mitochondria, and for genes specific to supragranular layers of human cortex. In contrast, primary and secondary sensory cortical nodes in posterior cortex with high intra-modular degree and short connection distance had transcriptional profiles enriched for RNA translation and nuclear components. We conclude that topologically integrative hubs, mediating long-distance connections between modules, are more costly in terms of mitochondrial glucose metabolism. 8 Impulsivity and peer influence study This was the first study analysing data from the cognition cohort. We found that inter-temporal discounting, 9 a standard measure of impulsivity in animal and human research, was subject to peer influence even if social or monetary rewards did not motivate participants. Participants shifted their level of impulsivity towards that of experimental ‘partners’ depending on two key characteristics: first, how relevant they felt their partner’s observed choices were; and second, how certain they were about their own tastes in the matter. 10 What are the main strengths and weaknesses? Strengths To our knowledge, the NSPN 2400 Cohort is the first to combine the behavioural, cognitive and neuroimaging measures to study the normative development of well-being and mental health in an adolescent/young adult cohort representing the England and Wales general population. Despite the NSPN 2400 being a volunteer sample, we demonstrated that it is broadly representative of the England & Wales youth; therefore, it is reasonable to generalise research findings to a wider population. The accelerated longitudinal design will allow estimation of development (growth curves) describing how self-report, cognitive or MRI measures change as a function of chronological age and gender, and to sketch the developmental trajectory of mental health. To do this, mixed effects models will be used to analyse outcome data, using fixed and random effects for linear and quadratic terms for age, with stratification by gender given that differences between boys and girls are accepted within the relevant literature. Another strength is a relatively good retention rate in the study, particularly at the first follow-up. Currently reported 47% retention rate for the second follow-up may increase as data collection continues. Weaknesses A paradoxical weakness is that participants were volunteers for an intensive study, albeit drawn from a randomly selected population, and volunteers are a unique population, especially psychologically. This sampling bias is perhaps evident as participants were from families with higher parental educational attainment when compared with the general population. This potentially means that, for younger participants in particular, they were encouraged to take part by parents particularly aware of the importance of research. That said, many participants are older and more autonomous. Unfortunately, we did not seek ethical committee approval to collect information on people who expressed interest in the study but did not, subsequently, consent to take part. Furthermore, we were not able to obtain accurate estimations of the population-based sampling frame (e.g. numbers of people in age-sex GP registers) from the PCRN, and we attempted to follow at two years only those we had measured at the one-year follow-up, standard in an accelerated design. Another limitation is that we have no information on the important period of change before the age of 14 years; this intend this to be the focus of further work. Despite best efforts, 53% attrition also means that we do not have the longitudinal information on every participant, which decreases our power to detect long-terms effects and introduces bias. Finally, the cohort is, by design, yet to live through the main period of risk for incident mental illness. Thus, the current emphasis is on characterising developmental styles and variations in the quantitative behavioural, cognitive and neural domains included in the study. It will be some time before the participants are at an age when the full implications of these differences will be known in terms of risk of conventional diagnostic categories. However, the intention is to describe and model developmental processes that transcend these unsatisfactory concepts. Can I get hold of the data? Where can I find out more? The study is committed to open science with the aim to make the anonymised dataset fully available to the research community. The participants have consented to their de-identified data being made available to other researchers. The first step has been to define a concise application process that establishes the bone fides of those making the request, accessible by email to openNSPN@medschl.cam.ac.uk. Requests are reviewed by the investigators. Second, data sets used for all publications involving NSPN are available at URLs to be included in the publication. Finally, the study aspires to making data publically available. This publication is based on data at https://doi.org/10.17863/CAM.12547. A process has begun involving participants themselves, ethicists, the funder, lawyers and experts in informatics and research governance in order to establish a framework in which to move as far as possible towards that aspiration. Profile in a nutshell The NSPN 2400 Cohort was established to link normal and psychopathological variation at the behavioural, cognitive and emotion level to phenotypic variation at the level of brain systems, subverting the unhelpful division between adult and child/adolescent psychiatry by measuring specified dimensions in healthy volunteers in the age range of 14–24 years. Participants were recruited in 2012 from Greater London and Cambridgeshire and are broadly representative of England & Wales. Self-reported behavioural data are available at three time-points with questionnaire return rate of 70% at one year follow-up, and 47% at two years when compared with baseline participant number of N = 2402. Cognitive battery data retention rate is 72% and for MRI data is 74% at follow-up 1, with baseline data points for 785 and 318 participants respectively. The NIHR Cambridge BioResource extracted and stores DNA from 2087 saliva samples. Part of each sample will be genotyped using the UK Biobank Axiom Array. This comprises 820,967 genetic markers designed for three domains: markers of specific interest, rare coding variants, and genome-wide coverage. The NSPN 2400 Cohort (measures of mental well-being, demographics and DNA), Cognitive cohort (cognitive tasks measures and clinical assessment) and MRI cohort (structural and functional imaging measures) data will be accessible for collaboration upon agreement with the principal investigators. Enquiries should be submitted to openNSPN@medschl.cam.ac.uk. Funding This study was supported by the Neuroscience in Psychiatry Network, a strategic award from the Wellcome Trust to the University of Cambridge and University College London (095844/Z/11/Z). Additional support was provided by the National Institute for Health (NIHR) Research Cambridge Biomedical Research Centre, the NIHR Collaboration for Leadership in Applied Health Research & Care East of England, and the Medical Research Council (MRC)/Wellcome Trust Behavioural and Clinical Neuroscience Institute. Supplementary Material Supplementary Data Click here for additional data file.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          The SPQ: a scale for the assessment of schizotypal personality based on DSM-III-R criteria.

          A Raine (1990)
          Existing self-report measures of schizotypal personality assess only one to three of the nine traits of schizotypal personality disorder. This study describes the development of the Schizotypal Personality Questionnaire (SPQ), a self-report scale modeled on DSM-III-R criteria for schizotypal personality disorder and containing subscales for all nine schizotypal traits. Two samples of normal subjects (n = 302 and n = 195) were used to test replicability of findings. The SPQ was found to have high sampling validity, high internal reliability (0.91), test-retest reliability (0.82), convergent validity (0.59 to 0.81), discriminant validity, and criterion validity (0.63, 0.68), findings which were replicated across samples. Fifty-five percent of subjects scoring in the top 10 percent of SPQ scores had a clinical diagnosis of schizotypal personality disorder. Thus, the SPQ may be useful in screening for schizotypal personality disorder in the general population and also in researching the correlates of individual schizotypal traits.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The rupture and repair of cooperation in borderline personality disorder.

            To sustain or repair cooperation during a social exchange, adaptive creatures must understand social gestures and the consequences when shared expectations about fair exchange are violated by accident or intent. We recruited 55 individuals afflicted with borderline personality disorder (BPD) to play a multiround economic exchange game with healthy partners. Behaviorally, individuals with BPD showed a profound incapacity to maintain cooperation, and were impaired in their ability to repair broken cooperation on the basis of a quantitative measure of coaxing. Neurally, activity in the anterior insula, a region known to respond to norm violations across affective, interoceptive, economic, and social dimensions, strongly differentiated healthy participants from individuals with BPD. Healthy subjects showed a strong linear relation between anterior insula response and both magnitude of monetary offer received from their partner (input) and the amount of money repaid to their partner (output). In stark contrast, activity in the anterior insula of BPD participants was related only to the magnitude of repayment sent back to their partner (output), not to the magnitude of offers received (input). These neural and behavioral data suggest that norms used in perception of social gestures are pathologically perturbed or missing altogether among individuals with BPD. This game-theoretic approach to psychopathology may open doors to new ways of characterizing and studying a range of mental illnesses.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Go and no-go learning in reward and punishment: Interactions between affect and effect

              Introduction Optimal decision-making requires choices that maximize reward and minimize punishment. Animals are endowed with two broad classes of mechanisms to achieve this optimization. Firstly, hard-wired, or Pavlovian, policies directly tie affectively important outcomes, together with learned predictions of those outcomes, to valence-dependent stereotyped behavioral responses. Secondly, a more flexible, instrumental, controller learns choices on the basis of contingent consequences (Dickinson and Balleine, 2002). These controllers generally favor the same choices, thereby rendering learning fast and efficient. However, their underlying workings are best revealed by striking sub-optimalities that ensue when they come into opposition (Boureau and Dayan, 2011; Breland and Breland, 1961; Dayan et al., 2006). One abundant source of sub-optimalities is the substantial interdependence of two logically independent axes of behavioral control (Boureau and Dayan, 2011; Cools et al., 2011; Gray and McNaughton, 2000; Niv et al., 2007): a valence axis running from reward to punishment, and an action axis running from vigor to inhibition. Pavlovian responses associated with predictions of reward usually entail vigorous active approach and engagement (Gray and McNaughton, 2000), irrespective of the instrumental validity of these actions. Equally, Pavlovian responses to (at least distal possible) punishments are generally associated with behavioral inhibition (Blanchard and Blanchard, 1988; Gray and McNaughton, 2000; Soubrie, 1986). The functional architecture of the basal ganglia, a region known to support instrumental control, reflects the same association between affect and effect. For example, the so-called “direct pathway” promotes go choices in light of provided rewards while the “indirect pathway” promotes no-go choices in light of foregone rewards (Frank and Fossella, 2011; Gerfen, 1992). Further, the same dual association may also be expressed within ascending monoaminergic systems (Cools et al., 2011). Thus, the dopaminergic system is involved in generating active motivated behavior (Berridge and Robinson, 1998; Niv et al., 2007; Salamone et al., 2007) and instrumental learning through reward prediction errors (Schultz et al., 1997). On the other hand, the serotonergic system seems to be more closely affiliated with behavioral inhibition in aversive contexts (Crockett et al., 2008; Dayan and Huys, 2009; Soubrie, 1986). Previous human studies on instrumental learning and decision making have generally exploited a conventional coupling between reward and go choices (e.g. Frank et al., 2004; O'Doherty et al., 2004). By contrast, various groups, including ourselves, have taken a different approach to decision-making, using tasks that fully orthogonalize action and valence in a balanced 2 (reward/punishment) × 2 (go/no-go) design (Crockett et al., 2009; Guitart-Masip et al., 2011). These latter tasks reveal that Pavlovian value expectations can disrupt instrumental performance, with anticipation of punishment impairing active go responses. However, the studies concerned considered steady-state behavior in a stable world, and did not examine learning. This is a critical omission, since the interaction between action and valence could boost, or indeed prevent learning altogether, and since the neural substrates of acquisition and maintenance could be quite different — as indeed has been claimed for action learning (Atallah et al., 2007; Everitt et al., 2008). Here, we designed a variant of our previous task (Guitart-Masip et al., 2011) to examine Pavlovian influences on instrumental learning of go and no-go choices to maximize gains and minimize losses. This question has generally been studied using Pavlovian to instrumental transfer paradigms involving separate Pavlovian and instrumental training phases prior to a transfer phase in which the effects of Pavlovian stimuli on instrumental performance are tested in extinction (Cardinal et al., 2002; Huys et al., 2011; Parkinson et al., 1999; Talmi et al., 2008). Our task, instead, involves the instrumental learning of active and passive choices (go or no-go) in contexts where either wins or losses are probabilistically realized. Therefore, the expected Pavlovian effects are incidental. The task structure allows a detailed computational analysis of Pavlovian and instrumental influences during learning while retaining the orthogonalization of reward/punishment and go/no-go of our original task. We hypothesized that learning of the optimal action choice (go or no-go) would be affected by the value of the choice outcomes. This would result from an interference arising out of state values or the expected value generated by the fractal images (Pavlovian controller) on the learned instrumental choice values for go and no-go options (Instrumental controller). In our task action and state values are indistinguishable from each other using fMRI because these values are highly correlated in some of the conditions. However, we envisaged that the neural correlates of action values for go and no-go choices would be affected by the states in which these actions are required. We expected that action values for go and no-go choices would be differentially expressed in the win and avoid losing conditions in brain areas implicated in the realization of a behavioral interaction between action and valence in our task. We surmised that this interaction should be evident in the striatum and amygdala, guided by previous studies implicating such regions in Pavlovian influences on instrumental choice (Cardinal et al., 2002; Parkinson et al., 1999; Talmi et al., 2008). Finally, as we also observed a value independent action bias in choices, we predicted that brain areas involved in inhibiting prepotent responses such as the inferior frontal gyrus (Aron and Poldrack, 2006; Robbins, 2007) would be involved in no-go performance. In accordance with previous accounts of the involvement of the striatum and SN/VTA in instrumental learning, we show that the magnitude of activity in striatum and SN/VTA parametrically tracked instrumental action values. Critically, we show that the sign of relationship between action value and striatal and SN/VTA activity depended on the vigor status of the behavioral choice, being positive for go actions and negative for no-go actions. For instance, a larger expected reward for a no-go action was coupled to less activity in both striatum and SN/VTA, whereas a larger expected reward for a go action was coupled to more activity in the same structures. Moreover, we exploited the fact that a significant subset of participants did not acquire accurate instrumental responses for all conditions to characterize the differential neural responses in those participants that showed successful instrumental performance in our task. Materials and methods Subjects 47 adults participated in the experiment (28 females and 19 males; age range 18–35 years; mean 23.1, SD = 4.1 years). 17 subjects performed the experiment outside, and 30 subjects, inside the scanner. All participants were healthy, right-handed and had normal or corrected-to-normal visual acuity. None of the participants reported a history of neurological, psychiatric or any other current medical problems. All subjects provided written informed consent for the experiment, which was approved by the local ethics board (University College London, UK). Experimental design and task We used a modified version of an experimental design we previously employed to disentangle the effects of action and valence in anticipatory responses in the striatum and the SN/VTA post learning (Guitart-Masip et al., 2011). Here we are addressing learning of state-action contingencies. Each trial consisted of three events: a fractal cue, a target detection task and a probabilistic outcome. The trial timeline is displayed in Fig. 1. In each trial, subjects saw one of four abstract fractal cues for 1000 ms. The fractal cues indicated whether a participant would subsequently be required to perform a target detection task by emitting a button press (go) or not (no-go). The fractal also instructed subjects as to the possible valence of any outcome consequent on the subject's behavior (reward/no reward or punishment/no punishment). The meaning of the fractal images was randomized across participants. Following a variable interval (250–2000 ms) after offset of the fractal image, the target detection task commenced. The target was a circle displayed on one side of the screen for 1500 ms. Participants had 1000 ms in which they indicated, via a key press, the side on which the cue was presented. If they chose to do so, and if they chose the correct side, the response was classified as “go”. 1000 ms after the offset of the circle, subjects were presented with the outcome. The outcome remained on screen for 1000 ms: a green arrow pointing upwards indicated a win of £1, a red arrow pointing downwards indicated a loss of £1, and a yellow horizontal bar indicated no win or loss. The outcome was probabilistic, in win trials 80% of correct choices and 20% of incorrect choices were rewarded (the remaining 20% of correct and 80% of incorrect choices leading to no outcome), while in lose trials 80% of correct choices and 20% of incorrect choices avoided punishment. Thus, there were 4 trial types depending on the nature of the fractal cue presented at the beginning of the trial: press the correct button in the target detection task to gain a reward (go to win); press the correct button in the target detection task to avoid punishment (go to avoid losing); do not press a button in the target detection task to gain a reward (no-go to win); do not press a button in the target detection task to avoid punishment (no-go to avoid losing). Unlike Guitart-Masip et al. (2011), in the current experiment, subjects were not verbally instructed about the action contingencies for each fractal image and had to learn them by trial and error. Participants were instructed that the correct choice for each fractal image could be either go or no-go. They were also instructed about the probabilistic nature of the task. Those participants that performed the task inside the scanner learned the task contingencies as they were being scanned. Our task separated instrumental responses (go and no-go choices to the targets) from the fractal images that indicate action requirements and outcome valence in order to dissociate anticipatory brain responses from responses elicited by execution of an actual motor response. However, unlike our previous experiment (Guitart-Masip et al., 2011), in the current experiment all trials included both a target detection task and outcome delivery. This decreased power for detecting changes in BOLD responses uniquely associated with action anticipation, but ensured that the learning process was not confounded by any attempt to decorrelate these two factors. The anticipatory response of an action before actual execution of any motor component involves action invigoration and is likely to be associated with the deployment of cognitive resources (attention and sensory process) that allow a directing effect on the specific action being prepared. This dual association of motoric and cognitive components that interact to sculpt a motor response is a general mechanism that allows adaptive interactions with the environment. Assessing the extent to which invigoration of action, and the associated deployment of distinct cognitive resources, can be attributed specifically to the observed anticipatory responses in the midbrain/basal ganglia network goes beyond the immediate goals and scope of the present study. The task included 240 trials, 60 trials per condition and was divided into four 9 min sessions (15 trials per condition). Subjects were told that they would be paid their earnings of the task up to a total of £35. Before starting with the learning task, subjects did 20 trials of the target detection task in order to get familiarized with the speed requirements. Behavioral data analysis The behavioral data were analyzed using the statistics software SPSS, version 16.0. The number of correct choices in the target detection task (correct button press for go conditions and correct omission of responses in no-go trials) was collapsed across time bins of 10 trials per condition. These measures were analyzed with a three way repeated-measures ANOVA with time, action (go/no-go) and valence (win/lose) as factors. In an initial analysis we also included group (inside the scanner/outside the scanner) as a between-subject factor. Reinforcement learning models We built six parameterized reinforcement learning models to fit to the behavior of the subjects. All the models assigned each action a t on trial t a probability. This was based on an action weight W(a t , s t ) that depended on the stimulus on that trial, and which was passed through a squashed softmax (Sutton and Barto, 1998): (1) p a t | s t = exp ( W a t | s t ∑ a ′ exp W a ′ | s t 1 − ξ + ξ 2 where ξ was the irreducible noise which was kept at 0 for one of the models (RW), but was free to vary between 0 and 1 for all other models. The models further differed in terms of how the action weight was constructed. For models RW and RW + noise, W(a,s) = Q(a,s), which was a simple Rescorla–Wagner like update equation: (2) Q t (a t , s t ) = Q t − 1(a t , s t ) + e(ρ r t − Q t − 1(a t , s t )) Q t a t , s t = Q t − 1 a t , s t + e ρ r t − Q t − 1 a t , s t where ε was the learning rate. Reinforcements entered the equation through r t  ∈ {− 1, 0, 1} and ρ was a free parameter that determined the effective size of reinforcements for a subject. For model RW(rew/pun) + noise + bias, the parameter ρ could take on different values for the reward and punishment trials, but for all other models there was only one value of ρ per subject. This meant that those models assumed that loss of a reward was as aversive as obtaining a punishment. The other models differed in the construction of the action weight in the following way. For model RW + noise + Q 0, the initial Q value for the go action was a free parameter, while for all other models this was set to zero. For models that contained a bias parameter, the action weight was modified to include a static bias parameter b: (3) W t a , s = { Q t a , s + b if a = go Q t a , s else . For the model including a Pavlovian factor (RW + noise + bias + Pav), the action weight consisted of three components: (4) W t a , s = { Q t a , s + b + π V t s if a = go Q t a , s else (5) V t (s t ) = V t − 1(s t ) + e(ρ r t − V t − 1(s t )) V t s t = V t − 1 s t + e ρ r t − V t − 1 s t where π ≥ 0 was again a free parameter. Thus, for conditions in which feedback was in terms of punishments, the Pavlovian parameter inhibited the go tendency in proportion to the negative value V(s) of the stimulus, while it similarly promoted the tendency to go in conditions where feedback was in terms of rewards. Model fitting procedure These procedures are identical to those used by Huys et al. (2011), but we repeat them here for completeness. For each subject, each model specified a vector of parameters h. We found the maximum a posteriori estimate of each parameter for each subject: (6) h i = argmax h p(A i |, h i )p(h i |θ) h i = argmax h p A i | , h i p h i | θ where A i comprised all actions by the ith subject. We assumed that actions were independent (given the stimuli, which we omit for notational clarity), and thus p(A i |h i ) factorized over trials, being a product of the probabilities in Eq. (1). The prior distribution over the parameters p(h i |θ) mainly served to regularize the inference and prevent parameters that were not well-constrained from taking on extreme values. We set the parameters of the (factorized) prior distribution θ, which consist of a prior mean m and variance v 2, to the maximum likelihood given all the data by all the N subjects: (7) θ ^ ML = arg max θ p A | θ (8) = arg max θ ∏ i N ∫ d N h i p A i | h i p h i | θ where A = {A i } i = 1 N comprised all the actions by all the N subjects and θ = {m, v 2} were the prior mean and variance. This maximization was approximately achieved by Expectation–Maximization (MacKay, 2003). We used a Laplacian approximation for the E-step at the kth iteration: (9) p h | A i , ≫ N h i k , s i k (10) h i k = arg max h p A i | , h p h | θ k − 1 where N(⋅) denotes a normal distribution and ∑  i (k) is the second moment around h i (k). This resulted in the following updates for the group-level parameters θ = {m, v 2}: (11) m k = 1 N ∑ i h i k (12) v k 2 = 1 N ∑ i h i k 2 + S i k − m k 2 . Before inference, all parameters were suitably transformed to enforce constraints (log and inverse sigmoid transforms). All model fitting procedures were verified on surrogate data generated from a known decision process. Model comparison Models would ideally be compared by computing the posterior log likelihood logp(M|A) of each model M given all the data A. As we had no prior on the models themselves (testing only models we believed were equally likely a priori), we instead examined the model log likelihood logp(M|A) directly. This quantity could be approximated in two steps. First, the integral over the hyperparameters was approximated using the Bayesian Information Criterion at the group level (Kass and Raftery, 1995): (13) logp(A|M) = ∫d θ p(A|θ)p(θ|M) log p A | M = ∫ d θ p A | θ p θ | M (14) ≫ − 1 2 B I C int = log p A | θ ^ ML − 1 2 | M | log A . Importantly, however, log p A | θ ^ ML was not the sum of individual likelihoods, but the integral over the individual parameters. We approximated this integral by sampling from the fitted priors: (15) log p A | θ ^ ML = ∑ i log ∫ d h p A i | , h p h | θ ^ ML (16) ≫ ∑ i log i K ∑ k = 1 K p A i | h k where K was set to 1000 and h k were parameters drawn independently from the priors over the parameters p h | θ ^ ML . These model comparison procedures were also verified on surrogate data generated from a known decision process. Comparing integrated BIC values is akin to a likelihood ratio test, and in fact can be shown to reduce to classical statistical tests for certain simple linear models (Kass and Raftery, 1995). fMRI data acquisition fMRI was performed on a 3-Tesla Siemens Allegra magnetic resonance scanner (Siemens, Erlangen, Germany) with echo planar imaging (EPI). Functional data was acquired in four scanning sessions containing 135 volumes with 41 slices, covering a partial volume that included the striatum and the midbrain (matrix: 128 × 128; 40 oblique axial slices per volume angled at − 30° in the antero-posterior axis; spatial resolution: 1.5 × 1.5 × 1.5 mm; TR = 4100 ms; TE = 30 ms). This partial volume included the whole striatum, the SN/VTA, the amygdala, and the ventromedial prefrontal cortex. However, it excluded the medial cingulate cortex, the supplementary motor areas, the superior frontal gyrus, and the middle frontal gyrus. The fMRI acquisition protocol was optimized to reduce susceptibility-induced BOLD sensitivity losses in inferior frontal and temporal lobe regions (Weiskopf et al., 2006). Six additional volumes at the beginning of each series were acquired to allow for steady state magnetization and were subsequently discarded. Anatomical images of each subject's brain were collected using multi-echo 3D FLASH for mapping proton density (PD), T1 and magnetization transfer (MT) at 1 mm3 resolution (Weiskopf and Helms, 2008) and by T1 weighted inversion recovery prepared EPI (IR-EPI) sequences (spatial resolution: 1 × 1 × 1 mm). Additionally, individual field maps were recorded using a double echo FLASH sequence (matrix size = 64 × 64; 64 slices; spatial resolution = 3 × 3 × 3 mm; gap = 1 mm; short TE = 10 ms; long TE = 12.46 ms; TR = 1020 ms) for distortion correction of the acquired EPI images (Weiskopf et al., 2006). Using the FieldMap toolbox (Hutton et al., 2002) field maps were estimated from the phase difference between the images acquired at the short and long TE. fMRI data analysis Data were analyzed using SPM8 (Wellcome Trust Centre for Neuroimaging, UCL, London). Pre-processing included realignment, unwrapping using individual fieldmaps, and spatial normalization to the Montreal Neurology Institute (MNI) space with spatial resolution after normalization of 1 × 1 × 1 mm. We used the unified segmentation algorithm available in SPM to perform normalization. This has been shown to achieve good intersubject co-registration for brain areas such as caudate, putamen and brain stem (Klein et al., 2009). Finally, data was smoothed with a 6 mm FWHM Gaussian kernel. The fMRI time series data were high-pass filtered (cutoff = 128 s) and whitened using an AR(1)-model. For each subject a statistical model was computed by applying a canonical hemodynamic response function (HRF) combined with time and dispersion derivatives (Friston et al., 1998). Separate general linear models (GLMs) were fit to the data to address two distinct questions. First, we wanted to identify the neural underpinnings for the interaction between action and valence that we observed at the behavioral level. The computational model suggested this is related to an interaction between action and state values. Although the BOLD signal associated with these two values is indistinguishable in our paradigm, we hypothesized that an interference mediated by action and state values would be realized in an interaction between contextual valence (whether a trial had a positive or a negative state value) and action values for go and no-go choices. Therefore, our first GLM asked whether brain representations of instrumental values inferred from behavior, as per our best-fitting computational model, were dependent on the vigor status of the action (go versus no-go) and on the motivational setting (reward or punishment feedback). Second, we hypothesized that anticipatory responses to the fractal images would differ between those participants that successfully learned the experimental conditions and those that did not. Moreover, as we also observed a value independent action bias, we hypothesized that brain areas involved in inhibiting preponderant responses such as the inferior frontal gyrus (Aron and Poldrack, 2006; Robbins, 2007) would be involved in no-go performance. To address these questions, our second GLM was implemented to analyze the effects of action and valence anticipation (2 × 2 factorial design) during the anticipatory phase (fractal image), without using action values employed in the first GLM analysis. GLM 1: effects of expected valence on the representation of action values (model-based analysis) We built a general linear model that included 4 different conditions: 2 at the onset of the fractal images (anticipatory phase); and 2 at the onset of the outcome. At the onset of the fractal images, and at outcome onset, trials were divided into those with a positive expected value (go to win and no-go to win) and those with a negative expected value (go to avoid losing and no-go to avoid losing). The onset of fractal images was modeled using a boxcar that extended in time during the whole anticipatory phase until the target detection task was presented. Importantly, each of the onset regressors was parametrically modulated by two separate and independent regressors: one parametric regressor included the value of the go action (Q t (go)) and the other the value of the no-go action (Q t (no-go)). We modified the standard procedure implemented in SPM in order to prevent automatic orthogonalization of consecutive parametric regressors. These time-varying action values were updated according to Eq. (2) using the posterior learning rate for the winning model. This amounted to four parametric regressors in total for the anticipatory phase responses. During the outcome phase, each of the two conditions (positive and negative expected value conditions) was parametrically modulated by two independent regressors: one included the raw outcome value (0 or 1 for win trials; and 0 or − 1 in lose trials) and the other included the state value V t (s) as inferred by the model. Again, this resulted in a total of four parametric regressors for outcome phase responses. To capture residual movement-related artifacts, six covariates were included (the three rigid-body translations and three rotations resulting from realignment) as regressors of no interest. Two subjects had to be excluded from analysis because it was not possible to use their regressor for Q t (no-go) in the win trials as they did not make enough no-go choices to generate sufficient variance for the values to be used as a parametric modulator. Notice that these two participants show a selective poor performance for the no-go to win condition, as the performance in the other 3 conditions was higher than 80% in both cases. To test for the effects of valence on different representations of action values, regionally specific condition effects were assessed by employing linear contrasts for each subject and each parametric condition (first-level analysis). The resulting contrast images were entered into a second-level random-effects analysis and the hemodynamic effects of each parametric condition were assessed using a 2 × 2 analysis of variance (ANOVA) with the factors ‘action’ (Q go/Q no-go), and valence (win/lose). To test for the presence of reward prediction errors at the time of the outcome as well as effects of valence on outcome processing, regionally specific condition effects were tested by employing linear contrasts for each subject and each parametric condition (first-level analysis). The resulting contrast images were entered into a second-level random-effects analysis and the hemodynamic effects of each parametric condition were assessed using a one way analysis of variance (ANOVA) with four levels: raw outcome value in win trials, raw outcome value in lose trials, expected value in win trials, expected value in lose trials. GLM 2: neural correlates of successful instrumental control We built a second general linear model that included our 4 conditions of interest as separate regressors at the onset of the fractal images: go to win trials, go to avoid losing trials, no-go to win trials, and no-go to avoid losing trials. We also modeled the onset of the target detection task separately for trials in which subjects emitted (or did not emit) a button press. Note we intentionally included these two regressors in order to explain away variance associated with the performance of the motor response in the anticipatory phase responses. We also included, as a regressor, the onset of the outcome (which could again be win £1, lose £1, or no monetary consequence). To capture residual movement-related artifacts, six covariates were included (the three rigid-body translation and three rotations resulting from realignment) as regressors of no interest. A heterogeneity in the expression of instrumental learning across subjects is well established (Schonberg et al., 2007). This was also the case here, with some subjects performing well in all conditions and others contributing the majority of errors. In fact this heterogeneity has advantages in that it allowed us to explore brain responses associated with appropriately successful instrumental control. To define successful instrumental control we used an arbitrary threshold of 60% correct trials across the whole experiment, and 80% correct in the second half of the experiment in every condition, enabling us to segregate subjects into learners (19/30) and non-learners (11/30). These criteria ensured participants classified as learners showed satisfactory instrumental learning in all four conditions (see supplemental Fig. S1). When we applied the same criteria to those participants that performed the task outside the scanner, we found that the proportion of learners was 7/17. A chi square test did not detect any differences in the frequency of learners between the two groups (χ 2 = 2.16, ns). A heterogeneity in the expression of instrumental learning across subjects is well established (Schonberg et al., 2007). This was also the case here, with some subjects performing well in all conditions and others contributing the majority of errors. In fact this heterogeneity has advantages in that it allowed us to explore brain responses associated with appropriately successful instrumental control. To define successful instrumental control we used an arbitrary threshold of 60% correct trials across the whole experiment, and 80% correct in the second half of the experiment in every condition, enabling us to segregate subjects into learners (19/30) and non-learners (11/30). These criteria ensured participants classified as learners showed satisfactory instrumental learning in all four conditions (see supplemental Fig. S1). When we applied the same criteria to those participants that performed the task outside the scanner, we found that the proportion of learners was 7/17. A chi square test did not detect any differences in the frequency of learners between the two groups (χ 2 = 2.16, ns). We analyzed neural representations of valence (win/lose) and action (go/no-go) anticipation elicited by presentation of fractal images, independently from value representations. We focused on the time point at which the fractal stimuli were presented, prior to the presentation of the target that occasions a behavioral response. We first focused our analysis on the learners because they were likely to anticipate the correct action in all conditions, reflecting successful instrumental control. We then conducted a separate analysis comparing anticipatory responses between learners and non-learners to detect whether the pattern of activated areas found in the learners was specific to those subjects showing successful instrumental control. To test for the effects of action and valence anticipation in learners, we tested for regionally specific condition effects in linear contrasts for each subject and each condition (first-level analysis). The resulting contrast images were entered into a second-level random-effects analysis and the hemodynamic effects of each condition were assessed using a 2 × 2 analysis of variance (ANOVA) with the factors ‘action’ (go/no-go), and valence (win/lose). To test for differences in the effects of action and valence anticipation between learners and non-learners we computed, at the first level, the parameter estimate of the main effect of action contrast [(go to win + go to avoid losing) − (no-go to win + no-go to avoid losing)] and the main effect of valence contrast [(go to win + no-go to win) − (go to avoid losing + no-go to avoid losing)]. The resulting contrast images were entered into a second-level random-effects analysis and the differences between the two groups (learners and non-learners) were assessed using a two sample t-test. Regions of interest Predicted activations detected in our voxel-based analysis were corrected for multiple comparisons using small volume correction (SVC) within anatomically defined regions of interest: these comprised the striatum, the inferior frontal gyrus (IFG), and the substantia nigra/ventral tegmental area (SN/VTA) of the midbrain (main origin of dopaminergic projections). A priori, we also included the amygdala but as we did not observe any active voxel there, this ROI is not reported any further. The striatum and the IFG regions of interest (ROIs) were defined using the MNI templates available in Marsbar (Brett et al., 2002); the striatum ROI included the caudate and the putamen, whereas the IFG ROI included the pars trigeminalis and the pars opercularis of the inferior frontal gyrus. The SN/VTA ROI was manually defined, using the software MRIcro and the mean MT image for the group. On MT-images the SN/VTA can be distinguished from surrounding structures as a bright stripe (Bunzeck and Duzel, 2006). It should be noted that in primates, reward responsive dopaminergic neurons are distributed across the SN/VTA complex and it is therefore appropriate to consider the activation of the entire SN/VTA complex rather than, a priori, focusing on its subcompartments such as the VTA (Duzel et al., 2009). For this purpose, a resolution of 1.5 mm3, as used in the present experiment, allowed a sampling over 200 voxels of the SN/VTA complex, which has a volume of 350 to 400 mm3. Results Reward and punishment differently affects go and no-go choices The optimal choice on both “go to win” and “go to avoid losing” trials is to go. Conversely, the optimal choice is not to emit an action in “no-go to win” and “no-go to avoid losing” trials. Figs. 2A–D show raw and average choice probabilities for all subjects. The group learning curves for each of the four conditions show that subjects did learn in all four conditions, but learning was far from equivalent across trial types. A three way ANOVA on the number of correct (optimal) choices with factors time (6 time bins of 10 trials each), action (go/no-go) and valence (win/lose) as repeated factors revealed a main effect of time (F(5,225) = 31.16, p   0.5). Consequently, we tested two alternative models to account for this effect: firstly, we included an initial shaping bonus (Ng et al., 1999) that could be naturally erased as the subjects learned (Model RW + noise + Q 0 in Fig. 2F); or, secondly, we included a bias that was constant across the experiment (Model RW + noise + bias in Fig. 2F). The BIC measure favored the latter. Indeed the model's simulated behavior matched the true behavior better, particularly in the early stages (green lines in Figs. 2A–D). However, the model RW + noise + bias still failed to capture the crucial action by valence interaction, as is clearly evident in the figures, for example in the no-go to win condition. Thus, we tested a further model that added a Pavlovian approach/withdrawal component to the other, instrumental, components. In this model, the probability of a go action was incremented proportionally to the overall (action-independent) state value of each stimulus. This model assumed that increasing reward expectancy induced a parametric increase in go probability, and that increasing punishment expectancy induced a parametric increase in no-go probability. For example, consider the no-go to win condition: as subjects learned to withhold their responses during the task, the stimulus indicative of this condition came to be associated with more reward. This positive expectancy, in turn, promoted a (inappropriate) go action. Similarly, the stimulus indicating the go to avoid losing condition embodied a negative expectation (even when a subject was always right, due to probabilistic feedback). In the model, this negative expectancy promoted inhibition of the requisite go choice. In both cases, we hypothesized that this Pavlovian factor would account for the pattern of action/valence interactions we observed, since on the one hand it should produce the very interference with performance in those critical conditions where action and valence were not aligned, while on the other hand it should support behavior in those conditions where action and valence were aligned. Indeed, we found that this latter model (Model RW + noise + bias + Pav in Fig. 2F) provided the most parsimonious account of our data. Surrogate choices generated from the model showed that it accurately captured crucial differences in learning across conditions. This model predicted the choices of 43/47 subjects better than chance (binomial test, p   7; p   no-go) in a sole cluster that survived SVC within our anatomical SN/VTA ROI located in left lateral SN/VTA [MNI space coordinates − 10,−17,−13; peak Z score = 3.38; p = 0.041 FWE SVC]. Furthermore, on the same analysis, subjects classified as learners revealed a complementary main effect of inaction (no-go > go) (Figs. 5A–B) in left IFG pars opercularis [MNI space coordinates − 43,8,14; peak Z score = 5.68; p   no-go’ contrast between learners and non-learners also revealed a cluster of activation that survived SVC within our a priori SN/VTA ROI (Figs. 4C–D). This cluster located to the same coordinates (left lateral SN/VTA [MNI space coordinates − 10,−17,−13; peak Z score = 3.33; p = 0.054 FWE SVC]) and showed higher parameter estimates for the learners. Thus, remarkably, both analyses highlight the same peak voxel, suggesting that the left SN/VTA is specifically recruited in go trials for subjects who successfully learn and who (one assumes by learning) anticipate the appropriate choice (go or no-go) upon presentation of the relevant fractal images. Finally, a separate voxel based two sample t-test comparing the magnitude of the ‘no-go > go’ contrast between learners and non-learners, we found three clusters of activation within an IFG anatomical ROI that survived SVC (Figs. 5C–D). These were located in close proximity to the foci of activation detected for the ‘no-go > go’ contrast in the learners: the left IFG pars opercularis [MNI space coordinates − 53,11,13; peak Z score = 4.77; p = 0.008 FWE SVC], the right IFG pars trigeminalis [MNI space coordinates 46,35,5; peak Z score = 4.68; p = 0.011 FWE SVC] and the left IFG pars trigeminalis [MNI space coordinates − 45,32,4; peak Z score = 4.26; p = 0.055 FWE SVC]. Note that the sign for the main effect of ‘no-go > go’ contrast in the non-learners is negative, suggesting a qualitative rather than just a quantitative difference in the pattern of activity in the IFG between learners and non-learners. Discussion We report a striking asymmetry for instrumental learning, whereby participants were better at learning to emit a behavioral response in anticipation of reward, and better at withholding a response in anticipation of punishment. A computational analysis revealed that this corruption of instrumental action learning could be accounted for in terms of an influence of a Pavlovian learning system. The striatum and the SN/VTA tracked action values for both choices, but with opposite signs for go and no-go. This finding points to value representation being bound either to the regulation of vigor or, equivalently here, to the specification of the chosen behavior (go or no-go). Finally, selective recruitment of left SN/VTA and bilateral IFG was coupled to the emergence of successful instrumental control. The overall pattern of findings highlights a mandatory coupling between valence and action at the behavioral level that contrasts with a dominance of vigor control at the neurobiological level. The data we report help refine the conception of Pavlovian influences over instrumental control as well as the architecture of instrumental decision-making itself. We note that our participants performed an apparently trivial task that entailed learning a simple relationship between four fractal images and a highly restricted behavior repertoire (go or a no-go choice). As the probability of reaping a reward or avoidance of a punishment was much higher for correct (0.8) than incorrect choices (0.2) one might expect rapid and fluent learning equivalent across all conditions. The striking finding that subjects, as a group, were impaired in this simple form of learning is testament to the strength and potential perniciousness of biases and asymmetries built into the architecture of decision-making. Furthermore, these effects persisted throughout a relatively lengthy learning period, and defeated optimizing instrumental learning mechanisms in a non-trivial fraction of our subjects. Our computational modeling revealed that a key asymmetry in learning came from a coupling between valence and vigor. This coupling is central to classical Pavlovian to instrumental paradigms where the presentation of the Pavlovian stimulus modifies the vigor of instrumental responses in a valence dependent manner (Dickinson and Balleine, 2002; Huys et al., 2011; Talmi et al., 2008). That is, go was favored in conditions where there was a possibility of winning money and no-go when there was a possibility of losing, while the alternative mappings were difficult. This pattern of behavioral finding is consistent with a number of well-known results such as negative automaintenance (Dayan et al., 2006; Williams and Williams, 1969). Such deep embedding of strong biases within flexible instrumental mechanisms may serve to alleviate computational costs of learning. Conversely, such biases may also lie at the root of many anomalies of decision-making (Dayan et al., 2006; Guitart-Masip et al., 2010). Interestingly, the deleterious effects of punishment on go choices, but not the deleterious effects of reward on no-go choices, were also observed in our previous study (Guitart-Masip et al., 2011). In that study we used a similar paradigm but with the crucial difference being that participants were both instructed about contingencies and over-trained to reach high levels of accuracy on the go/no-go choices. One possibility suggested by this is that the certainty as to the correct choice may affect Pavlovian influences on action selection elicited by reward differently from those elicited by punishment. Most previous human studies of learning have focused on two conditions that our subjects found straightforward: i.e., go to win and no-go to avoid losing (e.g. Cools et al., 2009; Frank et al., 2004; O'Doherty et al., 2004). A prevalent view is that dopamine projections to target structures, including the striatum (McClure et al., 2003; O'Doherty et al., 2003; Pessiglione et al., 2006), express reward prediction error signals (Bayer and Glimcher, 2005; Schultz et al., 1997) in the form of phasic bursts for positive prediction errors and dips below baseline for negative prediction errors (Bayer et al., 2007). The striatum then uses increases in dopamine to reinforce the direct pathway and generate go choices, while dips in dopamine reinforce the indirect pathway and generate no-go choices (Frank et al., 2004; Wickens et al., 2007). This functional architecture provides a plausible mechanism for instrumental learning of active responses through positive reinforcement and passive responses through punishment. Here, by passive we mean that they do not involve the generation of any overt behavioral responses. Crucially, in these straightforward conditions, instrumental and Pavlovian controllers prescribe the same action and are thus indistinguishable. An instrumental system of this sort embodies an asymmetry since it provides no clear mechanism for learning to go in order to avoid losing or to no-go in order to win. One idea is that instrumental mechanisms treat conditions such as active avoidance by coding the removal of possible punishment as akin to a reward (Maia, 2010; Moutoussis et al., 2008; Mowrer, 1947). In support of this view, whereas dopamine deficits impair acquisition and maintenance of active avoidance behavior (Darvas et al., 2011; McCullough et al., 1993), learning about the prospect of punishment can occur even when dopamine is compromised (Beninger and Phillips, 1981). This implies that dopamine is required to learn a requirement for active responses to avoid punishment but another system learns about punishment itself. Serotonin has been suggested as being involved in coding for aspects of punishment or punishment prediction errors, although this is far from certain (Boureau and Dayan, 2011; Cools et al., 2011; Daw et al., 2002). If two stages are indeed involved in learning active avoidance, then this could also contribute to the observed behavioral asymmetry. In line with the above view, our fMRI results showed that the striatum and the SN/VTA tracked action values for both go and no-go choices but that the relationship between value and brain activity was positive for go and negative for no-go. These results extended our recent observation that during anticipation, activity in striatum and lateral aspects of the SN/VTA complex reflect action requirements rather than state values (Guitart-Masip et al., 2011). It may be that both structures are part of an integral instrumental system that learn the value of available behavioral options, but where coding is relative to the control of vigor and approach. However, caution should be exercised in interpreting the lack of a conventional value signal in the striatum and the SN/VTA, as our experimental design did not allow us to search for such a signal in the current experiment. Conversely, despite a clear effect of valence on action learning, we did not find any effect of valence on action value representations. This negative result does not imply that the observed behavioral asymmetry was not realized in the brain. It may arise as an example of the sort of malign valence-induced bias in learning that induces risk sensitivity (Denrell, 2007; March, 1996; Niv et al., 2002). That is, consider the no-go to win condition. If some participants happened to obtain reward for an early trial in which they performed a go response, they might continue performing go inflexibly, without sampling no-go. As both the Pavlovian and a value independent action bias favor the performance in the go to win condition, a reverse inflexible performance of an early rewarded no-go, is unlikely to manifest. Future experiments should be designed to dissociate state and action values. This would require that these two values are not highly correlated in the way there were in the current experiment. A possible strategy for future examination of this would be to include forced trials without choices, but only outcomes. We also did not see any BOLD signals consistent with a prediction error at the time of the outcome. Since prediction errors are highly correlated with the reward term of the prediction error, and to ensure that a region is reporting a reward prediction error in a given task, it is necessary to separate the reward prediction in its two components, that is the reward and the value expectation (Behrens et al., 2008). In the current experiment, we followed this principle and only found a correlation between BOLD and the reward term at the time of the outcome. Similar results, in which prediction errors are not apparent, have previously been reported (Behrens et al., 2008; Li and Daw, 2011). Reconciling these results with those showing prediction errors (Daw and Doya, 2006) is an important task for the future. One possibility is that prediction errors in the striatum are only observed when prediction errors are of behavioral relevance for the instrumental task at hand (Klein-Flugge et al., 2011; Li and Daw, 2011). In the present task, the actual value of the stimuli was irrelevant for the instrumental task as instrumental choices could be informed with the reward component itself. In other context where participants may need to compare the relative value of different options, the full prediction error may be necessary for optimal instrumental performance as previously reported (e.g. Glascher et al., 2010; O'Doherty et al., 2004; Pessiglione et al., 2006; Schonberg et al., 2007). The decrease in activity within the striatum and SN/VTA as no-go choice value increased does not fit the classical view of cortico-striatal circuits, in which reward promotes the direct (go) pathway and the punishment promotes the indirect (no-go) pathway (Frank et al., 2004; Hikida et al., 2010). Instead, a supplementary mechanism seems to be required. Indeed, we observed that during anticipation, before subjects actually performed a behavioral response (go or no-go), only subjects who learned the no-go to win condition recruited bilateral IFG in trials requiring inhibition of a go choice. Given the functional anatomy of IFG, it is interesting to speculate that those who learned the task did so by overcoming dominant go response tendencies, as for example when presented with a reward predicting fractal image that mandated a no-go choice. The same would be true for the no-go-to-avoid losing condition if as suggested by the model, participants must learn to overcome a value independence bias toward go choices in this task. Recruitment of IFG is systematically associated with an ability to stop a preponderant motor response (Aron and Poldrack, 2006; Robbins, 2007), or when there is a need to slow down in a decision task involving response conflict (Fleming et al., 2010). Similarly, only participants who learned the appropriate choices in all conditions selectively recruited the left SN/VTA in trials requiring a go choice, suggesting that an inability to restrict such SN/VTA responses to go trials is related to a failure in learning task contingencies. We observed a similar pattern of activations in our previous study, in which the participants had such extensive training as to behave akin to learners during the second part of the experiment in the current task (Guitart-Masip et al., 2011). Within the limitation of fMRI studies of the SN/VTA (Duzel et al., 2009), this pattern is consistent with a suggestion that dopamine plays a role in action preparation and invigoration (Berridge and Robinson, 1998; Niv et al., 2007; Salamone et al., 2007), a role complementary to its established role in representing a reward prediction error. Non-learners failed to acquire appropriate behavior in conditions where the choices prescribed by a Pavlovian controller were inappropriate. This echoes recent evidence regarding individual differences in decision-making, and most particularly a prominent distinction between sign-tracking and goal-tracking in rodents (Flagel et al., 2010; 2011). Just as for our non-learners, Pavlovian influences are dominant for sign-trackers. Interestingly, rats with lesions of the subthalamic nucleus showed increased sign-tracking behavior (Uslaner et al., 2008), and we note that the effects of the IFG in stopping go responses are mediated by the subthalamic nucleus (Aron and Poldrack, 2006). Furthermore, the STN is recruited by the IFG when a subject rejects a default choice (Fleming et al., 2010). This raises the possibility that the IFG, together with the subthalamic nucleus, complements an instrumental system by allowing it to overcome the vagaries of Pavlovian influences. An immediate question for future research would be how this complementary system is triggered if, as suggested in the current experiment, the IFG does not appear to track action values. Our model captured a set of Pavlovian influences over behavior, with predictions of future reward being mandatorily associated with go active approach, and vigor; and predictions of future loss with a wider range of responses including no-go behavioral inhibition, and quiescence (Boureau and Dayan, 2011; Cools et al., 2011; Niv et al., 2007). Other possible substrates for these influences include the nucleus accumbens and the amygdala (Cardinal et al., 2002; Parkinson et al., 1999; Talmi et al., 2008) where dopamine plays a particularly important role in appetitive effects (Parkinson et al., 2002). On the other hand, serotonin is a prominent candidate for aversive effects (Dayan and Huys, 2009; Deakin and Graeff, 1991). Indeed, tryptophan depletion abolishes punishment induced inhibition, which is akin to the disadvantage we observed in the go to avoid losing condition (Crockett et al., 2009). Our key finding was that during a simple form of instrumental learning, healthy human volunteers showed a striking interdependence of action and valence which exerted a corrupting effect on the course and outcome of learning. We captured this within a computational architecture that invoked distinct, albeit interacting, behavioral control systems, an instrumental and a Pavlovian system. We showed that the striatum and the SN/VTA tracked instrumental values in opposite ways for go and no-go choices, suggesting that these value representations are bound to a regulation of vigor. Thus, our data point to intriguing functional dissociations with these regions that enrich their putative roles beyond that associated with the generation and report of prediction errors. The following are the supplementary data related to this article. Fig. S1 Behavioral performance in learners and non-learners Supplementary data related to this article can be found online at doi:10.1016/j.neuroimage.2012.04.024.
                Bookmark

                Author and article information

                Journal
                Int J Epidemiol
                Int J Epidemiol
                ije
                International Journal of Epidemiology
                Oxford University Press
                0300-5771
                1464-3685
                February 2018
                21 November 2017
                21 November 2017
                : 47
                : 1
                : 18-19g
                Affiliations
                [1 ]Department of Psychiatry, University of Cambridge, United Kingdom
                [2 ]Wellcome Trust Centre for Neuroimaging, University College London, United Kingdom
                [3 ]Cambridgeshire and Peterborough National Health Service Foundation Trust, Cambridge, United Kingdom
                [4 ]Research Department of Clinical, Educational and Health Psychology, University College London, United Kingdom
                [5 ]Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, United Kingdom
                [6 ]Medical Research Council/Wellcome Trust Behavioural and Clinical Neuroscience Institute, University of Cambridge, United Kingdom
                [7 ]ImmunoPsychiatry, GlaxoSmithKline Research and Development, Stevenage, United Kingdom
                Author notes
                Corresponding author. Department of Psychiatry, University of Cambridge, Cambridge CB2 0SZ, UK. E-mail: pbj21@ 123456cam.ac.uk
                Article
                dyx117
                10.1093/ije/dyx117
                5837633
                29177462
                ad12e5ac-f927-4ba2-bcab-cde364b3a806
                © The Author 2017. Published by Oxford University Press on behalf of the International Epidemiological Association.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 2 June 2017
                : 3 July 2017
                Page count
                Pages: 9
                Funding
                Funded by: Wellcome Trust 10.13039/100004440
                Funded by: University of Cambridge 10.13039/501100000735
                Funded by: University College London 10.13039/501100000765
                Award ID: 095844/Z/11/Z
                Funded by: NIHR 10.13039/100006662
                Funded by: NIHR 10.13039/100006662
                Funded by: Medical Research Council 10.13039/501100000265
                Funded by: MRC 10.13039/501100000265
                Funded by: Wellcome Trust 10.13039/100004440
                Categories
                Cohort Profiles

                Public health
                Public health

                Comments

                Comment on this article