A Learning Theory for Reward-Modulated Spike-Timing-Dependent
                    Plasticity with Application to Biofeedback

Legenstein, Robert; Pecevski, Dejan; Maass, Wolfgang

doi:10.1371/journal.pcbi.1000180

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

103

views

recommends

Record: found
Abstract: found
Article: not found

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

research-article

Author(s): Robert Legenstein , Dejan Pecevski , Wolfgang Maass ^*

Editor(s): Lyle J. Graham

Publication date (Electronic): 10 October 2008

Journal: PLoS Computational Biology

Publisher: Public Library of Science

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how behaviorally relevant adaptive changes in complex networks of spiking neurons could be achieved in a self-organizing manner through local synaptic plasticity. However, the capabilities and limitations of this learning rule could so far only be tested through computer simulations. This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect. These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons. They also can learn to respond to specific presynaptic firing patterns with particular spike patterns. Finally, the resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP. This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker. In this experiment monkeys were rewarded for increasing the firing rate of a particular neuron in the cortex and were able to solve this extremely difficult credit assignment problem. Our model for this experiment relies on a combination of reward-modulated STDP with variable spontaneous firing activity. Hence it also provides a possible functional explanation for trial-to-trial variability, which is characteristic for cortical networks of neurons but has no analogue in currently existing artificial computing systems. In addition our model demonstrates that reward-modulated STDP can be applied to all synapses in a large recurrent neural network without endangering the stability of the network dynamics.

Author Summary

A major open problem in computational neuroscience is to explain how learning, i.e., behaviorally relevant modifications in the central nervous system, can be explained on the basis of experimental data on synaptic plasticity. Spike-timing-dependent plasticity (STDP) is a rule for changes in the strength of an individual synapse that is supported by experimental data from a variety of species. However, it is not clear how this synaptic plasticity rule can produce meaningful modifications in networks of neurons. Only if one takes into account that consolidation of synaptic plasticity requires a third signal, such as changes in the concentration of a neuromodulator (that might, for example, be related to rewards or expected rewards), then meaningful changes in the structure of networks of neurons may occur. We provide in this article an analytical foundation for such reward-modulated versions of STDP that predicts when this type of synaptic plasticity can produce functionally relevant changes in networks of neurons. In particular we show that seemingly inexplicable experimental data on biofeedback, where a monkey learnt to increase the firing rate of an arbitrarily chosen neuron in the motor cortex, can be explained on the basis of this new learning theory.

Related collections

Most cited references 49

Record: found
Abstract: found
Article: not found

Real-time computing without stable states: a new framework for neural computation based on perturbations.

Wolfgang Maass, Thomas Natschläger, Henry Markram (2002)

A key challenge for neural modeling is to explain how a continuous stream of multimodal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real time. We propose a new computational model for real-time computing on time-varying input that provides an alternative to paradigms based on Turing machines or attractor neural networks. It does not require a task-dependent construction of neural circuits. Instead, it is based on principles of high-dimensional dynamical systems in combination with statistical learning theory and can be implemented on generic evolved or found recurrent circuitry. It is shown that the inherent transient dynamics of the high-dimensional dynamical system formed by a sufficiently large and heterogeneous neural circuit may serve as universal analog fading memory. Readout neurons can learn to extract in real time from the current state of such recurrent neural circuit information about current and past inputs that may be needed for diverse tasks. Stable internal states are not required for giving a stable output, since transient internal states can be transformed by readout neurons into stable target outputs due to the high dimensionality of the dynamical system. Our approach is based on a rigorous computational model, the liquid state machine, that, unlike Turing machines, does not require sequential transitions between well-defined discrete internal states. It is supported, as the Turing machine is, by rigorous mathematical results that predict universal computational power under idealized conditions, but for the biologically more realistic scenario of real-time processing of time-varying inputs. Our approach provides new perspectives for the interpretation of neural coding, the design of experiments and data analysis in neurophysiology, and the solution of problems in robotics and neurotechnology.

0 comments Cited 800 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Competitive Hebbian learning through spike-timing-dependent synaptic plasticity.

Sen Song, Kenneth D Miller, L. Abbott (2000)

Hebbian models of development and learning require both activity-dependent synaptic plasticity and a mechanism that induces competition between different synapses. One form of experimentally observed long-term synaptic plasticity, which we call spike-timing-dependent plasticity (STDP), depends on the relative timing of pre- and postsynaptic action potentials. In modeling studies, we find that this form of synaptic modification can automatically balance synaptic strengths to make postsynaptic firing irregular but more sensitive to presynaptic spike timing. It has been argued that neurons in vivo operate in such a balanced regime. Synapses modifiable by STDP compete for control of the timing of postsynaptic action potentials. Inputs that fire the postsynaptic neuron with short latency or that act in correlated groups are able to compete most successfully and develop strong synapses, while synapses of longer-latency or less-effective inputs are weakened.

0 comments Cited 630 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type.

Q Bi, G Bi, M Poo (1998)

In cultures of dissociated rat hippocampal neurons, persistent potentiation and depression of glutamatergic synapses were induced by correlated spiking of presynaptic and postsynaptic neurons. The relative timing between the presynaptic and postsynaptic spiking determined the direction and the extent of synaptic changes. Repetitive postsynaptic spiking within a time window of 20 msec after presynaptic activation resulted in long-term potentiation (LTP), whereas postsynaptic spiking within a window of 20 msec before the repetitive presynaptic activation led to long-term depression (LTD). Significant LTP occurred only at synapses with relatively low initial strength, whereas the extent of LTD did not show obvious dependence on the initial synaptic strength. Both LTP and LTD depended on the activation of NMDA receptors and were absent in cases in which the postsynaptic neurons were GABAergic in nature. Blockade of L-type calcium channels with nimodipine abolished the induction of LTD and reduced the extent of LTP. These results underscore the importance of precise spike timing, synaptic strength, and postsynaptic cell type in the activity-induced modification of central synapses and suggest that Hebb's rule may need to incorporate a quantitative consideration of spike timing that reflects the narrow and asymmetric window for the induction of synaptic modification.

0 comments Cited 543 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (publisher-id): plos

Journal ID (pmc): ploscomp

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date Collection: October 2008

Publication date (Print): October 2008

Publication date (Electronic): 10 October 2008

Volume: 4

Issue: 10

Electronic Location Identifier: e1000180

Affiliations

[1]Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria

UFR Biomédicale de l'Université René Descartes, France

Author notes

* E-mail: maass@ 123456igi.tugraz.at

Conceived and designed the experiments: RL DP WM. Wrote the paper: RL DP WM.

Article

Publisher ID: 08-PLCB-RA-0147R2

DOI: 10.1371/journal.pcbi.1000180

PMC ID: 2543108

PubMed ID: 18846203

SO-VID: 4eeda9bb-3e8f-4aeb-9b37-399f7ac74630

Copyright © Legenstein et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 3 March 2008

Date accepted : 7 August 2008

Page count

Pages: 27

Comments

Comment on this article

scite_

Cited by 80

See all cited by

Most referenced authors 246

See all reference authors

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

Read this article at

Abstract

Author Summary

Related collections

Journal of Systems Thinking Preprints

Most cited references 49

Real-time computing without stable states: a new framework for neural computation based on perturbations.

Competitive Hebbian learning through spike-timing-dependent synaptic plasticity.

Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 2

Cited by 80

Most referenced authors 246