Record: found
Abstract: found
Article: not found

The general inefficiency of batch training for gradient descent learning.

Publication date: 2003-11-30

Keywords: Algorithms, Generalization (Psychology), Humans, Learning, Neural Networks (Computer), Online Systems, Psychological Tests, Recognition (Psychology), Review Literature as Topic, Speech Perception, Stochastic Processes, Teaching, Time Factors

Read this article at

ScienceOpenPublisher PubMed

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Gradient descent training of neural networks can be done in either a batch or on-line manner. A widely held myth in the neural network community is that batch training is as fast or faster and/or more 'correct' than on-line training because it supposedly uses a better approximation of the true gradient for its weight updates. This paper explains why batch training is almost always slower than on-line training-often orders of magnitude slower-especially on large training sets. The main reason is due to the ability of on-line training to follow curves in the error surface throughout each epoch, which allows it to safely use a larger learning rate and thus converge with less iterations through the training data. Empirical results on a large (20,000-instance) speech recognition task and on 26 other learning tasks demonstrate that convergence can be reached significantly faster using on-line training than batch training, with no apparent difference in accuracy.

Related collections

Author and article information

Journal

PubMed ID:: 14622875

DOI:: 10.1016/S0893-6080(03)00138-2

ScienceOpen disciplines: Chemistry

Keywords: Algorithms,Generalization (Psychology),Humans,Learning,Neural Networks (Computer),Online Systems,Psychological Tests,Recognition (Psychology),Review Literature as Topic,Speech Perception,Stochastic Processes,Teaching,Time Factors

The general inefficiency of batch training for gradient descent learning.

Read this article at

Abstract

Related collections

Teaching and learning evolution

Author and article information

Journal

Comments

Comment on this article

Similar content 260

Cited by 72