51
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit. Dropout is a widely used tool for regularisation in deep models, but a long strand of empirical research has claimed that it cannot be applied between the recurrent connections of an RNN. The argument is that noise hinders the network's ability to model sequences and therefore dropout should be applied to the RNN's inputs and outputs alone. But without regularisation in recurrent layers, existing techniques overfit quickly. In this paper we make use of a recently developed theoretical framework casting dropout as approximate variational inference. Based on the framework we derive mathematically grounded tools to apply dropout within the recurrent layers of RNNs, eliminating model overfitting. We apply our new variational inference based dropout technique in LSTM and GRU networks, evaluating the technique empirically. We show that the new approach outperforms existing techniques on sentiment analysis and language modelling tasks, extending our arsenal of variational tools in deep learning.

          Related collections

          Author and article information

          Journal
          2015-12-16
          2016-02-11
          Article
          1512.05287
          2366b442-d4e3-4da7-b2a6-a7256518411d

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          stat.ML

          Machine learning
          Machine learning

          Comments

          Comment on this article