9
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Benign interpolation of noise in deep learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.CATEGORIES: • Computing methodologies ~ Machine learning • Computing methodologies ~ Neural networks • Theory of computation ~ Sample complexity and generalisation bounds

          Related collections

          Most cited references5

          • Record: found
          • Abstract: not found
          • Article: not found

          Gradient-based learning applied to document recognition

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            An overview of statistical learning theory.

            Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Flat Minima

                Bookmark

                Author and article information

                Journal
                sacj
                South African Computer Journal
                SACJ
                South African Institute of Computer Scientists and Information Technologists (SAICSIT) (Grahamstown, Eastern Cape, South Africa )
                1015-7999
                2313-7835
                December 2020
                : 32
                : 2
                : 80-101
                Affiliations
                [02] orgnameCentre for Artificial Intelligence Research South Africa
                [01] orgnameNorth-West University orgdiv1Multilingual Speech Technologies South Africa
                Article
                S2313-78352020000200006 S2313-7835(20)03200200006
                10.18489/sacj.v32i2.833
                6185beb5-c9ef-430c-8ef1-1707bc61b55f

                This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

                History
                : 05 October 2020
                : 08 May 2020
                Page count
                Figures: 0, Tables: 0, Equations: 0, References: 5, Pages: 22
                Product

                SciELO South Africa

                Categories
                Research Papers (General)

                deep learning,machine learning,learning theory,generalisation

                Comments

                Comment on this article