Benign interpolation of noise in deep learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.CATEGORIES: • Computing methodologies ~ Machine learning • Computing methodologies ~ Neural networks • Theory of computation ~ Sample complexity and generalisation bounds

Related collections

Most cited references 5

Record: found
Abstract: not found
Article: not found

Gradient-based learning applied to document recognition

Y Lecun, L. Bottou, Y Bengio … (1998)

0 comments Cited 3537 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

An overview of statistical learning theory.

V.N. Vapnik (1999)

Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).

0 comments Cited 354 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Flat Minima

Jürgen Schmidhuber, Jürgen Schmidhuber (1997)

0 comments Cited 168 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (publisher-id): sacj

Title: South African Computer Journal

Abbreviated Title: SACJ

Publisher: South African Institute of Computer Scientists and Information Technologists (SAICSIT) (Grahamstown, Eastern Cape, South Africa )

ISSN (Print): 1015-7999

ISSN (Electronic): 2313-7835

Publication date (Print and electronic): December 2020

Volume: 32

Issue: 2

Pages: 80-101

Affiliations

[02] orgnameCentre for Artificial Intelligence Research South Africa

[01] orgnameNorth-West University orgdiv1Multilingual Speech Technologies South Africa

Article

Publisher ID: S2313-78352020000200006 Publisher ID: S2313-7835(20)03200200006

DOI: 10.18489/sacj.v32i2.833

SO-VID: 6185beb5-c9ef-430c-8ef1-1707bc61b55f

License:

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

History

Date accepted : 05 October 2020

Date received : 08 May 2020

Page count

Figures: 0, Tables: 0, Equations: 0, References: 5, Pages: 22

Product

SciELO South Africa

Benign interpolation of noise in deep learning

Read this article at

Abstract

Related collections

SciELO South Africa

Most cited references 5

Gradient-based learning applied to document recognition

An overview of statistical learning theory.

Flat Minima

Author and article information

Journal

Affiliations

Article

History

Page count

Product

Categories

Comments

Comment on this article

Similar content 566

Most referenced authors 60