11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Classification in the presence of label noise: a survey.

          Label noise is an important issue in classification, with many potential negative consequences. For example, the accuracy of predictions may decrease, whereas the complexity of inferred models and the number of necessary training samples may increase. Many works in the literature have been devoted to the study of label noise and the development of techniques to deal with label noise. However, the field lacks a comprehensive survey on the different types of label noise, their consequences and the algorithms that consider label noise. This paper proposes to fill this gap. First, the definitions and sources of label noise are considered and a taxonomy of the types of label noise is proposed. Second, the potential consequences of label noise are discussed. Third, label noise-robust, label noise cleansing, and label noise-tolerant algorithms are reviewed. For each category of approaches, a short discussion is proposed to help the practitioner to choose the most suitable technique in its own particular field of application. Eventually, the design of experiments is also discussed, what may interest the researchers who would like to test their own algorithms. In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Learning Deconvolution Network for Semantic Segmentation

            We propose a novel semantic segmentation algorithm by learning a deconvolution network. We learn the network on top of the convolutional layers adopted from VGG 16-layer net. The deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. We apply the trained network to each proposal in an input image, and construct the final semantic segmentation map by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks by integrating deep deconvolution network and proposal-wise prediction; our segmentation method typically identifies detailed structures and handles objects in multiple scales naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset, and we achieve the best accuracy (72.5%) among the methods trained with no external data through ensemble with the fully convolutional network.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Deep Networks with Stochastic Depth

              , , (2016)
              Very deep convolutional networks with hundreds or more layers have lead to significant reductions in error on competitive benchmarks like the ImageNet or COCO tasks. Although the unmatched expressiveness of the many deep layers can be highly desirable at test time, training very deep networks comes with its own set of challenges. The gradients can vanish, the forward flow often diminishes and the training time can be painfully slow even on modern computers. In this paper we propose stochastic depth, a training procedure that enables the seemingly contradictory setup to train short networks and obtain deep networks. We start with very deep networks but during training, for each mini-batch, randomly drop a subset of layers and bypass them with the identity function. The resulting networks are short (in expectation) during training and deep during testing. Training Residual Networks with stochastic depth is compellingly simple to implement, yet effective. We show that this approach successfully addresses the training difficulties of deep networks and complements the recent success of Residual and Highway Networks. It reduces training time substantially and improves the test errors on almost all data sets significantly (CIFAR-10, CIFAR-100, SVHN). Intriguingly, with stochastic depth we can increase the depth of residual networks even beyond 1200 layers and still yield meaningful improvements in test error (4.91%) on CIFAR-10.
                Bookmark

                Author and article information

                Journal
                20 May 2018
                Article
                1805.07836
                308bae6e-0761-4d16-b89f-e05d6f165a2d

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.LG cs.CV stat.ML

                Computer vision & Pattern recognition,Machine learning,Artificial intelligence

                Comments

                Comment on this article