3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Sound of Pixels

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel. Our approach capitalizes on the natural synchronization of the visual and audio modalities to learn models that jointly parse sounds and images, without requiring additional manual supervision. Experimental results on a newly collected MUSIC dataset show that our proposed Mix-and-Separate framework outperforms baseline approaches for grounding sounds into images. Several qualitative results suggest our model learns to ground sounds in vision, enabling applications such as independently adjusting the volume of sound sources.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          U-Net: Convolutional Networks for Biomedical Image Segmentation

          There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A blind source separation technique using second-order statistics

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Performance measurement in blind audio source separation

                Bookmark

                Author and article information

                Journal
                09 April 2018
                Article
                1804.03160
                1d5925c6-f540-4271-b916-0f71a648352e

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.CV cs.SD eess.AS

                Computer vision & Pattern recognition,Electrical engineering,Graphics & Multimedia design

                Comments

                Comment on this article