5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      End-to-end Networks for Supervised Single-channel Speech Separation

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks. However, many such networks continue to operate on the magnitude spectrogram of a mixture, and produce an estimate of source magnitude spectrograms, to perform source separation. In this paper, we interpret these steps as additional neural network layers and propose an end-to-end source separation network that allows us to estimate the separated speech waveform by operating directly on the raw waveform of the mixture. Furthermore, we also propose the use of masking based end-to-end separation networks that jointly optimize the mask and the latent representations of the mixture waveforms. These networks show a significant improvement in separation performance compared to existing architectures in our experiments. To train these end-to-end models, we investigate the use of composite cost functions that are derived from objective evaluation metrics as measured on waveforms. We present subjective listening test results that demonstrate the improvement attained by using masking based end-to-end networks and also reveal insights into the performance of these cost functions for end-to-end source separation.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: not found
          • Article: not found

          A unified approach to short-time Fourier analysis and synthesis

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Multichannel Audio Source Separation With Deep Neural Networks

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              A short-time objective intelligibility measure for time-frequency weighted noisy speech

                Bookmark

                Author and article information

                Journal
                05 October 2018
                Article
                1810.02568
                18c82563-0fc6-492d-8891-8fa5cfeef872

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                eess.AS cs.LG cs.SD eess.SP

                Artificial intelligence,Graphics & Multimedia design,Electrical engineering

                Comments

                Comment on this article