4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Deep Generative Model of Speech Complex Spectrograms

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper proposes an approach to the joint modeling of the short-time Fourier transform magnitude and phase spectrograms with a deep generative model. We assume that the magnitude follows a Gaussian distribution and the phase follows a von Mises distribution. To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i.e., the group delay and the instantaneous frequency. Based on these assumptions, we explore and compare several combinations of loss functions for training our models. Built upon the variational autoencoder framework, our model consists of three convolutional neural networks acting as an encoder, a magnitude decoder, and a phase decoder. In addition to the latent variables, we propose to also condition the phase estimation on the estimated magnitude. Evaluated for a time-domain speech reconstruction task, our models could generate speech with a high perceptual quality and a high intelligibility.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: not found
          • Article: not found

          Signal estimation from modified short-time Fourier transform

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals

            B Boashash (1992)
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

                Bookmark

                Author and article information

                Journal
                07 March 2019
                Article
                1903.03269
                36df0374-06c1-4caa-b662-1bd438e20b3d

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.SD cs.LG eess.AS stat.ML

                Machine learning,Artificial intelligence,Electrical engineering,Graphics & Multimedia design

                Comments

                Comment on this article