14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      High Fidelity Speech Synthesis with Adversarial Networks

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fr\'echet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Listen to GAN-TTS reading this abstract at http://tiny.cc/gantts.

          Related collections

          Most cited references2

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Tacotron: Towards End-to-End Speech Synthesis

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Expediting TTS Synthesis with Adversarial Vocoding

              Bookmark

              Author and article information

              Journal
              25 September 2019
              Article
              1909.11646
              438c603f-ccfb-4706-af7e-ca30585f5d4e

              http://arxiv.org/licenses/nonexclusive-distrib/1.0/

              History
              Custom metadata
              cs.SD cs.LG eess.AS

              Artificial intelligence,Electrical engineering,Graphics & Multimedia design

              Comments

              Comment on this article