2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Teacher-Student Training for Robust Tacotron-based TTS

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          While neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods in many ways, the exposure bias problem in the autoregressive models remains an issue to be resolved. The exposure bias problem arises from the mismatch between the training and inference process, that results in unpredictable performance for out-of-domain test data at run-time. To overcome this, we propose a teacher-student training scheme for Tacotron-based TTS by introducing a distillation loss function in addition to the feature loss function. We first train a Tacotron2-based TTS model by always providing natural speech frames to the decoder, that serves as a teacher model. We then train another Tacotron2-based model as a student model, of which the decoder takes the predicted speech frames as input, similar to how the decoder works during run-time inference. With the distillation loss, the student model learns the output probabilities from the teacher model, that is called knowledge distillation. Experiments show that our proposed training scheme consistently improves the voice quality for out-of-domain test data both in Chinese and English systems.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: not found
          • Article: not found

          Signal estimation from modified short-time Fourier transform

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

                Bookmark

                Author and article information

                Journal
                07 November 2019
                Article
                1911.02839
                5a882271-0a26-4a58-8f86-1f7c65e703a1

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Submitted to ICASSP2020, Barcelona, Spain
                cs.CL cs.SD eess.AS

                Theoretical computer science,Electrical engineering,Graphics & Multimedia design

                Comments

                Comment on this article