5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present EMPHASIS, an emotional phoneme-based acoustic model for speech synthesis system. EMPHASIS includes a phoneme duration prediction model and an acoustic parameter prediction model. It uses a CBHG-based regression network to model the dependencies between linguistic features and acoustic features. We modify the input and output layer structures of the network to improve the performance. For the linguistic features, we apply a feature grouping strategy to enhance emotional and prosodic features. The acoustic parameters are designed to be suitable for the regression task and waveform reconstruction. EMPHASIS can synthesize speech in real-time and generate expressive interrogative and exclamatory speech with high audio quality. EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now. In the experiment of emotional speech synthesis, it achieves better subjective results than other real-time speech synthesis systems.

          Related collections

          Author and article information

          Journal
          24 June 2018
          Article
          1806.09276
          7e5e041c-28d4-4748-aea1-0c53bd91bfe0

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          eess.AS cs.SD

          Graphics & Multimedia design,Electrical engineering
          Graphics & Multimedia design, Electrical engineering

          Comments

          Comment on this article