0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      High-Fidelity Neural Phonetic Posteriorgrams

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.

          Related collections

          Author and article information

          Journal
          27 February 2024
          Article
          2402.17735
          23ee92ef-407f-43e5-8a6b-c31e6719851b

          http://creativecommons.org/licenses/by/4.0/

          History
          Custom metadata
          Accepted to ICASSP 2024 Workshop on Explainable Machine Learning for Speech and Audio
          eess.AS cs.SD

          Graphics & Multimedia design,Electrical engineering
          Graphics & Multimedia design, Electrical engineering

          Comments

          Comment on this article