Automatic prosodic boundary labeling based on fusing the silence duration with the lexical features

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Automatic prosodic boundary labeling is important in the construction of a speech corpus for speech synthesis. Automatic labeling of prosodic boundaries gives more consistent results than manual labeling of prosodic boundaries which is time consuming and inconsistent. Manual labeling method is modelled here using a recurrent neural network to train two sub-models which use lexical features and acoustic features to label the prosodic boundaries. Model fusion is then used to combine the outputs of the two sub-models to obtain the optimal labeling results. The silence durations for each word give clearer physical meanings and better correlations with the prosodic boundaries than the acoustic features used in traditional methods extracted frame-by-frame. Tests show that the silence durations extracted using the current acoustic features and the model fusion method improve the prosodic boundary labeling compared with previous feature fusion methods.

Abstract

摘要韵律边界标注对于语料库建设和语音合成有着至关重要的作用, 而自动韵律标注可以克服人工标注中耗时、不一致的缺点。仿照人工标注流程, 该文运用循环神经网络分别对文本和音频两个通道训练子模型, 对子模型的输出采用模型融合的方法, 从而获得最优标注。以词为单位提取了静音时长, 与传统以帧为单位的声学特征相比更具有明确的物理意义, 与韵律边界的联系更加紧密。实验结果表明：相比传统声学特征, 该文所采用的静音时长特征使自动韵律标注的性能有所提高; 相比直接特征层面的方法, 决策融合方法更好地结合了声学和文本的特征, 进一步提高了标注的性能。

Author and article information

Journal

Journal ID (publisher-id): J Tsinghua Univ (Sci & Technol)

Title: Journal of Tsinghua University (Science and Technology)

Publisher: Tsinghua University Press

ISSN (Electronic): 1000-0054

Publication date (Print): 15 January 2018

Publication date (Electronic): 19 January 2018

Volume: 58

Issue: 1

Pages: 61-66,74

Affiliations

[1] ¹National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

[2] ²School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China

[3] ³CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Author notes

*Corresponding author: TAO Jianhua, E-mail: jhtao@ 123456nlpr.ia.ac

Article

Publisher ID: j.cnki.qhdxxb.2018.21.003

DOI: 10.16511/j.cnki.qhdxxb.2018.21.003

SO-VID: 37d32840-4154-4269-8f99-7d01d7a1a30d

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 Unported License (CC BY-NC 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc/4.0/.

History

Date received : 29 September 2017

ScienceOpen disciplines: Software engineering,Data structures & Algorithms,Applied computer science,Computer science,Artificial intelligence,Hardware architecture

Keywords: prosodic boundary labeling,ensemble strategy,speech synthesis,silence duration,corpus construction

Data availability:

ScienceOpen disciplines: Software engineering, Data structures & Algorithms, Applied computer science, Computer science, Artificial intelligence, Hardware architecture

Keywords: prosodic boundary labeling, ensemble strategy, speech synthesis, silence duration, corpus construction

Comments

Comment on this article

scite_