Expanding the length of short utterances for short-duration language recognition

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The language recognition (LR) accuracy is often significantly reduced when the test utterance duration is as short as 10 s or less. This paper describes a method to extend the utterance length using time-scale modification (TSM) which changes the speech rate without changing the spectral information. The algorithm first converts an utterance to several time-stretched or time-compressed versions using TSM. These modified versions with different speech rates are concatenated together with the original one to form a long-duration signal, which is subsequently fed into the LR system. Tests demonstrate that this duration modification method dramatically improves the performance for short utterances.

Abstract

摘要为解决待识别语音时长小于10 s时, 语种识别性能急剧下降的问题, 该文提出应用语音时域伸缩 (time-scale modification, TSM) 技术改变语音的长度 (从而改变了语速), 并保持其他频域信息不变。首先, 对一段待识别语音, 应用TSM技术转换为多条时域压缩和时域拉伸后的语音; 其次, 将这些不同语速的语音与原语音拼接起来, 生成一个时长较长的语音; 最后, 送入语种识别系统进行识别。实验结果表明：所提出的语音时长扩展算法可以显著提升短时语音的语种识别性能。

Author and article information

Journal

Journal ID (publisher-id): J Tsinghua Univ (Sci & Technol)

Title: Journal of Tsinghua University (Science and Technology)

Publisher: Tsinghua University Press

ISSN (Electronic): 1000-0054

Publication date (Print): 15 March 2018

Publication date (Electronic): 14 March 2018

Volume: 58

Issue: 3

Pages: 254-259

Affiliations

[1] ¹Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China

[2] ²University of Chinese Academy of Sciences, Beijing 100190, China

[3] ³Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Xinjiang 830011, China

Author notes

*Corresponding author: ZHOU Ruohua, E-mail: zhouruohua@ 123456hccl.ioa.ac.cn

Article

Publisher ID: j.cnki.qhdxxb.2018.25.015

DOI: 10.16511/j.cnki.qhdxxb.2018.25.015

SO-VID: 2beeb53d-9b3e-4702-a3e0-104da559850b

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 Unported License (CC BY-NC 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc/4.0/.

History

Date received : 26 September 2017

ScienceOpen disciplines: Software engineering,Data structures & Algorithms,Applied computer science,Computer science,Artificial intelligence,Hardware architecture

Keywords: time-scale modification,short-duration,language recognition,speech rate

Data availability:

ScienceOpen disciplines: Software engineering, Data structures & Algorithms, Applied computer science, Computer science, Artificial intelligence, Hardware architecture

Keywords: time-scale modification, short-duration, language recognition, speech rate

Comments

Comment on this article

scite_