The language recognition (LR) accuracy is often significantly reduced when the test utterance duration is as short as 10 s or less. This paper describes a method to extend the utterance length using time-scale modification (TSM) which changes the speech rate without changing the spectral information. The algorithm first converts an utterance to several time-stretched or time-compressed versions using TSM. These modified versions with different speech rates are concatenated together with the original one to form a long-duration signal, which is subsequently fed into the LR system. Tests demonstrate that this duration modification method dramatically improves the performance for short utterances.
摘要 为解决待识别语音时长小于10 s时, 语种识别性能急剧下降的问题, 该文提出应用语音时域伸缩 (time-scale modification, TSM) 技术改变语音的长度 (从而改变了语速), 并保持其他频域信息不变。首先, 对一段待识别语音, 应用TSM技术转换为多条时域压缩和时域拉伸后的语音; 其次, 将这些不同语速的语音与原语音拼接起来, 生成一个时长较长的语音; 最后, 送入语种识别系统进行识别。实验结果表明:所提出的语音时长扩展算法可以显著提升短时语音的语种识别性能。