44
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Expanding the length of short utterances for short-duration language recognition

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The language recognition (LR) accuracy is often significantly reduced when the test utterance duration is as short as 10 s or less. This paper describes a method to extend the utterance length using time-scale modification (TSM) which changes the speech rate without changing the spectral information. The algorithm first converts an utterance to several time-stretched or time-compressed versions using TSM. These modified versions with different speech rates are concatenated together with the original one to form a long-duration signal, which is subsequently fed into the LR system. Tests demonstrate that this duration modification method dramatically improves the performance for short utterances.

          Abstract

          摘要 为解决待识别语音时长小于10 s时, 语种识别性能急剧下降的问题, 该文提出应用语音时域伸缩 (time-scale modification, TSM) 技术改变语音的长度 (从而改变了语速), 并保持其他频域信息不变。首先, 对一段待识别语音, 应用TSM技术转换为多条时域压缩和时域拉伸后的语音; 其次, 将这些不同语速的语音与原语音拼接起来, 生成一个时长较长的语音; 最后, 送入语种识别系统进行识别。实验结果表明:所提出的语音时长扩展算法可以显著提升短时语音的语种识别性能。

          Author and article information

          Journal
          J Tsinghua Univ (Sci & Technol)
          Journal of Tsinghua University (Science and Technology)
          Tsinghua University Press
          1000-0054
          15 March 2018
          14 March 2018
          : 58
          : 3
          : 254-259
          Affiliations
          [1] 1Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
          [2] 2University of Chinese Academy of Sciences, Beijing 100190, China
          [3] 3Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Xinjiang 830011, China
          Author notes
          *Corresponding author: ZHOU Ruohua, E-mail: zhouruohua@ 123456hccl.ioa.ac.cn
          Article
          j.cnki.qhdxxb.2018.25.015
          10.16511/j.cnki.qhdxxb.2018.25.015
          2beeb53d-9b3e-4702-a3e0-104da559850b
          Copyright © Journal of Tsinghua University

          This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 Unported License (CC BY-NC 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc/4.0/.

          History
          : 26 September 2017

          Software engineering,Data structures & Algorithms,Applied computer science,Computer science,Artificial intelligence,Hardware architecture
          time-scale modification,short-duration,language recognition,speech rate

          Comments

          Comment on this article