Speech feature fusion algorithm based on acoustic state likelihood and supervised state modelling

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A Gaussian mixture model-hidden Markov model (GMM-HMM) for speech recognition uses the most likely state sequence (MLSS) criterion to get the best state series of observations. Since the MLSS search algorithm only considers the maximum likelihood state of speech frame, the effects of other suboptimal states are neglected and some important information is lost, which reduces the system recognition rate. Acoustic state likelihood modelling and supervised state modelling are used here to better utilize the acoustic state likelihood information. A state likelihood cluster feature and a supervised state feature are used to calculate the state likelihood of the acoustic feature Mel frequency cepstrum coefficient (MFCC). Tests show that these three features improve the speech recognition accuracy. The state likelihood cluster and supervised state feature reduce the relative error rate by 6.10% and 9.66% for isolated word recognition compared to GMM-HMM using only MFCC and by 2.53% and 11.05% for continuous speech recognition.

Abstract

摘要语音识别GMM-HMM (Gaussian mixture model-hidden Markov model) 在使用最大似然状态序列 (most likely state sequence, MLSS) 准则得到观测量的最佳状态序列时, 只考虑了具有语音帧最大似然值的状态信息, 而忽略了其他次优状态对当前帧的影响, 造成信息的丢失, 从而降低了系统识别率。为更好地利用声学状态的似然值信息, 该文提出了声学状态似然值得分模型和监督状态模型, 并基于以上模型得到了状态似然聚类特征 (state likelihood cluster feature, SLCF) 、监督状态特征 (supervised state feature, SSF) 。这2种特征反映了MFCC (Mel frequency cepstrum coefficient) 声学特征关于HMM状态的一种信息。实验表明, 将SLCF、SSF分别与MFCC融合, 新的特征可提高语音识别效果。融合了SLCF、SSF后, 与GMM-HMM只使用MFCC相比, 孤立字识别系统的总错误率分别相对下降了6.10%、9.66%, 连续语音识别系统的总错误率分别相对下降了2.53%、11.05%。

Author and article information

Journal

Journal ID (publisher-id): J Tsinghua Univ (Sci & Technol)

Title: Journal of Tsinghua University (Science and Technology)

Publisher: Tsinghua University Press

ISSN (Electronic): 1000-0054

Publication date (Print): 15 June 2019

Publication date (Electronic): 01 June 2019

Volume: 59

Issue: 6

Pages: 476-481

Affiliations

[1] ¹Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

Article

Publisher ID: j.cnki.qhdxxb.2019.21.011

DOI: 10.16511/j.cnki.qhdxxb.2019.21.011

SO-VID: 2be87c13-370e-4789-ae19-f56c19e621b2

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 Unported License (CC BY-NC 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc/4.0/.

History

Date received : 07 December 2018

ScienceOpen disciplines: Software engineering,Data structures & Algorithms,Applied computer science,Computer science,Artificial intelligence,Hardware architecture

Keywords: acoustic feature clustering,state likelihood cluster feature,supervised state feature

Data availability:

ScienceOpen disciplines: Software engineering, Data structures & Algorithms, Applied computer science, Computer science, Artificial intelligence, Hardware architecture

Keywords: acoustic feature clustering, state likelihood cluster feature, supervised state feature

Comments

Comment on this article

scite_