第13卷第3期 智能系统学报 Vol.13 No.3 2018年6月 CAAI Transactions on Intelligent Systems Jun.2018 D0:10.11992/tis.201710029 网络出版地址:http:/kns.cnki.net/kcms/detail/23.1538.TP.20180408.1725.028.html 基于深度神经网络的蒙古语声学模型建模研究 马志强,李图雅,杨双涛,张力 (内蒙古工业大学数据科学与应用学院,内蒙古呼和浩特010080) 摘要:针对高斯混合模型在蒙古语语音识别声学建模中不能充分描述蒙古语声学特征之间相关性和独立性假设的 问题,开展了使用深度神经网铬模型进行蒙古语声学模型建模的研究。以深度神经网络为基础,将分类与语音特征 内在结构的学习紧密结合进行蒙古语声学特征的提取,构建了DNN-HMM蒙古语声学模型,结合无监督预训练与监 督训练调优过程设计了训练算法,在DNN-HMM蒙古语声学模型训练中加入dropout技术避免过拟合现象。最后, 在小规模语料库和Kaldi实验平台下,对GMM-HMM和DNN-HMM蒙古语声学模型进行了对比实验。实验结果表 明,DNN-HMM蒙古语声学模型的词识别错误率降低了7.5%,句识别错误率降低了13.63%;同时,训练时加入do pout技术可以有效避免DNN-HMM蒙古语声学模型的过拟合现象。 关键词:语音识别;声学模型;GMM-HMM;DNN-HMM;监督学习;预训练;过拟合;dropout 中图分类号:TP391 文献标志码:A文章编号:1673-4785(2018)03-0486-07 中文引用格式:马志强,李图雅,杨双涛,等.基于深度神经网络的蒙古语声学模型建模研究.智能系统学报,2018,13(3:486-492 英文引用格式:MA Zhiqiang,LITuya,YANG Shuangtao,.etal.Mongolian acoustic modeling based on deep neural network[J CAAI transactions on intelligent systems,2018,13(3):486-492. Mongolian acoustic modeling based on deep neural network MA Zhiqiang,LI Tuya,YANG Shuangtao,ZHANG Li (School of Data Science &Application,Inner Mongolia University of Technology,Hohhot 010080,China) Abstract:Considering the difficulty of using the Gaussian mixture model(GMM)to adequately describe the correla- tion and independence hypothesis of the Mongolian acoustic features in the acoustic modeling of Mongolian speech re- cognition,this study investigates an acoustic model based on deep neural network(DNN).Firstly,using DNN,the in- ternal structure of phonetic features were classified and learned to extract the Mongolian acoustic features,and a DNN- HMM Mongolian acoustic model was constructed.Secondly,a training algorithm was designed by combining unsuper- vised pre-training and supervised training tuning.In addition,dropout technology was added into the DNN-HMM Mon- golian acoustic model training to avoid the over-fitting phenomenon.Finally,a comparative experiment was conducted for the GMM-HMM and DNN-HMM Mongolian acoustic models on basis of the small-scale corpus and Kaldi experi- mental platform.Experimental results show that the word recognition error rate of DNN-HMM Mongolian model was reduced by 7.5%and sentence recognition error rate was reduced by 13.63%.In addition,the over-fitting of DNN-HMM Mongolian acoustic model can be effectively avoided by adopting the dropout technique during training. Keywords:speech recognition;acoustic model;GMM-HMM;DNN-HMM;supervised learning;pre-training;over-fit- ting,dropout 典型的大词汇量连续语音识别系统(large voca- 征提取、声学模型、语言模型和解码器等组成。声 bulary continuous speech recognition,LVCSR) 学模型是语音识别系统的核心组成部分,基于GMM 和HMM模型构建的GMM-HMM声学模型一度 收稿日期:2017-10-31.网络出版日期:2018-04-09 基金项目:国家自然科学基金项目(61762070,61650205)】 是大词汇量连续语音识别系统中应用最广的声学模 通信作者:李图雅.E-mail:2297854548@qq.com. 型。在GMM-HMM模型中,GMM模型对语音特DOI: 10.11992/tis.201710029 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20180408.1725.028.html 基于深度神经网络的蒙古语声学模型建模研究 马志强,李图雅,杨双涛,张力 (内蒙古工业大学 数据科学与应用学院,内蒙古 呼和浩特 010080) 摘 要:针对高斯混合模型在蒙古语语音识别声学建模中不能充分描述蒙古语声学特征之间相关性和独立性假设的 问题,开展了使用深度神经网络模型进行蒙古语声学模型建模的研究。以深度神经网络为基础,将分类与语音特征 内在结构的学习紧密结合进行蒙古语声学特征的提取,构建了 DNN-HMM 蒙古语声学模型,结合无监督预训练与监 督训练调优过程设计了训练算法,在 DNN-HMM 蒙古语声学模型训练中加入 dropout 技术避免过拟合现象。最后, 在小规模语料库和 Kaldi 实验平台下,对 GMM-HMM 和 DNN-HMM 蒙古语声学模型进行了对比实验。实验结果表 明,DNN-HMM 蒙古语声学模型的词识别错误率降低了 7.5%,句识别错误率降低了 13.63%;同时,训练时加入 dropout 技术可以有效避免 DNN-HMM 蒙古语声学模型的过拟合现象。 关键词:语音识别;声学模型;GMM-HMM;DNN-HMM;监督学习;预训练;过拟合;dropout 中图分类号:TP391 文献标志码:A 文章编号:1673−4785(2018)03−0486−07 中文引用格式:马志强, 李图雅, 杨双涛, 等. 基于深度神经网络的蒙古语声学模型建模研究[J]. 智能系统学报, 2018, 13(3): 486–492. 英文引用格式:MA Zhiqiang, LI Tuya, YANG Shuangtao, et al. Mongolian acoustic modeling based on deep neural network[J]. CAAI transactions on intelligent systems, 2018, 13(3): 486–492. Mongolian acoustic modeling based on deep neural network MA Zhiqiang,LI Tuya,YANG Shuangtao,ZHANG Li (School of Data Science &Application, Inner Mongolia University of Technology, Hohhot 010080, China) Abstract: Considering the difficulty of using the Gaussian mixture model (GMM) to adequately describe the correlation and independence hypothesis of the Mongolian acoustic features in the acoustic modeling of Mongolian speech recognition, this study investigates an acoustic model based on deep neural network (DNN). Firstly, using DNN, the internal structure of phonetic features were classified and learned to extract the Mongolian acoustic features, and a DNNHMM Mongolian acoustic model was constructed. Secondly, a training algorithm was designed by combining unsupervised pre-training and supervised training tuning. In addition, dropout technology was added into the DNN-HMM Mongolian acoustic model training to avoid the over-fitting phenomenon. Finally, a comparative experiment was conducted for the GMM-HMM and DNN-HMM Mongolian acoustic models on basis of the small-scale corpus and Kaldi experimental platform. Experimental results show that the word recognition error rate of DNN-HMM Mongolian model was reduced by 7.5% and sentence recognition error rate was reduced by 13.63%. In addition, the over-fitting of DNN-HMM Mongolian acoustic model can be effectively avoided by adopting the dropout technique during training. Keywords: speech recognition; acoustic model; GMM-HMM; DNN-HMM; supervised learning; pre-training; over-fitting; dropout 典型的大词汇量连续语音识别系统 (large vocabulary continuous speech recognition,LVCSR) 由特 征提取、声学模型、语言模型和解码器等组成。声 学模型是语音识别系统的核心组成部分,基于 GMM 和 HMM 模型构建的 GMM-HMM 声学模型[1]一度 是大词汇量连续语音识别系统中应用最广的声学模 型。在 GMM-HMM 模型中,GMM 模型对语音特 收稿日期:2017−10−31. 网络出版日期:2018−04−09. 基金项目:国家自然科学基金项目 (61762070,61650205). 通信作者:李图雅. E-mail:2297854548@qq.com. 第 13 卷第 3 期 智 能 系 统 学 报 Vol.13 No.3 2018 年 6 月 CAAI Transactions on Intelligent Systems Jun. 2018