张晴晴等: 基于卷积神经网络的连续语音识别征、卷积器尺寸和个数、计算量和

正在加载图片...

张晴晴等：基于卷积神经网络的连续语音识别 *1217· 征、卷积器尺寸和个数、计算量和模型规模等做了详细 formation Processing Systems,2012:197 的对比实验，并与普遍使用的深层神经网络进行了对 7]Wolf L.DeepFace:closing the gap to human-evel performance in 比.卷积神经网络通过卷积层对局部特征进行观察， face verification /IEEE Conference on Computer Vision and Pat- tern Recognition.Columbus,2014 再经过全网络层的信息整合最终得到输出概率，相比 [8]Abdel-Hamid O,Mohamed A,Jiang H,Penn G.Applying convo- 深层神经网络具有更好的物理意义.同时，由于卷积 lutional neural networks concepts to hybrid NN-HMM model for 神经网络的权值共享，使得模型复杂度大大降低.在 speech recognition /2012 IEEE International Conference on A- 多个标准库上的实验证明，在计算量比深层神经网络 coustics,Speech and Signal Processing (ICASSP).Kyoto,2012: 更少的条件下，卷积神经网络的识别性能更优，泛化能 4277 力更强. ]Sainath T N,Mohamed A R,Kingsbury B,et al.Deep convolu- tional neural networks for LVCSR /2013 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). 参考文献 Vancouver,2013:8614 00] Abdel-Hamid O,Deng L,Yu D.Exploring convolutional neural [1]Dahl G E,Yu D,Deng L,et al.Context-dependent pre-trained network structures and optimization techniques for speech recog- deep neural networks for large-vocabulary speech recognition. nition /INTERSPEECH.Lyon,2013:3366 IEEE Trans Audio Speech Lang Process,2012.20(1):30 [11]TIMIT.Linguistic Data Consortium [DB/OL][2014-08-10]. ]Hinton C,Deng L,Yu D,et al.Deep neural networks for acous- http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?cata- tic modeling in speech recognition:the shared views of four re logld LDC93SI search groups.IEEE Signal Process Mag,2012,29(6):82 [12]LeCun Y,Huang F J,Bottou L.Learning methods for generic 3]Yu D,Deng L.Deep leaming and its applications to signal and object recognition with invariance to pose and lighting /Pro- information processing.IEEE Signal Process Mag,2011,28 (1): ceedings of the 2004 IEEE Computer Society Conference on Com- 145 puter Vision and Pattern Recognition.Washinglon,2004:97- 4]LeCun Y,Bengio Y.Convolutional networks for images,speech, 104 and time series /The Handbook of Brain Theory and Neural Net- 03] Zhang QQ,Pan J L,Yan Y H.Tonal articulatory feature for works,1995 Mandarin and its application to conversational LVCSR /Tenth [5]Fan B L.Research on Parallelization of Convolutional Neural Net- Annual Conference of the International Speech Communication As- works DDissertation].Zhengzhou:Zhengzhou University,2013 sociation.Brighton,2009:3007 (凡保磊.卷积神经网路的并行化研究[学位论文].郑州，郑 [14]Zhang QQ,Cai S,Pan J L,et al.Improved acoustic models for 州大学，2013) conversational telephone speech recognition /9th International [6]Krizhevsky A,Sutskever I,Hinton G E.ImageNet classification Conference on Fuzzy Systems and Knowledge Discovery (FSKD) with deep convolutional neural networks /Adrances in Neural In- IEEE,2012:1229张晴晴等: 基于卷积神经网络的连续语音识别征、卷积器尺寸和个数、计算量和模型规模等做了详细的对比实验，并与普遍使用的深层神经网络进行了对比．卷积神经网络通过卷积层对局部特征进行观察，再经过全网络层的信息整合最终得到输出概率，相比深层神经网络具有更好的物理意义．同时，由于卷积神经网络的权值共享，使得模型复杂度大大降低．在多个标准库上的实验证明，在计算量比深层神经网络更少的条件下，卷积神经网络的识别性能更优，泛化能力更强．参考文献［1］ Dahl G E，Yu D，Deng L，et al． Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition． IEEE Trans Audio Speech Lang Process，2012，20( 1) : 30 ［2］ Hinton G，Deng L，Yu D，et al． Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups． IEEE Signal Process Mag，2012，29( 6) : 82 ［3］ Yu D，Deng L． Deep learning and its applications to signal and information processing． IEEE Signal Process Mag，2011，28( 1) : 145 ［4］ LeCun Y，Bengio Y． Convolutional networks for images，speech， and time series / / The Handbook of Brain Theory and Neural Networks，1995 ［5］ Fan B L．Ｒesearch on Parallelization of Convolutional Neural Networks ［Dissertation］． Zhengzhou: Zhengzhou University，2013 ( 凡保磊．卷积神经网络的并行化研究［学位论文］．郑州，郑州大学，2013) ［6］ Krizhevsky A，Sutskever I，Hinton G E． ImageNet classification with deep convolutional neural networks / / Advances in Neural Information Processing Systems，2012: 1097 ［7］ Wolf L． DeepFace: closing the gap to human-level performance in face verification / / IEEE Conference on Computer Vision and Pattern Ｒecognition． Columbus，2014 ［8］ Abdel-Hamid O，Mohamed A，Jiang H，Penn G． Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition / / 2012 IEEE International Conference on Acoustics，Speech and Signal Processing ( ICASSP) ． Kyoto，2012: 4277 ［9］ Sainath T N，Mohamed A Ｒ，Kingsbury B，et al． Deep convolutional neural networks for LVCSＲ / / 2013 IEEE International Conference on Acoustics，Speech and Signal Processing ( ICASSP) ． Vancouver，2013: 8614 ［10］ Abdel-Hamid O，Deng L，Yu D． Exploring convolutional neural network structures and optimization techniques for speech recognition / / INTEＲSPEECH． Lyon，2013: 3366 ［11］ TIMIT． Linguistic Data Consortium［DB /OL］［2014--08--10］． http: / /www． ldc． upenn． edu /Catalog /CatalogEntry． jsp? catalogId = LDC93S1 ［12］ LeCun Y，Huang F J，Bottou L． Learning methods for generic object recognition with invariance to pose and lighting / / Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Ｒecognition． Washington，2004: II--97-- 104 ［13］ Zhang Q Q，Pan J L，Yan Y H． Tonal articulatory feature for Mandarin and its application to conversational LVCSＲ / / Tenth Annual Conference of the International Speech Communication Association． Brighton，2009: 3007 ［14］ Zhang Q Q，Cai S，Pan J L，et al． Improved acoustic models for conversational telephone speech recognition / / 9th International Conference on Fuzzy Systems and Knowledge Discovery ( FSKD) ． IEEE，2012: 1229 ·1217·

<<向上翻页

点击下载：基于卷积神经网络的连续语音识别