张威等:基于DL-T及迁移学习的语音识别研究 441 to-end speech recognition /Conference of the International networks.Chin Comput,2017,40(6):1229 Speech Communication Association.Graz,2019:4395 (周飞燕,金林鹏,董军.卷积神经网络研究综述.计算机学报, [16]Bu H,Du J Y,Na X Y,et al.Aishell-1:an open-source mandarin 2017,40(6):1229) speech corpus and a speech recognition baseline[J/OL].arYiv [23]Yi J Y,Tao J H,Liu B,et al.Transfer learning for acoustic preprint(2017-09-16)[2019-10-10].http://arxiv..org/abs/17- modeling of noise robust speech recognition.J Tsinghua Univ Sci 09.05522 Technol,,2018,58(1):55 [17]Battenberg E,Chen J T.Child R,et al.Exploring neural (易江燕,陶建华,刘斌,等.基于迁移学习的噪声鲁棒性语音识 transducers for end-to-end speech recognition //2017 IEEE 别声学建模.清华大学学报:自然科学版,2018.58(1):55) Automatic Speech Recognition and Understanding Workshop [24]Xue J B,Han J Q,Zheng T R,et al.A multi-task learning (ASRU).Okinawa,2017:206 framework for overcoming the catastrophic forgetting in automatic [18]Williams R J,Zipser D.Gradient-based learning algorithms for speech recognition[J/OL].arXiv preprint (2019-04-17)[2019-10- recurrent networks and their computational complexity /Back- 101.https://arxiv.org/abs-/1904.08039 propagation:Theory,Architectures and Applications.1995:433 [25]Mikolov T,Sutskever I,Chen K,et al.Distributed representations [19]Huang G,Liu Z,Maaten L V D,et al.Densely connected of words and phrases and their compositionality /Proceedings of convolutional networks /IEEE Conference on Computer Vision the 26th International Conference on Neural Information and Pattern Recognition.Honolulu,2017:4700 [20]Cao Y,Huang Z L,Zhang W,et al.Urban sound event Processing Systems-Volume 2.Canada,2013:3111 classification with the N-order dense convolutional network.J [26]Povey D.Ghoshal A,Boulianne G,et al.The Kaldi speech Xidian Univ Nat Sci,2019,46(6):9 recognition toolkit IEEE 2011 Workshop on Automatic Speech (曹毅,黄子龙,张威,等.N-DenseNetl的城市声音事件分类模型 Recognition and Understanding.Big Island,2011 西安电子科技大学学报:自然科学版,2019,46(6):9) [27]Paszke A,Gross S,Chintala S,et al.Automatic differentiation in [21]Zhang S,Gong Y H,Wang J J.The development of deep PyTorch 31st Conference on Neural Information Processing convolutional neural networks and its application in computer Systems.Long Beach,2017 vision.ChinJ Comput,2019,42(3):453 [28]Shan C,Weng C,Wang G,et al.Component fusion:learning (张顺,龚怡宏,王进军.深度卷积神经网络的发展及其在计算 replaceable language model component for end-to-end speech 机视觉领域的应用.计算机学报,2019,42(3):453) recognition system IEEE International Conference on Acoustics. [22]Zhou F Y,Jin L P,Dong J.Review of convolutional neural Speech and Signal Processing.Brighton,2019:5361to-end speech recognition // Conference of the International Speech Communication Association. Graz, 2019: 4395 Bu H, Du J Y, Na X Y, et al. Aishell-1: an open-source mandarin speech corpus and a speech recognition baseline[J/OL]. arXiv preprint (2017-09-16)[2019-10-10]. http://arxiv.org/abs/17- 09.05522 [16] Battenberg E, Chen J T, Child R, et al. Exploring neural transducers for end-to-end speech recognition // 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Okinawa, 2017: 206 [17] Williams R J, Zipser D. Gradient-based learning algorithms for recurrent networks and their computational complexity // Backpropagation: Theory, Architectures and Applications. 1995: 433 [18] Huang G, Liu Z, Maaten L V D, et al. Densely connected convolutional networks // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017: 4700 [19] Cao Y, Huang Z L, Zhang W, et al. Urban sound event classification with the N-order dense convolutional network. J Xidian Univ Nat Sci, 2019, 46(6): 9 (曹毅, 黄子龙, 张威, 等. N-DenseNet的城市声音事件分类模型. 西安电子科技大学学报: 自然科学版, 2019, 46(6):9) [20] Zhang S, Gong Y H, Wang J J. The development of deep convolutional neural networks and its application in computer vision. Chin J Comput, 2019, 42(3): 453 (张顺, 龚怡宏, 王进军. 深度卷积神经网络的发展及其在计算 机视觉领域的应用. 计算机学报, 2019, 42(3):453) [21] [22] Zhou F Y, Jin L P, Dong J. Review of convolutional neural networks. Chin J Comput, 2017, 40(6): 1229 (周飞燕, 金林鹏, 董军. 卷积神经网络研究综述. 计算机学报, 2017, 40(6):1229) Yi J Y, Tao J H, Liu B, et al. Transfer learning for acoustic modeling of noise robust speech recognition. J Tsinghua Univ Sci Technol, 2018, 58(1): 55 (易江燕, 陶建华, 刘斌, 等. 基于迁移学习的噪声鲁棒性语音识 别声学建模. 清华大学学报: 自然科学版, 2018, 58(1):55) [23] Xue J B, Han J Q, Zheng T R, et al. A multi-task learning framework for overcoming the catastrophic forgetting in automatic speech recognition[J/OL]. arXiv preprint (2019-04-17)[2019-10- 10]. https://arxiv.org/abs-/1904.08039 [24] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality // Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2.Canada, 2013: 3111 [25] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit // IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. Big Island, 2011 [26] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch // 31st Conference on Neural Information Processing Systems. Long Beach, 2017 [27] Shan C, Weng C, Wang G, et al. Component fusion: learning replaceable language model component for end-to-end speech recognition system // IEEE International Conference on Acoustics, Speech and Signal Processing. Brighton, 2019: 5361 [28] 张 威等: 基于 DL-T 及迁移学习的语音识别研究 · 441 ·