正在加载图片...
第2期 余成字,等:一种高效的稀疏卷积神经网络加速器的设计与实现 ·333· 1-7. on Field Programmable Logic and Applications(FPL). [8]CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:an en- Lausanne,Switzerland,2016:1-8 ergy-efficient reconfigurable accelerator for deep convolu- [20]KU NG.Why systolic architectures[J].IEEE Computer, tional neural networks[J].IEEE journal of solid-state cir- 1982.15(1):300-309. cuits.2017,52(1):127-138 [21]GYSEL P.MOTAMEDI M.GHIASI S.Hardware-ori- [9]ALBERICIO J.JUDD P.HETHERINGTON T,et al.Cn- ented approximation of convolutional neural networks[J]. vlutin:ineffectual-neuron-free deep neural network com- Computer Vision and Pattern Recognition,2016,10:1-8. puting[C]//2016 ACM/IEEE 43rd Annual International [22]DORRANCE R,REN Fengbo,MARKOVIC D,et al.A Symposium on Computer Architecture (ISCA).Seoul, scalable sparse matrix-vector multiplication kernel for en- South Korea,2016. ergy-efficient sparse-blas on FPGAs[C]//ACM/SIGDA [10]ZHANG Shijin,DU Zidong,ZHANG Lei,et al.Cam- International Symposium on Field-Programmable Gate bricon-X:an accelerator for sparse neural networks[Cl// Arrays.Monterey,California,USA,2014:161-170. 201649th Annual IEEE/ACM International Symposium [23]QIU Jiantao,WANG Jie,YAO Song,et al.Going deeper on Microarchitecture (MICRO).Taipei,China,2016: with embedded FPGA platform for convolutional neural 1-12. network[C]//Proceedings of the 2016 ACM/SIGDA Inter- [11]HAN Song,LIU Xingyu,MAO Huizi,et al.EIE:effi- national Symposium on Field-Programmable Gate Arrays. cient inference engine on compressed deep neural net- Monterey,California,USA,2016:26-35. work[J].International symposium on computer architec- [24]SUDA N,CHANDRA V,DASIKA G,et al.Throughput- ture,2016,443):243-254. optimized openCL-based FPGA accelerator for large- [12]Parashar A,RHU M,Mukkara A,et al.SCNN:An ac- scale convolutional neural networks[C]//Proceedings of celerator for compressed-sparse convolutional neural net- the 2016 ACM/SIGDA International Symposium on works[C]//The 44th Annual International Symposium. Field-Programmable Gate Arrays.Monterey,California, TorontoCanada,2017. USA,2016. [13]OLSHAUSEN B A,FIELD D J.Sparse coding with an [25]ZHANG Chen,FANG Zhenman,ZHOU Peipei,et al. overcomplete basis set:a strategy employed by V1?[]. Caffeine:towards uniformed representation and accelera- Vision research,1997,37(23):3311-3325. tion for deep convolutional neural networks[C]//Proceed- [14]DAYAN P,ABBOTT L F.Theoretical neuroscience: ings of the 35th International Conference on Computer- computational and mathematical modeling of neural sys- Aided Design.Austin,Texas,USA,2016 tems[M].Cambridge,USA:The MIT Press,2001. [15]NAIR V,HINTON G E.Rectified linear units improve re- 作者简介: stricted Boltzmann machines[C]//Proceedings of the 27th 余成宇,硕士研究生,主要研究方 International Conference on International Conference on 向为算法硬件加速。 Machine Learning.Haifa,Israel,2010. [16]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Im- ageNet classification with deep convolutional neural net- works[C]//Proceedings of the 25th International Confer- ence on Neural Information Processing Systems.Lake Tahoe,Nevada,2012. 李志远,博士研究生,主要研究方 [17]HAN Song,POOL J,TRAN J,et al.Learning both 向为计算机视觉。 weights and connections for efficient neural networks[Cl// Proceedings of the 28th International Conference on Neural Information Processing Systems.Montreal, Canada,2015:1135-1143. [18]HAN Song,MAO Huizi,DALLY WJ,et al.Deep com- pression:compressing deep neural networks with pruning, 毛文宇,助理研究员,主要研究方 trained quantization and Huffman coding[C]//Internation- 向为智能计算系统、人工智能算法、信 al Conference on Learning Representations,San Juan,Pu- 号处理。主持国家自然科学基金项目 erto Rico,2016. 1项,中科院创新基金项目1项,授权 [19]MA Yufei,SUDA N.CAO Yu,et al.Scalable and modu- 专利1项。发表学术论文10余篇。 larized RTL compilation of convolutional neural net- works onto FPGA[C]//201626th International Conference1–7. CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: an en￾ergy-efficient reconfigurable accelerator for deep convolu￾tional neural networks[J]. IEEE journal of solid-state cir￾cuits, 2017, 52(1): 127–138. [8] ALBERICIO J, JUDD P, HETHERINGTON T, et al. Cn￾vlutin: ineffectual-neuron-free deep neural network com￾puting[C]//2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Seoul, South Korea, 2016. [9] ZHANG Shijin, DU Zidong, ZHANG Lei, et al. Cam￾bricon-X: an accelerator for sparse neural networks[C]// 201649th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Taipei, China, 2016: 1−12. [10] HAN Song, LIU Xingyu, MAO Huizi, et al. EIE: effi￾cient inference engine on compressed deep neural net￾work[J]. International symposium on computer architec￾ture, 2016, 44(3): 243–254. [11] Parashar A , RHU M , Mukkara A , et al. SCNN: An ac￾celerator for compressed-sparse convolutional neural net￾works[C]//The 44th Annual International Symposium. TorontoCanada, 2017. [12] OLSHAUSEN B A, FIELD D J. Sparse coding with an overcomplete basis set: a strategy employed by V1?[J]. Vision research, 1997, 37(23): 3311–3325. [13] DAYAN P, ABBOTT L F. Theoretical neuroscience: computational and mathematical modeling of neural sys￾tems[M]. Cambridge, USA: The MIT Press, 2001. [14] NAIR V, HINTON G E. Rectified linear units improve re￾stricted Boltzmann machines[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010. [15] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Im￾ageNet classification with deep convolutional neural net￾works[C]//Proceedings of the 25th International Confer￾ence on Neural Information Processing Systems. Lake Tahoe, Nevada, 2012. [16] HAN Song, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada, 2015: 1135−1143. [17] HAN Song, MAO Huizi, DALLY W J, et al. Deep com￾pression: compressing deep neural networks with pruning, trained quantization and Huffman coding[C]//Internation￾al Conference on Learning Representations, San Juan, Pu￾erto Rico,2016. [18] MA Yufei, SUDA N, CAO Yu, et al. Scalable and modu￾larized RTL compilation of convolutional neural net￾works onto FPGA[C]//201626th International Conference [19] on Field Programmable Logic and Applications (FPL). Lausanne, Switzerland, 2016: 1−8. KU NG. Why systolic architectures[J]. IEEE Computer, 1982, 15(1): 300–309. [20] GYSEL P, MOTAMEDI M, GHIASI S. Hardware-ori￾ented approximation of convolutional neural networks[J]. Computer Vision and Pattern Recognition, 2016, 10: 1–8. [21] DORRANCE R, REN Fengbo, MARKOVIĆ D, et al. A scalable sparse matrix-vector multiplication kernel for en￾ergy-efficient sparse-blas on FPGAs[C]//ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, California, USA, 2014: 161−170. [22] QIU Jiantao, WANG Jie, YAO Song, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]//Proceedings of the 2016 ACM/SIGDA Inter￾national Symposium on Field-Programmable Gate Arrays. Monterey, California, USA, 2016: 26−35. [23] SUDA N, CHANDRA V, DASIKA G, et al. Throughput￾optimized openCL-based FPGA accelerator for large￾scale convolutional neural networks[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, California, USA, 2016. [24] ZHANG Chen, FANG Zhenman, ZHOU Peipei, et al. Caffeine: towards uniformed representation and accelera￾tion for deep convolutional neural networks[C]//Proceed￾ings of the 35th International Conference on Computer￾Aided Design. Austin, Texas, USA, 2016. [25] 作者简介: 余成宇,硕士研究生,主要研究方 向为算法硬件加速。 李志远,博士研究生,主要研究方 向为计算机视觉。 毛文宇,助理研究员,主要研究方 向为智能计算系统、人工智能算法、信 号处理。主持国家自然科学基金项目 1 项,中科院创新基金项目 1 项,授权 专利 1 项。发表学术论文 10 余篇。 第 2 期 余成宇,等:一种高效的稀疏卷积神经网络加速器的设计与实现 ·333·
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有