DOI: 10.11992/tis.201902007 网络出版地址: h

正在加载图片...

第15卷第2期智能系统学报 Vol.15 No.2 2020年3月 CAAI Transactions on Intelligent Systems Mar.2020 D0:10.11992/tis.201902007 网络出版地址：http:/kns.cnki.net/kcms/detail/23.1538.TP.20191205.1449.006.html 一种高效的稀疏卷积神经网络加速器的设计与实现余成宇12，李志远2，毛文宇，鲁华祥2 (1.中国科学院半导体研究所，北京100083,2.中国科学院大学，北京100089；3.中国科学院脑科学与智能技术卓越创新中心，上海200031：4.半导体神经网络智能感知与计算技术北京市重，点实验室，北京100083】摘要：针对卷积神经网络计算硬件化实现困难的问题，之前大部分卷积神经网络加速器的设计都集中于解决计算性能和带宽瓶颈，忽视了卷积神经网络稀疏性对加速器设计的重要意义，近来少量的能够利用稀疏性的卷积神经网络加速器设计也往往难以同时兼顾计算灵活度、并行效率和资源开销。本文首先比较了不同并行展开方式对利用稀疏性的影响，分析了利用稀疏性的不同方法，然后提出了一种能够利用激活稀疏性加速卷积神经网络计算的同时，相比于同领域其他设计，并行效率更高、额外资源开销更小的并行展开方法，最后完成了这种卷积神经网络加速器的设计并在PGA上实现。研究结果表明：运行VGG-16网络，在ImageNet数据集下，该并行展开方法实现的稀硫卷积神经网络加速器和使用相同器件的稠密网络设计相比，卷积性能提升了 108.8%,整体性能提升了164.6%，具有明显的性能优势。关键词：卷积神经网络；稀疏性；嵌入式FPGA;R©LU;硬件加速；并行计算；深度学习中图分类号：TN4文献标志码：A文章编号：1673-4785(2020)02-0323-11 中文引用格式：余成宇，李志远，毛文宇，等.一种高效的稀疏卷积神经网络加速器的设计与实现.智能系统学报，2020， 15(2):323-333. 英文引用格式：YU Chengyu,LI Zhiyuan,MAO Wenyu,.etal.Design and implementation of an efficient accelerator for sparse con volutional neural network Jl.CAAI transactions on intelligent systems,2020,15(2):323-333. Design and implementation of an efficient accelerator for sparse convolutional neural network YU Chengyu2,LI Zhiyuan2,MAO Wenyu',LU Huaxiang'234 (1.Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083,China;2.University of Chinese Academy of Sci- ences,Beijing 100089,China;3.Center for Excellence in Brain Science and Intelligence Technology,Chinese Academy of Sciences, Shanghai 200031,China;4.Semiconductor Neural Network Intelligent Perception and Computing Technology Beijing Key Lab, Beijing 100083,China) Abstract:To address the difficulty experienced by convolutional neural networks(CNNs)in computing hardware im- plementation,most previous designs of convolutional neural network accelerators have focused on solving the computa- tion performance and bandwidth bottlenecks,while ignoring the importance of CNN sparsity to accelerator design.Re- cently,it has often been difficult to simultaneously achieve computational flexibility,parallel efficiency,and resource overhead using the small number of CNN accelerator designs capable of utilizing sparsity.In this paper,we first ana- lyze the effects of different parallel expansion methods on the use of sparsity,analyze different methods that utilize sparsity,and then propose a parallel expansion method that can accelerate CNNs with activated sparsity to achieve high- er parallelism efficiency and lower additional resource cost,as compared with other designs.Lastly,we complete the design of this CNN accelerator and implemented it on FPGA.The results show that compared with a dense network design using the same device,the acceleration performance of the VGG-16 network was increased by 108.8%and its overall performance was improved by 164.6%,which has obvious performance advantages. Keywords:convolutional neural network;sparsity;embedded FPGA;ReLU;hardware acceleration;parallel computing; deep learning 收稿日期：2019-02-14.网络出版日期：2019-12-06. 基金项目：国家白然科学基金项目(61701473)；中国科学院近年来，由于大数据时代海量数据的获取以 STS计划项目(KFJ-STS-ZDTP-070):中国科学院国防科技创新基金项目(CXJ小-17-M152:中国科学院及计算机性能的显著提升，以卷积神经网络为代战略性先导科技专项(A类)XDA18040400):北京市表的深度学习算法在许多领域体现出了巨大的优科技计划项目(Z181100001518006). 通信作者：毛文宇.E-mail:maowenyu(@semi.ac.cn 越性。在计算机视觉方向，深度学习的方法已经DOI: 10.11992/tis.201902007 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20191205.1449.006.html 一种高效的稀疏卷积神经网络加速器的设计与实现余成宇1,2，李志远1,2，毛文宇1 ，鲁华祥1,2,3,4 （1. 中国科学院半导体研究所，北京 100083; 2. 中国科学院大学，北京 100089; 3. 中国科学院脑科学与智能技术卓越创新中心，上海 200031; 4. 半导体神经网络智能感知与计算技术北京市重点实验室，北京 100083）摘要：针对卷积神经网络计算硬件化实现困难的问题，之前大部分卷积神经网络加速器的设计都集中于解决计算性能和带宽瓶颈，忽视了卷积神经网络稀疏性对加速器设计的重要意义，近来少量的能够利用稀疏性的卷积神经网络加速器设计也往往难以同时兼顾计算灵活度、并行效率和资源开销。本文首先比较了不同并行展开方式对利用稀疏性的影响，分析了利用稀疏性的不同方法，然后提出了一种能够利用激活稀疏性加速卷积神经网络计算的同时，相比于同领域其他设计，并行效率更高、额外资源开销更小的并行展开方法，最后完成了这种卷积神经网络加速器的设计并在 FPGA 上实现。研究结果表明：运行 VGG-16 网络，在 ImageNet 数据集下，该并行展开方法实现的稀疏卷积神经网络加速器和使用相同器件的稠密网络设计相比，卷积性能提升了 108.8%，整体性能提升了 164.6%，具有明显的性能优势。关键词：卷积神经网络；稀疏性；嵌入式 FPGA；ReLU；硬件加速；并行计算；深度学习中图分类号：TN4 文献标志码：A 文章编号：1673−4785(2020)02−0323−11 中文引用格式：余成宇, 李志远, 毛文宇, 等. 一种高效的稀疏卷积神经网络加速器的设计与实现 [J]. 智能系统学报, 2020, 15(2): 323–333. 英文引用格式：YU Chengyu, LI Zhiyuan, MAO Wenyu, et al. Design and implementation of an efficient accelerator for sparse convolutional neural network[J]. CAAI transactions on intelligent systems, 2020, 15(2): 323–333. Design and implementation of an efficient accelerator for sparse convolutional neural network YU Chengyu1,2 ，LI Zhiyuan1,2 ，MAO Wenyu1 ，LU Huaxiang1,2,3,4 (1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China; 2. University of Chinese Academy of Sciences, Beijing 100089, China; 3. Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; 4. Semiconductor Neural Network Intelligent Perception and Computing Technology Beijing Key Lab, Beijing 100083, China) Abstract: To address the difficulty experienced by convolutional neural networks (CNNs) in computing hardware implementation, most previous designs of convolutional neural network accelerators have focused on solving the computation performance and bandwidth bottlenecks, while ignoring the importance of CNN sparsity to accelerator design. Recently, it has often been difficult to simultaneously achieve computational flexibility, parallel efficiency, and resource overhead using the small number of CNN accelerator designs capable of utilizing sparsity. In this paper, we first analyze the effects of different parallel expansion methods on the use of sparsity, analyze different methods that utilize sparsity, and then propose a parallel expansion method that can accelerate CNNs with activated sparsity to achieve higher parallelism efficiency and lower additional resource cost, as compared with other designs. Lastly, we complete the design of this CNN accelerator and implemented it on FPGA. The results show that compared with a dense network design using the same device, the acceleration performance of the VGG-16 network was increased by 108.8% and its overall performance was improved by 164.6%, which has obvious performance advantages. Keywords: convolutional neural network; sparsity; embedded FPGA; ReLU; hardware acceleration; parallel computing; deep learning 近年来，由于大数据时代海量数据的获取以及计算机性能的显著提升，以卷积神经网络为代表的深度学习算法在许多领域体现出了巨大的优越性。在计算机视觉方向，深度学习的方法已经收稿日期：2019−02−14. 网络出版日期：2019−12−06. 基金项目：国家自然科学基金项目 (61701473)；中国科学院 STS 计划项目 (KFJ-STS-ZDTP-070)；中国科学院国防科技创新基金项目 (CXJJ-17-M152)；中国科学院战略性先导科技专项 (A 类)(XDA18040400)；北京市科技计划项目 (Z181100001518006). 通信作者：毛文宇. E-mail：maowenyu@semi.ac.cn. 第 15 卷第 2 期智能系统学报 Vol.15 No.2 2020 年 3 月 CAAI Transactions on Intelligent Systems Mar. 2020

向下翻页>>

点击下载：【机器学习】一种高效的稀疏卷积神经网络加速器的设计与实现