DOI: 10.11992/tis.202006046 非光滑凸情形 Ad

正在加载图片...

第15卷第6期智能系统学报 Vol.15 No.6 2020年11月 CAAI Transactions on Intelligent Systems Nov.2020 D0L:10.11992tis.202006046 非光滑凸情形Adam型算法的最优个体收敛速率黄鉴之，丁成诚'，陶蔚2，陶卿 (1.中国人民解放军陆军炮兵防空兵学院信息工程系，安徽合肥23003引；2.中国人民解放军陆军工程大学指挥控制工程学院，江苏南京210007) 摘要：Adam是目前深度神经网络训练中广泛采用的一种优化算法框架，同时使用了自适应步长和动量技巧，克服了SGD的一些固有缺陷。但即使对于凸优化问题，目前Adam也只是在线学习框架下给出了和梯度下降法一样的rgrt界，动量的加速特性并没有得到体现。这里针对非光滑凸优化问题，通过巧妙选取动量和步长参数，证明了Adam的改进型具有最优的个体收敛速率，从而说明了Adam同时具有自适应和加速的优点。通过求解l,范数约束下的hige损失问题，实验验证了理论分析的正确性和在算法保持稀硫性方面的良好性能。关键词：机器学习；AdaGrad算法；RMSProp算法；动量方法；Adam算法；AMSGrad算法；个体收敛速率；稀疏性中图分类号：TP181文献标志码：A文章编号：1673-4785(2020)06-1140-07 中文引用格式：黄鉴之，丁成减，陶蔚，等.非光滑凸情形Adam型算法的最优个体收敛速率.智能系统学报，2020,15(6)： 1140-1146 英文引用格式：HUANG Jianzhi,DING Chengcheng,TAO Wei,.etal.Optimal individual convergence rate of Adam-type al- gorithms in nonsmooth convex optimizationJ CAAI transactions on intelligent systems,2020,15(6):1140-1146. Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization HUANG Jianzhi',DING Chengcheng,TAO Wei,TAO Qing' (1.Department of Information Engineering,Army Academy of Artillery and Air Defense of PLA,Hefei 230031,China;2.Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China) Abstract:Adam is a popular optimization framework for training deep neural networks,which simultaneously employs adaptive step-size and momentum techniques to overcome some inherent disadvantages of SGD.However,even for the convex optimization problem,Adam proves to have the same regret bound as the gradient descent method under online optimization circumstances;moreover,the momentum acceleration property is not revealed.This paper focuses on nonsmooth convex problems.By selecting suitable time-varying step-size and momentum parameters,the improved Adam algorithm exhibits an optimal individual convergence rate,which indicates that Adam has the advantages of both adaptation and acceleration.Experiments conducted on the /-norm ball constrained hinge loss function problem verify the correctness of the theoretical analysis and the performance of the proposed algorithms in keeping the sparsity. Keywords:machine learning;AdaGrad algorithm;RMSProp algorithm;momentum methods;Adam algorithm;AMS- Grad algorithm;individual convergence rate;sparsity Adam是目前深度学习中广泛采用的一种优使用了自适应步长和动量两种技巧。其中自适应化算法"。与经典梯度下降不同的是，Adam同时步长技巧使算法对超参数不敏感，动量技巧可以加速算法在处理凸优化问题时的收敛速率，在处收稿日期：2020-06-28. 理非凸问题时帮助算法避开鞍点甚至局部极值基金项目：国家自然科学基金项目(61673394：62076252). 通信作者：陶卿.E-mail:qing,tao@ia.ac.cn. 点。与仅使用单一技巧的方法相比，Adam在典DOI: 10.11992/tis.202006046 非光滑凸情形 Adam 型算法的最优个体收敛速率黄鉴之1 ，丁成诚1 ，陶蔚2 ，陶卿1 （1. 中国人民解放军陆军炮兵防空兵学院信息工程系，安徽合肥 230031; 2. 中国人民解放军陆军工程大学指挥控制工程学院，江苏南京 210007） l1 摘要：Adam 是目前深度神经网络训练中广泛采用的一种优化算法框架，同时使用了自适应步长和动量技巧，克服了 SGD 的一些固有缺陷。但即使对于凸优化问题，目前 Adam 也只是在线学习框架下给出了和梯度下降法一样的 regret 界，动量的加速特性并没有得到体现。这里针对非光滑凸优化问题，通过巧妙选取动量和步长参数，证明了 Adam 的改进型具有最优的个体收敛速率，从而说明了 Adam 同时具有自适应和加速的优点。通过求解范数约束下的 hinge 损失问题，实验验证了理论分析的正确性和在算法保持稀疏性方面的良好性能。关键词：机器学习；AdaGrad 算法；RMSProp 算法；动量方法；Adam 算法；AMSGrad 算法；个体收敛速率；稀疏性中图分类号：TP181 文献标志码：A 文章编号：1673−4785(2020)06−1140−07 中文引用格式：黄鉴之, 丁成诚, 陶蔚, 等. 非光滑凸情形 Adam 型算法的最优个体收敛速率 [J]. 智能系统学报, 2020, 15(6): 1140–1146. 英文引用格式：HUANG Jianzhi, DING Chengcheng, TAO Wei, et al. Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization[J]. CAAI transactions on intelligent systems, 2020, 15(6): 1140–1146. Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization HUANG Jianzhi1 ，DING Chengcheng1 ，TAO Wei2 ，TAO Qing1 (1. Department of Information Engineering, Army Academy of Artillery and Air Defense of PLA, Hefei 230031, China; 2. Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China) Abstract: Adam is a popular optimization framework for training deep neural networks, which simultaneously employs adaptive step-size and momentum techniques to overcome some inherent disadvantages of SGD. However, even for the convex optimization problem, Adam proves to have the same regret bound as the gradient descent method under online optimization circumstances; moreover, the momentum acceleration property is not revealed. This paper focuses on nonsmooth convex problems. By selecting suitable time-varying step-size and momentum parameters, the improved Adam algorithm exhibits an optimal individual convergence rate, which indicates that Adam has the advantages of both adaptation and acceleration. Experiments conducted on the l1 -norm ball constrained hinge loss function problem verify the correctness of the theoretical analysis and the performance of the proposed algorithms in keeping the sparsity. Keywords: machine learning; AdaGrad algorithm; RMSProp algorithm; momentum methods; Adam algorithm; AMSGrad algorithm; individual convergence rate; sparsity Adam 是目前深度学习中广泛采用的一种优化算法[1]。与经典梯度下降不同的是，Adam 同时使用了自适应步长和动量两种技巧。其中自适应步长技巧使算法对超参数不敏感，动量技巧可以加速算法在处理凸优化问题时的收敛速率，在处理非凸问题时帮助算法避开鞍点甚至局部极值点。与仅使用单一技巧的方法相比，Adam 在典收稿日期：2020−06−28. 基金项目：国家自然科学基金项目 (61673394；62076252). 通信作者：陶卿. E-mail：qing.tao@ia.ac.cn. 第 15 卷第 6 期智能系统学报 Vol.15 No.6 2020 年 11 月 CAAI Transactions on Intelligent Systems Nov. 2020

向下翻页>>

点击下载：【机器学习】非光滑凸情形Adam型算法的最优个体收敛速率