Conclusion In this paper, we have pro_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Lock-Free Optimization for Non-Convex Problems

正在加载图片...

Conclusion server.In Proceedings of the IIth USENIX Symposium on In this paper,we have provided theoretical proof about the Operating Systems Design and Implementation. convergence of two representative lock-free strategy based Li,X.;Zhao,T.;Arora,R.;Han;and Haupt,J.2016.S- parallel SGD methods,Hogwild!and AsySVRG,for non- tochastic variance reduced optimization for nonconvex s- convex problems.Empirical results also show that both Hog- parse learning.In Proceedings of the 33nd International wild!and AsySVRG are convergent on non-convex prob- Conference on Machine Learning. lems.which successfully verifies our theoretical results.To Lian,X.;Huang,Y.;Li,Y.;and Liu,J.2015.Asynchronous the best of our knowledge,this is the first work to prove the parallel stochastic gradient for nonconvex optimization.In convergence of lock-free strategy based parallel SGD meth- Proceedings of the Advances in Neural Information Process- ods for non-convex problems. ing Systems. Recht,B.;Re,C.;Wright,S.;and Niu,F.2011.Hogwild!: Acknowledgements A lock-free approach to parallelizing stochastic gradient de- This work is partially supported by NSFC (No.61472182) scent.In Proceedings of the Advances in Neural Information and a fund from Tencent Processing Systems. Reddi,S.J.;Hefny,A.;Sra,S.;Poczos,B.;and Alex.2016. References Stochastic variance reduction for nonconvex optimization. Allen-Zhu,Z.,and Hazan.E.2016.Variance reduction for In Proceedings of the 33nd International Conference on Ma- faster non-convex optimization.In Proceedings of the 33nd chine Learning. International Conference on Machine Learning. Roux.N.L.:Schmidt,M.W.:and Bach,F.2012.A stochas- Allen-Zhu,Z.,and Yuan,Y.2016.Improved svrg for non- tic gradient method with an exponential convergence rate for strongly-convex or sum-of-non-convex objectives.In Pro- finite training sets.In Proceedings of the Advances in Neural ceedings of the 33nd International Conference on Machine Information Processing Systems. Learning Sa.C.D.:Re.C.:and Olukotun.K.2015.Global conver- Chaturapruek,S.;Duchi,J.C.;and Re,C.2015.Asyn- gence of stochastic gradient descent for some non-convex matrix problems.In Proceedings of the 32nd International chronous stochastic convex optimization:the noise is in the noise and sgd don't care.In Proceedings of the Advances in Conference on Machine Learning. Neural Information Processing Systems. Shalev-Shwartz,S.,and Zhang,T.2013.Stochastic dual coordinate ascent methods for regularized loss.Journal of Ghadimi,S.,and Lan,G.2013.Stochastic first-and zeroth-order methods for nonconvex stochastic program- Machine Learning Research 14(1):567-599. ming.SIAM Journal on Optimization 23(4):2341-2368. Shamir.O.2015.A stochastic PCA and SVD algorithm Huo,Z.,and Huang,H.2016.Asynchronous stochastic with an exponential convergence rate.In Proceedings of the 32nd International Conference on Machine Learning. gradient descent with variance reduction for non-convex op- timization.CoRR abs/1604.03584. Shamir,O.2016a.Convergence of stochastic gradient de- J.Reddi,S.;Hefny,A.;Sra,S.;Poczos,B.;and Smola,A.J. scent for pca.In Proceedings of the 33nd International Con- ference on Machine Learning. 2015.On variance reduction in stochastic gradient descent and its asynchronous variants.In Proceedings of the Ad- Shamir,O.2016b.Fast stochastic algorithms for svd and vances in Neural Information Processing Systems pca:Convergence properties and convexity.In Proceedings of the 33nd International Conference on Machine Learning. Jaggi,M.;Smith,V.;Takac,M.;Terhorst,J.;Krishnan,S.; Hofmann.T.:and Jordan.M.I.2014.Communication- Xing,E.P.;Ho,Q.;Dai,W.;Kim,J.K.;Wei,J.;Lee,S.; efficient distributed dual coordinate ascent.In Proceedings Zheng,X.;Xie,P.:Kumar,A.;and Yu,Y.2015.Petuum: of the Advances in Neural Information Processing Systems. A new platform for distributed machine learning on big da- ta.In Proceedings of the 2Ith ACM SIGKDD International Johnson,R.,and Zhang,T.2013.Accelerating stochastic Conference on Knowledge Discovery and Data Mining. gradient descent using predictive variance reduction.In Pro- ceedings of the Advances in Neural Information Processing Zhang,R.;Zheng,S.;and Kwok,J.T.2016.Asynchronous Systems. distributed semi-stochastic gradient optimization.In Pro ceedings of the AAAI Conference on Artificial Intelligence. Koren,Y.;Bell,R.M.;and Volinsky,C.2009.Matrix fac- torization techniques for recommender systems.IEEE Com Zhao,S.-Y.,and Li,W.-J.2016.Fast asynchronous parallel puter42(8):30-37. stochastic gradient descent:A lock-free approach with con- vergence guarantee.In Proceedings of the AAAI Conference Krizhevsky.A.:Sutskever.I.:and Hinton.G.E.2012.Ima- on Artificial Intelligence. genet classification with deep convolutional neural network- s.In Proceedings of the Advances in Neural Information Processing Systems. Li,M.;Andersen,D.G.;Park,J.W.;Smola,A.J.;Ahmed, A.;Josifovski,V.;Long,J.;Shekita,E.J.;and Su,B.2014. Scaling distributed machine learning with the parameterConclusion In this paper, we have provided theoretical proof about the convergence of two representative lock-free strategy based parallel SGD methods, Hogwild! and AsySVRG, for nonconvex problems. Empirical results also show that both Hogwild! and AsySVRG are convergent on non-convex problems, which successfully verifies our theoretical results. To the best of our knowledge, this is the first work to prove the convergence of lock-free strategy based parallel SGD methods for non-convex problems. Acknowledgements This work is partially supported by NSFC (No. 61472182) and a fund from Tencent. References Allen-Zhu, Z., and Hazan, E. 2016. Variance reduction for faster non-convex optimization. In Proceedings of the 33nd International Conference on Machine Learning. Allen-Zhu, Z., and Yuan, Y. 2016. Improved svrg for nonstrongly-convex or sum-of-non-convex objectives. In Proceedings of the 33nd International Conference on Machine Learning. Chaturapruek, S.; Duchi, J. C.; and Re, C. 2015. Asyn- ´ chronous stochastic convex optimization: the noise is in the noise and sgd don’t care. In Proceedings of the Advances in Neural Information Processing Systems. Ghadimi, S., and Lan, G. 2013. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4):2341–2368. Huo, Z., and Huang, H. 2016. Asynchronous stochastic gradient descent with variance reduction for non-convex optimization. CoRR abs/1604.03584. J. Reddi, S.; Hefny, A.; Sra, S.; Poczos, B.; and Smola, A. J. 2015. On variance reduction in stochastic gradient descent and its asynchronous variants. In Proceedings of the Advances in Neural Information Processing Systems. Jaggi, M.; Smith, V.; Takac, M.; Terhorst, J.; Krishnan, S.; Hofmann, T.; and Jordan, M. I. 2014. Communicationefficient distributed dual coordinate ascent. In Proceedings of the Advances in Neural Information Processing Systems. Johnson, R., and Zhang, T. 2013. Accelerating stochastic gradient descent using predictive variance reduction. In Proceedings of the Advances in Neural Information Processing Systems. Koren, Y.; Bell, R. M.; and Volinsky, C. 2009. Matrix factorization techniques for recommender systems. IEEE Computer 42(8):30–37. Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. Li, M.; Andersen, D. G.; Park, J. W.; Smola, A. J.; Ahmed, A.; Josifovski, V.; Long, J.; Shekita, E. J.; and Su, B. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. Li, X.; Zhao, T.; Arora, R.; Han; and Haupt, J. 2016. Stochastic variance reduced optimization for nonconvex sparse learning. In Proceedings of the 33nd International Conference on Machine Learning. Lian, X.; Huang, Y.; Li, Y.; and Liu, J. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Proceedings of the Advances in Neural Information Processing Systems. Recht, B.; Re, C.; Wright, S.; and Niu, F. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Proceedings of the Advances in Neural Information Processing Systems. Reddi, S. J.; Hefny, A.; Sra, S.; Poczos, B.; and Alex. 2016. Stochastic variance reduction for nonconvex optimization. In Proceedings of the 33nd International Conference on Machine Learning. Roux, N. L.; Schmidt, M. W.; and Bach, F. 2012. A stochastic gradient method with an exponential convergence rate for finite training sets. In Proceedings of the Advances in Neural Information Processing Systems. Sa, C. D.; Re, C.; and Olukotun, K. 2015. Global convergence of stochastic gradient descent for some non-convex matrix problems. In Proceedings of the 32nd International Conference on Machine Learning. Shalev-Shwartz, S., and Zhang, T. 2013. Stochastic dual coordinate ascent methods for regularized loss. Journal of Machine Learning Research 14(1):567–599. Shamir, O. 2015. A stochastic PCA and SVD algorithm with an exponential convergence rate. In Proceedings of the 32nd International Conference on Machine Learning. Shamir, O. 2016a. Convergence of stochastic gradient descent for pca. In Proceedings of the 33nd International Conference on Machine Learning. Shamir, O. 2016b. Fast stochastic algorithms for svd and pca: Convergence properties and convexity. In Proceedings of the 33nd International Conference on Machine Learning. Xing, E. P.; Ho, Q.; Dai, W.; Kim, J. K.; Wei, J.; Lee, S.; Zheng, X.; Xie, P.; Kumar, A.; and Yu, Y. 2015. Petuum: A new platform for distributed machine learning on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Zhang, R.; Zheng, S.; and Kwok, J. T. 2016. Asynchronous distributed semi-stochastic gradient optimization. In Proceedings of the AAAI Conference on Artificial Intelligence. Zhao, S.-Y., and Li, W.-J. 2016. Fast asynchronous parallel stochastic gradient descent: A lock-free approach with convergence guarantee. In Proceedings of the AAAI Conference on Artificial Intelligence

<<向上翻页

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Lock-Free Optimization for Non-Convex Problems