正在加载图片...
Conclusion server.In Proceedings of the IIth USENIX Symposium on In this paper,we have provided theoretical proof about the Operating Systems Design and Implementation. convergence of two representative lock-free strategy based Li,X.;Zhao,T.;Arora,R.;Han;and Haupt,J.2016.S- parallel SGD methods,Hogwild!and AsySVRG,for non- tochastic variance reduced optimization for nonconvex s- convex problems.Empirical results also show that both Hog- parse learning.In Proceedings of the 33nd International wild!and AsySVRG are convergent on non-convex prob- Conference on Machine Learning. lems.which successfully verifies our theoretical results.To Lian,X.;Huang,Y.;Li,Y.;and Liu,J.2015.Asynchronous the best of our knowledge,this is the first work to prove the parallel stochastic gradient for nonconvex optimization.In convergence of lock-free strategy based parallel SGD meth- Proceedings of the Advances in Neural Information Process- ods for non-convex problems. ing Systems. Recht,B.;Re,C.;Wright,S.;and Niu,F.2011.Hogwild!: Acknowledgements A lock-free approach to parallelizing stochastic gradient de- This work is partially supported by NSFC (No.61472182) scent.In Proceedings of the Advances in Neural Information and a fund from Tencent Processing Systems. Reddi,S.J.;Hefny,A.;Sra,S.;Poczos,B.;and Alex.2016. References Stochastic variance reduction for nonconvex optimization. Allen-Zhu,Z.,and Hazan.E.2016.Variance reduction for In Proceedings of the 33nd International Conference on Ma- faster non-convex optimization.In Proceedings of the 33nd chine Learning. International Conference on Machine Learning. Roux.N.L.:Schmidt,M.W.:and Bach,F.2012.A stochas- Allen-Zhu,Z.,and Yuan,Y.2016.Improved svrg for non- tic gradient method with an exponential convergence rate for strongly-convex or sum-of-non-convex objectives.In Pro- finite training sets.In Proceedings of the Advances in Neural ceedings of the 33nd International Conference on Machine Information Processing Systems. Learning Sa.C.D.:Re.C.:and Olukotun.K.2015.Global conver- Chaturapruek,S.;Duchi,J.C.;and Re,C.2015.Asyn- gence of stochastic gradient descent for some non-convex matrix problems.In Proceedings of the 32nd International chronous stochastic convex optimization:the noise is in the noise and sgd don't care.In Proceedings of the Advances in Conference on Machine Learning. Neural Information Processing Systems. Shalev-Shwartz,S.,and Zhang,T.2013.Stochastic dual coordinate ascent methods for regularized loss.Journal of Ghadimi,S.,and Lan,G.2013.Stochastic first-and zeroth-order methods for nonconvex stochastic program- Machine Learning Research 14(1):567-599. ming.SIAM Journal on Optimization 23(4):2341-2368. Shamir.O.2015.A stochastic PCA and SVD algorithm Huo,Z.,and Huang,H.2016.Asynchronous stochastic with an exponential convergence rate.In Proceedings of the 32nd International Conference on Machine Learning. gradient descent with variance reduction for non-convex op- timization.CoRR abs/1604.03584. Shamir,O.2016a.Convergence of stochastic gradient de- J.Reddi,S.;Hefny,A.;Sra,S.;Poczos,B.;and Smola,A.J. scent for pca.In Proceedings of the 33nd International Con- ference on Machine Learning. 2015.On variance reduction in stochastic gradient descent and its asynchronous variants.In Proceedings of the Ad- Shamir,O.2016b.Fast stochastic algorithms for svd and vances in Neural Information Processing Systems pca:Convergence properties and convexity.In Proceedings of the 33nd International Conference on Machine Learning. Jaggi,M.;Smith,V.;Takac,M.;Terhorst,J.;Krishnan,S.; Hofmann.T.:and Jordan.M.I.2014.Communication- Xing,E.P.;Ho,Q.;Dai,W.;Kim,J.K.;Wei,J.;Lee,S.; efficient distributed dual coordinate ascent.In Proceedings Zheng,X.;Xie,P.:Kumar,A.;and Yu,Y.2015.Petuum: of the Advances in Neural Information Processing Systems. A new platform for distributed machine learning on big da- ta.In Proceedings of the 2Ith ACM SIGKDD International Johnson,R.,and Zhang,T.2013.Accelerating stochastic Conference on Knowledge Discovery and Data Mining. gradient descent using predictive variance reduction.In Pro- ceedings of the Advances in Neural Information Processing Zhang,R.;Zheng,S.;and Kwok,J.T.2016.Asynchronous Systems. distributed semi-stochastic gradient optimization.In Pro ceedings of the AAAI Conference on Artificial Intelligence. Koren,Y.;Bell,R.M.;and Volinsky,C.2009.Matrix fac- torization techniques for recommender systems.IEEE Com Zhao,S.-Y.,and Li,W.-J.2016.Fast asynchronous parallel puter42(8):30-37. stochastic gradient descent:A lock-free approach with con- vergence guarantee.In Proceedings of the AAAI Conference Krizhevsky.A.:Sutskever.I.:and Hinton.G.E.2012.Ima- on Artificial Intelligence. genet classification with deep convolutional neural network- s.In Proceedings of the Advances in Neural Information Processing Systems. Li,M.;Andersen,D.G.;Park,J.W.;Smola,A.J.;Ahmed, A.;Josifovski,V.;Long,J.;Shekita,E.J.;and Su,B.2014. Scaling distributed machine learning with the parameterConclusion In this paper, we have provided theoretical proof about the convergence of two representative lock-free strategy based parallel SGD methods, Hogwild! and AsySVRG, for non￾convex problems. Empirical results also show that both Hog￾wild! and AsySVRG are convergent on non-convex prob￾lems, which successfully verifies our theoretical results. To the best of our knowledge, this is the first work to prove the convergence of lock-free strategy based parallel SGD meth￾ods for non-convex problems. Acknowledgements This work is partially supported by NSFC (No. 61472182) and a fund from Tencent. References Allen-Zhu, Z., and Hazan, E. 2016. Variance reduction for faster non-convex optimization. In Proceedings of the 33nd International Conference on Machine Learning. Allen-Zhu, Z., and Yuan, Y. 2016. Improved svrg for non￾strongly-convex or sum-of-non-convex objectives. In Pro￾ceedings of the 33nd International Conference on Machine Learning. Chaturapruek, S.; Duchi, J. C.; and Re, C. 2015. Asyn- ´ chronous stochastic convex optimization: the noise is in the noise and sgd don’t care. In Proceedings of the Advances in Neural Information Processing Systems. Ghadimi, S., and Lan, G. 2013. Stochastic first- and zeroth-order methods for nonconvex stochastic program￾ming. SIAM Journal on Optimization 23(4):2341–2368. Huo, Z., and Huang, H. 2016. Asynchronous stochastic gradient descent with variance reduction for non-convex op￾timization. CoRR abs/1604.03584. J. Reddi, S.; Hefny, A.; Sra, S.; Poczos, B.; and Smola, A. J. 2015. On variance reduction in stochastic gradient descent and its asynchronous variants. In Proceedings of the Ad￾vances in Neural Information Processing Systems. Jaggi, M.; Smith, V.; Takac, M.; Terhorst, J.; Krishnan, S.; Hofmann, T.; and Jordan, M. I. 2014. Communication￾efficient distributed dual coordinate ascent. In Proceedings of the Advances in Neural Information Processing Systems. Johnson, R., and Zhang, T. 2013. Accelerating stochastic gradient descent using predictive variance reduction. In Pro￾ceedings of the Advances in Neural Information Processing Systems. Koren, Y.; Bell, R. M.; and Volinsky, C. 2009. Matrix fac￾torization techniques for recommender systems. IEEE Com￾puter 42(8):30–37. Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Ima￾genet classification with deep convolutional neural network￾s. In Proceedings of the Advances in Neural Information Processing Systems. Li, M.; Andersen, D. G.; Park, J. W.; Smola, A. J.; Ahmed, A.; Josifovski, V.; Long, J.; Shekita, E. J.; and Su, B. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. Li, X.; Zhao, T.; Arora, R.; Han; and Haupt, J. 2016. S￾tochastic variance reduced optimization for nonconvex s￾parse learning. In Proceedings of the 33nd International Conference on Machine Learning. Lian, X.; Huang, Y.; Li, Y.; and Liu, J. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Proceedings of the Advances in Neural Information Process￾ing Systems. Recht, B.; Re, C.; Wright, S.; and Niu, F. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient de￾scent. In Proceedings of the Advances in Neural Information Processing Systems. Reddi, S. J.; Hefny, A.; Sra, S.; Poczos, B.; and Alex. 2016. Stochastic variance reduction for nonconvex optimization. In Proceedings of the 33nd International Conference on Ma￾chine Learning. Roux, N. L.; Schmidt, M. W.; and Bach, F. 2012. A stochas￾tic gradient method with an exponential convergence rate for finite training sets. In Proceedings of the Advances in Neural Information Processing Systems. Sa, C. D.; Re, C.; and Olukotun, K. 2015. Global conver￾gence of stochastic gradient descent for some non-convex matrix problems. In Proceedings of the 32nd International Conference on Machine Learning. Shalev-Shwartz, S., and Zhang, T. 2013. Stochastic dual coordinate ascent methods for regularized loss. Journal of Machine Learning Research 14(1):567–599. Shamir, O. 2015. A stochastic PCA and SVD algorithm with an exponential convergence rate. In Proceedings of the 32nd International Conference on Machine Learning. Shamir, O. 2016a. Convergence of stochastic gradient de￾scent for pca. In Proceedings of the 33nd International Con￾ference on Machine Learning. Shamir, O. 2016b. Fast stochastic algorithms for svd and pca: Convergence properties and convexity. In Proceedings of the 33nd International Conference on Machine Learning. Xing, E. P.; Ho, Q.; Dai, W.; Kim, J. K.; Wei, J.; Lee, S.; Zheng, X.; Xie, P.; Kumar, A.; and Yu, Y. 2015. Petuum: A new platform for distributed machine learning on big da￾ta. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Zhang, R.; Zheng, S.; and Kwok, J. T. 2016. Asynchronous distributed semi-stochastic gradient optimization. In Pro￾ceedings of the AAAI Conference on Artificial Intelligence. Zhao, S.-Y., and Li, W.-J. 2016. Fast asynchronous parallel stochastic gradient descent: A lock-free approach with con￾vergence guarantee. In Proceedings of the AAAI Conference on Artificial Intelligence
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有