Acknowledgements Riguzzi,F.;Bellodi,E.;Zese,R.:Cota,G.;and Lamma.E.2016.S- This work is partially supported by the"DengFeng"project caling structure learning of probabilistic logic programs by mapre- of Nanjing University. duce.In European Conference on Artificial Intelligence Schmidt,M.W.;Roux,N.L.;and Bach,F.R.2013.Minimiz- References ing finite sums with the stochastic average gradient. CoRR ab- s/1309.2388. Bottou,L.2010.Large-scale machine learning with stochastic Shalev-Shwartz,S.,and Zhang.T.2013.Stochastic dual coordinate gradient descent.In International Conference on Computational ascent methods for regularized loss.Journal of Machine Learning Statistics. Research14(1):567-599. De.S..and Goldstein.T.2016.Efficient distributed SGD with vari- ance reduction.In IEEE International Conference on Data Mining. Shalev-Shwartz,S.,and Zhang,T.2014.Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization Duchi,J.C.;Hazan,E.;and Singer,Y.2011.Adaptive subgradient In International Conference on Machine Learning methods for online learning and stochastic optimization.Journal Xiao,L.2009.Dual averaging method for regularized stochastic of Machine Learning Research 12:2121-2159. learning and online optimization.In Neural Information Process- Hsieh,C.-J.;Yu.H.-F.;and Dhillon.I.S.2015.Passcode:Parallel ing Systems. asynchronous stochastic dual co-ordinate descent.In International Xing.E.P.:Ho,Q.;Dai,W.:Kim,J.K.:Wei,J.;Lee,S.;Zheng.X.: Conference on Machine Learning. Xie,P.:Kumar,A.:and Yu,Y.2015.Petuum:A new platform for J.Reddi,S.;Hefny,A.;Sra,S.;Poczos,B.;and Smola,A.J.2015. distributed machine learning on big data.In ACM SIGKDD Inter- On variance reduction in stochastic gradient descent and its asyn- national Conference on Knowledge Discovery and Data Mining. chronous variants.In Neural Information Processing Systems. Yang,T.2013.Trading computation for communication:Distribut- Jaggi,M.;Smith,V.:Takac,M.;Terhorst,J.;Krishnan,S.;Hofman- ed stochastic dual coordinate ascent.In Neural Information Pro- n,T.:and Jordan.M.I.2014.Communication-efficient distributed cessing Systems. dual coordinate ascent.In Neural Information Processing Systems. Yu,Z.-Q.;Shi,X.-J.;Yan,L.;and Li,W.-J.2014.Distributed s- Johnson,R.,and Zhang.T.2013.Accelerating stochastic gradient tochastic ADMM for matrix factorization.In International Confer- descent using predictive variance reduction.In Neural Information ence on Conference on Information and Knowledge Management. Processing Systems. Zaharia,M.;Chowdhury,M.;Franklin,M.J.;Shenker,S.;and S- Konecny,J.;McMahan,B.;and Ramage,D.2015.Federated op- toica,I.2010.Spark:Cluster computing with working sets.In timization:Distributed optimization beyond the datacenter.arX- USENIX Workshop on Hot Topics in Cloud Computing. iv:1511.03575 Zhang,Y.,and Jordan,M.I.2015.Splash:User-friendly pro- Lee,J.D.:Lin,Q.;Ma,T.;and Yang,T.2016.Distributed s- gramming interface for parallelizing stochastic algorithms.CoRR tochastic variance reduced gradient methods and a lower bound for abs/1506.07552. communication complexity.arXiv:1507.07595v2. Zhang.R.,and Kwok,J.T.2014.Asynchronous distributed AD- Li,M.:Andersen.D.G.;Park,J.W.:Smola,A.J.:Ahmed.A.: MM for consensus optimization.In International Conference on Josifovski,V.;Long.J.;Shekita,E.J.;and Su,B.2014.Scaling Machine Learning distributed machine learning with the parameter server.In USENIX Zhang,S.;Choromanska,A.;and LeCun,Y.2015.Deep learn- Symposium on Operating Systems Design and Implementation. ing with elastic averaging SGD.In Neural Information Processing Lin,C.-Y.;Tsai,C.-H.;Lee,C.-P.;and Lin,C.-J.2014.Large-scale Systems. logistic regression and linear support vector machines using spark. Zhang.L.:Mahdavi,M.;and Jin,R.2013.Linear convergence with In IEEE International Conference on Big Data. condition number independent access of full gradients.In Neural Lin,Q.:Lu,Z.;and Xiao,L.2014.An accelerated proximal coor- Information Processing Systems. dinate gradient method.In Neural Information Processing Systems Zhang.Y;Wainwright,M.J.;and Duchi,J.C. 2012. Liu,J.;Wright,S.J.;Re,C.:Bittorf,V.;and Sridhar,S.2014.An Communication-efficient algorithms for statistical optimization.In asynchronous parallel stochastic coordinate descent algorithm.In Neural Information Processing Systems. International Conference on Machine Learning. Zhang,R.;Zheng,S.;and Kwok,J.T.2016.Asynchronous dis- Ma,C.;Smith,V.;Jaggi,M.;Jordan,M.I.;Richtarik,P.;and Takac, tributed semi-stochastic gradient optimization.In AAA/Conference M.2015.Adding vs.averaging in distributed primal-dual optimiza- on Artificial Intelligence. tion.In International Conference on Machine Learning. Zhao,S.-Y.,and Li,W.-J.2016.Fast asynchronous parallel s- Mania.H.:Pan.X.:Papailiopoulos.D.S.:Recht.B.:Ramchan- tochastic gradient descent:A lock-free approach with convergence dran,K.;and Jordan,M.I.2015.Perturbed iterate analysis for guarantee.In AAAI Conference on Artificial Intelligence. asynchronous stochastic optimization.ar Xiv:1507.06970. Zhao.T.:Yu.M.:Wang.Y.:Arora,R.:and Liu,H.2014.Acceler- Meng.X.;Bradley,J.;Yavuz,B.;Sparks,E.:Venkataraman,S.: ated mini-batch randomized block coordinate descent method.In Liu,D.:Freeman,J.:Tsai,D.:Amde,M.;Owen,S.:Xin.D.:X- Neural Information Processing Systems. in,R.:Franklin,M.J.:Zadeh,R.:Zaharia,M.:and Talwalkar.A. Zhao,S.-Y.;Xiang,R.;Shi,Y.-H.;Gao,P.;and Li,W.-J.2016. 2016.Mllib:Machine learning in apache spark.Journal of Ma- SCOPE:scalable composite optimization for learning on Spark. chine Learning Research 17(34):1-7. CoRR abs/1602.00133. Nitanda.A.2014.Stochastic proximal gradient descent with ac- Zinkevich,M.:Weimer,M.;Li.L.:and Smola,A.J.2010.Paral- celeration techniques.In Neural Information Processing Systems. lelized stochastic gradient descent.In Neural Information Process- Recht,B.;Re,C.;Wright,S.J.;and Niu,F.2011.Hogwild!:A ing Systems lock-free approach to parallelizing stochastic gradient descent.In Neural Information Processing Systems.Acknowledgements This work is partially supported by the “DengFeng” project of Nanjing University. References Bottou, L. 2010. Large-scale machine learning with stochastic gradient descent. In International Conference on Computational Statistics. De, S., and Goldstein, T. 2016. Efficient distributed SGD with variance reduction. In IEEE International Conference on Data Mining. Duchi, J. C.; Hazan, E.; and Singer, Y. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12:2121–2159. Hsieh, C.-J.; Yu, H.-F.; and Dhillon, I. S. 2015. Passcode: Parallel asynchronous stochastic dual co-ordinate descent. In International Conference on Machine Learning. J. Reddi, S.; Hefny, A.; Sra, S.; Poczos, B.; and Smola, A. J. 2015. On variance reduction in stochastic gradient descent and its asynchronous variants. In Neural Information Processing Systems. Jaggi, M.; Smith, V.; Takac, M.; Terhorst, J.; Krishnan, S.; Hofmann, T.; and Jordan, M. I. 2014. Communication-efficient distributed dual coordinate ascent. In Neural Information Processing Systems. Johnson, R., and Zhang, T. 2013. Accelerating stochastic gradient descent using predictive variance reduction. In Neural Information Processing Systems. Konecny, J.; McMahan, B.; and Ramage, D. 2015. Federated op- ´ timization: Distributed optimization beyond the datacenter. arXiv:1511.03575. Lee, J. D.; Lin, Q.; Ma, T.; and Yang, T. 2016. Distributed stochastic variance reduced gradient methods and a lower bound for communication complexity. arXiv:1507.07595v2. Li, M.; Andersen, D. G.; Park, J. W.; Smola, A. J.; Ahmed, A.; Josifovski, V.; Long, J.; Shekita, E. J.; and Su, B. 2014. Scaling distributed machine learning with the parameter server. In USENIX Symposium on Operating Systems Design and Implementation. Lin, C.-Y.; Tsai, C.-H.; Lee, C.-P.; and Lin, C.-J. 2014. Large-scale logistic regression and linear support vector machines using spark. In IEEE International Conference on Big Data. Lin, Q.; Lu, Z.; and Xiao, L. 2014. An accelerated proximal coordinate gradient method. In Neural Information Processing Systems. Liu, J.; Wright, S. J.; Re, C.; Bittorf, V.; and Sridhar, S. 2014. An ´ asynchronous parallel stochastic coordinate descent algorithm. In International Conference on Machine Learning. Ma, C.; Smith, V.; Jaggi, M.; Jordan, M. I.; Richtarik, P.; and Tak ´ ac, ´ M. 2015. Adding vs. averaging in distributed primal-dual optimization. In International Conference on Machine Learning. Mania, H.; Pan, X.; Papailiopoulos, D. S.; Recht, B.; Ramchandran, K.; and Jordan, M. I. 2015. Perturbed iterate analysis for asynchronous stochastic optimization. arXiv:1507.06970. Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.; Amde, M.; Owen, S.; Xin, D.; Xin, R.; Franklin, M. J.; Zadeh, R.; Zaharia, M.; and Talwalkar, A. 2016. Mllib: Machine learning in apache spark. Journal of Machine Learning Research 17(34):1–7. Nitanda, A. 2014. Stochastic proximal gradient descent with acceleration techniques. In Neural Information Processing Systems. Recht, B.; Re, C.; Wright, S. J.; and Niu, F. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Neural Information Processing Systems. Riguzzi, F.; Bellodi, E.; Zese, R.; Cota, G.; and Lamma, E. 2016. Scaling structure learning of probabilistic logic programs by mapreduce. In European Conference on Artificial Intelligence. Schmidt, M. W.; Roux, N. L.; and Bach, F. R. 2013. Minimizing finite sums with the stochastic average gradient. CoRR abs/1309.2388. Shalev-Shwartz, S., and Zhang, T. 2013. Stochastic dual coordinate ascent methods for regularized loss. Journal of Machine Learning Research 14(1):567–599. Shalev-Shwartz, S., and Zhang, T. 2014. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In International Conference on Machine Learning. Xiao, L. 2009. Dual averaging method for regularized stochastic learning and online optimization. In Neural Information Processing Systems. Xing, E. P.; Ho, Q.; Dai, W.; Kim, J. K.; Wei, J.; Lee, S.; Zheng, X.; Xie, P.; Kumar, A.; and Yu, Y. 2015. Petuum: A new platform for distributed machine learning on big data. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Yang, T. 2013. Trading computation for communication: Distributed stochastic dual coordinate ascent. In Neural Information Processing Systems. Yu, Z.-Q.; Shi, X.-J.; Yan, L.; and Li, W.-J. 2014. Distributed stochastic ADMM for matrix factorization. In International Conference on Conference on Information and Knowledge Management. Zaharia, M.; Chowdhury, M.; Franklin, M. J.; Shenker, S.; and Stoica, I. 2010. Spark: Cluster computing with working sets. In USENIX Workshop on Hot Topics in Cloud Computing. Zhang, Y., and Jordan, M. I. 2015. Splash: User-friendly programming interface for parallelizing stochastic algorithms. CoRR abs/1506.07552. Zhang, R., and Kwok, J. T. 2014. Asynchronous distributed ADMM for consensus optimization. In International Conference on Machine Learning. Zhang, S.; Choromanska, A.; and LeCun, Y. 2015. Deep learning with elastic averaging SGD. In Neural Information Processing Systems. Zhang, L.; Mahdavi, M.; and Jin, R. 2013. Linear convergence with condition number independent access of full gradients. In Neural Information Processing Systems. Zhang, Y.; Wainwright, M. J.; and Duchi, J. C. 2012. Communication-efficient algorithms for statistical optimization. In Neural Information Processing Systems. Zhang, R.; Zheng, S.; and Kwok, J. T. 2016. Asynchronous distributed semi-stochastic gradient optimization. In AAAI Conference on Artificial Intelligence. Zhao, S.-Y., and Li, W.-J. 2016. Fast asynchronous parallel stochastic gradient descent: A lock-free approach with convergence guarantee. In AAAI Conference on Artificial Intelligence. Zhao, T.; Yu, M.; Wang, Y.; Arora, R.; and Liu, H. 2014. Accelerated mini-batch randomized block coordinate descent method. In Neural Information Processing Systems. Zhao, S.-Y.; Xiang, R.; Shi, Y.-H.; Gao, P.; and Li, W.-J. 2016. SCOPE: scalable composite optimization for learning on Spark. CoRR abs/1602.00133. Zinkevich, M.; Weimer, M.; Li, L.; and Smola, A. J. 2010. Parallelized stochastic gradient descent. In Neural Information Processing Systems