正在加载图片...
Distributed Stochastic ADMM for Matrix Factorization Zhi-Qin Yu Xing-Jian Shi Ling Yan Shanghai Key Laboratory of Department of Computer Shanghai Key Laboratory of Scalable Computing and Science and Engineering Scalable Computing and Systems Hong Kong University of Systems Department of Computer Science and Technology, Department of Computer Science and Engineering China Science and Engineering Shanghai Jiao Tong University, xshiab@connect.ust.hk Shanghai Jiao Tong University, China China yzqfqyd@sjtu.edu.cn yling0718@sjtu.edu.cn Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology Nanjing University,China liwujun@nju.edu.cn ABSTRACT models for recommender systems due to their promising per- Matrix factorization (MF)has become the most popular formance 8.In this big data era,more and more large-scale technique for recommender systems due to its promising data sets have emerged in many real-world recommender performance. Recently,distributed (parallel)MF models systems.Hence,parallel or distributed MF models with have received much attention from researchers of big da- the potential of high scalability have recently captured much ta community.In this paper,we propose a novel model, attention from researchers. called distributed stochastic alternating direction methods The basic idea of MF is to use the multiplication of two of multipliers (DS-ADMM),for large-scale MF problems latent matrices,the user matrix and the item matrix,to ap- DS-ADMM is a distributed stochastic variant of ADMM.In proximate the original rating matrix.Least square method particular,we first devise a new data split strategy to make is usually used to find a solution.In recent years,several the distributed MF problem fit for the ADMM framework. parallel models have been proposed for MF.These existing Then.a stochastic ADMM scheme is designed for learning models can be roughly divided into two main categories:al- Finally,we implement DS-ADMM based on message passing ternating least square (ALS)[23]based models and stochas- interface (MPD),which can run on clusters with multiple ma- tic gradient descent (SGD)based models. chines (nodes).Experiments on several data sets from rec- ALS [23]adopts the alternating learning strategy to up- ommendation applications show that our DS-ADMM model date one matrix with the other one fixed.With one of the can outperform other state-of-the-art distributed MF mod- matrices fixed,the optimization problem of MF can be re- els in terms of both efficiency and accuracy. duced to a least square problem on the other matrix,which can be further decomposed into several independent least Keywords square problems on the latent feature vector of each user or item.Hence,it is easy to design parallel strategies for ALS. Matrix Factorization,Recommender Systems,ADMM,Dis- which has been implemented in [23.However,the time tributed Computing,Stochastic Learning complexity for each iteration in ALS is cubic in k,where k is the number of latent features for each user or item. 1. INTRODUCTION The cyclic coordinate descent (CCD)method [13]adopt- Recommender systems try to recommend products (item- s coordinate descent strategy to improve the ALS method s)to customers (users)by utilizing the customers'historic by decreasing the time complexity for each iteration to be preferences.Matrix factorization (MF)[8]and its exten- linear in k.The CCD++21]further improves the efficien- sions [9,22,16,14,10,18 have become the most popular cy of CCD by using similar coordinate descent strategy but different updating sequence of the variables.Because both CCD and CCD++are based on ALS,they can also be easily Permission to make digital or hard copies of all or part of this work for personal or parallelized [21] classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita tion on the hrst page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or re- 1In existing literatures,distributed models refer to those im- publish,to post on servers or to redistribute to lists,requires prior specific permission plemented on clusters with multiple machines(nodes),while and/or a fee.Request permissions from permissions@acm.org. parallel models refer to those implemented either on multi- CM4.November 3-7.2014.Shanghai China core systems with a single node or on clusters.We will alsc Copyright2014ACM978-1-4503-2598-1/14/11$15.00. follow this tradition in this paper.The specific meaning of http:/k.doi.org/10.1145/2661829.2661996. parallel can be determined from the context in the paper.Distributed Stochastic ADMM for Matrix Factorization Zhi-Qin Yu Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering Shanghai Jiao Tong University, China yzqfqyd@sjtu.edu.cn Xing-Jian Shi Department of Computer Science and Engineering Hong Kong University of Science and Technology, China xshiab@connect.ust.hk Ling Yan Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering Shanghai Jiao Tong University, China yling0718@sjtu.edu.cn Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology Nanjing University, China liwujun@nju.edu.cn ABSTRACT Matrix factorization (MF) has become the most popular technique for recommender systems due to its promising performance. Recently, distributed (parallel) MF models have received much attention from researchers of big da￾ta community. In this paper, we propose a novel model, called distributed stochastic alternating direction methods of multipliers (DS-ADMM), for large-scale MF problems. DS-ADMM is a distributed stochastic variant of ADMM. In particular, we first devise a new data split strategy to make the distributed MF problem fit for the ADMM framework. Then, a stochastic ADMM scheme is designed for learning. Finally, we implement DS-ADMM based on message passing interface (MPI), which can run on clusters with multiple ma￾chines (nodes). Experiments on several data sets from rec￾ommendation applications show that our DS-ADMM model can outperform other state-of-the-art distributed MF mod￾els in terms of both efficiency and accuracy. Keywords Matrix Factorization, Recommender Systems, ADMM, Dis￾tributed Computing, Stochastic Learning 1. INTRODUCTION Recommender systems try to recommend products (item￾s) to customers (users) by utilizing the customers’ historic preferences. Matrix factorization (MF) [8] and its exten￾sions [9, 22, 16, 14, 10, 18] have become the most popular Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita￾tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re￾publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. CIKM’14, November 3–7, 2014, Shanghai, China. Copyright 2014 ACM 978-1-4503-2598-1/14/11 ...$15.00. http://dx.doi.org/10.1145/2661829.2661996 . models for recommender systems due to their promising per￾formance [8]. In this big data era, more and more large-scale data sets have emerged in many real-world recommender systems. Hence, parallel or distributed1 MF models with the potential of high scalability have recently captured much attention from researchers. The basic idea of MF is to use the multiplication of two latent matrices, the user matrix and the item matrix, to ap￾proximate the original rating matrix. Least square method is usually used to find a solution. In recent years, several parallel models have been proposed for MF. These existing models can be roughly divided into two main categories: al￾ternating least square (ALS) [23] based models and stochas￾tic gradient descent (SGD) based models. ALS [23] adopts the alternating learning strategy to up￾date one matrix with the other one fixed. With one of the matrices fixed, the optimization problem of MF can be re￾duced to a least square problem on the other matrix, which can be further decomposed into several independent least square problems on the latent feature vector of each user or item. Hence, it is easy to design parallel strategies for ALS, which has been implemented in [23]. However, the time complexity for each iteration in ALS is cubic in k, where k is the number of latent features for each user or item. The cyclic coordinate descent (CCD) method [13] adopt￾s coordinate descent strategy to improve the ALS method by decreasing the time complexity for each iteration to be linear in k. The CCD++ [21] further improves the efficien￾cy of CCD by using similar coordinate descent strategy but different updating sequence of the variables. Because both CCD and CCD++ are based on ALS, they can also be easily parallelized [21]. 1 In existing literatures, distributed models refer to those im￾plemented on clusters with multiple machines (nodes), while parallel models refer to those implemented either on multi￾core systems with a single node or on clusters. We will also follow this tradition in this paper. The specific meaning of parallel can be determined from the context in the paper
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有