Updating one of them needs the other’_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Stochastic ADMM for Matrix Factorization

正在加载图片...

Updating one of them needs the other's latest value,which where makes the problem hardly separable.To decouple U and V we keep a local item latent matrix for all items in each node, which is denoted as VP.Please note that VP is not a sub- y,V,O)=∑1P(vP,V,) block of V,but it has the same size with V.We also have a global item latent matrix which is denoted as V.Because P(VP,7,ΘP)= lv"-V+tr(e""(v -v))] only the local VP couples with U,we can independently up- date U and VP for each node.This split strategy can make Here,p is a hyper-parameter and O={OP=denotes the the MF problem fit for the distributed ADMM framework. Lagrangian multiplier. which will be introduced in the following subsection. If we define Our split strategy is called LocalMFSplit,which is briefly summarized in Algorithm 1.Note that the size of VP is LP(U,VP,ΘP,)=fP(U,VP)+P(VP,7,Θ) k x n,but that of UP is k x mp with mp being the number of columns (about assigned to node p. = ∑i(U,V） (i.j)ERP Algorithm 1 LocalMFSplit +Iv-VIB+(ePvP-V列 1:Input:R,P 2:for i=1:m do we can get 3: Generate a random number p from 1,2,..,Pl,and distribute row i of R to node p. L(U,V,O,V)=>LP(U,VP,OP,V). 4:end for 5:for p=1:P parallel do p=1 6:Allocate memory for Ur,VP andV The ADMM will solve this problem by repeating the fol- 7:end for lowing steps: U+,V+1←ainU,V",o,V,peL,2…Py 3.2 Distributed ADMM (10a) Based on our split strategy LocalMFSplit,the MF problem V+1←argmin L(U+1,e+1,O,可)， (10b) in (1)can be reformulated as follows: Θ+1←Θ+p(V8+1-V+1),p∈{1,2，，P (RJ -UTVE,)2 (10c) p=1(ij)EOP It is easy to see that U,VP and OP can be locally updated on each node.Because the whole MF problem has been +AUTU.+A[VE,]TVE, (6) divided into P sub-problems which can be solved in parallel. our method is actually a distributed ADMM framework. st.:Vp-V=0:p∈{1,2，，Py 3.3 Stochastic Learning for Distributed ADMM where v={VP)P1,denotes the (i,j)indices of the To learn the parameters in (6),we just need to find the ratings located in node p.Note that here we omit the p in solutions in (10a),(10b)and (10c).After getting the op- UP for simplicity.It is not hard to determine whether U refers to the whole latent matrix or a sub-block UP located timal U+and [V,it is easy to solve the problem in (10b).More specifically,if we set =0,we can prove in node p from the context. If we define that =0.Hence,the update rule for V is: P 1 f(U,)=∑P(U,VP) (7) V+1=p∑V+ (11) p=】 P=i where The problem in(10c)directly shows the update rule,which can be computed locally and efficiently.Therefore,the key fP(U,VP)= ∑i(U,V) (8) learning part lies in how to efficiently solve the problem (i,j)EOp in (10a).In the following content of this subsection,we first (.V)(-v design a batch learning algorithm for the problem in(10a), and then a stochastic learning algorithm inspired by the batch learning is also designed to further improve the ef- +UTU..+Aa[V,lV ficiency. 3.3.1 Batch Learning we can transform the constrained problem in (6)to an un With oP and V:fixed,(10a)is an MF problem.How- constrained problem with augmented Lagrangian method. ever,we can not easily get the solution because U and VP and get the following objective function: are coupled together and the objective function of the MF problem is non-convex.To get an efficient solution,we use L(U,y,O,)=f(U,)+1y,7,O), (9) a technique similar to that in [12 to construct a surrogateUpdating one of them needs the other’s latest value, which makes the problem hardly separable. To decouple U and V, we keep a local item latent matrix for all items in each node, which is denoted as Vp . Please note that Vp is not a subblock of V, but it has the same size with V. We also have a global item latent matrix which is denoted as V. Because only the local Vp couples with U, we can independently update U and Vp for each node. This split strategy can make the MF problem fit for the distributed ADMM framework, which will be introduced in the following subsection. Our split strategy is called LocalMFSplit, which is briefly summarized in Algorithm 1. Note that the size of Vp is k × n, but that of Up is k × mp with mp being the number of columns (about m P ) assigned to node p. Algorithm 1 LocalMFSplit 1: Input: R, P 2: for i = 1 : m do 3: Generate a random number p from {1, 2, · · · , P}, and distribute row i of R to node p. 4: end for 5: for p = 1 : P parallel do 6: Allocate memory for Up , Vp and V 7: end for 3.2 Distributed ADMM Based on our split strategy LocalMFSplit, the MF problem in (1) can be reformulated as follows: min U,V,V 1 2 XP p=1 X (i,j)∈Ωp (Ri,j − U T ∗iV p ∗j ) 2 + λ1U T ∗iU∗i + λ2[V p ∗j ] T V p ∗j (6) s.t. : V p − V = 0; ∀p ∈ {1, 2, ..., P} where V = {Vp } P p=1, Ωp denotes the (i, j) indices of the ratings located in node p. Note that here we omit the p in Up for simplicity. It is not hard to determine whether U refers to the whole latent matrix or a sub-block Up located in node p from the context. If we define f(U, V) = XP p=1 f p (U, V p ), (7) where f p (U, V p ) = X (i,j)∈Ωp ˆfi,j (U∗i, V p ∗j ), (8) ˆfi,j (U∗i, V p ∗j ) = 1 2 (Ri,j − U T ∗iV p ∗j ) 2 + λ1U T ∗iU∗i + λ2[V p ∗j ] T V p ∗j , we can transform the constrained problem in (6) to an unconstrained problem with augmented Lagrangian method, and get the following objective function: L(U, V, O, V) = f(U, V) + l(V, V, O), (9) where l(V, V, O) = XP p=1 l p (V p , V, Θ p ), l p (V p , V, Θ p ) = ρ 2 kV p − Vk 2 F + tr [Θ p ] T (V p − V) . Here, ρ is a hyper-parameter and O = {Θp } P p=1 denotes the Lagrangian multiplier. If we define L p (U, V p , Θ p , V) = f p (U, V p ) + l p (V p , V, Θ p ) = X (i,j)∈Ωp ˆfi,j (U∗i, V p ∗j ) + ρ 2 kV p − Vk 2 F + tr [Θ p ] T (V p − V) , we can get L(U, V, O, V) = XP p=1 L p (U, V p , Θ p , V). The ADMM will solve this problem by repeating the following steps: Ut+1, V p t+1 ← argmin U,Vp L p (U, V p , Θ p t , Vt), ∀p ∈ {1, 2, ..., P} (10a) Vt+1 ← argmin V L(Ut+1, Vt+1, Ot, V), (10b) Θ p t+1 ← Θ p t + ρ(V p t+1 − Vt+1), ∀p ∈ {1, 2, ..., P}. (10c) It is easy to see that U, Vp and Θp can be locally updated on each node. Because the whole MF problem has been divided into P sub-problems which can be solved in parallel, our method is actually a distributed ADMM framework. 3.3 Stochastic Learning for Distributed ADMM To learn the parameters in (6), we just need to find the solutions in (10a), (10b) and (10c). After getting the optimal Ut+1 and {V p t+1}, it is easy to solve the problem in (10b). More specifically, if we set Θ p 0 = 0, we can prove that PP p=1 Θ p t = 0. Hence, the update rule for V is: Vt+1 = 1 P XP p=1 V p t+1. (11) The problem in (10c) directly shows the update rule, which can be computed locally and efficiently. Therefore, the key learning part lies in how to efficiently solve the problem in (10a). In the following content of this subsection, we first design a batch learning algorithm for the problem in (10a), and then a stochastic learning algorithm inspired by the batch learning is also designed to further improve the ef- ficiency. 3.3.1 Batch Learning With Θ p t and Vt fixed, (10a) is an MF problem. However, we can not easily get the solution because U and Vp are coupled together and the objective function of the MF problem is non-convex. To get an efficient solution, we use a technique similar to that in [12] to construct a surrogate

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Stochastic ADMM for Matrix Factorization