Support Matrix Machines Harchaoui et _中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Support Matrix Machines

正在加载图片...

Support Matrix Machines Harchaoui et al..2012:Kang et al..2011).The cornerstone show that our proposed training algorithm for SMM is effi- of these methods is to use the nuclear norm of a matrix as cient. a convex alternative of the matrix rank.Since the nuclear The remainder of the paper is organized as follows.In Sec- norm is the best convex approximation of the matrix rank tion 2,we give the notation and preliminaries.In Section 3, over the unit ball of matrices.this makes it more tractable to solve the resulting optimization problem.Moreover,some we review our concerned problem.In Section 4 we present nice properties such as the consistency results of the nucle- our model and the learning algorithm.In Section 5,we con- duct experimental analysis to justify our methods.Finally. ar norm minimization have been studied by Bach(2008) we conclude our work in Section 6. There has been some work which applies the nuclear norm penalization with least square loss function to matrix re- gression problems(Hunyadi et al.,2012;Signoretto et al., 2.Notation and Preliminaries 2014).Recently,Zhou Li(2014)applied the nuclear nor- In this section we give the notation and preliminaries which m penalization to matrix regression problems based on gen- will be used in this paper.We let Ip denote the pxp identity eralized linear models (GLMs). matrix.For a vector a E RP,the Euclidean norm is denot- In this paper we propose a new model to address the matrix ed asa=V∑-1a.For a matrix A∈Rpxa of rank classification problem.Our model includes two principal r where r min(p,q),we let the condensed singular val- ingredients.First,we consider the hinge loss due to its ue decomposition (SVD)of A be A =UADAV where widely deployed ability in sparseness and robustness mod- UA∈Rpxr and VA∈Raxr satisfy UTUA=I,and eling.Second,we employ a spectral elastic net penalty for VVA=Ir,and∑A=diag(o1(A),…,o,(A)with the regression matrix.The spectral elastic net is a spec- o1(A)≥·≥o(A)>0.Obviously,the rank of A is tral extension of the conventional elastic net over a matrix. equal to the number of nonzero singular values of A.Ad- In parallel to the conventional elastic net (Zou Hastie, ditionally,.we let AlF=√∑jA号=V∑=1oa(AP 2005)which is the combination of the ridge penalty and lasso penalty,our spectral elastic net is the combination of be the Frobenius norm,IAll.==(A)be the nu- the squared Frobenius matrix norm and nuclear norm.We clear norm,and All2 =1(A)be the spectral norm. prove that the spectral elastic net enjoys the property of For any T 0,we let D[A]=UAS,[EA]V where grouping effect which is similar to the conventional elastic S[>]=diag([1(A)]+;..,[o(A)]+)and net,while keeping a low-rank representation.We show that [+max(2,0).In the literature (Candes Recht,2009; the regression matrix in our model is indeed a combination Cai et al.,2010),D [A]is called the singular value thresh- of a set of support matrices.We thus refer to our model as olding (SVT)operator. a support matrix machine (SMM). It is well known that the nuclear norm A,as a function The optimization problem for SMM is convex but the hinge from RPx4 to R,is not differentiable.Alternatively,one loss function is not smooth.Fortunately,we can resort to considers the subdifferential of All,which is the set of an alternating direction method of multipliers (ADMM)s- subgradients and denoted by lAl.It follows from the tudied in (Goldstein et al.,2012).Specifically,we develop literature (Candes Recht,2009;Lewis,2003;Watson, an iteration algorithm,which is mainly built on ADMM 1992)that for a p x g real matrix A of rank r, and a singular value thresholding(SVT)operator(Candes Recht.2009:Cai et al.,2010).The algorithm converges OAll.=UAVA+Z:ZE RPX9,UTZ =0. quickly and promises to get the global optimal solution.It is worth pointing out that the algorithm requires repeated- ZVA=0,IZl2≤1}() ly computing singular value decomposition(SVD)of ma- trices with the same size as the matrix of input features 3.Problem Formulation and Related Work However,when represented as a matrix,the size of an in- put matrix is usually not too large. In this paper we study a regularized matrix classifier.We are given a set of training samplesT=[i where Finally,we apply our SMM to EEG and image classifica- Xi E RPX9 is the ith input sample and yi E{-1,1}is its tion problems.We see that classification methods direct- corresponding class label.As we have seen,Xi is repre- ly working on data matrices outperform those on vectors sented in matrix form.To fit a classifier,a commonly used such as the conventional SVM.When data are contaminat- approach is to stack Xi into a vector.Let xi vec(X)= ed with non-Gaussian noise or outliers,our SMM has sig- (（XXhg[X21,[Xlg)T∈R四.The soft nificant improvements over baselines.This implies that S- margin SVM is defined as MM is robust and has potential applications in matrix clas- sification problems with noises.Moreover,the experiments 2w2w+C∑1-h(w7x+b+, (2) i=1Support Matrix Machines Harchaoui et al., 2012; Kang et al., 2011). The cornerstone of these methods is to use the nuclear norm of a matrix as a convex alternative of the matrix rank. Since the nuclear norm is the best convex approximation of the matrix rank over the unit ball of matrices, this makes it more tractable to solve the resulting optimization problem. Moreover, some nice properties such as the consistency results of the nuclear norm minimization have been studied by Bach (2008). There has been some work which applies the nuclear norm penalization with least square loss function to matrix regression problems (Hunyadi et al., 2012; Signoretto et al., 2014). Recently, Zhou & Li (2014) applied the nuclear norm penalization to matrix regression problems based on generalized linear models (GLMs). In this paper we propose a new model to address the matrix classification problem. Our model includes two principal ingredients. First, we consider the hinge loss due to its widely deployed ability in sparseness and robustness modeling. Second, we employ a spectral elastic net penalty for the regression matrix. The spectral elastic net is a spectral extension of the conventional elastic net over a matrix. In parallel to the conventional elastic net (Zou & Hastie, 2005) which is the combination of the ridge penalty and lasso penalty, our spectral elastic net is the combination of the squared Frobenius matrix norm and nuclear norm. We prove that the spectral elastic net enjoys the property of grouping effect which is similar to the conventional elastic net, while keeping a low-rank representation. We show that the regression matrix in our model is indeed a combination of a set of support matrices. We thus refer to our model as a support matrix machine (SMM). The optimization problem for SMM is convex but the hinge loss function is not smooth. Fortunately, we can resort to an alternating direction method of multipliers (ADMM) studied in (Goldstein et al., 2012). Specifically, we develop an iteration algorithm, which is mainly built on ADMM and a singular value thresholding (SVT) operator (Candes` & Recht, 2009; Cai et al., 2010). The algorithm converges quickly and promises to get the global optimal solution. It is worth pointing out that the algorithm requires repeatedly computing singular value decomposition (SVD) of matrices with the same size as the matrix of input features. However, when represented as a matrix, the size of an input matrix is usually not too large. Finally, we apply our SMM to EEG and image classification problems. We see that classification methods directly working on data matrices outperform those on vectors such as the conventional SVM. When data are contaminated with non-Gaussian noise or outliers, our SMM has significant improvements over baselines. This implies that SMM is robust and has potential applications in matrix classification problems with noises. Moreover, the experiments show that our proposed training algorithm for SMM is effi- cient. The remainder of the paper is organized as follows. In Section 2, we give the notation and preliminaries. In Section 3, we review our concerned problem. In Section 4 we present our model and the learning algorithm. In Section 5, we conduct experimental analysis to justify our methods. Finally, we conclude our work in Section 6. 2. Notation and Preliminaries In this section we give the notation and preliminaries which will be used in this paper. We let Ip denote the p×p identity matrix. For a vector a ∈ R p , the Euclidean norm is denoted as ||a|| = pPp i=1 a 2 i . For a matrix A ∈ R p×q of rank r where r ≤ min(p, q), we let the condensed singular value decomposition (SVD) of A be A = UAΣAVT A where UA ∈ R p×r and VA ∈ R q×r satisfy UT AUA = Ir and VT AVA = Ir, and ΣA = diag(σ1(A), · · · , σr(A)) with σ1(A) ≥ · · · ≥ σr(A) > 0. Obviously, the rank of A is equal to the number of nonzero singular values of A. Additionally, we let kAkF = qP i,j A2 ij = pPr i=1 σi(A) 2 be the Frobenius norm, kAk∗ = Pr i=1 σi(A) be the nuclear norm, and kAk2 = σ1(A) be the spectral norm. For any τ > 0, we let Dτ [A] = UASτ [ΣA]VT A where Sτ [Σ] = diag([σ1(A) − τ ]+, · · · , [σr(A) − τ ]+) and [z]+ = max(z, 0). In the literature (Candes & Recht ` , 2009; Cai et al., 2010), Dτ [A] is called the singular value thresholding (SVT) operator. It is well known that the nuclear norm kAk∗, as a function from R p×q to R, is not differentiable. Alternatively, one considers the subdifferential of kAk∗, which is the set of subgradients and denoted by ∂kAk∗. It follows from the literature (Candes & Recht ` , 2009; Lewis, 2003; Watson, 1992) that for a p × q real matrix A of rank r, ∂kAk∗ = n UAVT A + Z : Z ∈ R p×q , UT AZ = 0, ZVA = 0, kZk2 ≤ 1 o . (1) 3. Problem Formulation and Related Work In this paper we study a regularized matrix classifier. We are given a set of training samples T = Xi , yi} n i=1, where Xi ∈ R p×q is the ith input sample and yi ∈ {−1, 1} is its corresponding class label. As we have seen, Xi is represented in matrix form. To fit a classifier, a commonly used approach is to stack Xi into a vector. Let xi , vec(XT i ) = ([Xi ]11, . . . , [Xi ] 1q , [Xi ] 21, . . . , [Xi ]pq) T ∈ R pq. The soft margin SVM is defined as min w,b 1 2 wT w + C Xn i=1 [1 − yi(wT xi + b)]+, (2)

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Support Matrix Machines