正在加载图片...
Support Matrix Machines Algorithm 1 ADMM for SMM 5.Experiments Initialize S(-1)=S(0)E RPX4,A(-1)=(0)E RPX9,p> 0,t)=1,n∈(0,1) In this section we conduct the experimental analysis of our for =1,2,3...do proposed SMM.We first analyze the group effect proper- (W(k),b())=argmin H(W,6)-tr(A(k)TW)+ ty of SMM.Then we study the classification performance (W,b) on synthetic and real-world data sets.All experiments are w-sI恨 implemented in Matlab R2011b on a workstation with Intel s)=argmin G(S)+r(Ars)+号Iw因-s到房 Xeon CPU X5675 3.06GHz(2 x 12 cores).24GB RAM. and 64bit Windows Server 2008 system. A(=)-pW)-S) c(k)=p-lA(k)(+pllS(e)-(k) 5.1.Group Effect Property on Synthetic Data if c(k)<nc(k-1)then tk+1)=1+V1+4)2 To intuitively demonstrate the grouping effect property de- s+)=s肉+品(S因-Sk-) scribed in Theorem 2.we design a simulated experimen- t to visualize it.We generate a synthetic data set of n +)=A内)+号(4--) samples as follows.First,we generate V orthogonal n- else k+1)=1 dimensional basis vectors v,v2,...,vv with the unit Eu- k+1)=sk-1) clidean length,respectively.Second,we construct pq fea- 人k+)=Ak-1) ture vectors of length n by the following process: e)=)1ck-1) end if fkl v[i/(0.2q)1+Ekl, end for ∈1aN0,1n, for k 1,...,p.and I 1,...,q.Here [l/(0.2g)]de- Here K=[Kl∈Rnxn and q∈R"are independent of notes the smallest integer no smaller than 1/(0.2g).In specifically, other words,the elements in each sample matrix Xi(i= 1,...,n)can be roughly partitioned into four groups(ver- K tr(XTX)】 p+1 tical blocks).The features within the same group have high correlation,while the features between different groups 9=1-U:tr[(A+pS)Tx;] have low correlation.Then we generate a p x q matrix W p+1 of rank 0.2g,and the label for each sample is generated by y=sign[tr(WTXi)].We set n 1000,V=4,p =80, By Theorem 4,updating W(k)and b(k)can be done by g=100 and 6=10-3 in this simulation. solving Problem (11).Several methods can be used,such as the sequential minimization optimization algorithm. The values of the regression matrix obtained respectively (Platt et al.,1998;Keerthi Gilbert,2002) from the bilinear SVM(B-SVM)(Pirsiavash et al.,2009), regularized GLM(R-GLM)(Zhou Li,2014)and SMM Theorem 5.Suppose the optimal solution of Problem (7) are shown in Figure 1.It is easy to see that the regression is (W,b,S).Then matrix of SMM is clearly partitioned into four pure color W=S=i+∑X (12) blocks,while the blocks of B-SVM have higher noise.R- 64>0 GLM fails to obtain the groups structure totally.The sim- ulation results show that SMM has a better grouping effect Theorem 5 can be obtained directly by Algorithm 1 through property than other baselines,which also implies that SM- Eqn.(10).Ifi>0,we call the corresponding Xi a sup- M is able to capture the structure information in the feature port matrix.Theorem 5 shows that the solution W of our matrices. SMM can be written as the linear combination of support matrices plus an offset.This is the reason that we call our 5.2.Classification Accuracy on Synthetic Data model the support matrix machine. We now conduct the performance of SMM on synthetic da- Since the hinge loss and nuclear norm are weakly convex, ta sets.We use the same data generation process as in the the convergence property of Algorithm 1 can be proved im- previous subsection,but we set V 10 and 6=10-2 to mediately based on the result in(Goldstein et al.,2012:He obtain more complicated data.We use 1000 samples for Yuan,2012).That is,we have training,and other 500 samples for testing.All the hyper- Theorem 6.For any p >0 and n E (0,1),the iteration parameters involved are selected via cross validation. sequence given by Algorithm I converges to the optimal IThe code is available in http://bcmi.s jtu.edu.cn/ solution of Problem(7). ~luoluo/code/smm.zipSupport Matrix Machines Algorithm 1 ADMM for SMM Initialize S (−1) = Sb(0) ∈ R p×q , Λ (−1) = Λb(0) ∈ R p×q , ρ > 0, t(1) = 1, η ∈ (0, 1). for k = 1, 2, 3 . . . do (W(k) , b(k) ) = argmin (W,b) H(W, b) − tr(Λb(k)TW) + ρ 2 ||W−Sb(k) ||2 F S (k) = argmin S G(S) + tr(Λb(k)T S) + ρ 2 ||W(k) − S||2 F Λ (k) = Λb(k) − ρ(W(k) − S (k) ) c (k) = ρ −1 ||Λ (k) − Λb(k) ||2 F + ρ||S (k) − Sb(k) ||2 F if c (k) < ηc(k−1) then t (k+1) = 1+√ 1+4t (k)2 2 Sb(k+1) = S (k) + t (k)−1 t (k+1) (S (k) − S (k−1)) Λb(k+1) = Λ (k) + t (k)−1 t (k+1) (Λ (k) − Λ (k−1)) else t (k+1) = 1 Sb(k+1) = S (k−1) Λb(k+1) = Λ (k−1) c (k) = η −1 c (k−1) end if end for Here K = [Kij ] ∈ R n×n and q ∈ R n are independent of α; specifically, Kij = yiyj tr(XT i Xj ) ρ + 1 , qi = 1 − yitr[(Λ + ρS) T Xi ] ρ + 1 . By Theorem 4, updating W(k) and b (k) can be done by solving Problem (11). Several methods can be used, such as the sequential minimization optimization algorithm. (Platt et al., 1998; Keerthi & Gilbert, 2002) Theorem 5. Suppose the optimal solution of Problem (7) is (W˜ , ˜b, S˜). Then W˜ = S˜ = Λ˜ + X α˜i>0 α˜iyiXi . (12) Theorem 5 can be obtained directly by Algorithm 1 through Eqn. (10). If α˜i > 0, we call the corresponding Xi a sup￾port matrix. Theorem 5 shows that the solution W˜ of our SMM can be written as the linear combination of support matrices plus an offset. This is the reason that we call our model the support matrix machine. Since the hinge loss and nuclear norm are weakly convex, the convergence property of Algorithm 1 can be proved im￾mediately based on the result in (Goldstein et al., 2012; He & Yuan, 2012). That is, we have Theorem 6. For any ρ > 0 and η ∈ (0, 1), the iteration sequence given by Algorithm 1 converges to the optimal solution of Problem (7). 5. Experiments In this section we conduct the experimental analysis of our proposed SMM 1 . We first analyze the group effect proper￾ty of SMM. Then we study the classification performance on synthetic and real-world data sets. All experiments are implemented in Matlab R2011b on a workstation with Intel Xeon CPU X5675 3.06GHz (2 × 12 cores), 24GB RAM, and 64bit Windows Server 2008 system. 5.1. Group Effect Property on Synthetic Data To intuitively demonstrate the grouping effect property de￾scribed in Theorem 2, we design a simulated experimen￾t to visualize it. We generate a synthetic data set of n samples as follows. First, we generate V orthogonal n￾dimensional basis vectors ν1, ν2, · · · , νV with the unit Eu￾clidean length, respectively. Second, we construct pq fea￾ture vectors of length n by the following process: ˜fkl = νdl/(0.2q)e + kl, kl i.i.d ∼ N (0, δ2 In), for k = 1, . . . , p, and l = 1, . . . , q. Here dl/(0.2q)e de￾notes the smallest integer no smaller than l/(0.2q). In other words, the elements in each sample matrix Xi (i = 1, . . . , n) can be roughly partitioned into four groups (ver￾tical blocks). The features within the same group have high correlation, while the features between different groups have low correlation. Then we generate a p × q matrix W of rank 0.2q, and the label for each sample is generated by yi = sign[tr(WT Xi)]. We set n = 1000, V = 4, p = 80, q = 100 and δ = 10−3 in this simulation. The values of the regression matrix obtained respectively from the bilinear SVM (B-SVM) (Pirsiavash et al., 2009), regularized GLM (R-GLM) (Zhou & Li, 2014) and SMM are shown in Figure 1. It is easy to see that the regression matrix of SMM is clearly partitioned into four pure color blocks, while the blocks of B-SVM have higher noise. R￾GLM fails to obtain the groups structure totally. The sim￾ulation results show that SMM has a better grouping effect property than other baselines, which also implies that SM￾M is able to capture the structure information in the feature matrices. 5.2. Classification Accuracy on Synthetic Data We now conduct the performance of SMM on synthetic da￾ta sets. We use the same data generation process as in the previous subsection, but we set V = 10 and δ = 10−2 to obtain more complicated data. We use 1000 samples for training, and other 500 samples for testing. All the hyper￾parameters involved are selected via cross validation. 1The code is available in http://bcmi.sjtu.edu.cn/ ˜luoluo/code/smm.zip
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有