正在加载图片...
Support Matrix Machines Table 2.The classification accuracy on four data sets (in % Data sets L-SVM B-SVM R-GLM SMM EEG alcoholism 71.11(±8.30) 71.67(±7.83) 71.39(±6.55 73.33(±5.89) EEG emotion 88.76(±1.16 87.73(±1.18) 82.26(±1.65) 90.01(±0.98) students face 91.67(±1.57) 95.42(±1.72) 94.25(±2.76) 96.83(±1.66 INRIA person 84.88(±1.98) 85.09(±1.46) 84.65(±1.38) 85.95(±0.77) Table 3.The training time on the four data sets (in second) Data sets B-SVM R-GLM SMM EEG alcoholism 86.30(±163.73) 407.59(±100.93) 1.36(±0.09) EEG emotion 292.89(±248.47) 33.32(±3.38) 6.57(±6.73) students face 23.88(±10.53 121.14(±87.40) 7.20(±0.22) INRIA person 19.36(±9.23) 580.06(±229.14) 6.61(±2.44) We are also interested in the computational efficiency of 61472182)and the Fundamental Research Funds for the the three matrix classification models:B-SVM,R-GLM Central Universities (No.20620140510). and SMM.We report the training time on the four data sets in Table 3.Recall that R-GLM is solved by the Nes- Appendix A:The Proof of Lemma 1 terov method (Zhou Li,2014).We can find that R-GLM is the slowest method on EEG alcoholism.students face Proof.Let B=[By,...Bnyn],then we have and INRIA person.This is because the main step of the Nesterov method is affected by the dimension of the in- (kl-k22)2 put sample (Zhou Li,2014).However,the main step of B-SVM and SMM is a quadratic programming problem whose time complexity is mainly affected by the number of training samples.So B-SVM and SMM are more efficient than R-GLM on the data sets with high-dimension samples [aT(Eh-f22)]2 Furthermore,we find that the running time of B-SVM are unstable on different data sets,usually with higher vari- ≤ l3121lE4,-f222 ance than that of SMM.The reason might be that B-SVM is a non-convex problem,the training procedure of which ≤ 2nC2(1-fa). relies heavily on the initial value of the parameter. ▣ Appendix B:The Proof of Theorem 2 6.Conclusion Proof.Suppose has condensed SVD UXVT, In this paper we have proposed a novel matrix classifica- where U∈Rpxr,∑=diag(o1,.,og)andV∈Rgxr tion method called support matrix machine (SMM).SMM satisfy UUT Ip and vVT =Ig.Denote U can leverage the structure of the data matrices and has the [u1,...,ur],where for k=1,...,r,uk is the kth column grouping effect property.We have derived an iteration al- of U.Since the columns of U are orthogonal,we have gorithm based on ADMM for learning,and applied our method to EEG and image classification with better per- lw]k4-w],2 formance than the baselines such as B-SVM and R-GLM. lh-【, Specifically,our method is more robust than B-SVM and R-GLM to model noisy data.Furthermore,our method is ‖∑=1(ak-T)+(V]k-[V]2k)u2 more efficient than B-SVM and R-GLM,and more numer- ically stable than B-SVM. ‖∑g=1ok(V],k-[V2k)u2 =1[(ok-T)+]2(V]k-[V]2k)2llu2 7.Acknowledgement Xk=102([V]k-[V]tak)2llukl2 Luo Luo and Zhihua Zhang are supported by the Natu- ral Science Foundation of Shanghai City of China(No. ∑1los-r+IPVk-IVYs'≤1. 15ZR1424200).Wu-Jun Li is supported by the NSFC (No. ∑%=1o2(V]k-[V]2k)尸Support Matrix Machines Table 2. The classification accuracy on four data sets (in %) Data sets L-SVM B-SVM R-GLM SMM EEG alcoholism 71.11 (± 8.30) 71.67 (± 7.83) 71.39 (± 6.55) 73.33 (± 5.89) EEG emotion 88.76 (± 1.16) 87.73 (± 1.18) 82.26 (± 1.65) 90.01 (± 0.98) students face 91.67 (± 1.57) 95.42 (± 1.72) 94.25 (± 2.76) 96.83 (± 1.66) INRIA person 84.88 (± 1.98) 85.09 (± 1.46) 84.65 (± 1.38) 85.95 (± 0.77) Table 3. The training time on the four data sets (in second) Data sets B-SVM R-GLM SMM EEG alcoholism 86.30 (± 163.73) 407.59 (± 100.93) 1.36 (± 0.09) EEG emotion 292.89 (± 248.47) 33.32 (± 3.38) 6.57 (± 6.73) students face 23.88 (± 10.53) 121.14 (± 87.40) 7.20 (± 0.22) INRIA person 19.36 (± 9.23) 580.06 (± 229.14) 6.61 (± 2.44) We are also interested in the computational efficiency of the three matrix classification models: B-SVM, R-GLM and SMM. We report the training time on the four data sets in Table 3. Recall that R-GLM is solved by the Nes￾terov method (Zhou & Li, 2014). We can find that R-GLM is the slowest method on EEG alcoholism, students face and INRIA person. This is because the main step of the Nesterov method is affected by the dimension of the in￾put sample (Zhou & Li, 2014). However, the main step of B-SVM and SMM is a quadratic programming problem whose time complexity is mainly affected by the number of training samples. So B-SVM and SMM are more efficient than R-GLM on the data sets with high-dimension samples. Furthermore, we find that the running time of B-SVM are unstable on different data sets, usually with higher vari￾ance than that of SMM. The reason might be that B-SVM is a non-convex problem, the training procedure of which relies heavily on the initial value of the parameter. 6. Conclusion In this paper we have proposed a novel matrix classifica￾tion method called support matrix machine (SMM). SMM can leverage the structure of the data matrices and has the grouping effect property. We have derived an iteration al￾gorithm based on ADMM for learning, and applied our method to EEG and image classification with better per￾formance than the baselines such as B-SVM and R-GLM. Specifically, our method is more robust than B-SVM and R-GLM to model noisy data. Furthermore, our method is more efficient than B-SVM and R-GLM, and more numer￾ically stable than B-SVM. 7. Acknowledgement Luo Luo and Zhihua Zhang are supported by the Natu￾ral Science Foundation of Shanghai City of China (No. 15ZR1424200). Wu-Jun Li is supported by the NSFC (No. 61472182) and the Fundamental Research Funds for the Central Universities (No. 20620140510). Appendix A: The Proof of Lemma 1 Proof. Let β˜0 = [β˜ 1y1, . . . , β˜ nyn] T , then we have ([Ω]k1l1 − [Ω]k2l2 ) 2 = Xn i=1 β˜ iyi [Xi ]k1l1 − Xn i=1 β˜ iyi [Xi ]k2l2 2 = [β˜0T (fk1l1 − fk2l2 )]2 ≤ ||β˜0 ||2 ||fk1l1 − fk2l2 ||2 ≤ 2nC2 (1 − f T k1l1 fk2l2 ). Appendix B: The Proof of Theorem 2 Proof. Suppose Ω has condensed SVD Ω = UΣVT , where U ∈ R p×r , Σ = diag(σ1, . . . , σq) and V ∈ R q×r satisfy UUT = Ip and VVT = Iq. Denote U = [u1, . . . , ur], where for k = 1, . . . , r, uk is the kth column of U. Since the columns of U are orthogonal, we have [W˜ ]:,l1 − [W˜ ]:,l1 2 [Ω]:,l1 − [Ω]:,l1 2 = Pq k=1(σk − τ )+([V]l1k − [V]l2k)uk 2 Pq k=1 σk([V]l1k − [V]l2k)uk 2 = Pq k=1[(σk − τ )+] 2 ([V]l1k − [V]l2k) 2 ||uk||2 Pq k=1 σ 2 k ([V]l1k − [V]l2k) 2||uk||2 = Pq k=1[(σk − τ )+] 2 ([V]l1k − [V]l2k) 2 Pq k=1 σ 2 k ([V]l1k − [V]l2k) 2 ≤ 1
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有