Support Matrix Machines Then we can o_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Support Matrix Machines

正在加载图片...

Support Matrix Machines Then we can obtain the following bound based on Lemma Substituting (15)and(16)into (14)to eliminate Yi and 1 we obtain I[W -[w2 L(W,b,,a,Y） ≤lh-2I2 -jt(wTW)-t(ATW)+2lW-sl 1 三Iou.-a -∑a{tr(wrx)+-1. (17) 2=1 ≤2nc2(p-∑) Setting the derivative of L with respect to W to be 0,we k=1 ▣ have the optimal value (18) Appendix C:The Proof of Theorem 3 w=a+s+wX =1 Proof.Let Zo be U11V1.Recall that Uo.U1,Vo and Substituting (18)into(17),we obtain Vi are column orthogonal.So we have UZo =0 and L(W,b,ξ，a,Y) Zo Vo =0.By the SVD form of S,formulation(9)and using Eqn.(1)we have: -∑1- A+pSPX)a i=l p+1 aG1(S)ls=s-=A-pW +D(pW -A)+T llsll-Is=s- 1 Thus,we have 0 E 0G1(S*). ▣ 2(p+1) ∑a:ah5tr(XX)+D, i.j=1 Appendix D:The Proof of Theorem 4 where D= ()Thus., 1 2 Proof.We denote H(W,b)=H(W,b)-tr(ATW)+ finding the minimizer of H(W,b)is equivalent to solving W-Finding the minimizer of H(W,)is equiv- Problem(11)by KKT conditions.Let the optimal solution of (11)be a*,we can obtain (10)from(18)directly.The alent to solving the following problem: KKT conditions also provide r(wrw)+C∑s (13) a{{t[(W*)TX+b-1+}=0 i=1 Y2=0, -t(AV+1w-s服 which means for any 0<a<C,the corresponding> st.tr(WrX)+≥1- 0,=0 and yi{tr[(W*)TX]+*}-1 =0.Then the optimal b*can be calculated by Ei≥0. To solve problem (13).we construct the following La- b=贴-tr(W*)TXJ. grangian function In practice,we choose the optimal b*by averaging these solutions L(W,b,E,a,Y) =2rww)+c∑i-traW+w-sl呢 b 网-wyrx iES+ =1 0 -∑a,u(wrX)+-1+s-∑&.(14 -1 References Setting the derivative of L with respect to to be 0,we Bach,Francis R.Consistency of trace norm minimiza- have tion.The Journal of Machine Learning Research,9: %=C-ai20i=1,.,n. (15) 1019-1048.2008. Setting the derivative of L with respect to bbe 0,we have Cai,Deng,He,Xiaofei,Wen,Ji-Rong,Han,Jiawei,and Ma,Wei-Ying.Support tensor machines for text cat- ∑a=0, (16) egorization.Technical report,University of Illinois at = Urbana-Champaign,2006.Support Matrix Machines Then we can obtain the following bound based on Lemma 1 [W˜ ]:,l1 − [W˜ ]:,l2 2 ≤ [Ω]:,l1 − [Ω]:,l2 2 = Xp k=1 [Ω]kl1 − [Ω]kl2 2 ≤ 2nC2 p − Xp k=1 f T kl1 fkl2 . Appendix C: The Proof of Theorem 3 Proof. Let Z0 be 1 τ U1Σ1V1. Recall that U0, U1, V0 and V1 are column orthogonal. So we have UT 0 Z0 = 0 and Z0V0 = 0. By the SVD form of Sb, formulation (9) and using Eqn. (1) we have: ∂G1(S)|S=S∗ = Λ − ρW + Dτ (ρW − Λ) + τ ∂||S||∗|S=S∗ . Thus, we have 0 ∈ ∂G1(S ∗ ). Appendix D: The Proof of Theorem 4 Proof. We denote H1(W, b) = H(W, b) − tr(ΛTW) + ρ 2 ||W−S||2 F . Finding the minimizer of H1(W, b) is equivalent to solving the following problem: min W,b,ξ 1 2 tr(WTW) + C Xn i=1 ξi (13) −tr(Λ TW) + ρ 2 ||W − S||2 F s.t. yi [tr(WT Xi) + b] ≥ 1 − ξi ξi ≥ 0. To solve problem (13), we construct the following Lagrangian function L(W, b, ξ, α, γ) = 1 2 tr(WTW) + C Xn i=1 ξi − tr(Λ TW) + ρ 2 ||W − S||2 F − Xn i=1 αi{yi [tr(WT Xi) + b] − 1 + ξi} −Xn i=1 γiξi . (14) Setting the derivative of L with respect to ξ to be 0, we have γi = C − αi ≥ 0, i = 1, . . . , n. (15) Setting the derivative of L with respect to b be 0, we have Xn i=1 αiyi = 0. (16) Substituting (15) and (16) into (14) to eliminate γi and ξi , we obtain L(W, b, ξ, α, γ) = 1 2 tr(WTW) − tr(Λ TW) + ρ 2 ||W − S||2 F − Xn i=1 αi{yi [tr(WT Xi) + b] − 1}. (17) Setting the derivative of L with respect to W to be 0, we have the optimal value W∗ = 1 ρ + 1 Λ + ρS + Xn i=1 αiyiXi . (18) Substituting (18) into (17), we obtain L(W, b, ξ, α, γ) = Xn i=1 1 − yitr[(Λ + ρS) T Xi ] ρ + 1 αi − 1 2(ρ + 1) Xn i,j=1 αiαiyiyj tr(XT i Xj ) + D, where D = ρ 2 tr(S T S) − 1 2(ρ + 1)||Λ + ρS||2 F . Thus, finding the minimizer of H(W, b) is equivalent to solving Problem (11) by KKT conditions. Let the optimal solution of (11) be α∗ , we can obtain (10) from (18) directly. The KKT conditions also provide α ∗ i yi{tr[(W∗ ) T Xi ] + b ∗ } − 1 + ξ ∗ i = 0 γ ∗ i ξ ∗ i = 0, which means for any 0 < α∗ i < C, the corresponding γ ∗ i > 0, ξ∗ i = 0 and yi{tr[(W∗ ) T Xi ] + b ∗} − 1 = 0. Then the optimal b ∗ can be calculated by b ∗ = yi − tr[(W∗ ) T Xi ]. In practice, we choose the optimal b ∗ by averaging these solutions b ∗ = 1 |S∗| X i∈S∗ {yi − tr[(W∗ ) T Xi ]}. References Bach, Francis R. Consistency of trace norm minimization. The Journal of Machine Learning Research, 9: 1019–1048, 2008. Cai, Deng, He, Xiaofei, Wen, Ji-Rong, Han, Jiawei, and Ma, Wei-Ying. Support tensor machines for text categorization. Technical report, University of Illinois at Urbana-Champaign, 2006

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Support Matrix Machines