Z. Zhang et al. / Artificial Intellig_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Multicategory large margin classification methods - hinge losses vs. coherence functions.

正在加载图片...

Z Zhang et aL Artificial Intelligence 215(2014)55-78 65 logit loss.Thus,it is applicable to the construction of multiclass large margin classification methods.This motivates us to devise multiclass large margin classification methods based on Cr(g(x).c). 5.Applications of the multiclass C-loss in classification problems In this section,we develop a multiclass large margin classifier and a multiclass boosting algorithm.Recall that we let Cr(g(x).c)denote Cr.1(g(x).c):that is,we always set u=1 here and later. 5.1.The multiclass C learning Using the multiclass C-loss Cr(g(x),c).we now construct margin-based classifiers that we refer to as multiclass C learning (MCL).We first consider the linear case and then turn to the kernelized case.In the linear case,where gj(x)= aj+x'bj.we pose the following optimization problem: m2∑Ib2+∑cgx.) a.b j=1 m m s.t∑ajln+x∑bj=0, (17) j=1 j=1 where y>0 is the regularization parameter.X=[x1.....xn]is the nxp input data matrix,and 1n represents the nx1 of ones.Note that here we use the result from Liu and Shen that the infinite constraint()xcan be reduced to j+bj=0.which is a function solely of the training data. Given a reproducing kernel K(,)from x R,we attempt to find a margin vector (g1(x)....,gm(x))=(a+ hi(x)....,am +hm(x))E((1)+Hk),where Hk is a reproducing kernel Hilbert space.The solution of the following problem 2∑+2cgw. (18) n i=1 under the constraints 1gj(x)=0 VxE is gj(x)=aj+BjiK(xix).j=1.....m i=1 with constraints=0fori=1...n.This result follows readily from that of Lee et al.]We see that kernel- based MCL solves the following optimization problem: 册明K,+片∑c,gX.c) n i=1 m m s.t∑ajln+K∑Bj=0, (19) j=1 j=1 where Bj=(Bj1.....Bjn)T and K=[K(xi,xj)]is the nxn kernel matrix. The minimization problem in(19)or(17)is a convex minimization problem and the objective function is differentiable; thus,the problem is readily solved.In particular,we make use of Newton-type methods to solve this problem.We further alternatively update aj's and Bi's.The details are given in Appendix D. To end this subsection,we establish a connection of multiclass C learning with the multiclass SVM of Crammer and Singer [5].which is defined by n min (20) g(x)2 ∑lh,x3.+兰∑gx) n i=1 under the constraints=0x.From Proposition 12.MCL reduces to the multiclass SVM of Crammer and Singer [5]as T-0.In fact,we have the following theorem (the proof is given in Appendix E). Theorem 15.Assume that y in Problems(20)and(18)are same.The minimizer of(18)approaches the minimizer of(20)as T-0.Z. Zhang et al. / Artificial Intelligence 215 (2014) 55–78 65 logit loss. Thus, it is applicable to the construction of multiclass large margin classification methods. This motivates us to devise multiclass large margin classification methods based on CT (g(x), c). 5. Applications of the multiclass C-loss in classification problems In this section, we develop a multiclass large margin classifier and a multiclass boosting algorithm. Recall that we let CT (g(x), c) denote CT ,1(g(x), c); that is, we always set u = 1 here and later. 5.1. The multiclass C learning Using the multiclass C-loss CT (g(x), c), we now construct margin-based classifiers that we refer to as multiclass C learning (MCL). We first consider the linear case and then turn to the kernelized case. In the linear case, where g j(x) = a j + xT bj, we pose the following optimization problem: min a,b 1 2 m j=1 bj2 + γ n n i=1 CT g(xi), ci s.t. m j=1 a j1n + X m j=1 bj = 0, (17) where γ > 0 is the regularization parameter, X = [x1,..., xn]T is the n×p input data matrix, and 1n represents the n×1 of ones. Note that here we use the result from Liu and Shen [15] that the infinite constraint m j=1 g j(x) ∀x ∈ X can be reduced to m j=1 a j1n + Xm j=1 bj = 0, which is a function solely of the training data. Given a reproducing kernel K(·,·) from X × X → R, we attempt to find a margin vector (g1(x),..., gm(x)) = (a1 + h1(x),...,am + hm(x)) ∈ m j=1({1} + HK ), where HK is a reproducing kernel Hilbert space. The solution of the following problem min g(x) 1 2 m j=1 h j(x) 2 HK + γ n n i=1 CT g(xi), ci (18) under the constraints m j=1 g j(x) = 0 ∀x ∈ X is g j(x) = a j +n i=1 βji K(xi, x), j = 1,...,m with constraints m j=1 g j(xi) = 0 for i = 1,...,n. This result follows readily from that of Lee et al. [13]. We see that kernelbased MCL solves the following optimization problem: min a,β 1 2 βT j Kβ j + γ n n i=1 CT g(xi), ci s.t. m j=1 a j1n + K m j=1 β j = 0, (19) where β j = (βj1,...,βjn)T and K = [K(xi, xj)] is the n×n kernel matrix. The minimization problem in (19) or (17) is a convex minimization problem and the objective function is differentiable; thus, the problem is readily solved. In particular, we make use of Newton-type methods to solve this problem. We further alternatively update a j ’s and β j’s. The details are given in Appendix D. To end this subsection, we establish a connection of multiclass C learning with the multiclass SVM of Crammer and Singer [5], which is defined by min g(x) 1 2 m j=1 h j(x) 2 HK + γ n n i=1 ξci g(xi) (20) under the constraints m j=1 g j(x) = 0 ∀x ∈ X . From Proposition 12, MCL reduces to the multiclass SVM of Crammer and Singer [5] as T → 0. In fact, we have the following theorem (the proof is given in Appendix E). Theorem 15. Assume that γ in Problems (20) and (18) are same. The minimizer of (18) approaches the minimizer of (20) as T → 0.

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Multicategory large margin classification methods - hinge losses vs. coherence functions.