Z. Zhang et al. / Artificial Intellig_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Multicategory large margin classification methods - hinge losses vs. coherence functions.

正在加载图片...

Z Zhang et aL Artificial Intelligence 215(2014)55-78 67 (1)fP1(x)>1/2,then8x)=j=-品orj=1,,m; (2)fP1(x)=1/2.then0≤gi(x-8gj(x)≤1 and gj(x)=gc(x)for c,j≠L: (3)If P(x)<1/2,then ge(x)=0 for c=1,....m. This theorem shows that the majorization function maxj(gj(x)+1-Ij=c)-ge(x)is Fisher-consistent when P(x)>1/2. Otherwise,the solution of (5)degenerates to the trivial point.As we have seen from Theorems 7 and 8,P(x)>1/2 is a sufficient condition for both maxj{gj(x)+1-Ij=c))-ge(x)and j(1-ge(x)+gj(x))+to be Fisher-consistent.Moreover. they satisfy the condition gi(x)=1+gk(x)where k =argmaxj Pj(x).However,as shown in Theorem 7.j(1-ge(x)+ gj(x))+still yields the Fisher-consistent property when Pk<.Thus,the consistency condition for the pairwise comparison hinge loss is weaker than that for the maximum pairwise comparison hinge loss. 3.4.Multicategory coherence losses To construct a smooth majorization function of we define n(ge(x))as the coherence function which was pro- posed by Zhang et al.[30].The coherence function is 1-21 r(z)≌Tlog1+exp T T>0 (6) where T is called the temperature parameter.Clearly.nr(z)>(1-z)+>Izso).Moreover,limron(z)=(1-z)+.Thus, we directly have two majorizations of based on the constrained comparison method and the pairwise comparison method. Using the constrained comparison,we give a smooth approximation to j(1+gj(x))+for the MSVM of Lee et al.[13]. That is, j≠c It is immediate that Lr(g(x,c)之∑jc(1+g(x)+and limr→oLr(g(x,c)=∑j≠e(1+gjx)+.Furthermore,we have the following theorem(the proof is given in Appendix B.3). Theorem 9.Assume that Pe(x)>0for c=1.....m.Consider the optimization problem m max 〉Lr(gx),c)Pc(x) (7) gX)∈G c=1 for a fixed T>and letg(x)=((x).....gm(x))be its solution.Theng(x)is unique.Moreover,if Pi(x)<Pj(x).we have g(x)< gj(x).Furthermore,we have m -1 ifc argmaxj Pj(x), m8w={-1 otherwise. Additionally,having obtainedg(x).Pe(x)is given by Pc(x)=1- m-1)(1+exp(-1+) (8) m+∑1exp(-1+) Although there is no explicit expression for g(x)in Problem(7),Theorem 9 shows that its limit at T=0 is equal to the minimizer of(+gj()P().which was studied by Lee et al Based on the pairwise comparison,we have a smooth alternative to multiclass hinge loss j(1+ge(x)-gj(x))+ which is Gr(g(x).c)T log 1+exp( +8j()-8c(X T (9) j≠c It is also immediate that Gr(g(x).c)j(1+ge(x)-gj(x))+and limT0GT(g(x).c)=j(1+ge(x)-gj(x))+. Theorem 10.Assume that Pe(x)>0 for c=1....,m.Let PI=maxj Pj(x)and Pk(x)=maxj Pj(x).Consider the optimization problem m max >Gr(g(x).c)Pc(x) g(x)EC c=1Z. Zhang et al. / Artificial Intelligence 215 (2014) 55–78 61 (1) If Pl(x) > 1/2, then gˆ j(x) = I{j=l} − 1 m for j = 1,...,m; (2) If Pl(x) = 1/2, then 0 ≤ gˆl(x) − gˆ j(x) ≤ 1 and gˆ j(x) = gˆ c (x) for c, j = l; (3) If Pl(x) < 1/2, then gˆc (x) = 0 for c = 1,...,m. This theorem shows that the majorization function max j{g j(x)+1−I{j=c}}− gc (x) is Fisher-consistent when Pl(x) > 1/2. Otherwise, the solution of (5) degenerates to the trivial point. As we have seen from Theorems 7 and 8, Pl(x) > 1/2 is a sufficient condition for both max j{g j(x)+1−I{j=c}}− gc (x) and j=c (1− gc (x)+ g j(x))+ to be Fisher-consistent. Moreover, they satisfy the condition gˆl(x) = 1 + gˆk(x) where k = argmaxj=l P j(x). However, as shown in Theorem 7, j=c (1 − gc (x) + g j(x))+ still yields the Fisher-consistent property when Pk < 1 m . Thus, the consistency condition for the pairwise comparison hinge loss is weaker than that for the maximum pairwise comparison hinge loss. 3.4. Multicategory coherence losses To construct a smooth majorization function of I{φ(x)=c}, we define η(gc (x)) as the coherence function which was proposed by Zhang et al. [30]. The coherence function is ηT (z) T log 1 + exp 1 − z T , T > 0 (6) where T is called the temperature parameter. Clearly, ηT (z) ≥ (1 − z)+ ≥ I{z≤0}. Moreover, limT→0 ηT (z) = (1 − z)+. Thus, we directly have two majorizations of I{φ(x)=c} based on the constrained comparison method and the pairwise comparison method. Using the constrained comparison, we give a smooth approximation to j=c (1+ g j(x))+ for the MSVM of Lee et al. [13]. That is, LT g(x), c T j=c log 1 + exp1 + g j(x) T . It is immediate that LT (g(x), c) ≥ j=c (1 + g j(x))+ and limT→0 LT (g(x), c) = j=c (1 + g j(x))+. Furthermore, we have the following theorem (the proof is given in Appendix B.3). Theorem 9. Assume that Pc(x) > 0 for c = 1,...,m. Consider the optimization problem max g(x)∈G m c=1 LT g(x), c Pc (x) (7) for a fixed T > 0 and let gˆ(x) = (gˆ 1(x),..., gˆm(x))T be its solution. Then gˆ(x) is unique. Moreover, if Pl(x) < P j(x), we have gˆl(x) < gˆ j(x). Furthermore, we have lim T→0 gˆ c (x) = m − 1 if c = argmaxj P j(x), −1 otherwise. Additionally, having obtained gˆ(x), Pc (x) is given by Pc (x) = 1 − (m − 1)(1 + exp(−1+gˆ c (x) T )) m + m j=1 exp(−1+gˆ j(x) T ) . (8) Although there is no explicit expression for gˆ(x) in Problem (7), Theorem 9 shows that its limit at T = 0 is equal to the minimizer of m c=1 j=c (1 + g j(x))+ Pc (x), which was studied by Lee et al. [13]. Based on the pairwise comparison, we have a smooth alternative to multiclass hinge loss j=c (1 + gc (x) − g j(x))+, which is GT g(x), c T j=c log 1 + exp1 + g j(x) − gc (x) T . (9) It is also immediate that GT (g(x), c) ≥ j=c (1 + gc (x) − g j(x))+ and limT→0 GT (g(x), c) = j=c (1 + gc (x) − g j(x))+. Theorem 10. Assume that Pc(x) > 0 for c = 1,...,m. Let Pl = maxj P j(x) and Pk(x) = maxj=l P j(x). Consider the optimization problem max g(x)∈G m c=1 GT g(x), c Pc (x)

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Multicategory large margin classification methods - hinge losses vs. coherence functions.