60 Z. Zhang et al. / Artificial Intel_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Multicategory large margin classification methods - hinge losses vs. coherence functions.

正在加载图片...

% Z Zhang et aL Artificial Intelligence 215(2014)55-78 vc(g(x))=>n(gc(x)-gn(x)). (2) l≠c while the second one is vc(g(x))=n(-gi(x)). (3) l≠c This leads us to two approaches for constructing majorization ve(g(x))of I Zhang [28]referred to them as pairwise comparison and constrained comparison.A theoretical analysis of these two classes of majorization functions has also been presented by Zhang [28].His analysis mainly focused on consistency of empirical risk minimization and the ISC property of surrogate losses.Our results in Section 3.1 show a direct and intuitive connection of these two approaches with the original 0-1 loss. 3.3.Multicategory hinge losses Using distinct n(gj(x))())in the two approaches,we can construct different multicategory losses for large margin classifiers.For example,let n(gj(x))=(1-gj(x))+which upper bounds Ig;(x)0).Then j(1-ge(x)+gj(x))+ and (1+gj(x))+are candidate majorizations for)which yield two multiclass SVM methods. In the multicategory SVM(MSVM),Lee et al.[13]employed (1+gj(x))+as a multicategory hinge loss.Moreover, Lee et al.proved that this multicategory hinge loss is Fisher-consistent.In particular.the minimizer of(1+ gj(x))+Pc(x)w.r.t.geg is g(x)=m-1 if I=argmaxj (Pj(x))and g(x)=-1 otherwise. The pairwise comparison j(1-ge(x)+gj(x))+was used by Vapnik [25].Weston and Watkins [26],Bredensteiner and Bennett [3].Guermeur [12].Unfortunately,Lee et al.[13].Zhang [28],Liu [14]showed that solutions of the corresponding optimization problem do not always implement the Bayes decision rule.However,we find that it is still Fisher-consistent under certain conditions.In particular,we have the following theorem (the proof is given in Appendix B.1). Theorem 7.Let Pj(x)>0 for j=1....,m,Pi(x)=maxj Pj(x)and Pk(x)=maxj Pj(x),and let m gx)=argmin∑Pc(x)∑(1-gc(x+gj(x)+ g(x)EC c=1 j≠c If P(x)>1/2 or Pk(x)<1/m,then (x)=1+g(x)1+j(x)for jl.k. This theorem implies that g(x)>gj(x).so the majorization function (1-ge(x)+gj(x))+is Fisher-consistent when P(x)>1/2 or Pk(x)<1/m.In the case that m=3.Liu [14]showed that this majorization function yields the Fisher consistency when Pk<while the consistency is not always satisfied when 1/2>P>Pk=1/3.Theorem 7 shows that for any m3 the consistency is also satisfied whenever Pk<. As we have seen.can be also used as a starting point to construct a majorization ofSince Imwe call this construction approach the maximum pairwise comparison.In fact,this approach was employed by Crammer and Singer [5].Liu and Shen [15]and Wu and Liu [27].Especially,Crammer and Singer 5 used the surrogate: Ec(g(x))=max gj(x)+1-Ij=c)-gc(x). (4) It is easily seen that Ukw-8e图>0≤max{8jX)+1-U=c}-8c(X≤1+8jx-gc(x)+ j≠c which implies that e(g(x))is a tighter upper bound of)than (1-ge(x)+gj(x))+.Note that Crammer and Singer [5]did not assume geg,but Liu and Shen [15]argued that this assumption is also necessary.Zhang [28]showed that sc(g(x))is Fisher-consistent only when P(x)>1/2.However,the author did not give an explicit expression of the minimizer of the expected error in question in the literature.Here we present the constructive solution of the corresponding minimization problem in the following theorem (the proof is given in Appendix B.2). Theorem 8.Consider the following optimization problem of m gx)=argmin∑ max(gj(x)+1-Itj=c))-gc(x)Pc(x). (5) g(X)EG C=1 Assume that P(x)=maxj Pi(x).60 Z. Zhang et al. / Artificial Intelligence 215 (2014) 55–78 ψc g(x) = l=c η gc (x) − gl(x) , (2) while the second one is ψc g(x) = l=c η −gl(x) . (3) This leads us to two approaches for constructing majorization ψc (g(x)) of I{φ(x)=c}. Zhang [28] referred to them as pairwise comparison and constrained comparison. A theoretical analysis of these two classes of majorization functions has also been presented by Zhang [28]. His analysis mainly focused on consistency of empirical risk minimization and the ISC property of surrogate losses. Our results in Section 3.1 show a direct and intuitive connection of these two approaches with the original 0–1 loss. 3.3. Multicategory hinge losses Using distinct η(g j(x)) (≥ I{g j(x)≤0}) in the two approaches, we can construct different multicategory losses for large margin classifiers. For example, let η(g j(x)) = (1 − g j(x))+ which upper bounds I{g j(x)≤0}. Then j=c (1 − gc (x) + g j(x))+ and j=c (1 + g j(x))+ are candidate majorizations for I{φ(x)=c}, which yield two multiclass SVM methods. In the multicategory SVM (MSVM), Lee et al. [13] employed j=c (1 + g j(x))+ as a multicategory hinge loss. Moreover, Lee et al. [13] proved that this multicategory hinge loss is Fisher-consistent. In particular, the minimizer of m c=1 j=c (1 + g j(x))+ Pc (x) w.r.t. g ∈ G is gˆl(x) = m − 1 if l = argmaxj(P j(x)) and gˆl(x) = −1 otherwise. The pairwise comparison j=c (1−gc (x)+g j(x))+ was used by Vapnik [25], Weston and Watkins [26], Bredensteiner and Bennett [3], Guermeur [12]. Unfortunately, Lee et al. [13], Zhang [28], Liu [14] showed that solutions of the corresponding optimization problem do not always implement the Bayes decision rule. However, we find that it is still Fisher-consistent under certain conditions. In particular, we have the following theorem (the proof is given in Appendix B.1). Theorem 7. Let P j(x) > 0 for j = 1,...,m, Pl(x) = maxj P j(x) and Pk(x) = maxj=l P j(x), and let gˆ(x) = argmin g(x)∈G m c=1 Pc (x) j=c 1 − gc (x) + g j(x) +. If Pl(x) > 1/2 or Pk(x) < 1/m, then gˆl(x) = 1 + gˆk(x) ≥ 1 + gˆ j(x) for j = l,k. This theorem implies that gˆl(x) > gˆ j(x), so the majorization function j=c (1 − gc (x) + g j(x))+ is Fisher-consistent when Pl(x) > 1/2 or Pk(x) < 1/m. In the case that m = 3, Liu [14] showed that this majorization function yields the Fisher consistency when Pk < 1 3 , while the consistency is not always satisfied when 1/2 > Pl > Pk ≥ 1/3. Theorem 7 shows that for any m ≥ 3 the consistency is also satisfied whenever Pk < 1 m . As we have seen, I{ j=c g j(x)−gc (x)>0} can be also used as a starting point to construct a majorization of I{φ(x)=c}. Since I{ j=c g j(x)−gc (x)>0} = I{max j=c g j(x)−gc (x)>0}, we call this construction approach the maximum pairwise comparison. In fact, this approach was employed by Crammer and Singer [5], Liu and Shen [15] and Wu and Liu [27]. Especially, Crammer and Singer [5] used the surrogate: ξc g(x) = max g j(x) + 1 − I{j=c} − gc (x). (4) It is easily seen that I{ j=c g j(x)−gc (x)>0} ≤ max j g j(x) + 1 − I{j=c} − gc (x) ≤ j=c 1 + g j(x) − gc (x) +, which implies that ξc (g(x)) is a tighter upper bound of I{φ(x)=c} than j=c (1 − gc (x) + g j(x))+. Note that Crammer and Singer [5] did not assume g ∈ G, but Liu and Shen [15] argued that this assumption is also necessary. Zhang [28] showed that ξc (g(x)) is Fisher-consistent only when Pl(x) > 1/2. However, the author did not give an explicit expression of the minimizer of the expected error in question in the literature. Here we present the constructive solution of the corresponding minimization problem in the following theorem (the proof is given in Appendix B.2). Theorem 8. Consider the following optimization problem of gˆ(x) = argmin g(x)∈G m c=1 max j g j(x) + 1 − I{j=c} − gc (x) Pc (x). (5) Assume that Pl(x) = maxj P j(x).

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Multicategory large margin classification methods - hinge losses vs. coherence functions.

©2008-现在 cucdc.com 高等教育资讯网版权所有