Coherence Functions for Multicategory_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Coherence functions for multicategory margin-based classification methods

正在加载图片...

Coherence Functions for Multicategory Margin-based Classification Methods functions for multicategory margin-based classification thus used to approximate I().The correspond- methods.Section 3 proposes the coherence function ing population and empirical risk functions are given and discusses its statistical properties.Section 4 de- by vises a multicategory margin-based boosting algorithm using the coherence function.An experimental anal- R(P,g)=Ex ((g(x).c)P:(x) ysis is presented in Section 5 and concluding remarks c】 are given in Section 6.Some proofs of the theoretical results are left to the appendices (g)= 1∑(gx.c), 2 i=1 2 Problem Formulation where Ex()is the expectation taken w.r.t.the distri- bution of X. We are given an m-class(m >2)classification problem If a is a positive constant that does not de- with a set of training data {(xi,ci))21,where xi E pend on (x,c),argming((g)is equivalent to C Rd is an input vector and ci {1,2,...,m}is argming(x)eR(g).We thus present the following def- its corresponding class label.We assume that each x inition. belongs to one and only one class.Our goal is to find a classifier(x):x一→c∈{l,…,m Definition 1 A surrogate loss (g(x),c)is said to be Let Pe(x)=P(Cc x),c 1,...,m be the a majori2 ation o时e(x)≠dw.r.t.(x,c）fS(g(x),c）≥ class probabilities given x.The expected error at x is )twhere a is a positive constant that does not depend on (x,c). then defined by∑2，le(x≠aP.(x,whereI=1if is true and 0 otherwise.The empirical error on the Given a majorization function (g(x),c),the classi- training data is given by fier resulting from the minimization of R(g)w.r.t.the margin vector g(x)is called a margin-based classifier (x)≠c or a margin-based classification method.Therefore. i=1 a margin-based classifier corresponds to a so-called majorization-minimization procedure.In the binary Since e is equal to its minimum value of zero when classification setting,a wide variety of classifiers can be all the training data points are correctly classified,we understood as minimizers of a majorization loss func- wish to use e as a basis for devising multicategory clas- tion of the 0-1 loss.If such functions satisfy other sification algorithms. technical conditions,the resulting classifiers can be shown to be Bayes consistent (Bartlett et al.,2006). 2.1 Multicategory Margins It seems reasonable to pursue a similar development Suppose the classifier is modeled using an m-vector in the case of multicategory classification,and indeed g(x)=(g1(x),...,9m(x))T,where the induced clas- such a proposal has been made by Zou et al.(2008) (see also Tewari and Bartlett (2007);Zhang (2004)). sifier is obtained via maximization in a manner akin to discriminant analysis:(x)=argmax;{gi(x)}.For The following definition refines the definition of Zou et al.(2008).(Specifically,we do not require that the simplicity of analysis.we assume that the maximiz- ing argument of maxjgi(x)is unique.Of course this function (g(x),c)depends only on ge(x).) does not imply that the maximum value is unique;in- Definition 2 A surrogate function ((g(x),c)is said deed,adding a constant to each component gi(x)does to be Fisher consistent w.r.t.a margin vector g(x)= not change the maximizing argument.To remove this (g(x),....9m(x))T at x if (i)the following risk min- redundancy,it is convenient to impose a sum-to-zero imization problem constraint.Thus we define m =0 g(x)=argmin (g(x),c)P(x) (1) g(x)E9 c=l and assume g(x)Eg.Zou et al.(2008)referred to the has a unique solutiong(x)=((x),...,m(x))T;and (i) vectors in g as multicategory margin vectors. argmax gc(x)=argmax Pe(x). Since a margin vector g(x)induces a classifier,we ex- plore the minimization of e with respect to (w.r.t.) 2.2 Multicategory Losses g(x).However,this minimization problem is in- tractable because is the 0-1 function.Var- Zou et al.(2008)derived multicategory boosting al- ious tractable surrogate loss functions (g(x),c)are gorithms by using ((g(x),c)=exp(-ge(x)).In theirCoherence Functions for Multicategory Margin-based Classification Methods functions for multicategory margin-based classification methods. Section 3 proposes the coherence function and discusses its statistical properties. Section 4 devises a multicategory margin-based boosting algorithm using the coherence function. An experimental analysis is presented in Section 5 and concluding remarks are given in Section 6. Some proofs of the theoretical results are left to the appendices. 2 Problem Formulation We are given an m-class (m ≥ 2) classification problem with a set of training data {(xi , ci)} n i=1, where xi ∈ X ⊂ R d is an input vector and ci ∈ {1, 2, . . . , m} is its corresponding class label. We assume that each x belongs to one and only one class. Our goal is to find a classifier φ(x) : x → c ∈ {1, . . . , m}. Let Pc(x) = P(C = c|X = x), c = 1, . . . , m be the class probabilities given x. The expected error at x is then defined by Pm c=1 I[φ(x)6=c]Pc(x), where I[#] = 1 if # is true and 0 otherwise. The empirical error on the training data is given by ǫ = 1 n Xn i=1 I[φ(xi)6=ci] . Since ǫ is equal to its minimum value of zero when all the training data points are correctly classified, we wish to use ǫ as a basis for devising multicategory classification algorithms. 2.1 Multicategory Margins Suppose the classifier is modeled using an m-vector g(x) = (g1(x), . . . , gm(x))T , where the induced classifier is obtained via maximization in a manner akin to discriminant analysis: φ(x) = argmaxj{gj (x)}. For simplicity of analysis, we assume that the maximizing argument of maxj gj (x) is unique. Of course this does not imply that the maximum value is unique; indeed, adding a constant to each component gj (x) does not change the maximizing argument. To remove this redundancy, it is convenient to impose a sum-to-zero constraint. Thus we define G = (g1(x), . . . , gm(x))T Xm j=1 gj (x) = 0 and assume g(x) ∈ G. Zou et al. (2008) referred to the vectors in G as multicategory margin vectors. Since a margin vector g(x) induces a classifier, we explore the minimization of ǫ with respect to (w.r.t.) g(x). However, this minimization problem is intractable because I[φ(x)6=c] is the 0-1 function. Various tractable surrogate loss functions ζ(g(x), c) are thus used to approximate I[φ(x)6=c] . The corresponding population and empirical risk functions are given by R(P, g) = EX Xm c=1 ζ(g(x), c)Pc(x) , Rˆ(g) = 1 n Xn i=1 ζ(g(xi), ci), where EX(·) is the expectation taken w.r.t. the distribution of X. If α is a positive constant that does not depend on (x, c), argming(x)∈G 1 αRˆ(g) is equivalent to argming(x)∈G Rˆ(g). We thus present the following definition. Definition 1 A surrogate loss ζ(g(x), c) is said to be a majorization of I[φ(x)6=c] w.r.t. (x, c) if ζ(g(x), c) ≥ αI[φ(x)6=c] where α is a positive constant that does not depend on (x, c). Given a majorization function ζ(g(x), c), the classi- fier resulting from the minimization of Rˆ(g) w.r.t. the margin vector g(x) is called a margin-based classifier or a margin-based classification method. Therefore, a margin-based classifier corresponds to a so-called majorization-minimization procedure. In the binary classification setting, a wide variety of classifiers can be understood as minimizers of a majorization loss function of the 0-1 loss. If such functions satisfy other technical conditions, the resulting classifiers can be shown to be Bayes consistent (Bartlett et al., 2006). It seems reasonable to pursue a similar development in the case of multicategory classification, and indeed such a proposal has been made by Zou et al. (2008) (see also Tewari and Bartlett (2007); Zhang (2004)). The following definition refines the definition of Zou et al. (2008). (Specifically, we do not require that the function ζ(g(x), c) depends only on gc(x).) Definition 2 A surrogate function ζ(g(x), c) is said to be Fisher consistent w.r.t. a margin vector g(x) = (g1(x), . . . , gm(x))T at x if (i) the following risk minimization problem gˆ(x) = argmin g(x)∈G Xm c=1 ζ(g(x), c)Pc(x) (1) has a unique solution gˆ(x) = (ˆg1(x), . . . , gˆm(x))T ; and (ii) argmax c gˆc(x) = argmax c Pc(x). 2.2 Multicategory Losses Zou et al. (2008) derived multicategory boosting algorithms by using ζ(g(x), c) = exp(−gc(x)). In their

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Coherence functions for multicategory margin-based classification methods