正在加载图片...
Z Zhang et aL Artificial Intelligence 215(2014)55-78 5》 other technical conditions,the resulting classifiers can be shown to be Bayes consistent [1.It seems reasonable to pursue a similar development in the case of multicategory classification,and indeed such a proposal has been made by Zou et al. [33](also see[24.23). Definition 2.A surrogate function ve(g(x))is said to be Fisher-consistent w.r.t.a margin vector g(x)=(g(x).....gm(x))at (x,c)if(i)the following risk minimization problem m (x)=argmin ve(g(x))Pc(x) (1) g(x)EG c=1 has a unique solution g(x)=(g1(x).....gm(x));and (ii) argmax gc(x)=argmax Pc(x) Zou et al.[33]assumed ve(g(x))as an independent and identical setting:that is,ve(g(x))n(ge(x))where n is some loss function.As we see.Definition 2 does not require that the function ve(g(x))depends only on ge(x).Thus,this def- inition refines the definition of Zou et al.33.The definition is related to the notion of infinite-sample consistency (ISC) of Zhang [28].ISC says that an exact solution of Problem(1)leads to a Bayes rule.However,it does not require that the solution of Problem(1)be unique.Additionally.Zhang [28]especially discussed two other settings:pairwise comparison e(g(x)g(x)-gj(x))and constrained comparison ve(g(x))-gj(x)). In this paper,we are concerned with multicategory classification methods in which binary and multicategory problems are solved following the same principle.One of the principled approaches is due to Lee et al.[13].The authors proposed a multicategory SVM(MSVM)which treats the m-class problem simultaneously.Moreover,Lee et al.[13]proved that their MSVM satisfies a Fisher consistency condition.Unfortunately.this desirable property does not hold for many other multiclass SVMs(see,e.g.,[25,3,26,12]).The multiclass SVM of [5]possesses this property only if there is a dominating class(that is, maxj Pj(x)>1/2). Recently.Liu and Shen [15]proposed a so-called multicategory -learning algorithm by using a multicategory loss, and Wu and Liu [27]devised robust truncated-hinge-loss SVMs.These two algorithms are parallel to the multiclass SVM of Crammer and Singer [5]and enjoy a generalized pairwise comparison setting. Additionally.Zhu et al.[32]and Saberian and Vasconcelos [21]devised several multiclass boosting algorithms,which solve binary and multicategory problems under the same principle.Mukherjee and Schapire [17]created a general frame- work for studying multiclass boosting.which formalizes the interaction between the boosting algorithm and the weak learner.We note that Gao and Koller [11]applied the multiclass hinge loss of Crammer and Singer [5]to devise a multiclass boosting algorithm.However,this algorithm is cast under an output coding framework. 1.2.Contributions and outline In this paper,we study the Fisher consistency properties of multicategory surrogate losses.First,assuming that losses are twice differentiable,we present a Fisher consistency property under a more general setting.including the independent and identical,constrained comparison and generalized pairwise comparison settings.We next propose a framework for constructing a majorization function of the 0-1 loss.This framework provides us with a natural and intuitive perspective for construction of three extant multicategory hinge losses.Under this framework,we conduct an in-depth analysis on the Fisher consistency properties of these three extant multicategory hinge losses.In particular,we give a sufficient condition that the multiclass hinge loss used by Vapnik[25],Bredensteiner and Bennett[3.Weston and Watkins [26,Guermeur [12] satisfies the Fisher consistency.Moreover,we constructively derive the minimizers of the expected errors of the multiclass hinge losses of Crammer and Singer 5. The framework also inspires us to propose a class of multicategory majorization functions which are based on the coherence function [30.The coherence function is a smooth and convex majorization of the hinge function.Especially. its limit as the temperature approaches zero gives the hinge loss.Moreover,its relationship with the logit loss is also shown.Zhang et al.[30]originally exploited the coherence function in binary classification problems.We investigate its application in the development of multicategory margin classification methods.Based on the coherence function,we in particular present three multicategory coherence losses which correspond to the three extant multicategory hinge losses. These multicategory coherence losses are infinitely smooth and convex and they satisfy the Fisher consistency condition. The coherence losses have the advantage over the hinge losses that they provide an estimate of the conditional class probability,and over the multicategory logit loss that their limiting versions at zero temperature are just their corresponding multicategory hinge loss functions.Thus they are very appropriate for use in the development of multicategory large margin classification methods,especially boosting algorithms.We propose in this paper a multiclass C learning algorithm and a multiclass GentleBoost algorithm,both based on our multicategory coherence loss functions. The remainder of this paper is organized as follows.Section 2 gives a general result on Fisher consistency.In Section 3. we discuss the methodology for the construction of multicategory majorization losses and present two majorization lossesZ. Zhang et al. / Artificial Intelligence 215 (2014) 55–78 57 other technical conditions, the resulting classifiers can be shown to be Bayes consistent [1]. It seems reasonable to pursue a similar development in the case of multicategory classification, and indeed such a proposal has been made by Zou et al. [33] (also see [24,23]). Definition 2. A surrogate function ψc (g(x)) is said to be Fisher-consistent w.r.t. a margin vector g(x) = (g1(x),..., gm(x))T at (x, c) if (i) the following risk minimization problem gˆ(x) = argmin g(x)∈G m c=1 ψc  g(x)  Pc (x) (1) has a unique solution gˆ(x) = (gˆ 1(x),..., gˆm(x))T ; and (ii) argmax c gˆ c (x) = argmax c Pc (x). Zou et al. [33] assumed ψc (g(x)) as an independent and identical setting; that is, ψc (g(x)) η(gc (x)) where η is some loss function. As we see, Definition 2 does not require that the function ψc (g(x)) depends only on gc (x). Thus, this def￾inition refines the definition of Zou et al. [33]. The definition is related to the notion of infinite-sample consistency (ISC) of Zhang [28]. ISC says that an exact solution of Problem (1) leads to a Bayes rule. However, it does not require that the solution of Problem (1) be unique. Additionally, Zhang [28] especially discussed two other settings: pairwise comparison ψc (g(x))  j=c η(gc (x) − g j(x)) and constrained comparison ψc (g(x))  j=c η(−g j(x)). In this paper, we are concerned with multicategory classification methods in which binary and multicategory problems are solved following the same principle. One of the principled approaches is due to Lee et al. [13]. The authors proposed a multicategory SVM (MSVM) which treats the m-class problem simultaneously. Moreover, Lee et al. [13] proved that their MSVM satisfies a Fisher consistency condition. Unfortunately, this desirable property does not hold for many other multiclass SVMs (see, e.g., [25,3,26,12]). The multiclass SVM of [5] possesses this property only if there is a dominating class (that is, maxj P j(x) > 1/2). Recently, Liu and Shen [15] proposed a so-called multicategory ψ-learning algorithm by using a multicategory ψ loss, and Wu and Liu [27] devised robust truncated-hinge-loss SVMs. These two algorithms are parallel to the multiclass SVM of Crammer and Singer [5] and enjoy a generalized pairwise comparison setting. Additionally, Zhu et al. [32] and Saberian and Vasconcelos [21] devised several multiclass boosting algorithms, which solve binary and multicategory problems under the same principle. Mukherjee and Schapire [17] created a general frame￾work for studying multiclass boosting, which formalizes the interaction between the boosting algorithm and the weak learner. We note that Gao and Koller [11] applied the multiclass hinge loss of Crammer and Singer [5] to devise a multiclass boosting algorithm. However, this algorithm is cast under an output coding framework. 1.2. Contributions and outline In this paper, we study the Fisher consistency properties of multicategory surrogate losses. First, assuming that losses are twice differentiable, we present a Fisher consistency property under a more general setting, including the independent and identical, constrained comparison and generalized pairwise comparison settings. We next propose a framework for constructing a majorization function of the 0–1 loss. This framework provides us with a natural and intuitive perspective for construction of three extant multicategory hinge losses. Under this framework, we conduct an in-depth analysis on the Fisher consistency properties of these three extant multicategory hinge losses. In particular, we give a sufficient condition that the multiclass hinge loss used by Vapnik [25], Bredensteiner and Bennett [3], Weston and Watkins [26], Guermeur [12] satisfies the Fisher consistency. Moreover, we constructively derive the minimizers of the expected errors of the multiclass hinge losses of Crammer and Singer [5]. The framework also inspires us to propose a class of multicategory majorization functions which are based on the coherence function [30]. The coherence function is a smooth and convex majorization of the hinge function. Especially, its limit as the temperature approaches zero gives the hinge loss. Moreover, its relationship with the logit loss is also shown. Zhang et al. [30] originally exploited the coherence function in binary classification problems. We investigate its application in the development of multicategory margin classification methods. Based on the coherence function, we in particular present three multicategory coherence losses which correspond to the three extant multicategory hinge losses. These multicategory coherence losses are infinitely smooth and convex and they satisfy the Fisher consistency condition. The coherence losses have the advantage over the hinge losses that they provide an estimate of the conditional class probability, and over the multicategory logit loss that their limiting versions at zero temperature are just their corresponding multicategory hinge loss functions. Thus they are very appropriate for use in the development of multicategory large margin classification methods, especially boosting algorithms. We propose in this paper a multiclass C learning algorithm and a multiclass GentleBoost algorithm, both based on our multicategory coherence loss functions. The remainder of this paper is organized as follows. Section 2 gives a general result on Fisher consistency. In Section 3, we discuss the methodology for the construction of multicategory majorization losses and present two majorization losses
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有