正在加载图片...
Coherence Functions for Multicategory Margin-based Classification Methods Zhihua Zhang Michael I.Jordan Wu-Jun Li and Dit-Yan Yeung College of Comp.Sci.and Tech. Depts.of EECS and Statistics Dept.of Comp.Sci.and Eng. Zhejiang University University of California,Berkeley Hong Kong Univ.of Sci.and Tech Zhejiang 310027,China Berkeley.CA 94720.USA Hong Kong,China zhzhang@cs.zju.edu.cn jordanOcs.berkeley.edu {liwujun,dyyeung}@cse.ust.hk Abstract A variety of ad hoc extensions of the binary SVM and boosting to multicategory classification problems Margin-based classification methods are typ- have been studied.These include one-versus-rest.one- ically devised based on a majorization- versus-one,error-correcting codes,and pairwise cou- minimization procedure,which approxi- pling (Allwein et al.,2000).Among these methods, mately solves an otherwise intractable min- one-versus-rest has been the dominant approach.The imization problem defined with the 0-1 loss. basic idea is to train m binary classifiers for an m- The extension of such methods from the bi- class(m >2)problem so that each classifier learns to nary classification setting to the more general discriminate one class from the rest.However,opti- multicategory setting turns out to be non- mality achieved for each of the m independent binary trivial.In this paper,our focus is to devise problems does not readily guarantee optimality for the margin-based classification methods that can original m-class problem. be seamlessly applied to both settings,with The goal of this paper is to solve multicategory clas- the binary setting simply as a special case. sification problems using the same margin principle In particular,we propose a new majoriza- as that for binary problems.Of crucial concern are tion loss function that we call the coherence the statistical properties(Bartlett et al.,2006;Tewari function,and then devise a new multicate- and Bartlett,2007;Zhang,2004)of a majorization gory margin-based boosting algorithm based function for the original 0-1 loss function.In particu- on the coherence function.Analogous to lar,we analyze the Fisher-consistency properties(Zou deterministic annealing,the coherence func- et al.,2008)of extant majorization functions,which tion is characterized by a temperature fac- are built on the exponential,logit and hinge functions. tor.It is closely related to the multinomial This analysis inspires us to propose a new majorization log-likelihood function and its limit at zero function,which we call the coherence function. temperature corresponds to a multicategory hinge loss function. The coherence function is attractive because it is a Fisher-consistent majorization of the 0-1 loss.Also, one limiting version of it is just the multicategory hinge 1 Introduction loss function of Crammer and Singer(2001),and its re- lationship with the multinomial log-likelihood function Margin-based classification methods have become in- is very clear.Moreover,this function is differentiable creasingly popular since the advent of the support vec- and convex.Thus it is very appropriate for use in the tor machine (SVM)(Cortes and Vapnik,1995)and development of multicategory margin-based classifica- boosting (Freund,1995:Freund and Schapire.1997). tion methods,especially boosting algorithms.Fried- These algorithms were originally designed for binary man et al.(2000)and Zou et al.(2008)proposed the classification problems.Unfortunately,extension of multicategory LogitBoost and GentleBoost algorithms them to the multicategory setting has been found to based on the multinomial log-likelihood function and be non-trivial. the exponential loss function,respectively.We pro- Appearing in Proceedings of the 12th International Confe- pose in this paper a new multicategory GentleBoost algorithm based on our coherence function. rence on Artificial Intelligence and Statistics (AISTATS) 2009,Clearwater Beach,Florida,USA.Volume 5 of JMLR: The rest of this paper is organized as follows.Sec- W&CP 5.Copyright 2009 by the authors. tion 2 presents theoretical discussions of extant lossCoherence Functions for Multicategory Margin-based Classification Methods Zhihua Zhang College of Comp. Sci. and Tech. Zhejiang University Zhejiang 310027, China zhzhang@cs.zju.edu.cn Michael I. Jordan Depts. of EECS and Statistics University of California, Berkeley Berkeley, CA 94720, USA jordan@cs.berkeley.edu Wu-Jun Li and Dit-Yan Yeung Dept. of Comp. Sci. and Eng. Hong Kong Univ. of Sci. and Tech. Hong Kong, China {liwujun, dyyeung}@cse.ust.hk Abstract Margin-based classification methods are typ￾ically devised based on a majorization￾minimization procedure, which approxi￾mately solves an otherwise intractable min￾imization problem defined with the 0-l loss. The extension of such methods from the bi￾nary classification setting to the more general multicategory setting turns out to be non￾trivial. In this paper, our focus is to devise margin-based classification methods that can be seamlessly applied to both settings, with the binary setting simply as a special case. In particular, we propose a new majoriza￾tion loss function that we call the coherence function, and then devise a new multicate￾gory margin-based boosting algorithm based on the coherence function. Analogous to deterministic annealing, the coherence func￾tion is characterized by a temperature fac￾tor. It is closely related to the multinomial log-likelihood function and its limit at zero temperature corresponds to a multicategory hinge loss function. 1 Introduction Margin-based classification methods have become in￾creasingly popular since the advent of the support vec￾tor machine (SVM) (Cortes and Vapnik, 1995) and boosting (Freund, 1995; Freund and Schapire, 1997). These algorithms were originally designed for binary classification problems. Unfortunately, extension of them to the multicategory setting has been found to be non-trivial. Appearing in Proceedings of the 12th International Confe￾rence on Artificial Intelligence and Statistics (AISTATS) 2009, Clearwater Beach, Florida, USA. Volume 5 of JMLR: W&CP 5. Copyright 2009 by the authors. A variety of ad hoc extensions of the binary SVM and boosting to multicategory classification problems have been studied. These include one-versus-rest, one￾versus-one, error-correcting codes, and pairwise cou￾pling (Allwein et al., 2000). Among these methods, one-versus-rest has been the dominant approach. The basic idea is to train m binary classifiers for an m￾class (m ≥ 2) problem so that each classifier learns to discriminate one class from the rest. However, opti￾mality achieved for each of the m independent binary problems does not readily guarantee optimality for the original m-class problem. The goal of this paper is to solve multicategory clas￾sification problems using the same margin principle as that for binary problems. Of crucial concern are the statistical properties (Bartlett et al., 2006; Tewari and Bartlett, 2007; Zhang, 2004) of a majorization function for the original 0-1 loss function. In particu￾lar, we analyze the Fisher-consistency properties (Zou et al., 2008) of extant majorization functions, which are built on the exponential, logit and hinge functions. This analysis inspires us to propose a new majorization function, which we call the coherence function. The coherence function is attractive because it is a Fisher-consistent majorization of the 0-1 loss. Also, one limiting version of it is just the multicategory hinge loss function of Crammer and Singer (2001), and its re￾lationship with the multinomial log-likelihood function is very clear. Moreover, this function is differentiable and convex. Thus it is very appropriate for use in the development of multicategory margin-based classifica￾tion methods, especially boosting algorithms. Fried￾man et al. (2000) and Zou et al. (2008) proposed the multicategory LogitBoost and GentleBoost algorithms based on the multinomial log-likelihood function and the exponential loss function, respectively. We pro￾pose in this paper a new multicategory GentleBoost algorithm based on our coherence function. The rest of this paper is organized as follows. Sec￾tion 2 presents theoretical discussions of extant loss
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有