Coherence Functions for Multicategory_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Coherence functions for multicategory margin-based classification methods

正在加载图片...

Coherence Functions for Multicategory Margin-based Classification Methods u with∑g1凸=0.This shows that the optimiza- Acknowledgements tion problem has a strictly local minimum point g. Again,we note that the Hessian matrix is positive Zhihua Zhang is supported in part by program for semi-definite,.so∑m1log[1+∑i4与exp(1+9-g)l乃 Changjiang Scholars and Innovative Research Team in University (IRT0652,PCSIRT),China.Zhang, is convex.Thus,g is also the global minimum point. Li and Yeung are supported by General Research Now we prove that if Pe>Pk,then de >gk.Since Fund 621407 from the Research Grants Council isthe solution of equations0,it immediately of the Hong Kong Special Administrative Region, follows that入=0 by using∑-1张=0.Hence,. China.Michael Jordan acknowledges support from NSF Grant 0509559 and grants from Google and Mi- crosoft Research. P.∑1exp(1+9-9e= 1+∑1≠eexp(1+9-9e) References from which we get Allwein,E.L.,R.E.Schapire,and Y.Singer (2000).Re- ducing multiclass to binary:A unifying approach for margin classifiers.Journal of Machine Learning Re- Pc= exp(gc)exp(ge)+>ic exp(1+) Pexp(gk)exp(gk)+∑≠kexp(1+9) (9) search1,113-141 Bartlett,P.L.,M.I.Jordan,and J.D.McAuliffe (2006). exp(2gc)-exp(1+2ge)+exp(c)∑1exp(1+】 Convexity,classification,and risk bounds.Journal of the American Statistical Association 101(473),138-156 exp(2gk)-exp(1+2gk)+exp(gk)1 exp(1+g) Cortes,C.and V.Vapnik (1995).Support-vector networks. >1. Machine Learning 20,273-297. Crammer.K.and Y.Singer (2001).On the algorithmic im- Consequently, plementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2,265-292. 0>[exp(2gc)-exp(2gx)][1-exp(1)] Freund,Y.(1995).Boosting a weak learning algorithm by majority.Information and Computation 21,256-285. +[ep0e)-exp(】∑ep(1+n） Freund,Y.and R.E.Schapire(1997).A decision-theoretic = generalization of on-line learning and an application to boosting.Journal of Computer and System Sci- =(exp(jc)-exp(gx))exp(gc)+exp(g) eces55(1),119-139. +∑exp(1+) Friedman,J.H.,T.Hastie,and R.Tibshirani (2000).Ad- ditive logistic regression:A statistical view of boosting. l≠c,k The Annals of Statistics 28(2),337-374. Rose,K.,E.Gurewitz,and G.C.Fox (1990).Statistical Thus we obtain ge gk.From (9),we get (8). mechanics and phase transitions in clustering.Physics Review Letters 65,945-948. B Proof of Theorem 2 Schapire,R.E.and Y.Singer (1999).Improved boosting algorithms using confidence-rated predictions.Machine Learning 37,297-336. First,consider that Tewari.A.and P.L.Bartlett (2007).On the consistency of multiclass classification methods.Journal of Machine Tlogm+S(g(x),c)-C(g(x),c) Learning Research 8,1007-1025. exp品∑t:+9g9e☒ Wang,G.,Z.Zhang,and F.H.Lochovsky (2005).An- =Tlog nealed discriminant analysis.In ECML. 1+∑jcX+9-☒ Zhang,T.(2004).Statistical analysis of some multi- ≤Tlog 1+∑je exp+9x9☒ category large margin classification methods.Journal ”1+∑*p+y☒=0. of Machine Learning Research 5,1225-1251. Zhang,T.and F.Oles (2001).Text categorization based on regularized linear classification methods.Information Here we use the fact that exp()is convex.Second, Retrieval 4,5-31. assume that l=argmaxj{gi(x)+1-Ij=c}.Then Zou,H.,J.Zhu,and T.Hastie (2008).New multicate- gory boosting algorithms based on multicategory Fisher- consistent losses.Annals of Applied Statistics 2(4), Tlogm+H(g(x),c)-C(g(x),c) 1290-1306. Tlog mexp1+9x）-g=x)-l 1+∑j≠cexp+92s=9s区≥0，Coherence Functions for Multicategory Margin-based Classification Methods u with Pm j=1 uj = 0. This shows that the optimization problem has a strictly local minimum point gˆ. Again, we note that the Hessian matrix is positive semi-definite, so Pm j=1 log h 1+P l6=j exp(1+gl−gj ) i Pj is convex. Thus, gˆ is also the global minimum point. Now we prove that if Pc > Pk, then ˆgc > gˆk. Since gˆ is the solution of equations ∂L ∂gc = 0, it immediately follows that λ = 0 by using Pm c=1 ∂L ∂gc = 0. Hence, Pc Pm l=1 exp(1+ˆgl−gˆc) 1+P l6=c exp(1+ˆgl−gˆc) = Xm j=1 Pj × exp(1+ˆgc−gˆj ) 1+P l6=j exp(1+ˆgl−gˆj ) , from which we get Pc Pk = exp(ˆgc) exp(ˆgk) exp(ˆgc)+P l6=c exp(1+ˆgl) exp(ˆgk) + P l6=k exp(1+ˆgl) (9) = exp(2ˆgc)− exp(1+2ˆgc)+ exp(ˆgc) Pm l=1 exp(1+ˆgl) exp(2ˆgk)− exp(1+2ˆgk)+ exp(ˆgk) Pm l=1 exp(1+ˆgl) > 1. Consequently, 0 > exp(2ˆgc) − exp(2ˆgk) 1 − exp(1) + exp(ˆgc) − exp(ˆgk) Xm l=1 exp(1+ˆgl) = (exp(ˆgc) − exp(ˆgk))h exp(ˆgc) + exp(ˆgk) + X l6=c,k exp(1 + ˆgl) i . Thus we obtain ˆgc > gˆk. From (9), we get (8). B Proof of Theorem 2 First, consider that T log m + S(g(x), c) − C(g(x), c) = T log m exp 1 m P j6=c 1+gj (x)−gc(x) T 1+P j6=c exp 1+gj (x)−gc(x) T ≤ T log 1+P j6=c exp 1+gj (x)−gc(x) T 1+P j6=c exp 1+gj (x)−gc(x) T = 0. Here we use the fact that exp(·) is convex. Second, assume that l = argmaxj{gj (x) + 1 − I[j=c]}. Then T log m + H(g(x), c) − C(g(x), c) = T log m exp 1+gl(x)−gc(x)−I[l=c] T 1+P j6=c exp 1+gj (x)−gc(x) T ≥ 0. Acknowledgements Zhihua Zhang is supported in part by program for Changjiang Scholars and Innovative Research Team in University (IRT0652, PCSIRT), China. Zhang, Li and Yeung are supported by General Research Fund 621407 from the Research Grants Council of the Hong Kong Special Administrative Region, China. Michael Jordan acknowledges support from NSF Grant 0509559 and grants from Google and Microsoft Research. References Allwein, E. L., R. E. Schapire, and Y. Singer (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141. Bartlett, P. L., M. I. Jordan, and J. D. McAuliffe (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association 101 (473), 138–156. Cortes, C. and V. Vapnik (1995). Support-vector networks. Machine Learning 20, 273–297. Crammer, K. and Y. Singer (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation 21, 256–285. Freund, Y. and R. E. Schapire (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55 (1), 119–139. Friedman, J. H., T. Hastie, and R. Tibshirani (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics 28 (2), 337–374. Rose, K., E. Gurewitz, and G. C. Fox (1990). Statistical mechanics and phase transitions in clustering. Physics Review Letters 65, 945–948. Schapire, R. E. and Y. Singer (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 297–336. Tewari, A. and P. L. Bartlett (2007). On the consistency of multiclass classification methods. Journal of Machine Learning Research 8, 1007–1025. Wang, G., Z. Zhang, and F. H. Lochovsky (2005). Annealed discriminant analysis. In ECML. Zhang, T. (2004). Statistical analysis of some multicategory large margin classification methods. Journal of Machine Learning Research 5, 1225–1251. Zhang, T. and F. Oles (2001). Text categorization based on regularized linear classification methods. Information Retrieval 4, 5–31. Zou, H., J. Zhu, and T. Hastie (2008). New multicategory boosting algorithms based on multicategory Fisherconsistent losses. Annals of Applied Statistics 2 (4), 1290–1306.

<<向上翻页

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Coherence functions for multicategory margin-based classification methods