Journal of Statistical Software 3 In _中国高校课件下载中心

正在加载图片...

JJournal of Statistical Software 3 In the case of the L2-norm soft margin classification the primal optimization problem takes the form: minimize t(w,)= m i=1 subject to (④(x),w〉+b)≥1- (i=1,..,m) (4) 5≥0 (i=1,.,m) where m is the number of training patterns,and yi=1.As in most kernel methods,the SVM solution w can be shown to have an expansion w= 〉aΦ(x) (5) i=1 where non-zero coefficients(support vectors)occur when a point (i,yi)meets the constraint. The coefficients a;are found by solving the following(dual)quadratic programming problem: m 1 m maximize W(a)=>ai- 2 Qiajyiyik(Ti,xj) i=1 ij=1 subject to 0≤ai≤ (i=1,..,m) (6) m m ,ay=0. i=1 This is a typical quadratic problem of the form: minimize cx+xHx subject to b≤Ax≤b+r (7) l≤x≤u where H∈Rmxm with entries Hij=yk(x,x)c=(1,,1)∈Rm,u=(C,,C)∈Rm, 1 =(0,...,0)E Rm,A =(y1,...,ym)ERm,b=0,r=0.The problem can easily be solved in a standard QP solver such as quadprog()in package quadprog(Weingessel 2004)or ipop() in package kernlab (Karatzoglou,Smola,Hornik,and Zeileis 2005),both available in R(R Development Core Team 2005).Techniques taking advantage of the special structure of the SVM QP problem like SMO and chunking (Osuna et al.1997)though offer much better performance in terms of speed,scalability and memory usage. The cost parameter C of the SVM formulation in Equation 7 controls the penalty paid by the SVM for missclassifying a training point and thus the complexity of the prediction function. A high cost value C will force the SVM to create a complex enough prediction function to missclassify as few training points as possible,while a lower cost parameter will lead to a simpler prediction function.Therefore,this type of SVM is usually called C-SVM. Another formulation of the classification with a more intuitive hyperparameter than C is the v-SVM (Scholkopf,Smola,Williamson,and Bartlett 2000).The v parameter has the interesting property of being an upper bound on the training error and a lower bound onJournal of Statistical Software 3 In the case of the L2-norm soft margin classification the primal optimization problem takes the form: minimize t(w, ξ) = 1 2 kwk 2 + C m Xm i=1 ξi subject to yi(hΦ(xi), wi + b) ≥ 1 − ξi (i = 1, . . . , m) (4) ξi ≥ 0 (i = 1, . . . , m) where m is the number of training patterns, and yi = ±1. As in most kernel methods, the SVM solution w can be shown to have an expansion w = Xm i=1 αiyiΦ(xi) (5) where non-zero coefficients (support vectors) occur when a point (xi , yi) meets the constraint. The coefficients αi are found by solving the following (dual) quadratic programming problem: maximize W(α) = Xm i=1 αi − 1 2 Xm i,j=1 αiαjyiyjk(xi , xj ) subject to 0 ≤ αi ≤ C m (i = 1, . . . , m) (6) Xm i=1 αiyi = 0. This is a typical quadratic problem of the form: minimize c >x + 1 2 x >Hx subject to b ≤ Ax ≤ b + r l ≤ x ≤ u (7) where H ∈ R m×m with entries Hij = yiyjk(xi , xj ), c = (1, . . . , 1) ∈ R m, u = (C, . . . , C) ∈ R m, l = (0, . . . , 0) ∈ R m, A = (y1, . . . , ym) ∈ R m, b = 0, r = 0. The problem can easily be solved in a standard QP solver such as quadprog() in package quadprog (Weingessel 2004) or ipop() in package kernlab (Karatzoglou, Smola, Hornik, and Zeileis 2005), both available in R (R Development Core Team 2005). Techniques taking advantage of the special structure of the SVM QP problem like SMO and chunking (Osuna et al. 1997) though offer much better performance in terms of speed, scalability and memory usage. The cost parameter C of the SVM formulation in Equation 7 controls the penalty paid by the SVM for missclassifying a training point and thus the complexity of the prediction function. A high cost value C will force the SVM to create a complex enough prediction function to missclassify as few training points as possible, while a lower cost parameter will lead to a simpler prediction function. Therefore, this type of SVM is usually called C-SVM. Another formulation of the classification with a more intuitive hyperparameter than C is the ν-SVM (Sch¨olkopf, Smola, Williamson, and Bartlett 2000). The ν parameter has the interesting property of being an upper bound on the training error and a lower bound on

<<向上翻页向下翻页>>

点击下载：《多元统计分析》课程教学资源（阅读材料）SVM in R