4 Introduction to Support Vector Learning To construct this Optimal Hyperplane (cf.figure 1.1),one solves the following optimization problem: minimize r(w)-lwl2 (1.9) subject to y·(w·x)+b)≥1,i=1,,. (1.10) This constrained optimization problem is dealt with by introducing Lagrange Lagrangian multipliers ai>0 and a Lagrangian L(w,6a)=wP-∑a(kw)+)-1). (1.11) =1 The Lagrangian L has to be minimized with respect to the primal variables w and b and maximized with respect to the dual variables ai (i.e.a saddle point has to be found).Let us try to get some intuition for this.If a constraint (1.10)is violated,then yi.((w.xi)+b)-1 0,in which case L can be increased by increasing the corresponding ai.At the same time,w and b will have to change such that L decreases.To prevent -a;(y;((w.x)+b)-1)from becoming arbitrarily large,the change in w and b will ensure that,provided the problem is separable,the constr aint will eventually be satisfied.Similarly,one can understand that for all constr aints which are not precisely met as equalities,i.e. for which yi((wxi)+b)-1 >0,the corresponding a;must be 0:this is the value of oi that maximizes L.The latter is the statement of the Karush-Kuhn- KKT Tucker complementarity conditions of optimization theory (Karush,1939;Kuhn Conditions and Tucker,1951;Bertsekas,1995). The condition that at the saddle point,the derivatives of L with respect to the primal variables must vanish, 品,6o)=0是a=0 dw (1.12) leads to ∑a:=0 (1.13) =1 and w=∑aUx (1.14) The solution vector thus has an expansion in terms of a subset of the training Support Vector patterns,namely those patterns whose a;is non-zero,called Support Vectors.By the Karush-Kuhn-Tucker complementarity conditions a·:(x·w)+b)-1=0,i=1,,, (1.15) the Support Vectors lie on the margin (cf.figure 1.1).All remaining examples of the training set are irrelevant:their constraint (1.10)does not play a role in the optimization,and they do not appear in the expansion (1.14).This nicely 1998/08/251631 Introduction to Support Vector Learning To construct this Optimal Hyperplane cf gure one solves the following optimization problem minimize w kwk sub ject to yi w xi b i This constrained optimization problem is dealt with by introducing Lagrange Lagrangian multipliers i and a Lagrangian Lw b kwk X i i yi xi w b The Lagrangian L has to be minimized with respect to the primal variables w and b and maximized with respect to the dual variables i ie a saddle point has to be found Let us try to get some intuition for this If a constraint is violated then yi w xi b in which case L can be increased by increasing the corresponding i At the same time w and b will have to change such that L decreases To prevent i yi w xi b from becoming arbitrarily large the change in w and b will ensure that provided the problem is separable the constraint will eventually be satis ed Similarly one can understand that for all constraints which are not precisely met as equalities ie for which yi w xi b the corresponding i must be this is the value of i that maximizes L The latter is the statement of the KarushKuhn KKT Tucker complementarity conditions of optimization theory Karush Kuhn Conditions and Tucker Bertsekas The condition that at the saddle point the derivatives of L with respect to the primal variables must vanish bLw b w Lw b leads to X i iyi and w X i iyixi The solution vector thus has an expansion in terms of a subset of the training patterns namely those patterns whose i Support Vector is nonzero called Support Vectors By the KarushKuhnTucker complementarity conditions i yixi w b i the Support Vectors lie on the margin cf gure All remaining examples of the training set are irrelevant their constraint does not play a role in the optimization and they do not appear in the expansion This nicely