3 Introduction to Support Vector Learning input space feature space O ◆ O Figure.The idea of SV machines:map the training data nonlinearly into a higher/dimensional eature space via/zand construct a separating hyperplane with maximum margin there This yields a nonlinear decision boundary in input spacer By the use of a kernel fnction.1120-2 it is possible to compute the separating hyperplane without explicitly carrying out the map into the eature spacer with o >OI Note that when using Gaussian kernels2 fr inst ancezthe fature space H thus contains all superpositions ofGaussians on C.plus limit points-2 whereas by definition of.1127-2only single bumps k.x,.=do have pre/images under 1.4 Support Vector Machines To construct SV machines2 one computes an optimal hyperplane in eature spacer To this end2 we substitute .x;=6r each training example xi The weight vector cf.1114-then becomes an expansion in eature space2 and will thus typically no more correspond to the image ofa single vector fom input space.ci Scholkopf et al1.1aa8c=fr a frmula how to compute the pre/image ifit exists=Since all patterns only occur in dot products2 one can substitute Mercer kernels k or the dot products Boser et al2 1002;Guyon et al121a03-2leading to decision finctions Decision ofthe more general 6rm.cf.1118- Function f.x==sgn ∑y4x.重.x=r重.x=+b =1 sgn yi4i Tk.x,xi=+b (1.32) and the 6llowing quadratic program cf.1116 m axim ize (1.33) subject to 4>=1…6m空4=0 (1.34) .(0,1),9-6 Introduction to Support Vector Learning input space feature space Φ ◆ ◆ ◆ ◆ ❍ ❍ ❍ ❍ ❍ ❍ Figure The idea of SV machines map the training data nonlinearly into a higherdimensional feature space via and construct a separating hyperplane with maximum margin there This yields a nonlinear decision boundary in input space By the use of a kernel function it is possible to compute the separating hyperplane without explicitly carrying out the map into the feature space with Note that when using Gaussian kernels for instance the feature space Hk thus contains all superpositions of Gaussians on C plus limit points whereas by de nition of only single bumps kx do have preimages under Support Vector Machines To construct SV machines one computes an optimal hyperplane in feature space To this end we substitute xi for each training example xi The weight vector cf then becomes an expansion in feature space and will thus typically no more correspond to the image of a single vector from input space cf Scholkopf et al c for a formula how to compute the preimage if it exists Since all patterns only occur in dot products one can substitute Mercer kernels k for the dot products Boser et al Guyon et al leading to decision functions Decision of the more general form cf Function f x sgn X i yii x xi b sgn X i yii kx xi b and the following quadratic program cf maximize W X i i X ij ij yiyjkxi xj sub ject to i i and X i iyi