Introduction to Support Vector Learning map 重:RN,F (1.19× and per orm the above linear algorithm in Fi For instance2 suppose we are given patterns x e RN where most in ormation is contained in the dith or der products.monomials=ofentries zj ofxziixj.,.rj2 where j.e...id e {1e.,.N)I In that case2 we might prefer to extract these monomial eatures fir st2 and work in the eature space F o fall products ofd entriesi This approach2 however2 fails fr realistically sized problems:fr N/dimensional input patterns2there exist.N+d-14/.d!.N-14=different monomialsI Already 16<16 pixel input images egi in character recognition=and a monomial degree d=5 yield a dimensionality of101 This problem can be overcome by noticing that both the construction ofthe optimal hyperplane in F.cf.1116-and the evaluation o fthe corresponding decision finction.1118-only require the evaluation ofdot products..x=.y=2and never the mapped patterns .x=in explicit ormi This is crucial2 since in some cases2the Mer cer Kernel dot products can be evaluated by a simple kernel ‖.xy==.Φ.x=TΦ.y= (1.20× For inst ance2the polynomial kernel ll.xty-=.xry_d (1.21× can be shown to correspond to a map into the space spanned by all products of exactly d dimensions ofRN.Poggio.1a75=Boser et al1.1002=Burges.1008fr a proof see chapter 20-1 For d=2 and xey e R22egn we have.Vapnik21a05= xy2=.x2xV2x.x2=2yV2.2T=.重.x=r重.y (1.22× defining .x==.x?xv2r.2- By using ll.xey==.x Ty +c-d with c>02 we can take into account all product o for der up to d.iel including those o forder smaller than d- More gener ally2 the pllowing theorem of finctional analysis shows that kernels llofpositive integral operators give rise to maps such that.120-holds.Mercer2 1a0a:Aizerman et al2 1064:Boser et al121002= Theorem 1.1 (Mercer) If is a continuous symmetric kernel ofa positive integral operator T2 ie .Tf-y-= ‖.xeyf.x= (1.23× with 厂xty-f.x-f.y-w>0 (1.2-× or all f e L2.C=.C being a compact subset of RN=it can be expanded in a uniprmly convergent series.on C<C-in termsofT's eigen finctions j and positive .(0,1),9.-6 Introduction to Support Vector Learning map RN F and perform the above linear algorithm in F For instance suppose we are given patterns x RN where most information is contained in the dth order products monomials of entries xj of x ie xj xjd where jjd fNg In that case we might prefer to extract these monomial features rst and work in the feature space F of all products of d entries This approach however fails for realistically sized problems for Ndimensional input patterns there exist N d d N di erent monomials Already pixel input images eg in character recognition and a monomial degree d yield a dimensionality of This problem can be overcome by noticing that both the construction of the optimal hyperplane in F cf and the evaluation of the corresponding decision function only require the evaluation of dot products x y and never the mapped patterns x in explicit form This is crucial since in some cases the Mercer Kernel dot products can be evaluated by a simple kernel kx y x y For instance the polynomial kernel kx yx yd can be shown to correspond to a map into the space spanned by all products of exactly d dimensions of RN Poggio Boser et al Burges for a proof see chapter For d and x y R eg we have Vapnik x y x x p xxy y p yy x y de ning xx x p xx By using kx yx y cd with c we can take into account all product of order up to d ie including those of order smaller than d More generally the following theorem of functional analysis shows that kernels k of positive integral operators give rise to maps such that holds Mercer Aizerman et al Boser et al Theorem Mercer If k is a continuous symmetric kernel of a positive integral operator T ie T f y Z C kx yf x dx with Z CC kx yf xf y dx dy for all f LC C being a compact subset of RN it can be expanded in a uniformly convergent series on CC in terms of T s eigenfunctions j and positive