正在加载图片...
9 Feature Spaces and Kernels captures our intuition of the problem:as thehyperplane(cf.figure1.1)is complet ely determined by the patterns closest to it,the solution should not depend on the oth er examples. By substituting (1.13)and (1.14)into L,one eliminates the primal variables and arrives at the Wolfe dual of the optimiation problem (eg.Bertsckas,1995):find Dual multipliers ai whid Optimiz ation Problem maximt e W(a)= 1 aiailili(Xi·Xi) (1.16) subject to a:≥01i=11..11amd>a:=0. (1.17) 讹, The hyperplane decision function can thus be writt en as 4 f(x)=sgn :a·(x·x)+b (1.18 where b is computed using (1.15). The structure of the optimi ation problem cosely resembles those that typically arise in Logranges formulation of mecanics (eg.Goldstein,1986).Also there often only a subset of the constraints become active For instance f we keep a ball in a b ox,then it will typically roll into one of the corners.The constraints corresponding to the walls whic are not touded by the ball are irrelevant,the walls could just as well be removed. Seen in this light,it is not too surprising that it is possible to give a mecanical interpret ation of optimal margin hyperplanes(Burges and Sdolkopf,1997):If we assume that ead support vector xi exerts a perp endicular force of sie ai and sign yi on a solid plane sheet lying along the hyperplane,then the solution satisfies the requirements of medanical stability.The constraint(1.13)states that the forces on the sheet sum to zero;and(1.14)implies that the torques also sum to zero,via ∑:x:×yaw/lw‖=w×w/lw‖=0. There are several theoretical arguments supporting the good generaliz ation per- formance of the optimal hyp erplane (Vapnik and Chervonenkis (1974);Vapnik (1979),cf.chapters 3 and 4).In addition,it is computationally attractive,since it can be constructed by solving a quadratic programming problem.But how can this begeneralized to the case of decision functions whic,unlike (1.7),are nonlinear in the data? 1.3 Feature Spaces and Kernels To construct SVmadines,the optimal hyperplane algorithm had to be augment ed by a method for computing dot products in feature spaces nonlinearly related to input space (Aizerman et al.,1964;Boser et al.,1992).The b asic idea is to map the Feature space data into some other dot product space (called the feature space)F via a nonlinear .(0,1),9.-6￾ ￾ ￾  Feature Spaces and Kernels  captures our intuition of the problem as the hyperplane cf￾ gure ￾ is completely determined by the patterns closest to it the solution should not depend on the other examples￾ By substituting ￾ and ￾ into L one eliminates the primal variables and arrives at the Wolfe dual of the optimization problem e￾g￾ Bertsekas  nd Dual multipliers i which Optimization Problem maximize W  X ￾ i￾ i   X ￾ ij￾ ij yiyj xi  xj ￾ sub ject to i ￾ i  ￾    ￾ ￾ and X ￾ i￾ iyi   ￾ The hyperplane decision function can thus be written as f x  sgn X ￾ i￾ yii  x  xi  b ￾ where b is computed using ￾￾ The structure of the optimization problem closely resembles those that typically arise in Lagranges formulation of mechanics e￾g￾ Goldstein ￾ Also there often only a subset of the constraints become active￾ For instance if we keep a ball in a box then it will typically roll into one of the corners￾ The constraints corresponding to the walls which are not touched by the ball are irrelevant the walls could just as well be removed￾ Seen in this light it is not too surprising that it is possible to give a mechanical interpretation of optimal margin hyperplanes Burges and Scholkopf  If we assume that each support vector xi exerts a perpendicular force of size i and sign yi on a solid plane sheet lying along the hyperplane then the solution satis es the requirements of mechanical stability￾ The constraint ￾ states that the forces on the sheet sum to zero and ￾ implies that the torques also sum to zero via P i xi yii  wkwk  w wkwk  ￾ There are several theoretical arguments supporting the good generalization per formance of the optimal hyperplane Vapnik and Chervonenkis  Vapnik  cf￾ chapters and ￾ In addition it is computationally attractive since it can be constructed by solving a quadratic programming problem￾ But how can this be generalized to the case of decision functions which unlike ￾ are nonlinear in the data ￾ Feature Spaces and Kernels To construct SV machines the optimal hyperplane algorithm had to be augmented by a method for computing dot products in feature spaces nonlinearly related to input space Aizerman et al￾  Boser et al￾  ￾ The basic idea is to map the Feature Space data into some other dot product space called the feature space F via a nonlinear
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有