SUPPORT VECTOR MACHINES 129 -b |w| w _中国高校课件下载中心

点击下载：北京大学：《模式识别》课程教学资源（参考资料）A Tutorial on Support Vector Machines for Pattern Recognition

正在加载图片...

SUPPORT VECTOR MACHINES 129 ● H @ Margin Figure 5.Linear separating hyperplanes for the separable case.The support vectors are circled. x:·w+b≥+1for=+1 (10) x;·w+b≤-1fori=-1 (11) These can be combined into one set of inequalities: (x·w+b)-1≥0i (12) Now consider the points for which the equality in Eq.(10)holds(requiring that there exists such a point is equivalent to choosing a scale for w and b).These points lie on the hyperplane H1:xi.w+6=1 with normal w and perpendicular distance from the origin 1-b/wll.Similarly,the points for which the equality in Eq.(11)holds lie on the hyperplane H2:xiw+b=-1,with normal again w,and perpendicular distance from the origin-1-bl/llwll.Hence d=d=1/wll and the margin is simply 2/wll. Note that Hi and H2 are parallel(they have the same normal)and that no training points fall between them.Thus we can find the pair of hyperplanes which gives the maximum margin by minimizing w2,subject to constraints(12). Thus we expect the solution for a typical two dimensional case to have the form shown in Figure 5.Those training points for which the equality in Eq.(12)holds(i.e.those which wind up lying on one of the hyperplanes H1,H2),and whose removal would change the solution found,are called support vectors;they are indicated in Figure 5 by the extra circles. We will now switch to a Lagrangian formulation of the problem.There are two reasons for doing this.The first is that the constraints(12)will be replaced by constraints on the Lagrange multipliers themselves,which will be much easier to handle.The second is that in this reformulation of the problem,the training data will only appear(in the actual training and test algorithms)in the form of dot products between vectors.This is a crucial property which will allow us to generalize the procedure to the nonlinear case(Section 4). Thus,we introduce positive Lagrange multipliers o,i=1,...,l,one for each of the inequality constraints(12).Recall that the rule is that for constraints of the form ci>0,the constraint equations are multiplied by positive Lagrange multipliers and subtracted fromSUPPORT VECTOR MACHINES 129 -b |w| w Origin Margin H1 H2 Figure 5. Linear separating hyperplanes for the separable case. The support vectors are circled. xi · w + b ≥ +1 for yi = +1 (10) xi · w + b ≤ −1 for yi = −1 (11) These can be combined into one set of inequalities: yi(xi · w + b) − 1 ≥ 0 ∀i (12) Now consider the points for which the equality in Eq. (10) holds (requiring that there exists such a point is equivalent to choosing a scale for w and b). These points lie on the hyperplane H1 : xi · w + b = 1 with normal w and perpendicular distance from the origin |1 − b|/kwk. Similarly, the points for which the equality in Eq. (11) holds lie on the hyperplane H2 : xi · w + b = −1, with normal again w, and perpendicular distance from the origin | − 1 − b|/kwk. Hence d+ = d− = 1/kwk and the margin is simply 2/kwk. Note that H1 and H2 are parallel (they have the same normal) and that no training points fall between them. Thus we can find the pair of hyperplanes which gives the maximum margin by minimizing kwk2, subject to constraints (12). Thus we expect the solution for a typical two dimensional case to have the form shown in Figure 5. Those training points for which the equality in Eq. (12) holds (i.e. those which wind up lying on one of the hyperplanes H1, H2), and whose removal would change the solution found, are called support vectors; they are indicated in Figure 5 by the extra circles. We will now switch to a Lagrangian formulation of the problem. There are two reasons for doing this. The first is that the constraints (12) will be replaced by constraints on the Lagrange multipliers themselves, which will be much easier to handle. The second is that in this reformulation of the problem, the training data will only appear (in the actual training and test algorithms) in the form of dot products between vectors. This is a crucial property which will allow us to generalize the procedure to the nonlinear case (Section 4). Thus, we introduce positive Lagrange multipliers αi, i = 1, ··· , l, one for each of the inequality constraints (12). Recall that the rule is that for constraints of the form ci ≥ 0, the constraint equations are multiplied by positive Lagrange multipliers and subtracted from

<<向上翻页向下翻页>>

点击下载：北京大学：《模式识别》课程教学资源（参考资料）A Tutorial on Support Vector Machines for Pattern Recognition