3 where xi is the ith training exampl_中国高校课件下载中心

点击下载：北京大学：《模式识别》课程教学资源（参考资料）Sequential Minimal Optimization - A Fast Algorithm for Training Support Vector Machines

正在加载图片...

where x;is the ith training example,and y;is the correct output of the SVM for the ith training example.The value y,is+1 for the positive examples in a class and-1 for the negative examples. Using a Lagrangian,this optimization problem can be converted into a dual form which is a QP problem where the objective function is solely dependent on a set of Lagrange multipliers a, mna=m22(-，a,-立a (4) (where N is the number of training examples),subject to the inequality constraints, a,≥0，i, (5) and one linear equality constraint, y,0,=0 (6) There is a one-to-one relationship between each Lagrange multiplier and each training example. Once the Lagrange multipliers are determined,the normal vector w and the threshold b can be derived from the Lagrange multipliers: p=∑ya,,b=币-元-y.for some>0. (7) Because w can be computed via equation(7)from the training data before use,the amount of computation required to evaluate a linear SVM is constant in the number of non-zero support vectors. Of course,not all data sets are linearly separable.There may be no hyperplane that splits the positive examples from the negative examples.In the formulation above,the non-separable case would correspond to an infinite solution.However,in 1995,Cortes Vapnik [7]suggested a modification to the original optimization statement(3)which allows,but penalizes,the failure of an example to reach the correct margin.That modification is: mi四f+c25,subject to(--b≥1-5u, (8) where are slack variables that permit margin failure and C is a parameter which trades off wide margin with a small number of margin failures.When this new optimization problem is transformed into the dual form,it simply changes the constraint(5)into a box constraint: 0≤u,≤C,i. (9) The variables do not appear in the dual formulation at all. SVMs can be even further generalized to non-linear classifiers [2].The output of a non-linear SVM is explicitly computed from the Lagrange multipliers: 23 where xi is the ith training example, and yi is the correct output of the SVM for the ith training example. The value yi is +1 for the positive examples in a class and –1 for the negative examples. Using a Lagrangian, this optimization problem can be converted into a dual form which is a QP problem where the objective function Ψ is solely dependent on a set of Lagrange multipliers αi, min ( ) min ( ) , r r r r r α α Ψ= ⋅ − α αα α = = = ∑ ∑ ∑ 1 2 1 1 1 yy x x i j N i N j i j ij i i N (4) (where N is the number of training examples), subject to the inequality constraints, α i ≥ 0, , ∀i (5) and one linear equality constraint, yi i N i = ∑ = 1 α 0. (6) There is a one-to-one relationship between each Lagrange multiplier and each training example. Once the Lagrange multipliers are determined, the normal vector r w and the threshold b can be derived from the Lagrange multipliers: r r rr w y x b wx y i i N = =⋅ − > ii k k = ∑ 1 α α , . for some 0 k (7) Because r w can be computed via equation (7) from the training data before use, the amount of computation required to evaluate a linear SVM is constant in the number of non-zero support vectors. Of course, not all data sets are linearly separable. There may be no hyperplane that splits the positive examples from the negative examples. In the formulation above, the non-separable case would correspond to an infinite solution. However, in 1995, Cortes & Vapnik [7] suggested a modification to the original optimization statement (3) which allows, but penalizes, the failure of an example to reach the correct margin. That modification is: min || || ( ) , , , , r r r rr w b i i N w C y wx b i ii i ξ ξ ξ 1 2 2 1 + ⋅ − ≥− ∀ 1 = ∑ subject to (8) where ξi are slack variables that permit margin failure and C is a parameter which trades off wide margin with a small number of margin failures. When this new optimization problem is transformed into the dual form, it simply changes the constraint (5) into a box constraint: 0 ≤≤∀ α i C, .i (9) The variables ξi do not appear in the dual formulation at all. SVMs can be even further generalized to non-linear classifiers [2]. The output of a non-linear SVM is explicitly computed from the Lagrange multipliers:

<<向上翻页向下翻页>>

点击下载：北京大学：《模式识别》课程教学资源（参考资料）Sequential Minimal Optimization - A Fast Algorithm for Training Support Vector Machines