4 u y Kx x b j j N = − j j = ∑ 1 α ( _中国高校课件下载中心

点击下载：北京大学：《模式识别》课程教学资源（参考资料）Sequential Minimal Optimization - A Fast Algorithm for Training Support Vector Machines

正在加载图片...

立，aK,-b, u= (10) where K is a kernel function that measures the similarity or distance between the input vector X and the stored training vector ,Examples of K include Gaussians,polynomials,and neural network non-linearities [4].If K is linear,then the equation for the linear SVM(1)is recovered The Lagrange multipliers o are still computed via a quadratic program.The non-linearities alter the quadratic form,but the dual objective function Y is still quadratic in o: mna=mn之立yK民，天)a,-立a i=l jl 0≤a,≤C,i, (11) 立ya=0 The QP problem in equation(11),above,is the QP problem that the SMO algorithm will solve. In order to make the QP problem above be positive definite,the kernel function K must obey Mercer's conditions [4]. The Karush-Kuhn-Tucker(KKT)conditions are necessary and sufficient conditions for an optimal point of a positive definite QP problem.The KKT conditions for the QP problem(11) are particularly simple.The QP problem is solved when,for all i: 0,=0台y,4≥1， 0<C,<C台y4,=1, (12) ,=C台y,4≤1. where 4;is the output of the SVM for the ith training example.Notice that the KKT conditions can be evaluated on one example at a time,which will be useful in the construction of the SMO algorithm. 1.2 Previous Methods for Training Support Vector Machines Due to its immense size,the QP problem (11)that arises from SVMs cannot be easily solved via standard QP techniques.The quadratic form in (11)involves a matrix that has a number of elements equal to the square of the number of training examples.This matrix cannot be fit into 128 Megabytes if there are more than 4000 training examples. Vapnik [19]describes a method to solve the SVM QP,which has since been known as "chunking."The chunking algorithm uses the fact that the value of the quadratic form is the same if you remove the rows and columns of the matrix that corresponds to zero Lagrange multipliers. Therefore,the large QP problem can be broken down into a series of smaller QP problems,whose ultimate goal is to identify all of the non-zero Lagrange multipliers and discard all of the zero Lagrange multipliers.At every step,chunking solves a QP problem that consists of the following examples:every non-zero Lagrange multiplier from the last step,and the M worst examples that violate the KKT conditions(12)[4],for some value of M(see figure 2).If there are fewer than M examples that violate the KKT conditions at a step,all of the violating examples are added in. Each QP sub-problem is initialized with the results of the previous sub-problem.At the last step,4 u y Kx x b j j N = − j j = ∑ 1 α ( ,) , r r (10) where K is a kernel function that measures the similarity or distance between the input vector r x and the stored training vector r x j . Examples of K include Gaussians, polynomials, and neural network non-linearities [4]. If K is linear, then the equation for the linear SVM (1) is recovered. The Lagrange multipliers αi are still computed via a quadratic program. The non-linearities alter the quadratic form, but the dual objective function Ψ is still quadratic in α: min ( ) min ( , ) , , , . r r r r r α α α αα α α α Ψ= − ≤ ≤∀ = = = = = ∑ ∑ ∑ ∑ 1 2 1 1 1 1 0 0 yyK x x C i y ij i j i j j N i N i i N i i i i N (11) The QP problem in equation (11), above, is the QP problem that the SMO algorithm will solve. In order to make the QP problem above be positive definite, the kernel function K must obey Mercer’s conditions [4]. The Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient conditions for an optimal point of a positive definite QP problem. The KKT conditions for the QP problem (11) are particularly simple. The QP problem is solved when, for all i: α α α i ii i ii i ii y u C yu C yu =⇔ ≥ < <⇔ = =⇔ ≤ 0 1 0 1 1 , , . (12) where ui is the output of the SVM for the ith training example. Notice that the KKT conditions can be evaluated on one example at a time, which will be useful in the construction of the SMO algorithm. 1.2 Previous Methods for Training Support Vector Machines Due to its immense size, the QP problem (11) that arises from SVMs cannot be easily solved via standard QP techniques. The quadratic form in (11) involves a matrix that has a number of elements equal to the square of the number of training examples. This matrix cannot be fit into 128 Megabytes if there are more than 4000 training examples. Vapnik [19] describes a method to solve the SVM QP, which has since been known as “chunking.” The chunking algorithm uses the fact that the value of the quadratic form is the same if you remove the rows and columns of the matrix that corresponds to zero Lagrange multipliers. Therefore, the large QP problem can be broken down into a series of smaller QP problems, whose ultimate goal is to identify all of the non-zero Lagrange multipliers and discard all of the zero Lagrange multipliers. At every step, chunking solves a QP problem that consists of the following examples: every non-zero Lagrange multiplier from the last step, and the M worst examples that violate the KKT conditions (12) [4], for some value of M (see figure 2). If there are fewer than M examples that violate the KKT conditions at a step, all of the violating examples are added in. Each QP sub-problem is initialized with the results of the previous sub-problem. At the last step

<<向上翻页向下翻页>>

点击下载：北京大学：《模式识别》课程教学资源（参考资料）Sequential Minimal Optimization - A Fast Algorithm for Training Support Vector Machines