正在加载图片...
plane is a hyperplane.For most typical SVM functional forms,the matrix Q has special properties,so that the objective function is either bowl-shaped(positive definite)or has flat-bottomed troughs(posi- tive semidefinite).but is never saddle- shaped(indefinite).Thus,there is either a unique minimum or a connected set of equivalent minima.An SVM QP has a defi- (c) nite termination (or optimality)condition that describes these minima.We call these Fiqure 12.Three alternative methods for training SVMs:(a)Chunking,(b)Osuna's algorithm,and (c)SMO.There are optimality conditions the Karush-Kuhn- three steps for each method.The horizontal thin line at every step represents the training set,while the thick boxes Tucker(KKT)conditions,and they simply represent the o,being optimized at that step.A given group of three lines corresponds to three training iterations, describe the set of o,that are constrained with the first iteration at the top. minima.3 The values of c,also have an intuitive explanation.There is one o,for each train- move the rows and columns of the matrix Q 12b).Using a constant-size matrix allows ing example.Each o,determines how that correspond to zero o.Therefore,the the training of arbitrarily sized datasets. much each training example influences the large QP problem can break down into a The algorithm in Osuna's paper suggests SVM function.Most of the training exam- series of smaller OP problems,whose ulti- adding one example and deleting one ex- ples do not affect the SVM function,so mate goal is to identify all of the nonzero o ample at every step.Such an algorithm most of the o,are 0. and discard all of the zero o.At every step, converges,although it might not be the Because of its simple form,you might chunking solves a QP problem that consists fastest possible algorithm.In practice,re- expect the solution to the SVM QP problem of the following every nonzero from searchers add and delete multiple examples to be quite simple.Unfortunately,for real- the last step,and the that correspond to according to various unpublished heuris- world problems,the matrix Q can be enor- the Mworst violations of the KKT condi- tics.Typically,these heuristics add KKT mous:it has a dimension equal to the num- tions.for some value of M(see Figure 12a) violators at each step and delete those a ber of training examples.A training set of The size of the OP subproblem tends to that are either 0 or C.Joachims has pub- 60,000 examples will yield a Q matrix with grow with time.At the last step,the chunk- lished an algorithm for adding and deleting 3.6 billion elements.which cannot easily fit ing approach has identified the entire set of examples from the OP steps,which rapidly into the memory of a standard computer. nonzero o;hence,the last step solves the decreases the objective function.5 We have at least two different ways of overall OP problem. All of these decomposition methods solving such gigantic QP problems.First. Chunking reduces the Q matrix's dimen- require a numerical QP package.Such there are QP methods that use sophisticated sion from the number of training examples packages might be expensive for commer- data structures.4 These QP methods do not to approximately the number of nonzero cial users(see the"Where to get the pro- require the storage of the entire Qmatrix, However,chunking still might not handle grams"section).Writing your own efficient because they do not need to access the rows large-scale training problems,because even QP package is difficult without a numeri- or columns of Q that correspond to those o this reduced matrix might not fit into mem- cal-analysis background that are at 0 or at C.Deep in the inner loop, ory.Of course,we can combine chunking these methods only perform dot products with the sophisticated OP methods des- Sequential minimal optimization between rows or columns of Q and a vec- cribed above,which do not require full Sequential minimal optimization is an tor,rather than performing an entire ma- storage of a matrix. alternative method that can decompose the trix-vector multiplication. In 1997,Edgar Osuna and his colleagues SVM OP problem without any extra matrix suggested a new strategy for solving the storage and without using numerical QP Decomposing the OP problem SVM QP problem.5 Osuna showed that the optimization steps.3,7SMO decomposes The other method for attacking the large- large QP problem can be broken down into the overall QP problem into QP subprob- scale SVM QP problem is to decompose a series of smaller QP subproblems.As long lems,identically to Osuna's method.Un- the large OP problem into a series of as at least one d,that violates the KKT con- like the previous decomposition heuristics. smaller QP problems.Thus,the selection ditions is added to the previous subproblem. SMO chooses to solve the smallest possible of submatrices of Q happens outside of the each step reduces the objective function and optimization problem at every step.For the OP package,rather than inside.Conse- maintains all of the constraints.Therefore,a standard SVM OP problem,the smallest quently,the decomposition method is com- sequence of QP subproblems that always possible optimization problem involves patible with standard QP packages. add at least one KKT violator will asymp- two elements of o because the a,must Vapnik first suggested the decomposition totically converge. obey one linear equality constraint.At approach in a method that has since been Osuna suggests keeping a constant size every step,SMO chooses two a to jointly known as chunking.The chunking algo- matrix for every QP subproblem,which optimize,finds the optimal values for these rithm exploits the fact that the value of the implies adding and deleting the same num- and updates the SVM to reflect the new objective function is the same if you re- ber of examples at every step(see Figure optimal values(see Figure 12c). JULY/AUGUST 1998 27JULY/AUGUST 1998 27 plane is a hyperplane. For most typical SVM functional forms, the matrix Q has special properties, so that the objective function is either bowl-shaped (positive definite) or has flat-bottomed troughs (posi￾tive semidefinite), but is never saddle￾shaped (indefinite). Thus, there is either a unique minimum or a connected set of equivalent minima. An SVM QP has a defi￾nite termination (or optimality) condition that describes these minima. We call these optimality conditions the Karush-Kuhn￾Tucker (KKT) conditions, and they simply describe the set of αi that are constrained minima.3 The values of αi also have an intuitive explanation. There is one αi for each train￾ing example. Each αi determines how much each training example influences the SVM function. Most of the training exam￾ples do not affect the SVM function, so most of the αi are 0. Because of its simple form, you might expect the solution to the SVM QP problem to be quite simple. Unfortunately, for real￾world problems, the matrix Q can be enor￾mous: it has a dimension equal to the num￾ber of training examples. A training set of 60,000 examples will yield a Q matrix with 3.6 billion elements, which cannot easily fit into the memory of a standard computer. We have at least two different ways of solving such gigantic QP problems. First, there are QP methods that use sophisticated data structures.4 These QP methods do not require the storage of the entire Q matrix, because they do not need to access the rows or columns of Q that correspond to those αi that are at 0 or at C. Deep in the inner loop, these methods only perform dot products between rows or columns of Q and a vec￾tor, rather than performing an entire ma￾trix-vector multiplication. Decomposing the QP problem The other method for attacking the large￾scale SVM QP problem is to decompose the large QP problem into a series of smaller QP problems. Thus, the selection of submatrices of Q happens outside of the QP package, rather than inside. Conse￾quently, the decomposition method is com￾patible with standard QP packages. Vapnik first suggested the decomposition approach in a method that has since been known as chunking. 1 The chunking algo￾rithm exploits the fact that the value of the objective function is the same if you re￾move the rows and columns of the matrix Q that correspond to zero αi . Therefore, the large QP problem can break down into a series of smaller QP problems, whose ulti￾mate goal is to identify all of the nonzero αi and discard all of the zero αi . At every step, chunking solves a QP problem that consists of the following αi : every nonzero αi from the last step, and the αi that correspond to the M worst violations of the KKT condi￾tions, for some value of M (see Figure 12a). The size of the QP subproblem tends to grow with time. At the last step, the chunk￾ing approach has identified the entire set of nonzero αi ; hence, the last step solves the overall QP problem. Chunking reduces the Q matrix’s dimen￾sion from the number of training examples to approximately the number of nonzero αi . However, chunking still might not handle large-scale training problems, because even this reduced matrix might not fit into mem￾ory. Of course, we can combine chunking with the sophisticated QP methods des￾cribed above, which do not require full storage of a matrix. In 1997, Edgar Osuna and his colleagues suggested a new strategy for solving the SVM QP problem.5 Osuna showed that the large QP problem can be broken down into a series of smaller QP subproblems. As long as at least one αi that violates the KKT con￾ditions is added to the previous subproblem, each step reduces the objective function and maintains all of the constraints. Therefore, a sequence of QP subproblems that always add at least one KKT violator will asymp￾totically converge. Osuna suggests keeping a constant size matrix for every QP subproblem, which implies adding and deleting the same num￾ber of examples at every step5 (see Figure 12b). Using a constant-size matrix allows the training of arbitrarily sized datasets. The algorithm in Osuna’s paper suggests adding one example and deleting one ex￾ample at every step. Such an algorithm converges, although it might not be the fastest possible algorithm. In practice, re￾searchers add and delete multiple examples according to various unpublished heuris￾tics. Typically, these heuristics add KKT violators at each step and delete those αi that are either 0 or C. Joachims has pub￾lished an algorithm for adding and deleting examples from the QP steps, which rapidly decreases the objective function.6 All of these decomposition methods require a numerical QP package. Such packages might be expensive for commer￾cial users (see the “Where to get the pro￾grams” section). Writing your own efficient QP package is difficult without a numeri￾cal-analysis background. Sequential minimal optimization Sequential minimal optimization is an alternative method that can decompose the SVM QP problem without any extra matrix storage and without using numerical QP optimization steps.3,7 SMO decomposes the overall QP problem into QP subprob￾lems, identically to Osuna’s method. Un￾like the previous decomposition heuristics, SMO chooses to solve the smallest possible optimization problem at every step. For the standard SVM QP problem, the smallest possible optimization problem involves two elements of αi , because the αi must obey one linear equality constraint. At every step, SMO chooses two αi to jointly optimize, finds the optimal values for these αi , and updates the SVM to reflect the new optimal values (see Figure 12c). (a) (b) (c) Figure 12. Three alternative methods for training SVMs: (a) Chunking, (b) Osuna’s algorithm, and (c) SMO. There are three steps for each method. The horizontal thin line at every step represents the training set, while the thick boxes represent the αibeing optimized at that step. A given group of three lines corresponds to three training iterations, with the first iteration at the top.
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有