2 Support Vector Machines in R define_中国高校课件下载中心

正在加载图片...

2 Support Vector Machines in R defined by a kernel function,i.e.,a function returning the inner product ((x),(x))between the images of two data points z,x'in the feature space.The learning then takes place in the feature space,and the data points only appear inside dot products with other points.This is often referred to as the "kernel trick"(Scholkopf and Smola 2002).More precisely,if a projection重：X→H is used,the dot product(Φ(x),Φ(x)》can be represented by a kernel function k k(c,x)=(④(x),④(x)》, (1) which is computationally simpler than explicitly projecting r and r'into the feature space H. One interesting property of support vector machines and other kernel-based systems is that, once a valid kernel function has been selected,one can practically work in spaces of any dimension without any significant additional computational cost,since feature mapping is never effectively performed.In fact,one does not even need to know which features are being used. Another advantage of SVMs and kernel methods is that one can design and use a kernel for a particular problem that could be applied directly to the data without the need for a feature extraction process.This is particularly important in problems where a lot of structure of the data is lost by the feature extraction process (e.g.,text processing). Training a SVM for classification,regression or novelty detection involves solving a quadratic optimization problem.Using a standard quadratic problem solver for training an SVM would involve solving a big QP problem even for a moderate sized data set,including the computation of an m x m matrix in memory (m number of training points).This would seriously limit the size of problems an SVM could be applied to.To handle this issue,methods like SMO(Platt 1998),chunking (Osuna,Freund,and Girosi 1997)and simple SVM(Vishwanathan,Smola, and Murty 2003)exist that iteratively compute the solution of the SVM and scale O(Nk) where k is between 1 and 2.5 and have a linear space complexity. 2.1.Classification In classification,support vector machines separate the different classes of data by a hyper- plane (w,Φ(x)》+b=0 (2) corresponding to the decision function f(x)=sign(w,Φ(x)》+b) (3) It can be shown that the optimal,in terms of classification performance,hyper-plane (Vapnik 1998)is the one with the maximal margin of separation between the two classes.It can be constructed by solving a constrained quadratic optimization problem whose solution w has an expansion w=iai(ri)in terms of a subset of training patterns that lie on the margin.These training patterns,called support vectors,carry all relevant information about the classification problem.Omitting the details of the calculation,there is just one crucial property of the algorithm that we need to emphasize:both the quadratic programming problem and the final decision function depend only on dot products between patterns.This allows the use of the "kernel trick"and the generalization of this linear algorithm to the nonlinear case.2 Support Vector Machines in R defined by a kernel function, i.e., a function returning the inner product hΦ(x), Φ(x 0 )i between the images of two data points x, x0 in the feature space. The learning then takes place in the feature space, and the data points only appear inside dot products with other points. This is often referred to as the “kernel trick” (Sch¨olkopf and Smola 2002). More precisely, if a projection Φ : X → H is used, the dot product hΦ(x), Φ(x 0 )i can be represented by a kernel function k k(x, x0 ) = hΦ(x), Φ(x 0 )i, (1) which is computationally simpler than explicitly projecting x and x 0 into the feature space H. One interesting property of support vector machines and other kernel-based systems is that, once a valid kernel function has been selected, one can practically work in spaces of any dimension without any significant additional computational cost, since feature mapping is never effectively performed. In fact, one does not even need to know which features are being used. Another advantage of SVMs and kernel methods is that one can design and use a kernel for a particular problem that could be applied directly to the data without the need for a feature extraction process. This is particularly important in problems where a lot of structure of the data is lost by the feature extraction process (e.g., text processing). Training a SVM for classification, regression or novelty detection involves solving a quadratic optimization problem. Using a standard quadratic problem solver for training an SVM would involve solving a big QP problem even for a moderate sized data set, including the computation of an m × m matrix in memory (m number of training points). This would seriously limit the size of problems an SVM could be applied to. To handle this issue, methods like SMO (Platt 1998), chunking (Osuna, Freund, and Girosi 1997) and simple SVM (Vishwanathan, Smola, and Murty 2003) exist that iteratively compute the solution of the SVM and scale O(Nk ) where k is between 1 and 2.5 and have a linear space complexity. 2.1. Classification In classification, support vector machines separate the different classes of data by a hyperplane hw, Φ(x)i + b = 0 (2) corresponding to the decision function f(x) = sign(hw, Φ(x)i + b) (3) It can be shown that the optimal, in terms of classification performance, hyper-plane (Vapnik 1998) is the one with the maximal margin of separation between the two classes. It can be constructed by solving a constrained quadratic optimization problem whose solution w has an expansion w = P i αiΦ(xi) in terms of a subset of training patterns that lie on the margin. These training patterns, called support vectors, carry all relevant information about the classification problem. Omitting the details of the calculation, there is just one crucial property of the algorithm that we need to emphasize: both the quadratic programming problem and the final decision function depend only on dot products between patterns. This allows the use of the “kernel trick” and the generalization of this linear algorithm to the nonlinear case

<<向上翻页向下翻页>>

点击下载：《多元统计分析》课程教学资源（阅读材料）SVM in R