正在加载图片...
Data Mining and Knowledge Discovery,2,121-167(1998) 1998 Kluwer Academic Publishers,Boston.Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C.BURGES burges@lucent.com Bell Laboratories,Lucent Technologies Editor:Usama Fayyad Abstract.The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization We then describe linear Support Vector Machines(SVMs)for separable and non-separable data,working through a non-trivial example in detail.We describe a mechanical analogy,and discuss when SVM solutions are unique and when they are global.We describe how support vector training can be practically implemented,and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data.We show how Support Vector machines can have very large (even infinite)VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels.While very high VC dimension would normally bode ill for generalization performance,and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs,there are several arguments which support the observed high accuracy of SVMs,which we review.Results of some experiments which were inspired by these arguments are also presented.We give numerous examples and proofs of most of the key theorems. There is new material,and I hope that the reader will find that even old material is cast in a fresh light. Keywords:support vector machines,statistical leaming theory,VC dimension,pattern recognition 1.Introduction The purpose of this paper is to provide an introductory yet extensive tutorial on the basic ideas behind Support Vector Machines(SVMs).The books (Vapnik,1995;Vapnik,1998) contain excellent descriptions of SVMs,but they leave room for an account whose purpose from the start is to teach.Although the subject can be said to have started in the late seventies (Vapnik,1979),it is only now receiving increasing attention,and so the time appears suitable for an introductory review.The tutorial dwells entirely on the pattern recognition problem.Many of the ideas there carry directly over to the cases of regression estimation and linear operator inversion,but space constraints precluded the exploration of these topics here. The tutorial contains some new material.All of the proofs are my own versions,where I have placed a strong emphasis on their being both clear and self-contained,to make the material as accessible as possible.This was done at the expense of some elegance and generality:however generality is usually easily added once the basic ideas are clear.The longer proofs are collected in the Appendix. By way of motivation,and to alert the reader to some of the literature,we summarize some recent applications and extensions of support vector machines.For the pattern recog- nition case,SVMs have been used for isolated handwritten digit recognition(Cortes and Vapnik,1995;Scholkopf,Burges and Vapnik,1995;Scholkopf,Burges and Vapnik,1996; Burges and Scholkopf,1997),object recognition(Blanz et al.,1996),speaker identification (Schmidt,1996),charmed quark detection,face detection in images(Osuna,Freund andData Mining and Knowledge Discovery, 2, 121–167 (1998) °c 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES burges@lucent.com Bell Laboratories, Lucent Technologies Editor: Usama Fayyad Abstract. The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light. Keywords: support vector machines, statistical learning theory, VC dimension, pattern recognition 1. Introduction The purpose of this paper is to provide an introductory yet extensive tutorial on the basic ideas behind Support Vector Machines (SVMs). The books (Vapnik, 1995; Vapnik, 1998) contain excellent descriptions of SVMs, but they leave room for an account whose purpose from the start is to teach. Although the subject can be said to have started in the late seventies (Vapnik, 1979), it is only now receiving increasing attention, and so the time appears suitable for an introductory review. The tutorial dwells entirely on the pattern recognition problem. Many of the ideas there carry directly over to the cases of regression estimation and linear operator inversion, but space constraints precluded the exploration of these topics here. The tutorial contains some new material. All of the proofs are my own versions, where I have placed a strong emphasis on their being both clear and self-contained, to make the material as accessible as possible. This was done at the expense of some elegance and generality: however generality is usually easily added once the basic ideas are clear. The longer proofs are collected in the Appendix. By way of motivation, and to alert the reader to some of the literature, we summarize some recent applications and extensions of support vector machines. For the pattern recog￾nition case, SVMs have been used for isolated handwritten digit recognition (Cortes and Vapnik, 1995; Sch¨olkopf, Burges and Vapnik, 1995; Sch¨olkopf, Burges and Vapnik, 1996; Burges and Sch¨olkopf, 1997), object recognition (Blanz et al., 1996), speaker identification (Schmidt, 1996), charmed quark detection1, face detection in images (Osuna, Freund and
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有