正在加载图片...
122 BURGES Girosi,1997a),and text categorization (Joachims,1997).For the regression estimation case,SVMs have been compared on benchmark time series prediction tests(Muller et al., 1997;Mukherjee,Osuna and Girosi,1997),the Boston housing problem(Drucker et al., 1997),and(on artificial data)on the PET operator inversion problem(Vapnik,Golowich and Smola,1996).In most of these cases,SVM generalization performance (i.e.error rates on test sets)either matches or is significantly better than that of competing methods The use of SVMs for density estimation(Weston et al.,1997)and ANOVA decomposition (Stitson et al.,1997)has also been studied.Regarding extensions,the basic SVMs contain no prior knowledge of the problem(for example,a large class of SVMs for the image recognition problem would give the same results if the pixels were first permuted randomly (with each image suffering the same permutation),an act of vandalism that would leave the best performing neural networks severely handicapped)and much work has been done on incorporating prior knowledge into SVMs(Scholkopf,Burges and Vapnik,1996:Scholkopf et al.,1998a;Burges,1998).Although SVMs have good generalization performance,they can be abysmally slow in test phase,a problem addressed in (Burges,1996;Osuna and Girosi,1998).Recent work has generalized the basic ideas(Smola.Scholkopf and Muller. 1998a;Smola and Scholkopf,1998),shown connections to regularization theory (Smola, Scholkopf and Muller,1998b:Girosi.1998:Wahba.1998).and shown how SVMideas can be incorporated in a wide range of other algorithms(Scholkopf,Smola and Muller,1998b; Scholkopf et al.1998c).The reader may also find the thesis of(Scholkopf.1997)helpful The problem which drove the initial development of SVMs occurs in several guises- the bias variance tradeoff(Geman and Bienenstock,1992).capacity control (Guyon et al.. 1992),overfitting(Montgomery and Peck,1992)-but the basic idea is the same.Roughly speaking,for a given learning task,with a given finite amount of training data,the best generalization performance will be achieved if the right balance is struck between the accuracy attained on that particular training set,and the"capacity"of the machine,that is, the ability of the machine to learn any training set without error.A machine with too much capacity is like a botanist with a photographic memory who,when presented with a new tree,concludes that it is not a tree because it has a different number of leaves from anything she has seen before;a machine with too little capacity is like the botanist's lazy brother, who declares that if it's green,it's a tree.Neither can generalize well.The exploration and formalization of these concepts has resulted in one of the shining peaks of the theory of statistical learning(Vapnik,1979). In the following,bold typeface will indicate vector or matrix quantities;normal typeface will be used for vector and matrix components and for scalars.We will label components of vectors and matrices with Greek indices,and label vectors and matrices themselves with Roman indices.Familiarity with the use of Lagrange multipliers to solve problems with equality or inequality constraints is assumed2 2.A Bound on the Generalization Performance of a Pattern Recognition Learning Machine There is a remarkable family of bounds governing the relation between the capacity of a learning machine and its performance3.The theory grewout ofconsiderations ofunder what circumstances,and how quickly,the mean of some empirical quantity converges uniformly,122 BURGES Girosi, 1997a), and text categorization (Joachims, 1997). For the regression estimation case, SVMs have been compared on benchmark time series prediction tests (M¨uller et al., 1997; Mukherjee, Osuna and Girosi, 1997), the Boston housing problem (Drucker et al., 1997), and (on artificial data) on the PET operator inversion problem (Vapnik, Golowich and Smola, 1996). In most of these cases, SVM generalization performance (i.e. error rates on test sets) either matches or is significantly better than that of competing methods. The use of SVMs for density estimation (Weston et al., 1997) and ANOVA decomposition (Stitson et al., 1997) has also been studied. Regarding extensions, the basic SVMs contain no prior knowledge of the problem (for example, a large class of SVMs for the image recognition problem would give the same results if the pixels were first permuted randomly (with each image suffering the same permutation), an act of vandalism that would leave the best performing neural networks severely handicapped) and much work has been done on incorporating prior knowledge into SVMs (Sch¨olkopf, Burges and Vapnik, 1996; Sch¨olkopf et al., 1998a; Burges, 1998). Although SVMs have good generalization performance, they can be abysmally slow in test phase, a problem addressed in (Burges, 1996; Osuna and Girosi, 1998). Recent work has generalized the basic ideas (Smola, Sch¨olkopf and M¨uller, 1998a; Smola and Sch¨olkopf, 1998), shown connections to regularization theory (Smola, Sch¨olkopf and M¨uller, 1998b; Girosi, 1998; Wahba, 1998), and shown how SVM ideas can be incorporated in a wide range of other algorithms (Sch¨olkopf, Smola and M¨uller, 1998b; Sch¨olkopf et al, 1998c). The reader may also find the thesis of (Sch¨olkopf, 1997) helpful. The problem which drove the initial development of SVMs occurs in several guises - the bias variance tradeoff (Geman and Bienenstock, 1992), capacity control (Guyon et al., 1992), overfitting (Montgomery and Peck, 1992) - but the basic idea is the same. Roughly speaking, for a given learning task, with a given finite amount of training data, the best generalization performance will be achieved if the right balance is struck between the accuracy attained on that particular training set, and the “capacity” of the machine, that is, the ability of the machine to learn any training set without error. A machine with too much capacity is like a botanist with a photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything she has seen before; a machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree. Neither can generalize well. The exploration and formalization of these concepts has resulted in one of the shining peaks of the theory of statistical learning (Vapnik, 1979). In the following, bold typeface will indicate vector or matrix quantities; normal typeface will be used for vector and matrix components and for scalars. We will label components of vectors and matrices with Greek indices, and label vectors and matrices themselves with Roman indices. Familiarity with the use of Lagrange multipliers to solve problems with equality or inequality constraints is assumed2. 2. A Bound on the Generalization Performance of a Pattern Recognition Learning Machine There is a remarkable family of bounds governing the relation between the capacity of a learning machine and its performance3. The theory grew out of considerations of under what circumstances, and how quickly, the mean of some empirical quantity converges uniformly
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有