456 IEEE TRANSACTIONS ON SYSTEMS, MAN_中国高校课件下载中心

点击下载：《模式识别》课程教学资源（书籍文献）Neural Networks for Classification - A Survey

正在加载图片...

456 IEEE TRANSACTIONS ON SYSTEMS,MAN,AND CYBERNETICS-PART C:APPLICATIONS AND REVIEWS,VOL.30,NO.4,NOVEMBER 2000 There are many different ways of combining individual on previous constructed ensemble's performances with more classifiers [84],[192].The most popular approach to com- weights giving to those cases mostly likely to be misclassified. bining multiple classifiers is via simple average of outputs from Breiman [27]shows that both bagging and arcing can reduce individual classifiers.But combining can also be done with bias but the reduction in variance with these approaches is weighted averaging that treats the contribution or accuracy of much larger. component classifiers differently [68],[67],[84].Nonlinear Although much effort has been devoted in combining combining methods such as Dempster-Shafer belief-based method,several issues remain or have not completely solved. methods [141],[192],rank-based information [1],voting These include the choice of individual classifiers included in schemes [17],and order statistics [173]have been proposed. the ensemble,the size of the ensemble,and the general optimal Wolpert [189]proposes to use two (or more)levels of stacked way to combine individual classifiers.The issue about under networks to improve generalization performance of neural what conditions combining is most effective and what methods network classifiers.The first level networks include a variety should be included is still not completely solved.Combining of neural models trained with leave-one-out cross validation neural classifiers with traditional methods can be a fruitful samples.The outputs from these networks are then used as research area.Since ensemble methods are very effective when inputs to the second level of networks that provide smoothed individual classifiers are negatively related [85]or uncorrelated transformation into the predicted output. [1291.there is a need to develop efficient classifier selection The error reduction of ensemble method is mainly due to schemes to make best use of the advantage of combining. the reduction of the model variance rather than the model bias. Since the ensemble method works better if different classifiers IV.FEATURE VARIABLE SELECTION in the ensemble disagree each other strongly [95],[111],[129], [1411.some of the models in the ensemble could be highly bi- Selection of a set of appropriate input feature variables is an ased.However,the averaging effect may offset the bias and more important issue in building neural as well as other classifiers. importantly decrease the sensitivity of the classifier to the new The purpose of feature variable selection is to find the smallest data.It has been observed [59]that it is generally more desirable set of features that can result in satisfactory predictive perfor- to have an error rate estimator with small variance than an un- mance.Because of the curse of dimensionality [38,it is often biased one with large variance.Empirically a number of studies necessary and beneficial to limit the number of input features in [41],[93]find that the prediction error reduction of ensemble a classifier in order to have a good predictive and less compu- method is mostly accounted for by the reduction in variance. tationally intensive model.Out-of-sample performance can be Although in general,classifier combination can improve gen- improved by using only a small subset of the entire set of vari- eralization performance,correlation among individual classi- ables available.The issue is also closely related to the principle fiers can be harmful to the neural network ensemble [69],[129], of parsimony in model building as well as the model learning [172].Sharkey and Sharkey [154]discuss the need and benefits and generalization discussed in Section III. of ensemble diversity among the members of an ensemble for Numerous statistical feature selection criteria and search al- generalization.Rogova [141]finds that the better performance gorithms have been developed in the pattern recognition liter- of a combined classifier is not necessarily achieved by com- ature 38,[52.Some of these statistical feature selection ap- bining classifiers with better individual performance.Instead, proaches can not be directly applied to neural classifiers due it is more important to have independent classifiers in the en- to nonparametric nature of neural networks.Recently there are semble.His conclusion is in line with that of Perron and Cooper increasing interests in developing feature variable selection or [129]and Krogh and Vedelsby [95]that ensemble classifiers dimension reduction approaches for neural network classifiers. can perform better if individual classifiers considerably disagree Most of the methods are heuristic in nature.Some are proposed with each other. based on the ideas similar to their statistical counterparts.It is One of the ways to reduce correlation among component found under certain circumstances that the performance of a classifiers is to build the ensemble model using different feature neural classifier can be improved by using statistically indepen- variables.In general,classifiers based on different feature dent features [49]. variables are more independent than those based on different One of the most popular methods in feature selection is the architectures with the same feature variables [73],192. principle component analysis(PCA).Principle component anal- Another effective method is training with different data sets. ysis is a statistical technique to reduce dimension without loss of Statistical resampling techniques such as bootstrapping [45]are the intrinsic information contained in the original data.As such. often used to generate multiple samples from original training it is often used as a pre-processing method in neural network data.Two recently developed ensemble methods based on training.One problem with PCA is that it is a kind of unsuper- bootstrap samples are "bagging"[26]and "arcing"classifiers vised learning procedure and does not consider the correlation [27].Bagging (for bootstrap aggregation and combining)and between target outputs and input features.In addition,PCA is arcing (for adaptive resampling and combining)are similar a linear dimension reduction technique.It is not appropriate for methods in that both combine multiple classifiers constructed complex problems with nonlinear correlation structures. from bootstrap samples and vote for classes.The bagging The linear limitation of the PCA can be overcome by directly classifier generates simple bootstrap samples and combines using neural networks to perform dimension reduction.It has by simple majority voting while arcing uses an adaptive been shown that neural networks are able to perform certain bootstrapping scheme which selects bootstrap samples based nonlinear PCA [70],[125],[147].Karhunen and Joutsensalo456 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 30, NO. 4, NOVEMBER 2000 There are many different ways of combining individual classifiers [84], [192]. The most popular approach to combining multiple classifiers is via simple average of outputs from individual classifiers. But combining can also be done with weighted averaging that treats the contribution or accuracy of component classifiers differently [68], [67], [84]. Nonlinear combining methods such as Dempster–Shafer belief-based methods [141], [192], rank-based information [1], voting schemes [17], and order statistics [173] have been proposed. Wolpert [189] proposes to use two (or more) levels of stacked networks to improve generalization performance of neural network classifiers. The first level networks include a variety of neural models trained with leave-one-out cross validation samples. The outputs from these networks are then used as inputs to the second level of networks that provide smoothed transformation into the predicted output. The error reduction of ensemble method is mainly due to the reduction of the model variance rather than the model bias. Since the ensemble method works better if different classifiers in the ensemble disagree each other strongly [95], [111], [129], [141], some of the models in the ensemble could be highly biased. However, the averaging effect may offset the bias and more importantly decrease the sensitivity of the classifier to the new data. It has been observed [59] that it is generally more desirable to have an error rate estimator with small variance than an unbiased one with large variance. Empirically a number of studies [41], [93] find that the prediction error reduction of ensemble method is mostly accounted for by the reduction in variance. Although in general, classifier combination can improve generalization performance, correlation among individual classifiers can be harmful to the neural network ensemble [69], [129], [172]. Sharkey and Sharkey [154] discuss the need and benefits of ensemble diversity among the members of an ensemble for generalization. Rogova [141] finds that the better performance of a combined classifier is not necessarily achieved by combining classifiers with better individual performance. Instead, it is more important to have independent classifiers in the ensemble. His conclusion is in line with that of Perron and Cooper [129] and Krogh and Vedelsby [95] that ensemble classifiers can perform better if individual classifiers considerably disagree with each other. One of the ways to reduce correlation among component classifiers is to build the ensemble model using different feature variables. In general, classifiers based on different feature variables are more independent than those based on different architectures with the same feature variables [73], [192]. Another effective method is training with different data sets. Statistical resampling techniques such as bootstrapping [45] are often used to generate multiple samples from original training data. Two recently developed ensemble methods based on bootstrap samples are “bagging” [26] and “arcing” classifiers [27]. Bagging (for bootstrap aggregation and combining) and arcing (for adaptive resampling and combining) are similar methods in that both combine multiple classifiers constructed from bootstrap samples and vote for classes. The bagging classifier generates simple bootstrap samples and combines by simple majority voting while arcing uses an adaptive bootstrapping scheme which selects bootstrap samples based on previous constructed ensemble’s performances with more weights giving to those cases mostly likely to be misclassified. Breiman [27] shows that both bagging and arcing can reduce bias but the reduction in variance with these approaches is much larger. Although much effort has been devoted in combining method, several issues remain or have not completely solved. These include the choice of individual classifiers included in the ensemble, the size of the ensemble, and the general optimal way to combine individual classifiers. The issue about under what conditions combining is most effective and what methods should be included is still not completely solved. Combining neural classifiers with traditional methods can be a fruitful research area. Since ensemble methods are very effective when individual classifiers are negatively related [85] or uncorrelated [129], there is a need to develop efficient classifier selection schemes to make best use of the advantage of combining. IV. FEATURE VARIABLE SELECTION Selection of a set of appropriate input feature variables is an important issue in building neural as well as other classifiers. The purpose of feature variable selection is to find the smallest set of features that can result in satisfactory predictive performance. Because of the curse of dimensionality [38], it is often necessary and beneficial to limit the number of input features in a classifier in order to have a good predictive and less computationally intensive model. Out-of-sample performance can be improved by using only a small subset of the entire set of variables available. The issue is also closely related to the principle of parsimony in model building as well as the model learning and generalization discussed in Section III. Numerous statistical feature selection criteria and search algorithms have been developed in the pattern recognition literature [38], [52]. Some of these statistical feature selection approaches can not be directly applied to neural classifiers due to nonparametric nature of neural networks. Recently there are increasing interests in developing feature variable selection or dimension reduction approaches for neural network classifiers. Most of the methods are heuristic in nature. Some are proposed based on the ideas similar to their statistical counterparts. It is found under certain circumstances that the performance of a neural classifier can be improved by using statistically independent features [49]. One of the most popular methods in feature selection is the principle component analysis (PCA). Principle component analysis is a statistical technique to reduce dimension without loss of the intrinsic information contained in the original data. As such, it is often used as a pre-processing method in neural network training. One problem with PCA is that it is a kind of unsupervised learning procedure and does not consider the correlation between target outputs and input features. In addition, PCA is a linear dimension reduction technique. It is not appropriate for complex problems with nonlinear correlation structures. The linear limitation of the PCA can be overcome by directly using neural networks to perform dimension reduction. It has been shown that neural networks are able to perform certain nonlinear PCA [70], [125], [147]. Karhunen and Joutsensalo

<<向上翻页向下翻页>>

点击下载：《模式识别》课程教学资源（书籍文献）Neural Networks for Classification - A Survey