正在加载图片...
1 Introduction vantages of these methods are clarified. In the following we explain these data sets Table 1.1 lists the data sets for two-class classification problems 21-23 For each problem the table lists the numbers of inputs, training data, test data, and data sets. Each problem has 100 or 20 training data sets and their corresponding test data sets and is used to compare statistical differences among some classifiers Table 1.1 Benchmark data sets for two-class problems Data Inputs Training data Test data Sets Banana 2 4,900 Breast cancer g Diabetes Flare-solar 9 German Heart 00000000 Ignor 7.000 Thyroid Titan 2.051 Twonorm Waveform 4.600 Pattern classification technology has been applied to DNA microarray data, which provide expression levels of thousands of genes, to classify nerous/non-cancerous patients. Microarray data are characterized by a large number of input variables but a small number of training/test data Thus the classification problems are linearly separable and overfitting occurs quite easily. Therefore, usually, feature selection or extraction is performed to improve generalization ability. Table 1. 2 lists the data sets 24 used in this book. For each problem there is one training data set and one test data set Table 1. 3 shows the data sets for multiclass problems. Each problem has one training data set and the associated test data set The Fisher iris data 32, 33 are widely used for evaluating classification performance of classifiers. They consist of 150 data with four features and three classes: there are 50 data per class. We used the first 25 data of each class as the training data and the remaining 25 data of each class as the test data The numeral data 34 were collected to identify Japanese license plates of running cars. They include numerals, hiragana, and kanji characters. The original image taken from a TV camera was preprocessed and each numeral was transformed into 12 features, such as the number of holes and the cur- vature of a numeral at some point10 1 Introduction vantages of these methods are clarified. In the following we explain these data sets. Table 1.1 lists the data sets for two-class classification problems [21–23] For each problem the table lists the numbers of inputs, training data, test data, and data sets. Each problem has 100 or 20 training data sets and their corresponding test data sets and is used to compare statistical differences among some classifiers. Table 1.1 Benchmark data sets for two-class problems Data Inputs Training data Test data Sets Banana 2 400 4,900 100 Breast cancer 9 200 77 100 Diabetes 8 468 300 100 Flare-solar 9 666 400 100 German 20 700 300 100 Heart 13 170 100 100 Image 18 1,300 1,010 20 Ringnorm 20 400 7,000 100 Splice 60 1,000 2,175 20 Thyroid 5 140 75 100 Titanic 3 150 2,051 100 Twonorm 20 400 7,000 100 Waveform 21 400 4,600 100 Pattern classification technology has been applied to DNA microarray data, which provide expression levels of thousands of genes, to classify cancerous/non-cancerous patients. Microarray data are characterized by a large number of input variables but a small number of training/test data. Thus the classification problems are linearly separable and overfitting occurs quite easily. Therefore, usually, feature selection or extraction is performed to improve generalization ability. Table 1.2 lists the data sets [24] used in this book. For each problem there is one training data set and one test data set. Table 1.3 shows the data sets for multiclass problems. Each problem has one training data set and the associated test data set. The Fisher iris data [32, 33] are widely used for evaluating classification performance of classifiers. They consist of 150 data with four features and three classes; there are 50 data per class. We used the first 25 data of each class as the training data and the remaining 25 data of each class as the test data. The numeral data [34] were collected to identify Japanese license plates of running cars. They include numerals, hiragana, and kanji characters. The original image taken from a TV camera was preprocessed and each numeral was transformed into 12 features, such as the number of holes and the cur￾vature of a numeral at some point
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有