第2期 叶志飞,等:不平衡分类问题研究综述 ·155· 1995:558-565 SVM's a case study [C]//Intemational Conference on [6 WEISS GM,H RSH H.A quantitative study of sall dis- Machine Leaming Washington DC,2003:65-71. juncts[C]//Proceedings of the 17th National Conference on [20]ESTABROOKS A,JAPKOW CZ N.A m ixture-of-experts A rtific ial Intelligence Texas:AAA I Press,2000:665- framework for leaming from unbalanced data sets[C]// 670 Proceedings of the 4th Intelligent Data Analysis Confer [7 WEISS GM.Mining with rarity:a unifying framework[J]. ence L isbon,Portugal,2001:34-43 Sigkdd Expbrations,2004,6(1):7-19 [21]AN R,LU Y,JN R,et al On predicting rare classes [8 ]JAPKOW CCZ N,STEPHEN S The class ibalance prob- with SVM ensembles in scene classification [C]//IEEE In- lem:a systematic study[J].Intelligent Data Analysis Jour temational Conference on Acoustics,Speech and Signal nal2002,6(5):429-450 Processing Hong Kong,2003:21-24. [9]ARUNASALAM B,CHAWLA S CCCS:a top down associ- [22 ]LU BL,IIO M.Task decomposition and module combina- ative classifier for m balanced class distribution [C]//Inter tion based on class relations a modular neural netork for national Conference on Knowledge Discovery and Data M in- pattem classification[J].IEEE Transaction on NeuralNet- ing New York:ACM Press,2006:517-522 w0ks,1999,10(5):1244-1256 10]DRUMMOND C,HOLTE R Explicitly representing ex- [23 ]LU BL,WANG KA,UTIYAMA M,et al A part-versus- pected cost an altemative o ROC representation [C]// part method ormassively parallel training of support vecor Proceedings of the 6th ACM SIGKDD Intemational Confer machines C ]//Proceedings of 17th Intemational Joint ence on Knowledge Discovery and Data Mining New Conference on NeuralNetorks Budapest Hungary,2004: York:ACM Press,2000:187-207. 735-740 [11]PROVOST F,FAWCETT T Robust classification for m- [24]YE Z F,LU B L Leaming mbalanced data sets with a precise envirorments[J]Machine Leaming.2001,42 m inmax modular support vector machine [C ]//Proceed- (3):203-231 ings of the 20th Intemational Joint Conference on Neural [12 ]DRUMMOND C.HOLTE R C C4 5,class ibalance, Netorks Orlando,USA,2007:1673-1678 and oost sensitivity:why under-sampling beats over-sam- [25 KOTSANTIS S B,PNTELAS P E Mixture of expert a- pling[C]//Intemational Conference on Machine Leaming gents for handling ibalanced data sets[J ]Annals of Washington DC,2003:152-154. Mathematics,Computing Teleinomatics,2003,1(1): [13 ]L NG C,LIC Data mining or direct marketing problems 46-55 and solutions C ]//Proceedings of the 4th Intemational [26]ESTABROOK A,TAEHO J,JAPKOW ICZ N.A multip le Conference on Knowledge Discovery and Data M ing New resampling method or leaming from mbalanced data sets York:AAA I Press,1998:73-79. [J].Computatonal Intelligence,2004,20(1):18-36 [14]CHAWLA N V.BOW YER K W.HALL L O,et al [27]CHEN C,L AW A,BREMAN L.Using random forest SMOTE:synthetic m inority over-sampling technique [J]. leam mbalanced data [R No 666,Statistics Depart- Joumal ofA rtificial Intelligence Research,2002,16:321- ment,University of Calimia at Berkeley,2004 357 [28]CHAWLA N V,LAZAREV C A,HALL L O,et al [15 ]LEE SS Noisy replication in skewed binary classification SMOTEBoost:mp oving prediction of the m inority class in [J].Computational Statistics and Data Analysis,2000, boosting[C]//Proceedings of 7th European Conference on 34(2):165-191 Princples and Practice of Knowledge Discovery in Data- [16 ]KUBATM,HOTLE R,MA TW N S Leaming when nega- bases CavtatDubrovnik,Croatia,2003:107-119 tive examp les abound [C]//Proceedings of the 9th Europe- [29]L U X Y,WU J X,ZHOU Z H A cascade-based classifi- an Conference on Machine Leaming London:Springer- cation method for class-mbalanced data [J ]Joumal of Verlag.1997:146-153 NanJ ing University:Natural Science,2006,42 (2):148- [17]KUBAT M,MATW N S Addressing the curse of mbal- 155 anced training sets one-sided selection[C]//Poceedings [30 ]ZHOU Z H,LU X Y Training cost-sensitive neural net of the 14th Intemational Conference on Machine Leaming works with methods addressing the class mbalance poblem San Francisco:Morgan Kaufann,1997:179-186. [J].EEE Transaction on Knowledge and Data Engineer [18 ]CHEN X W,GERLACH B,CASASENT D.Pruning sup- mg2006,18(1):63-77 port vectors for mbalanced data classification [C ]//Pro- [31]PAZZAN IM,MERZ C,MURPHY P,et al Reducing ceedings of 18th Intemational Joint Conference on Neural m isc lassificaton costs[C]//Proceedings of the 11 th Inter Netorks Montreal,Quebec,Canada,2005:1883-1887. national Conference on Machine Leaming San Francisco, [19]RASKUTTIB,KOW ALCZYK A Extreme re-balancing for CA,US4,1994:217-225 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.htp://www.cnki.net© 1994-2009 China Academic Journal Electronic Publishing House. All rights reserved. http://www.cnki.net 1995: 5582565. [ 6 ]W EISS GM, H IRSH H. A quantitative study of small dis2 juncts[C ] / /Proceedings of the 17 th National Conference on A rtificial Intelligence. Texas: AAA I Press, 2000: 6652 670. [ 7 ]W EISS GM. M ining with rarity: a unifying framework[J ]. Sigkdd Exp lorations, 2004, 6 (1) : 7219. [ 8 ]JAPKOW ICZ N, STEPHEN S. The class imbalance p rob2 lem: a systematic study[J ]. IntelligentData Analysis Jour2 nal, 2002, 6 (5) : 4292450. [ 9 ]ARUNASALAM B, CHAWLA S. CCCS: a top down associ2 ative classifier for imbalanced class distribution [C ] / / Inter2 national Conference on Knowledge D iscovery and Data M in2 ing. New York: ACM Press, 2006: 5172522. [ 10 ]DRUMMOND C, HOLTE R. Exp licitly rep resenting ex2 pected cost: an alternative to ROC rep resentation [ C ] / / Proceedings of the 6 th ACM SIGKDD International Confer2 ence on Knowledge D iscovery and Data M ining. New York: ACM Press, 2000: 1872207. [ 11 ] PROVOST F, FAWCETT T. Robust classification for im2 p recise environments [ J ]. Machine Learning, 2001, 42 (3) : 2032231. [ 12 ]DRUMMOND C, HOLTE R C. C4. 5, class imbalance, and cost sensitivity: why under2samp ling beats over2sam2 p ling[C ] / / International Conference onMachine Learning. W ashington DC, 2003: 1522154. [ 13 ]L ING C,L I C. Data m ining for direct marketing p roblem s and solutions [ C ] / /Proceedings of the 4 th International Conference on Knowledge D iscovery and Data M ing. New York: AAA I Press, 1998: 73279. [ 14 ] CHAWLA N V, BOW YER K W , HALL L O, et al. SMOTE: synthetic m inority over2samp ling technique [J ]. Journal ofA rtificial Intelligence Research, 2002, 16: 3212 357. [ 15 ]LEE S S. Noisy rep lication in skewed binary classification [J ]. Computational Statistics and Data Analysis, 2000, 34 (2) : 1652191. [ 16 ]KUBATM, HOTLE R,MATW IN S. Learning when nega2 tive examp les abound[C ] / /Proceedings of the 9 th Europe2 an Conference on Machine Learning. London: Sp ringer2 Verlag, 1997: 1462153. [ 17 ] KUBAT M, MATW IN S. Addressing the curse of imbal2 anced training sets: one2sided selection [ C ] / /Proceedings of the 14 th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 1792186. [ 18 ]CHEN X W , GERLACH B, CASASENT D. Pruning sup2 port vectors for imbalanced data classification [ C ] / /Pro2 ceedings of 18 th International Joint Conference on Neural Networks. Montreal,Quebec, Canada, 2005: 188321887. [ 19 ]RASKUTTIB, KOWALCZYK A. Extreme re2balancing for SVM’s: a case study [ C ] / / International Conference on Machine Learning. W ashington DC, 2003: 65271. [ 20 ] ESTABROOKS A, JAPKOW ICZ N. A mixture2of2experts framework for learning from unbalanced data sets[ C ] / / Proceedings of the 4 th Intelligent Data Analysis Confer2 ence. L isbon, Portugal, 2001: 34243. [ 21 ]AN R, L IU Y, J IN R, et al. On p redicting rare classes with SVM ensembles in scene classification[C ] / / IEEE In2 ternational Conference on Acoustics, Speech and Signal Processing. Hong Kong, 2003: 21224. [ 22 ]LU B L, ITO M. Task decomposition and module combina2 tion based on class relations: a modular neural network for pattern classification[J ]. IEEE Transaction on NeuralNet2 works, 1999, 10 (5) : 124421256. [ 23 ]LU B L, WANG KA, UTIYAMA M, et al. A part2versus2 part method formassively parallel training of support vector machines [ C ] / /Proceedings of 17 th International Joint Conference on NeuralNetworks. Budapest, Hungary, 2004: 7352740. [ 24 ] YE Z F , LU B L. Learning imbalanced data sets with a min2max modular support vector machine [ C ] / /Proceed2 ings of the 20 th International Joint Conference on Neural Networks. O rlando, USA, 2007: 167321678. [ 25 ] KOTSIANTIS S B, P INTELAS P E. M ixture of expert a2 gents for handling imbalanced data sets [ J ]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1 (1) : 46255. [ 26 ] ESTABROOK A, TAEHO J, JAPKOW ICZ N. A multip le resamp ling method for learning from imbalanced data sets [J ]. Computational Intelligence, 2004, 20 (1) : 18236. [ 27 ]CHEN C, L IAW A, BREIMAN L. U sing random forest to learn imbalanced data [ R ]. No. 666, Statistics Depart2 ment, University of California atBerkeley, 2004. [ 28 ] CHAWLA N V, LAZAREV IC A, HALL L O, et al. SMOTEBoost: imp roving p rediction of the m inority class in boosting[C ] / /Proceedings of 7 th European Conference on Princip les and Practice of Knowledge D iscovery in Data2 bases. Cavtat2Dubrovnik, Croatia, 2003: 1072119. [ 29 ]L IU X Y, WU J X, ZHOU Z H. A cascade2based classifi2 cation method for class2imbalanced data [ J ]. Journal of NanJing University: Natural Science, 2006 , 42 ( 2) : 1482 155. [ 30 ] ZHOU Z H, L IU X Y. Training cost2sensitive neural net2 works with methods addressing the class imbalance p roblem [J ]. IEEE Transaction on Knowledge and Data Engineer2 ing, 2006, 18 (1) : 63277. [ 31 ] PAZZAN I M, MERZ C, MURPHY P, et al. Reducing m isclassification costs[C ] / /Proceedings of the 11 th Inter2 national Conference on Machine Learning. San Francisco, CA, USA, 1994: 2172225. 第 2期 叶志飞 ,等 :不平衡分类问题研究综述 ·155·