[15]KUBAT M,MATWIN S.Addressing the curse of imbal- [5]李雄飞,李军,董元方,等.一种新的不平衡数据学习算 anced training sets:one-sided selection[C]//Proceedings 法PCBoost[J].计算机学报,2012,35(2):202-209. of the 14th International Conference on Machine Learning. LI Xiongfei,LI Jun,DONG Yuanfang,et al.A new learning San Francisco,USA:Morgan Kaufmann,1997:179-186. algorithm for imbalanced data-PCBoost[J].Chinese journal [16]蒋盛益,苗邦,余雯.基于一趟聚类的不平衡数据下抽从图 1 可看出,随着参与训练数据集比例的增 大,无论是正类分类性能还是整体分类精度,都有所 上升,但是随着数据比例的增大,相应的分类性能提 升幅度有限。 另外,在数据比例为 20%、40% 时,3 种算法相对应的 Fmeasure 和 Gmean 值几乎是线性提升, 这说明过低比例的抽样数据由于损失太大的原始数 据分布信息,会严重影响算法的分类性能。 4 结束语 针对类别不平衡数据分类问题,本文提出了一 种混合数据采样与 Boosting 技术相结合的集成分类 方法。 该方法统筹运用欠采样和过采样,在保持训 练集数据规模一致条件下,灵活调整各类别样本数 量比例,较好地保持原始数据分布,然后采用 Boos⁃ ting 技术进行多次迭代学习,获得更强性能分类器。 实验结果表明,该方法能够有效提高正类样本的分 类性能。 由于数据集本身的多样性和复杂性,诸如类重 叠分布、噪声样本等均会影响不平衡数据性能,如果 进行有针对性的数据预处理工作,将会使得动态平 衡采样的数据分布更加合理,对正类的分类性能将 会进一步提高。 此外,将本文方法应用于多类别不 平衡数据分类,也是今后需要进一步研究的方向。 参考文献: [1]CATENI S, COLLA V, VANNUCCI M. A method for resam⁃ pling imbalanced datasets in binary classification tasks for real⁃world problems[ J]. Neurocomputing, 2014, 135: 32⁃ 41. [2]ZHANG Huaxiang, LI Mingfang. RWO⁃Sampling: a random walk over⁃sampling approach to imbalanced data classifica⁃ tion[J]. Information fusion, 2014, 20: 99⁃116. [3] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over⁃sampling technique [ J]. Journal of artificial intelligence research, 2002, 16 ( 1): 321⁃357. [4]郭丽娟, 倪子伟, 江弋, 等. 集成降采样不平衡数据分类 方法研究[ J]. 计算机科学与探索, 2013, 7 ( 7): 630⁃ 638. GUO Lijuan, NI Ziwei, JIANG Yi, et al. Research on im⁃ balanced data classification based on ensemble and under⁃ sampling[J]. Journal of frontiers of computer and technolo⁃ gy, 2013, 7(7): 630⁃638. [5]李雄飞, 李军, 董元方, 等. 一种新的不平衡数据学习算 法 PCBoost[J]. 计算机学报, 2012, 35(2): 202⁃209. LI Xiongfei, LI Jun, DONG Yuanfang, et al. A new learning algorithm for imbalanced data⁃PCBoost[ J]. Chinese journal of computers, 2012, 35(2): 202⁃209. [6]CHEN Xiaolin, SONG Enming, MA Guangzhi. An adaptive cost⁃sensitive classifier[C] / / Proceedings of the 2nd Inter⁃ national Conference on Computer and Automation Engineer⁃ ing. Singapore: IEEE, 2010, 1: 699⁃701. [7]李倩倩, 刘胥影. 多类类别不平衡学习算法: EasyEnsem⁃ ble. M[J]. 模式识别与人工智能, 2014, 27 ( 2): 187⁃ 192. LI Qianqian, LIU Xuying. EasyEnsemble. M for multiclass imbalance problem[J]. Pattern recognition and artificial in⁃ telligence, 2014, 27(2): 187⁃192. [8]韩敏, 朱新荣. 不平衡数据分类的混合算法[ J]. 控制理 论与应用, 2011, 28(10): 1485⁃1489. HAN Min, ZHU Xinrong. Hybrid algorithm for classification of unbalanced datasets [ J]. Control theory & applications, 2012, 28(10): 1485⁃1489. [9] WANG Shijin, XI Lifeng. Condition monitoring system de⁃ sign with one⁃class and imbalanced⁃data classifier [ C] / / Proceedings of the 16th International Conference on Industri⁃ al Engineering and Engineering Management. Beijing, Chi⁃ na: IEEE, 2009: 779⁃783. [10]叶志飞, 文益民, 吕宝粮. 不平衡分类问题研究综述 [J]. 智能系统学报, 2009, 4(2): 148⁃156. YE Zhifei, WEN Yimin, LV Baoliang. A survey of imbal⁃ anced pattern classification problems [ J]. CAAI transac⁃ tions on intelligent systems, 2009, 4(2): 148⁃156. [11]翟云, 杨炳儒, 曲武. 不平衡类数据挖掘研究综述[ J]. 计算机科学, 2010, 37(10): 27⁃32. ZHAI Yun, YANG Bingyu, QU Wu. Survey of mining im⁃ balanced datasets[J]. Computer science, 2010, 37(10): 27⁃32. [12]HAN Hui, WANG Wenyuan, MAO Binghuan. Borderline⁃ SMOTE: a new over⁃sampling method in imbalanced data sets learning [ C] / / International Conference on Intelligent Computing. Berlin Heidelberg, Germany: Springer, 2005: 878⁃887. [13]HE Haibo, BAI Yang, GARCIA E A, et al. ADASYN: a⁃ daptive synthetic sampling approach for imbalanced learning [C] / / Proceedings of IEEE International Joint Conference on Neural Networks. Hong Kong, China: IEEE, 2008: 1322⁃1328. [14]BATISTA G, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data [ J]. ACM SIGKDD explorations newsletter, 2004, 6(1): 20⁃29. [15] KUBAT M, MATWIN S. Addressing the curse of imbal⁃ anced training sets: one⁃sided selection[C] / / Proceedings of the 14th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1997: 179⁃186. [16]蒋盛益, 苗邦, 余雯. 基于一趟聚类的不平衡数据下抽 ·262· 智 能 系 统 学 报 第 11 卷
