正在加载图片...
第5期 冯骥,等:基于自然邻居邻域图的无参数离群检测算法 ·1005· 总结上述两个人工数据集的实验结果可以发 tions,2011,38(8:9587-9596 现:WNaNG算法的离群检测结果不需要邻域参 [7]CAMPELLO R J G B,MOULAVI D,ZIMEK A,et al. 数,因此不存在邻域选择影响算法效率的问题; Hierarchical density estimates for data clustering,visualiz- 算法在两个数据集中均表现较为稳定,对不同的 ation,and outlier detection[J].ACM transactions on know- 数据集均能获得较好的效果。NS算法需要邻域 ledge discovery from data,2015,10(1):5. 参数,虽然其对参数的容忍度较高,但从本实验 [8]苟和平,景永霞,冯百明,等.基于DBSCAN聚类的改进 中依然可以看到参数取值的最差情况和最好情况 KNN文本分类算法[).科学技术与工程,2013,13(1): 219-222 所对应的检测结果差距较大。其余3个算法在不 GOU Heping,JING Yongxia,FENG Baiming,et al.An 同数据集、不同参数的情况下表现出了较大的波 improved KNN text categorization algorithm based on DB- 动,且针对不同数据集参数的最优取值之间没有 SCAN[J].Science technology and engineering,2013, 规律,需要根据具体问题独立尝试。 13(1):219-222. 4结束语 [9]周芳芳,高飞,刘勇刚,等.基于密度距离图的交互式体 数据分类方法.软件学报,2016,27(5):1061-1073 针对离群检测中邻域参数、离群点总数参 ZHOU Fangfang,GAO Fei,LIU Yonggang,et al.Interact- 数以及局部离群点等问题,本文结合自然邻居 ive volume data classification based on density-distance 思想提出了一种自适应的离群检测算法WN- graph[J].Journal of software,2016,27(5):1061-1073. aNG。该算法在不同的数据集中运行时无需人 [10]周国兵,吴建鑫,周嵩.一种基于近邻表示的聚类方法 为设置邻域参数,并能够根据数据集自身的分 [).软件学报,2015,26(11):2847-2855. 布特征获得令人满意的检测结果。另外,WN ZHOU Guobing,WU Jianxin,ZHOU Song.Clustering method based on nearest neighbors representation[]. aNG能够更为准确地挖掘出局部离群点和全局 Journal of software,2015,26(11):2847-2855 离群点并予以区分,这也为离群点解释、释义空 [11]王习特,申德荣,白梅,等.BOD:一种高效的分布式离 间的构建等数据挖掘的后续步骤提供了强有力 群点检测算法).计算机学报,2016,39(1):36-51 的支持。 WANG Xite,SHEN Derong,BAI Mei,et al.BOD:an ef- 参考文献: ficient algorithm for distributed outlier detection[J]. Chinese journal of computers,2016,39(1):36-51. [1]BOLTON R J.HAND D J.Statistical fraud detection:a re- [12]陆海青,葛洪伟.自适应灰度加权的鲁棒模糊C均值图 view[J].Statistical science,2002,17(3):235-255. 像分割).智能系统学报,2018,13(4)584-593. [2]DENG Hongmei.XU R.Model selection for anomaly de- LU Haiqing,GE Hongwei.Adaptive gray-weighted ro- tection in wireless ad hoc networks[Cl//Proceedings of bust fuzzy C-means algorithm for image segmentation[J]. 2007 IEEE Symposium on Computational Intelligence and CAAI transactions on intelligent systems,2018,13(4): Data Mining.Honolulu,USA,2007:540-546. 584593. [3]DURAN O,PETROU M.A Time-efficient method for an- [13]赵冠哲,齐建鹏,于彦伟,等.移动社交网络异常签到在 omaly detection in hyperspectral images[J].IEEE Transac- 线检测算法[U.智能系统学报,2017,12(5):752-759, tions on geoscience and remote sensing,2007,45(12): ZHAO Guanzhe,QI Jianpeng,YU Yanwei,et al.Online 38943904. check-in outlier detection method in mobile social net- [4]PODGORELEC V,HERICKO M,ROZMAN I.Improv- works[J].CAAI transactions on intelligent systems,2017. ing mining of medical data by outliers prediction[C]//Pro- 12(5):752-759 ceedings of the 18th IEEE Symposium on Computer-Based [14]张美琴,白亮,王俊斌.基于加权聚类集成的标签传播 Medical Systems.Dublin,Ireland,2005:91-96. 算法[.智能系统学报,2018,13(6):994-998. [5]NASI J.SORSA A.LEIVISKA K.Sensor validation and ZHANG Meiqin,BAI Liang,WANG Junbin.Label outlier detection using fuzzy limits[C]//Proceedings of the propagation algorithm based on weighted clustering en- 44th IEEE Conference on Decision and Control.Seville, semble[J].CAAI transactions on intelligent systems. Spain,2005:7828-7833 2018.13(6):994-998 [6]KIM S,CHO N W,KANG B,et al.Fast outlier detection [15]HA J,SEOK S,LEE J S.Robust outlier detection using for very large log data[J].Expert systems with applica- the instability factor[J].Knowledge-based systems,2014,总结上述两个人工数据集的实验结果可以发 现:WNaNG 算法的离群检测结果不需要邻域参 数,因此不存在邻域选择影响算法效率的问题; 算法在两个数据集中均表现较为稳定,对不同的 数据集均能获得较好的效果。INS 算法需要邻域 参数,虽然其对参数的容忍度较高,但从本实验 中依然可以看到参数取值的最差情况和最好情况 所对应的检测结果差距较大。其余 3 个算法在不 同数据集、不同参数的情况下表现出了较大的波 动,且针对不同数据集参数的最优取值之间没有 规律,需要根据具体问题独立尝试。 4 结束语 针对离群检测中邻域参数、离群点总数参 数以及局部离群点等问题,本文结合自然邻居 思想提出了一种自适应的离群检测算法 WN￾aNG。该算法在不同的数据集中运行时无需人 为设置邻域参数,并能够根据数据集自身的分 布特征获得令人满意的检测结果。另外,WN￾aNG 能够更为准确地挖掘出局部离群点和全局 离群点并予以区分,这也为离群点解释、释义空 间的构建等数据挖掘的后续步骤提供了强有力 的支持。 参考文献: BOLTON R J, HAND D J. Statistical fraud detection: a re￾view[J]. Statistical science, 2002, 17(3): 235–255. [1] DENG Hongmei, XU R. Model selection for anomaly de￾tection in wireless ad hoc networks[C]//Proceedings of 2007 IEEE Symposium on Computational Intelligence and Data Mining. Honolulu, USA, 2007: 540–546. [2] DURAN O, PETROU M. A Time-efficient method for an￾omaly detection in hyperspectral images[J]. IEEE Transac￾tions on geoscience and remote sensing, 2007, 45(12): 3894–3904. [3] PODGORELEC V, HERICKO M, ROZMAN I. Improv￾ing mining of medical data by outliers prediction[C]//Pro￾ceedings of the 18th IEEE Symposium on Computer-Based Medical Systems. Dublin, Ireland, 2005: 91–96. [4] NASI J, SORSA A, LEIVISKA K. Sensor validation and outlier detection using fuzzy limits[C]//Proceedings of the 44th IEEE Conference on Decision and Control. Seville, Spain, 2005: 7828–7833. [5] KIM S, CHO N W, KANG B, et al. Fast outlier detection for very large log data[J]. Expert systems with applica- [6] tions, 2011, 38(8): 9587–9596. CAMPELLO R J G B, MOULAVI D, ZIMEK A, et al. Hierarchical density estimates for data clustering, visualiz￾ation, and outlier detection[J]. ACM transactions on know￾ledge discovery from data, 2015, 10(1): 5. [7] 苟和平, 景永霞, 冯百明, 等. 基于 DBSCAN 聚类的改进 KNN 文本分类算法 [J]. 科学技术与工程, 2013, 13(1): 219–222. GOU Heping, JING Yongxia, FENG Baiming, et al. An improved KNN text categorization algorithm based on DB￾SCAN[J]. Science technology and engineering, 2013, 13(1): 219–222. [8] 周芳芳, 高飞, 刘勇刚, 等. 基于密度-距离图的交互式体 数据分类方法 [J]. 软件学报, 2016, 27(5): 1061–1073. ZHOU Fangfang, GAO Fei, LIU Yonggang, et al. Interact￾ive volume data classification based on density-distance graph[J]. Journal of software, 2016, 27(5): 1061–1073. [9] 周国兵, 吴建鑫, 周嵩. 一种基于近邻表示的聚类方法 [J]. 软件学报, 2015, 26(11): 2847–2855. ZHOU Guobing, WU Jianxin, ZHOU Song. Clustering method based on nearest neighbors representation[J]. Journal of software, 2015, 26(11): 2847–2855. [10] 王习特, 申德荣, 白梅, 等. BOD: 一种高效的分布式离 群点检测算法 [J]. 计算机学报, 2016, 39(1): 36–51. WANG Xite, SHEN Derong, BAI Mei, et al. BOD: an ef￾ficient algorithm for distributed outlier detection[J]. Chinese journal of computers, 2016, 39(1): 36–51. [11] 陆海青, 葛洪伟. 自适应灰度加权的鲁棒模糊 C 均值图 像分割 [J]. 智能系统学报, 2018, 13(4): 584–593. LU Haiqing, GE Hongwei. Adaptive gray-weighted ro￾bust fuzzy C-means algorithm for image segmentation[J]. CAAI transactions on intelligent systems, 2018, 13(4): 584–593. [12] 赵冠哲, 齐建鹏, 于彦伟, 等. 移动社交网络异常签到在 线检测算法 [J]. 智能系统学报, 2017, 12(5): 752–759. ZHAO Guanzhe, QI Jianpeng, YU Yanwei, et al. Online check-in outlier detection method in mobile social net￾works[J]. CAAI transactions on intelligent systems, 2017, 12(5): 752–759. [13] 张美琴, 白亮, 王俊斌. 基于加权聚类集成的标签传播 算法 [J]. 智能系统学报, 2018, 13(6): 994–998. ZHANG Meiqin, BAI Liang, WANG Junbin. Label propagation algorithm based on weighted clustering en￾semble[J]. CAAI transactions on intelligent systems, 2018, 13(6): 994–998. [14] HA J, SEOK S, LEE J S. Robust outlier detection using the instability factor[J]. Knowledge-based systems, 2014, [15] 第 5 期 冯骥,等:基于自然邻居邻域图的无参数离群检测算法 ·1005·
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有