正在加载图片...
第14卷第5期 智能系统学报 Vol.14 No.5 2019年9月 CAAI Transactions on Intelligent Systems Sept.2019 D0:10.11992/tis.201812014 网络出版地址:http:/kns.cnki.net/kcms/detail/23.1538.TP.20190527.0921.002.html 网络拓扑特征的不平衡数据分类 普事业,刘三阳,白艺光 (西安电子科技大学数学与统计学院,陕西西安710126) 摘要:现实中的数据集普遍具有非均衡性。针对不平衡分类问题,建立数据集网络结构来充分挖掘隐藏在样 本点位置信息外的拓扑特征,分析网络节点的连接特性并赋予节点不同的效率。计算待测节点与每个子网络 的相似性测度,依据新型的概率模型,进一步推算出该节点与各子网络的整体性测度。构建了一种基于网络拓 扑特征的不平衡数据分类方法,算法中引入不平衡因子c用以诚小由正负类样本数量差异所带来的影响。实 验结果表明,该算法能有效提高分类精度,特别是对拓扑特征明显的数据集,在分类性能和适应能力上相比传 统分类方法都得到进一步提升。 关键词:不平衡数据;相似度;网络结构;准确率;拓扑;物理特征 中图分类号:TP391.9文献标志码:A文章编号:1673-4785(2019)05-0889-08 中文引用格式:普事业,刘三阳,白艺光.网络拓扑特征的不平衡数据分类J.智能系统学报,2019,14(⑤):889-896. 英文引用格式:PU Shiye,,LIU Sanyang,BAI Yiguang.Imbalanced data classification of network topology characteristiesJ.CAAI transactions on intelligent systems,2019,14(5):889-896. Imbalanced data classification of network topology characteristics PU Shiye,LIU Sanyang,BAI Yiguang (School of Mathematics and Statistics,Xidian University,Xi'an 710126,China) Abstract:This paper aims to solve the imbalanced data classification problem,which has been proven to be common in real applications.The dataset network structure is established to fully mine the topological features hidden outside the position information of sample points,analyze the connection characteristics of network nodes,and give these nodes dif- ferent efficiencies.The similarity measure between the node to be tested and each sub-network is calculated,and the in- tegrity measure between the node and each sub-network is further calculated according to the new probability model.A classification method of imbalanced data based on network topology features is constructed.An imbalanced factor c is introduced into the algorithm to reduce the influence caused by the difference in the number of positive and negative samples.The experimental results show that the algorithm can effectively improve the classification accuracy,espe- cially for datasets with significant topological features.The classification performance and adaptability are further im- proved compared with the traditional classification method. Keywords:imbalanced data;similarity;network structure;accuracy rate;topology;physical feature 在数据分类的研究中,普遍存在类别分布不持向量机(SVM)在处理不平衡数据时,分类超平 平衡的问题,即某一类别的样本数量远远多于 面往往会向少数类偏移,导致对少数类的识别率 另一类(分别称为多数类和少数类),具有这样特降低,而随机森林(random forest,,RF分类时易 征的数据集视为不平衡。传统的分类算法,如支 出现分类不佳、泛化误差变大等问题。针对支持 收稿日期:2018-12-12.网络出版日期:2019-05-27. 向量机在训练样本点过程中存在的噪声和野点问 基金项目:国家自然科学基金项目(61877046):陕西省自然科 题,不少研究学者提出了相应的改进算法。如台 学基金项目(2017JM1001). 通信作者:普事业.E-mail:psy2361@126.com 湾学者Lin等1提出模糊支持向量机(fuzzy sup-DOI: 10.11992/tis.201812014 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20190527.0921.002.html 网络拓扑特征的不平衡数据分类 普事业,刘三阳,白艺光 (西安电子科技大学 数学与统计学院,陕西 西安 710126) 摘 要:现实中的数据集普遍具有非均衡性。针对不平衡分类问题,建立数据集网络结构来充分挖掘隐藏在样 本点位置信息外的拓扑特征,分析网络节点的连接特性并赋予节点不同的效率。计算待测节点与每个子网络 的相似性测度,依据新型的概率模型,进一步推算出该节点与各子网络的整体性测度。构建了一种基于网络拓 扑特征的不平衡数据分类方法,算法中引入不平衡因子 c 用以减小由正负类样本数量差异所带来的影响。实 验结果表明,该算法能有效提高分类精度,特别是对拓扑特征明显的数据集,在分类性能和适应能力上相比传 统分类方法都得到进一步提升。 关键词:不平衡数据;相似度;网络结构;准确率;拓扑;物理特征 中图分类号:TP391.9 文献标志码:A 文章编号:1673−4785(2019)05−0889−08 中文引用格式:普事业, 刘三阳, 白艺光. 网络拓扑特征的不平衡数据分类 [J]. 智能系统学报, 2019, 14(5): 889–896. 英文引用格式:PU Shiye, LIU Sanyang, BAI Yiguang. Imbalanced data classification of network topology characteristics[J]. CAAI transactions on intelligent systems, 2019, 14(5): 889–896. Imbalanced data classification of network topology characteristics PU Shiye,LIU Sanyang,BAI Yiguang (School of Mathematics and Statistics, Xidian University, Xi’an 710126, China) Abstract: This paper aims to solve the imbalanced data classification problem, which has been proven to be common in real applications. The dataset network structure is established to fully mine the topological features hidden outside the position information of sample points, analyze the connection characteristics of network nodes, and give these nodes dif￾ferent efficiencies. The similarity measure between the node to be tested and each sub-network is calculated, and the in￾tegrity measure between the node and each sub-network is further calculated according to the new probability model. A classification method of imbalanced data based on network topology features is constructed. An imbalanced factor c is introduced into the algorithm to reduce the influence caused by the difference in the number of positive and negative samples. The experimental results show that the algorithm can effectively improve the classification accuracy, espe￾cially for datasets with significant topological features. The classification performance and adaptability are further im￾proved compared with the traditional classification method. Keywords: imbalanced data; similarity; network structure; accuracy rate; topology; physical feature 在数据分类的研究中,普遍存在类别分布不 平衡[1] 的问题,即某一类别的样本数量远远多于 另一类 (分别称为多数类和少数类),具有这样特 征的数据集视为不平衡。传统的分类算法,如支 持向量机 (SVM) 在处理不平衡数据时,分类超平 面往往会向少数类偏移,导致对少数类的识别率 降低,而随机森林 (random forest,RF[2] )分类时易 出现分类不佳、泛化误差变大等问题。针对支持 向量机在训练样本点过程中存在的噪声和野点问 题,不少研究学者提出了相应的改进算法。如台 湾学者 Lin 等 [3] 提出模糊支持向量机 (fuzzy sup- 收稿日期:2018−12−12. 网络出版日期:2019−05−27. 基金项目:国家自然科学基金项目 (61877046);陕西省自然科 学基金项目 (2017JM1001). 通信作者:普事业. E-mail:psy2361@126.com. 第 14 卷第 5 期 智 能 系 统 学 报 Vol.14 No.5 2019 年 9 月 CAAI Transactions on Intelligent Systems Sept. 2019
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有