DOI: 10.11992/tis.201812014 网络出版地址: h

正在加载图片...

第14卷第5期智能系统学报 Vol.14 No.5 2019年9月 CAAI Transactions on Intelligent Systems Sept.2019 D0:10.11992/tis.201812014 网络出版地址：http:/kns.cnki.net/kcms/detail/23.1538.TP.20190527.0921.002.html 网络拓扑特征的不平衡数据分类普事业，刘三阳，白艺光 (西安电子科技大学数学与统计学院，陕西西安710126) 摘要：现实中的数据集普遍具有非均衡性。针对不平衡分类问题，建立数据集网络结构来充分挖掘隐藏在样本点位置信息外的拓扑特征，分析网络节点的连接特性并赋予节点不同的效率。计算待测节点与每个子网络的相似性测度，依据新型的概率模型，进一步推算出该节点与各子网络的整体性测度。构建了一种基于网络拓扑特征的不平衡数据分类方法，算法中引入不平衡因子c用以诚小由正负类样本数量差异所带来的影响。实验结果表明，该算法能有效提高分类精度，特别是对拓扑特征明显的数据集，在分类性能和适应能力上相比传统分类方法都得到进一步提升。关键词：不平衡数据；相似度；网络结构；准确率；拓扑；物理特征中图分类号：TP391.9文献标志码：A文章编号：1673-4785(2019)05-0889-08 中文引用格式：普事业，刘三阳，白艺光.网络拓扑特征的不平衡数据分类J.智能系统学报，2019,14（⑤）：889-896. 英文引用格式：PU Shiye,,LIU Sanyang,BAI Yiguang.Imbalanced data classification of network topology characteristiesJ.CAAI transactions on intelligent systems,2019,14(5):889-896. Imbalanced data classification of network topology characteristics PU Shiye,LIU Sanyang,BAI Yiguang (School of Mathematics and Statistics,Xidian University,Xi'an 710126,China) Abstract:This paper aims to solve the imbalanced data classification problem,which has been proven to be common in real applications.The dataset network structure is established to fully mine the topological features hidden outside the position information of sample points,analyze the connection characteristics of network nodes,and give these nodes dif- ferent efficiencies.The similarity measure between the node to be tested and each sub-network is calculated,and the in- tegrity measure between the node and each sub-network is further calculated according to the new probability model.A classification method of imbalanced data based on network topology features is constructed.An imbalanced factor c is introduced into the algorithm to reduce the influence caused by the difference in the number of positive and negative samples.The experimental results show that the algorithm can effectively improve the classification accuracy,espe- cially for datasets with significant topological features.The classification performance and adaptability are further im- proved compared with the traditional classification method. Keywords:imbalanced data;similarity;network structure;accuracy rate;topology;physical feature 在数据分类的研究中，普遍存在类别分布不持向量机(SVM)在处理不平衡数据时，分类超平平衡的问题，即某一类别的样本数量远远多于面往往会向少数类偏移，导致对少数类的识别率另一类（分别称为多数类和少数类），具有这样特降低，而随机森林(random forest,,RF分类时易征的数据集视为不平衡。传统的分类算法，如支出现分类不佳、泛化误差变大等问题。针对支持收稿日期：2018-12-12.网络出版日期：2019-05-27. 向量机在训练样本点过程中存在的噪声和野点问基金项目：国家自然科学基金项目(61877046)：陕西省自然科题，不少研究学者提出了相应的改进算法。如台学基金项目(2017JM1001). 通信作者：普事业.E-mail:psy2361@126.com 湾学者Lin等1提出模糊支持向量机(fuzzy sup-DOI: 10.11992/tis.201812014 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20190527.0921.002.html 网络拓扑特征的不平衡数据分类普事业，刘三阳，白艺光（西安电子科技大学数学与统计学院，陕西西安 710126）摘要：现实中的数据集普遍具有非均衡性。针对不平衡分类问题，建立数据集网络结构来充分挖掘隐藏在样本点位置信息外的拓扑特征，分析网络节点的连接特性并赋予节点不同的效率。计算待测节点与每个子网络的相似性测度，依据新型的概率模型，进一步推算出该节点与各子网络的整体性测度。构建了一种基于网络拓扑特征的不平衡数据分类方法，算法中引入不平衡因子 c 用以减小由正负类样本数量差异所带来的影响。实验结果表明，该算法能有效提高分类精度，特别是对拓扑特征明显的数据集，在分类性能和适应能力上相比传统分类方法都得到进一步提升。关键词：不平衡数据；相似度；网络结构；准确率；拓扑；物理特征中图分类号：TP391.9 文献标志码：A 文章编号：1673−4785(2019)05−0889−08 中文引用格式：普事业, 刘三阳, 白艺光. 网络拓扑特征的不平衡数据分类 [J]. 智能系统学报, 2019, 14(5): 889–896. 英文引用格式：PU Shiye, LIU Sanyang, BAI Yiguang. Imbalanced data classification of network topology characteristics[J]. CAAI transactions on intelligent systems, 2019, 14(5): 889–896. Imbalanced data classification of network topology characteristics PU Shiye，LIU Sanyang，BAI Yiguang (School of Mathematics and Statistics, Xidian University, Xi’an 710126, China) Abstract: This paper aims to solve the imbalanced data classification problem, which has been proven to be common in real applications. The dataset network structure is established to fully mine the topological features hidden outside the position information of sample points, analyze the connection characteristics of network nodes, and give these nodes different efficiencies. The similarity measure between the node to be tested and each sub-network is calculated, and the integrity measure between the node and each sub-network is further calculated according to the new probability model. A classification method of imbalanced data based on network topology features is constructed. An imbalanced factor c is introduced into the algorithm to reduce the influence caused by the difference in the number of positive and negative samples. The experimental results show that the algorithm can effectively improve the classification accuracy, especially for datasets with significant topological features. The classification performance and adaptability are further improved compared with the traditional classification method. Keywords: imbalanced data; similarity; network structure; accuracy rate; topology; physical feature 在数据分类的研究中，普遍存在类别分布不平衡[1] 的问题，即某一类别的样本数量远远多于另一类 (分别称为多数类和少数类)，具有这样特征的数据集视为不平衡。传统的分类算法，如支持向量机 (SVM) 在处理不平衡数据时，分类超平面往往会向少数类偏移，导致对少数类的识别率降低，而随机森林 (random forest，RF[2] )分类时易出现分类不佳、泛化误差变大等问题。针对支持向量机在训练样本点过程中存在的噪声和野点问题，不少研究学者提出了相应的改进算法。如台湾学者 Lin 等 [3] 提出模糊支持向量机 (fuzzy sup- 收稿日期：2018−12−12. 网络出版日期：2019−05−27. 基金项目：国家自然科学基金项目 (61877046)；陕西省自然科学基金项目 (2017JM1001). 通信作者：普事业. E-mail：psy2361@126.com. 第 14 卷第 5 期智能系统学报 Vol.14 No.5 2019 年 9 月 CAAI Transactions on Intelligent Systems Sept. 2019

向下翻页>>

点击下载：【机器学习】网络拓扑特征的不平衡数据分类