正在加载图片...
第15卷第6期 智能系统学报 Vol.15 No.6 2020年11月 CAAI Transactions on Intelligent Systems Nov.2020 D0:10.11992tis.202006050 一种深度自监督聚类集成算法 杜航原',张晶2,王文剑2 (1.山西大学计算机与信息技术学院,山西太原030006:2.山西大学计算智能与中文信息处理教育部重点实 验室,山西太原030006) 摘要:针对聚类集成中一致性函数设计问题,本文提出一种深度自监督聚类集成算法。该算法首先根据基聚 类划分结果采用加权连通三元组算法计算样本之间的相似度矩阵,基于相似度矩阵表达邻接关系,将基聚类由 特征空间中的数据表示变换至图数据表示:在此基础上,基聚类的一致性集成问题被转化为对基聚类图数据表 示的图聚类问题。为此,本文利用图神经网络构造自监督聚类集成模型,一方面采用图自动编码器学习图的低 维嵌入,依据低维嵌入似然分布估计聚类集成的目标分布:另一方面利用聚类集成目标对低维嵌入过程进行指 导,确保模型获得的图低维嵌入与聚类集成结果是一致最优的。在大量数据集上进行了仿真实验,结果表明本 文算法相比HGPA、CSPA和MCLA等算法可以进一步提高聚类集成结果的准确性。 关键词:特征空间:聚类算法:一致性函数:图表示:相似性度量:自监督学习:图数据:神经网络模型 中图分类号:TP391 文献标志码:A文章编号:1673-4785(2020)06-1113-08 中文引用格式:杜航原,张晶,王文剑.一种深度自监督聚类集成算法.智能系统学报,2020,15(6):1113-1120. 英文引用格式:DU Hangyuan,.ZHANG Jing,WANG Wenjian.A deep self-supervised clustering ensemble algorithmJ.CAAI transactions on intelligent systems,2020,15(6):1113-1120. A deep self-supervised clustering ensemble algorithm DU Hangyuan',ZHANG Jing',WANG Wenjian'2 (1.College of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;2.Key Laboratory of Computa- tional Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China) Abstract:In this study,we propose a deep self-supervised clustering ensemble algorithm to obtain the design of a con- sensus function in a clustering ensemble.In this algorithm,a weighted connected-triple algorithm is applied to the cluster components for estimating the similarity matrix of the samples,based on which the adjacency relation can be de- termined.Thus,the cluster components can be transformed from data representation in the feature space to graph data representation.On this basis,the consistency integration problem of cluster components is transformed into a graph clus- tering problem for the graph data representation of cluster components.Further,a graph neural network is used to con- struct the self-supervised clustering ensemble model.This model uses a graph autoencoder to obtain the low-dimension- al embedding of the graph,and the target distribution of the cluster ensemble can be estimated based on the likelihood distribution generated via low-dimensional embedding.The clustering ensemble guides the learning of low-dimensional embedding.The above methods ensure that the low-dimensional embedding and clustering ensemble results obtained by the model are consistent and optimal.Simulation experiments were conducted on a large number of data sets.Results show that the proposed algorithm improves the accuracy of the clustering ensemble result compared with the accuracies obtained using algorithms such as HGPA,CSPA,and MCLA. Keywords:feature space;clustering algorithm;consistency function;graph representation;similarity measure;self-su- pervised learning;graphical data;neural network model 收稿日期:2020-06-29. 基金项目:国家自然科学基金项目(61902227,61673249 聚类分析在图像处理、机器学习、Web搜索 61773247,U1805263):山西省国际合作重点研发计 划项目(201903D421050):山西省基础研究计划项目 等众多领域得到了广泛应用,是机器学习领域一 (201901D211192):山西省应用基础研究计划项目 (201701D121053):山西省1331工程项目. 个比较活跃且极具挑战的研究方向。其主要思想 通信作者:王文剑.E-mail:wjwang(@sxu.edu.cn. 是通过计算样本间的相似度把数据集划分成若干DOI: 10.11992/tis.202006050 一种深度自监督聚类集成算法 杜航原1 ,张晶2 ,王文剑1,2 (1. 山西大学 计算机与信息技术学院,山西 太原 030006; 2. 山西大学 计算智能与中文信息处理教育部重点实 验室,山西 太原 030006) 摘 要:针对聚类集成中一致性函数设计问题,本文提出一种深度自监督聚类集成算法。该算法首先根据基聚 类划分结果采用加权连通三元组算法计算样本之间的相似度矩阵,基于相似度矩阵表达邻接关系,将基聚类由 特征空间中的数据表示变换至图数据表示;在此基础上,基聚类的一致性集成问题被转化为对基聚类图数据表 示的图聚类问题。为此,本文利用图神经网络构造自监督聚类集成模型,一方面采用图自动编码器学习图的低 维嵌入,依据低维嵌入似然分布估计聚类集成的目标分布;另一方面利用聚类集成目标对低维嵌入过程进行指 导,确保模型获得的图低维嵌入与聚类集成结果是一致最优的。在大量数据集上进行了仿真实验,结果表明本 文算法相比 HGPA、CSPA 和 MCLA 等算法可以进一步提高聚类集成结果的准确性。 关键词:特征空间;聚类算法;一致性函数;图表示;相似性度量;自监督学习;图数据;神经网络模型 中图分类号:TP391 文献标志码:A 文章编号:1673−4785(2020)06−1113−08 中文引用格式:杜航原, 张晶, 王文剑. 一种深度自监督聚类集成算法 [J]. 智能系统学报, 2020, 15(6): 1113–1120. 英文引用格式:DU Hangyuan, ZHANG Jing, WANG Wenjian. A deep self-supervised clustering ensemble algorithm[J]. CAAI transactions on intelligent systems, 2020, 15(6): 1113–1120. A deep self-supervised clustering ensemble algorithm DU Hangyuan1 ,ZHANG Jing2 ,WANG Wenjian1,2 (1. College of Computer and Information Technology, Shanxi University, Taiyuan 030006, China; 2. Key Laboratory of Computa￾tional Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China) Abstract: In this study, we propose a deep self-supervised clustering ensemble algorithm to obtain the design of a con￾sensus function in a clustering ensemble. In this algorithm, a weighted connected-triple algorithm is applied to the cluster components for estimating the similarity matrix of the samples, based on which the adjacency relation can be de￾termined. Thus, the cluster components can be transformed from data representation in the feature space to graph data representation. On this basis, the consistency integration problem of cluster components is transformed into a graph clus￾tering problem for the graph data representation of cluster components. Further, a graph neural network is used to con￾struct the self-supervised clustering ensemble model. This model uses a graph autoencoder to obtain the low-dimension￾al embedding of the graph, and the target distribution of the cluster ensemble can be estimated based on the likelihood distribution generated via low-dimensional embedding. The clustering ensemble guides the learning of low-dimensional embedding. The above methods ensure that the low-dimensional embedding and clustering ensemble results obtained by the model are consistent and optimal. Simulation experiments were conducted on a large number of data sets. Results show that the proposed algorithm improves the accuracy of the clustering ensemble result compared with the accuracies obtained using algorithms such as HGPA, CSPA, and MCLA. Keywords: feature space; clustering algorithm; consistency function; graph representation; similarity measure; self-su￾pervised learning; graphical data; neural network model 聚类分析在图像处理、机器学习、Web 搜索 等众多领域得到了广泛应用,是机器学习领域一 个比较活跃且极具挑战的研究方向。其主要思想 是通过计算样本间的相似度把数据集划分成若干 收稿日期:2020−06−29. 基金项目:国家自然科学基金项 目 (61902227, 61673249, 61773247,U1805263);山西省国际合作重点研发计 划项目 (201903D421050);山西省基础研究计划项目 (201901D211192);山西省应用基础研究计划项目 (201701D121053);山西省 1331 工程项目. 通信作者:王文剑. E-mail:wjwang@sxu.edu.cn. 第 15 卷第 6 期 智 能 系 统 学 报 Vol.15 No.6 2020 年 11 月 CAAI Transactions on Intelligent Systems Nov. 2020
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有