正在加载图片...
第15卷第4期 智能系统学报 Vol.15 No.4 2020年7月 CAAI Transactions on Intelligent Systems Jul.2020 D0L:10.11992tis.201911039 结合度量融合和地标表示的自编码谱聚类算法 张敏,周治平2 (1.江南大学物联网工程学院,江苏无锡214122,2.江南大学物联网技术应用教育部工程研究中心,江苏无 锡214122) 摘要:针对大多数现有谱聚类算法处理大规模数据集时面临聚类精度低、大规模相似度矩阵存储开销大的问 题,提出一种结合度量融合和地标表示的自编码谱聚类算法。引入相对质量概念进行节点评估,选取最具代表 性的点作为地标点,通过稀疏表示近似获得图相似度矩阵,以降低存储开销。同时考虑到近邻样本的几何分布 和拓扑分布的信息,融合欧氏距离与Kendall Tau距离来度量地标点与其他样本之间的相似度,提高聚类精度: 以栈式自编码器取代拉普拉斯矩阵特征分解,将所获得的相似度矩阵作为自编码器的输入,通过联合学习嵌入 表示和聚类来进一步提高聚类精度。在5个大规模数据集上的实验验证了本文算法的有效性。 关键词:大规模数据集;度量融合:地标表示:相对质量:稀疏表示:栈式自编码器:联合学习:嵌入表示 中图分类号:TP18文献标志码:A文章编号:1673-4785(2020)04-0687-10 中文引用格式:张敏,周治平.结合度量融合和地标表示的自编码谱聚类算法,智能系统学报,2020,15(4):687-696. 英文引用格式:ZHANG Min,ZHOU Zhiping.An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J].CAAI transactions on intelligent systems,2020,15(4):687-696. An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation ZHANG Min',ZHOU Zhiping2 (1.School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China;2.Engineering Research Center of Internet of Things Technology Applications Ministry of Education,Jiangnan University,Wuxi 214122,China) Abstract:Most existing spectral clustering algorithms are faced with low clustering accuracy and costly large-scale sim- ilarity matrix storage.Aiming at these problems,this paper proposes an autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation.First,instead of random sampling,the concept of relative mass is introduced to evaluate node quality.Based on this,the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation.Meanwhile,considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points,so as to increase the clustering pre- cision.A stacked autoencoder is then used to replace the Laplace matrix eigen-decomposition,and the obtained similar- ity matrix is taken as the autoencoder's input.The clustering accuracy is further improved by joint learning of embed- ded representation and clustering.Experiments on five large-scale datasets validate the effectiveness of our algorithm. Keywords:large-scale datasets;metric fusion;landmark representation;relative mass;sparse representation;stacked au- toencoder;joint learning;embedded representation 聚类旨在根据数据点之间的相似性将其划分 算法,缺乏处理复杂数据结构的能力,当样本空 到不同的簇,使簇内相似度最大,簇间相似度最 间非凸时,易陷入局部最优。近年来,谱聚类算 小u。传统聚类方法如K-means算法和模糊聚类 法因其可在任意形状空间内进行聚类,并收敛到 收稿日期:2019-12-02. 全局最优,在非凸数据集表现出良好聚类性能, 通信作者:张敏.E-mail:150618823731@163.com 在人脸识别、社区检测、图像分割等领域有着广DOI: 10.11992/tis.201911039 结合度量融合和地标表示的自编码谱聚类算法 张敏1 ,周治平1,2 (1. 江南大学 物联网工程学院,江苏 无锡 214122; 2. 江南大学 物联网技术应用教育部工程研究中心,江苏 无 锡 214122) 摘 要:针对大多数现有谱聚类算法处理大规模数据集时面临聚类精度低、大规模相似度矩阵存储开销大的问 题,提出一种结合度量融合和地标表示的自编码谱聚类算法。引入相对质量概念进行节点评估,选取最具代表 性的点作为地标点,通过稀疏表示近似获得图相似度矩阵,以降低存储开销。同时考虑到近邻样本的几何分布 和拓扑分布的信息,融合欧氏距离与 Kendall Tau 距离来度量地标点与其他样本之间的相似度,提高聚类精度; 以栈式自编码器取代拉普拉斯矩阵特征分解,将所获得的相似度矩阵作为自编码器的输入,通过联合学习嵌入 表示和聚类来进一步提高聚类精度。在 5 个大规模数据集上的实验验证了本文算法的有效性。 关键词:大规模数据集;度量融合;地标表示;相对质量;稀疏表示;栈式自编码器;联合学习;嵌入表示 中图分类号:TP18 文献标志码:A 文章编号:1673−4785(2020)04−0687−10 中文引用格式:张敏, 周治平. 结合度量融合和地标表示的自编码谱聚类算法 [J]. 智能系统学报, 2020, 15(4): 687–696. 英文引用格式:ZHANG Min, ZHOU Zhiping. An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J]. CAAI transactions on intelligent systems, 2020, 15(4): 687–696. An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation ZHANG Min1 ,ZHOU Zhiping1,2 (1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China; 2. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China) Abstract: Most existing spectral clustering algorithms are faced with low clustering accuracy and costly large-scale sim￾ilarity matrix storage. Aiming at these problems, this paper proposes an autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation. First, instead of random sampling, the concept of relative mass is introduced to evaluate node quality. Based on this, the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation. Meanwhile, considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points, so as to increase the clustering pre￾cision. A stacked autoencoder is then used to replace the Laplace matrix eigen-decomposition, and the obtained similar￾ity matrix is taken as the autoencoder’s input. The clustering accuracy is further improved by joint learning of embed￾ded representation and clustering. Experiments on five large-scale datasets validate the effectiveness of our algorithm. Keywords: large-scale datasets; metric fusion; landmark representation; relative mass; sparse representation; stacked au￾toencoder; joint learning; embedded representation 聚类旨在根据数据点之间的相似性将其划分 到不同的簇,使簇内相似度最大,簇间相似度最 小 [1]。传统聚类方法如 K-means 算法和模糊聚类 算法,缺乏处理复杂数据结构的能力,当样本空 间非凸时,易陷入局部最优。近年来,谱聚类算 法因其可在任意形状空间内进行聚类,并收敛到 全局最优,在非凸数据集表现出良好聚类性能, 在人脸识别、社区检测、图像分割等领域有着广 收稿日期:2019−12−02. 通信作者:张敏. E-mail:15061882373_1@163.com. 第 15 卷第 4 期 智 能 系 统 学 报 Vol.15 No.4 2020 年 7 月 CAAI Transactions on Intelligent Systems Jul. 2020
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有