DOI: 10.11992/tis.201911039 结合度量融合和地标

正在加载图片...

第15卷第4期智能系统学报 Vol.15 No.4 2020年7月 CAAI Transactions on Intelligent Systems Jul.2020 D0L:10.11992tis.201911039 结合度量融合和地标表示的自编码谱聚类算法张敏，周治平2 (1.江南大学物联网工程学院，江苏无锡214122,2.江南大学物联网技术应用教育部工程研究中心，江苏无锡214122) 摘要：针对大多数现有谱聚类算法处理大规模数据集时面临聚类精度低、大规模相似度矩阵存储开销大的问题，提出一种结合度量融合和地标表示的自编码谱聚类算法。引入相对质量概念进行节点评估，选取最具代表性的点作为地标点，通过稀疏表示近似获得图相似度矩阵，以降低存储开销。同时考虑到近邻样本的几何分布和拓扑分布的信息，融合欧氏距离与Kendall Tau距离来度量地标点与其他样本之间的相似度，提高聚类精度：以栈式自编码器取代拉普拉斯矩阵特征分解，将所获得的相似度矩阵作为自编码器的输入，通过联合学习嵌入表示和聚类来进一步提高聚类精度。在5个大规模数据集上的实验验证了本文算法的有效性。关键词：大规模数据集；度量融合：地标表示：相对质量：稀疏表示：栈式自编码器：联合学习：嵌入表示中图分类号：TP18文献标志码：A文章编号：1673-4785(2020)04-0687-10 中文引用格式：张敏，周治平.结合度量融合和地标表示的自编码谱聚类算法，智能系统学报，2020,15(4)：687-696. 英文引用格式：ZHANG Min,ZHOU Zhiping.An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J].CAAI transactions on intelligent systems,2020,15(4):687-696. An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation ZHANG Min',ZHOU Zhiping2 (1.School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China;2.Engineering Research Center of Internet of Things Technology Applications Ministry of Education,Jiangnan University,Wuxi 214122,China) Abstract:Most existing spectral clustering algorithms are faced with low clustering accuracy and costly large-scale sim- ilarity matrix storage.Aiming at these problems,this paper proposes an autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation.First,instead of random sampling,the concept of relative mass is introduced to evaluate node quality.Based on this,the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation.Meanwhile,considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points,so as to increase the clustering pre- cision.A stacked autoencoder is then used to replace the Laplace matrix eigen-decomposition,and the obtained similar- ity matrix is taken as the autoencoder's input.The clustering accuracy is further improved by joint learning of embed- ded representation and clustering.Experiments on five large-scale datasets validate the effectiveness of our algorithm. Keywords:large-scale datasets;metric fusion;landmark representation;relative mass;sparse representation;stacked au- toencoder;joint learning;embedded representation 聚类旨在根据数据点之间的相似性将其划分算法，缺乏处理复杂数据结构的能力，当样本空到不同的簇，使簇内相似度最大，簇间相似度最间非凸时，易陷入局部最优。近年来，谱聚类算小u。传统聚类方法如K-means算法和模糊聚类法因其可在任意形状空间内进行聚类，并收敛到收稿日期：2019-12-02. 全局最优，在非凸数据集表现出良好聚类性能，通信作者：张敏.E-mail:150618823731@163.com 在人脸识别、社区检测、图像分割等领域有着广DOI: 10.11992/tis.201911039 结合度量融合和地标表示的自编码谱聚类算法张敏1 ，周治平1,2 （1. 江南大学物联网工程学院，江苏无锡 214122; 2. 江南大学物联网技术应用教育部工程研究中心，江苏无锡 214122）摘要：针对大多数现有谱聚类算法处理大规模数据集时面临聚类精度低、大规模相似度矩阵存储开销大的问题，提出一种结合度量融合和地标表示的自编码谱聚类算法。引入相对质量概念进行节点评估，选取最具代表性的点作为地标点，通过稀疏表示近似获得图相似度矩阵，以降低存储开销。同时考虑到近邻样本的几何分布和拓扑分布的信息，融合欧氏距离与 Kendall Tau 距离来度量地标点与其他样本之间的相似度，提高聚类精度；以栈式自编码器取代拉普拉斯矩阵特征分解，将所获得的相似度矩阵作为自编码器的输入，通过联合学习嵌入表示和聚类来进一步提高聚类精度。在 5 个大规模数据集上的实验验证了本文算法的有效性。关键词：大规模数据集；度量融合；地标表示；相对质量；稀疏表示；栈式自编码器；联合学习；嵌入表示中图分类号：TP18 文献标志码：A 文章编号：1673−4785(2020)04−0687−10 中文引用格式：张敏, 周治平. 结合度量融合和地标表示的自编码谱聚类算法 [J]. 智能系统学报, 2020, 15(4): 687–696. 英文引用格式：ZHANG Min, ZHOU Zhiping. An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J]. CAAI transactions on intelligent systems, 2020, 15(4): 687–696. An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation ZHANG Min1 ，ZHOU Zhiping1,2 (1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China; 2. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China) Abstract: Most existing spectral clustering algorithms are faced with low clustering accuracy and costly large-scale similarity matrix storage. Aiming at these problems, this paper proposes an autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation. First, instead of random sampling, the concept of relative mass is introduced to evaluate node quality. Based on this, the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation. Meanwhile, considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points, so as to increase the clustering precision. A stacked autoencoder is then used to replace the Laplace matrix eigen-decomposition, and the obtained similarity matrix is taken as the autoencoder’s input. The clustering accuracy is further improved by joint learning of embedded representation and clustering. Experiments on five large-scale datasets validate the effectiveness of our algorithm. Keywords: large-scale datasets; metric fusion; landmark representation; relative mass; sparse representation; stacked autoencoder; joint learning; embedded representation 聚类旨在根据数据点之间的相似性将其划分到不同的簇，使簇内相似度最大，簇间相似度最小 [1]。传统聚类方法如 K-means 算法和模糊聚类算法，缺乏处理复杂数据结构的能力，当样本空间非凸时，易陷入局部最优。近年来，谱聚类算法因其可在任意形状空间内进行聚类，并收敛到全局最优，在非凸数据集表现出良好聚类性能，在人脸识别、社区检测、图像分割等领域有着广收稿日期：2019−12−02. 通信作者：张敏. E-mail：15061882373_1@163.com. 第 15 卷第 4 期智能系统学报 Vol.15 No.4 2020 年 7 月 CAAI Transactions on Intelligent Systems Jul. 2020

向下翻页>>

点击下载：【机器学习】结合度量融合和地标表示的自编码谱聚类算法