DOI: 10.11992/tis.201811020 网络出版地址: h

正在加载图片...

第15卷第2期智能系统学报 Vol.15 No.2 2020年3月 CAAI Transactions on Intelligent Systems Mar.2020 D0:10.11992/tis.201811020 网络出版地址：http:/kns.cnki.net/kcms/detail/23.1538.TP.20190829.1004.004html 基于可拓距的改进k-means聚类算法赵燕伟，朱芬，桂方志，任设东2，谢智伟，徐晨 (1.浙江工业大学特种装备制造与先进加工技术教育部/汾江省重点实验室，浙江杭州310014,2.浙江业大学计算机科学与技术学院，浙江杭州310014) 摘要：针对现有聚类算法在初始聚类中心优化过程中存在首个初始聚类中心点落于边界非密集区域的不足，导致出现算法聚类效果不均衡问题，提出一种基于可拓距优选初始聚类中心的改进k-meas算法。将样本经典距离向可拓区间映射，并通过可拓侧距计算方法得到可拓左侧距及可拓右侧距：引入平均可拓侧距概念，将平均可拓左侧距和平均可拓右侧距分别作为样本密集度和聚类中心疏远度的量化指标：在此基础上，给出初始聚类中心选取准则。通过与传统k-means聚类算法进行对比，结果表明改进后的k-means聚类算法选取的初始聚类中心分布更加均匀，聚类效果更好，尤其在对高维数据聚类时具有更高的聚类准确率和更好的均衡性。关键词：可拓距；k-means聚类算法；缩放因子；初始聚类中心；密集度；疏远度中图分类号：TP181文献标志码：A 文章编号：1673-4785(2020)02-0344-08 中文引用格式：赵燕伟，朱芬，桂方志，等.基于可拓距的改进k-means聚类算法.智能系统学报，2020,15(2)：344-351. 英文引用格式：ZHAO Yanwei,,ZHU Fen,GUI Fangzhi,,etal.Improved k-means algorithm based on extension distance.CAAl transactions on intelligent systems,2020,15(2):344-351. Improved k-means algorithm based on extension distance ZHAO Yanwei',ZHU Fen',GUI Fangzhi',REN Shedong',XIE Zhiwei',XU Chen' (1.Key Lab of Special Purpose Equipment and Advanced Manufacturing Technology,Ministry of Education Zhejiang Province, Zhejiang University of Technology,Hangzhou 310014,China;2.College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310014,China) Abstract:An improved k-means algorithm optimizing the initial cluster centers based on extension distance was pro- posed to solve several problems that lead to clustering imbalance of the algorithm,such as the poor quality of initial cluster center selection or the first initial cluster center easily falling into the non-dense area of the data boundary.First, the classical distance of the sample was mapped onto the extension interval,and the extension left-side and right-side distances were obtained using the extension distance calculation method.Then,the average extension side distance was determined,and the extension left-side and right-side distances were taken as the quantitative indicators of sample dens- ity and cluster center distance,respectively.Subsequently,the selection criteria of the initial cluster center were given. Finally,compared with the traditional k-means algorithm,the improved k-means algorithm obtained higher clustering accuracy and better balance,particularly in high-dimensional data clustering. Keywords:extension distance;k-means clustering algorithm;scaling factor;initial cluster center;intensity;alienation 聚类是数据分析的重要手段，将数据集分为有明显区别，使得相似性最小，在数据挖掘、图像若干类，使得簇内紧密且相似性大，簇与簇之间处理等领域被广泛应用。k-means聚类算法是收稿日期：2018-11-26.网络出版日期：2019-08-29 一种常用的动态聚类算法，具有聚类速度快，操基金项目：国家自然科学基金项目(51875524)：浙江省公益技做简单，效率高等特点，但其同时存在对初始聚术应用研究计划项目(2017C31072). 通信作者：赵燕伟(1959-，.E-mail:ywz@zjut.edu.cn. 类中心点较敏感、全局搜索能力弱的缺点，使得DOI: 10.11992/tis.201811020 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20190829.1004.004.html 基于可拓距的改进 k-means 聚类算法赵燕伟1 ，朱芬1 ，桂方志1 ，任设东2 ，谢智伟1 ，徐晨1 （1. 浙江工业大学特种装备制造与先进加工技术教育部/浙江省重点实验室，浙江杭州 310014; 2. 浙江业大学计算机科学与技术学院，浙江杭州 310014）摘要：针对现有聚类算法在初始聚类中心优化过程中存在首个初始聚类中心点落于边界非密集区域的不足，导致出现算法聚类效果不均衡问题，提出一种基于可拓距优选初始聚类中心的改进 k-means 算法。将样本经典距离向可拓区间映射，并通过可拓侧距计算方法得到可拓左侧距及可拓右侧距；引入平均可拓侧距概念，将平均可拓左侧距和平均可拓右侧距分别作为样本密集度和聚类中心疏远度的量化指标；在此基础上，给出初始聚类中心选取准则。通过与传统 k-means 聚类算法进行对比，结果表明改进后的 k-means 聚类算法选取的初始聚类中心分布更加均匀，聚类效果更好，尤其在对高维数据聚类时具有更高的聚类准确率和更好的均衡性。关键词：可拓距；k-means 聚类算法；缩放因子；初始聚类中心；密集度；疏远度中图分类号：TP181 文献标志码：A 文章编号：1673−4785(2020)02−0344−08 中文引用格式：赵燕伟, 朱芬, 桂方志, 等. 基于可拓距的改进 k-means 聚类算法 [J]. 智能系统学报, 2020, 15(2): 344–351. 英文引用格式：ZHAO Yanwei, ZHU Fen, GUI Fangzhi, et al. Improved k-means algorithm based on extension distance[J]. CAAI transactions on intelligent systems, 2020, 15(2): 344–351. Improved k-means algorithm based on extension distance ZHAO Yanwei1 ，ZHU Fen1 ，GUI Fangzhi1 ，REN Shedong2 ，XIE Zhiwei1 ，XU Chen1 (1. Key Lab of Special Purpose Equipment and Advanced Manufacturing Technology, Ministry of Education & Zhejiang Province, Zhejiang University of Technology, Hangzhou 310014, China; 2. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310014, China) Abstract: An improved k -means algorithm optimizing the initial cluster centers based on extension distance was proposed to solve several problems that lead to clustering imbalance of the algorithm, such as the poor quality of initial cluster center selection or the first initial cluster center easily falling into the non-dense area of the data boundary. First, the classical distance of the sample was mapped onto the extension interval, and the extension left-side and right-side distances were obtained using the extension distance calculation method. Then, the average extension side distance was determined, and the extension left-side and right-side distances were taken as the quantitative indicators of sample density and cluster center distance, respectively. Subsequently, the selection criteria of the initial cluster center were given. Finally, compared with the traditional k-means algorithm, the improved k-means algorithm obtained higher clustering accuracy and better balance, particularly in high-dimensional data clustering. Keywords: extension distance; k-means clustering algorithm; scaling factor; initial cluster center; intensity; alienation 聚类是数据分析的重要手段，将数据集分为若干类，使得簇内紧密且相似性大，簇与簇之间有明显区别，使得相似性最小，在数据挖掘、图像处理等领域被广泛应用[1-4]。k-means 聚类算法是一种常用的动态聚类算法，具有聚类速度快，操做简单，效率高等特点，但其同时存在对初始聚类中心点较敏感、全局搜索能力弱的缺点，使得收稿日期：2018−11−26. 网络出版日期：2019−08−29. 基金项目：国家自然科学基金项目 (51875524)；浙江省公益技术应用研究计划项目 (2017C31072). 通信作者：赵燕伟 (1959-). E-mail：ywz@zjut.edu.cn. 第 15 卷第 2 期智能系统学报 Vol.15 No.2 2020 年 3 月 CAAI Transactions on Intelligent Systems Mar. 2020

向下翻页>>

点击下载：【人工智能基础】基于可拓距的改进k-means聚类算法