正在加载图片...
第15卷第2期 智能系统学报 Vol.15 No.2 2020年3月 CAAI Transactions on Intelligent Systems Mar.2020 D0:10.11992/tis.201811020 网络出版地址:http:/kns.cnki.net/kcms/detail/23.1538.TP.20190829.1004.004html 基于可拓距的改进k-means聚类算法 赵燕伟,朱芬,桂方志,任设东2,谢智伟,徐晨 (1.浙江工业大学特种装备制造与先进加工技术教育部/汾江省重点实验室,浙江杭州310014,2.浙江业大学 计算机科学与技术学院,浙江杭州310014) 摘要:针对现有聚类算法在初始聚类中心优化过程中存在首个初始聚类中心点落于边界非密集区域的不足, 导致出现算法聚类效果不均衡问题,提出一种基于可拓距优选初始聚类中心的改进k-meas算法。将样本经典 距离向可拓区间映射,并通过可拓侧距计算方法得到可拓左侧距及可拓右侧距:引入平均可拓侧距概念,将平 均可拓左侧距和平均可拓右侧距分别作为样本密集度和聚类中心疏远度的量化指标:在此基础上,给出初始聚 类中心选取准则。通过与传统k-means聚类算法进行对比,结果表明改进后的k-means聚类算法选取的初始聚 类中心分布更加均匀,聚类效果更好,尤其在对高维数据聚类时具有更高的聚类准确率和更好的均衡性。 关键词:可拓距;k-means聚类算法;缩放因子;初始聚类中心;密集度;疏远度 中图分类号:TP181文献标志码:A 文章编号:1673-4785(2020)02-0344-08 中文引用格式:赵燕伟,朱芬,桂方志,等.基于可拓距的改进k-means聚类算法.智能系统学报,2020,15(2):344-351. 英文引用格式:ZHAO Yanwei,,ZHU Fen,GUI Fangzhi,,etal.Improved k-means algorithm based on extension distance.CAAl transactions on intelligent systems,2020,15(2):344-351. Improved k-means algorithm based on extension distance ZHAO Yanwei',ZHU Fen',GUI Fangzhi',REN Shedong',XIE Zhiwei',XU Chen' (1.Key Lab of Special Purpose Equipment and Advanced Manufacturing Technology,Ministry of Education Zhejiang Province, Zhejiang University of Technology,Hangzhou 310014,China;2.College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310014,China) Abstract:An improved k-means algorithm optimizing the initial cluster centers based on extension distance was pro- posed to solve several problems that lead to clustering imbalance of the algorithm,such as the poor quality of initial cluster center selection or the first initial cluster center easily falling into the non-dense area of the data boundary.First, the classical distance of the sample was mapped onto the extension interval,and the extension left-side and right-side distances were obtained using the extension distance calculation method.Then,the average extension side distance was determined,and the extension left-side and right-side distances were taken as the quantitative indicators of sample dens- ity and cluster center distance,respectively.Subsequently,the selection criteria of the initial cluster center were given. Finally,compared with the traditional k-means algorithm,the improved k-means algorithm obtained higher clustering accuracy and better balance,particularly in high-dimensional data clustering. Keywords:extension distance;k-means clustering algorithm;scaling factor;initial cluster center;intensity;alienation 聚类是数据分析的重要手段,将数据集分为有明显区别,使得相似性最小,在数据挖掘、图像 若干类,使得簇内紧密且相似性大,簇与簇之间 处理等领域被广泛应用。k-means聚类算法是 收稿日期:2018-11-26.网络出版日期:2019-08-29 一种常用的动态聚类算法,具有聚类速度快,操 基金项目:国家自然科学基金项目(51875524):浙江省公益技 做简单,效率高等特点,但其同时存在对初始聚 术应用研究计划项目(2017C31072). 通信作者:赵燕伟(1959-,.E-mail:ywz@zjut.edu.cn. 类中心点较敏感、全局搜索能力弱的缺点,使得DOI: 10.11992/tis.201811020 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20190829.1004.004.html 基于可拓距的改进 k-means 聚类算法 赵燕伟1 ,朱芬1 ,桂方志1 ,任设东2 ,谢智伟1 ,徐晨1 (1. 浙江工业大学 特种装备制造与先进加工技术教育部/浙江省重点实验室,浙江 杭州 310014; 2. 浙江业大学 计算机科学与技术学院,浙江 杭州 310014) 摘 要:针对现有聚类算法在初始聚类中心优化过程中存在首个初始聚类中心点落于边界非密集区域的不足, 导致出现算法聚类效果不均衡问题,提出一种基于可拓距优选初始聚类中心的改进 k-means 算法。将样本经典 距离向可拓区间映射,并通过可拓侧距计算方法得到可拓左侧距及可拓右侧距;引入平均可拓侧距概念,将平 均可拓左侧距和平均可拓右侧距分别作为样本密集度和聚类中心疏远度的量化指标;在此基础上,给出初始聚 类中心选取准则。通过与传统 k-means 聚类算法进行对比,结果表明改进后的 k-means 聚类算法选取的初始聚 类中心分布更加均匀,聚类效果更好,尤其在对高维数据聚类时具有更高的聚类准确率和更好的均衡性。 关键词:可拓距;k-means 聚类算法;缩放因子;初始聚类中心;密集度;疏远度 中图分类号:TP181 文献标志码:A 文章编号:1673−4785(2020)02−0344−08 中文引用格式:赵燕伟, 朱芬, 桂方志, 等. 基于可拓距的改进 k-means 聚类算法 [J]. 智能系统学报, 2020, 15(2): 344–351. 英文引用格式:ZHAO Yanwei, ZHU Fen, GUI Fangzhi, et al. Improved k-means algorithm based on extension distance[J]. CAAI transactions on intelligent systems, 2020, 15(2): 344–351. Improved k-means algorithm based on extension distance ZHAO Yanwei1 ,ZHU Fen1 ,GUI Fangzhi1 ,REN Shedong2 ,XIE Zhiwei1 ,XU Chen1 (1. Key Lab of Special Purpose Equipment and Advanced Manufacturing Technology, Ministry of Education & Zhejiang Province, Zhejiang University of Technology, Hangzhou 310014, China; 2. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310014, China) Abstract: An improved k -means algorithm optimizing the initial cluster centers based on extension distance was pro￾posed to solve several problems that lead to clustering imbalance of the algorithm, such as the poor quality of initial cluster center selection or the first initial cluster center easily falling into the non-dense area of the data boundary. First, the classical distance of the sample was mapped onto the extension interval, and the extension left-side and right-side distances were obtained using the extension distance calculation method. Then, the average extension side distance was determined, and the extension left-side and right-side distances were taken as the quantitative indicators of sample dens￾ity and cluster center distance, respectively. Subsequently, the selection criteria of the initial cluster center were given. Finally, compared with the traditional k-means algorithm, the improved k-means algorithm obtained higher clustering accuracy and better balance, particularly in high-dimensional data clustering. Keywords: extension distance; k-means clustering algorithm; scaling factor; initial cluster center; intensity; alienation 聚类是数据分析的重要手段,将数据集分为 若干类,使得簇内紧密且相似性大,簇与簇之间 有明显区别,使得相似性最小,在数据挖掘、图像 处理等领域被广泛应用[1-4]。k-means 聚类算法是 一种常用的动态聚类算法,具有聚类速度快,操 做简单,效率高等特点,但其同时存在对初始聚 类中心点较敏感、全局搜索能力弱的缺点,使得 收稿日期:2018−11−26. 网络出版日期:2019−08−29. 基金项目:国家自然科学基金项目 (51875524);浙江省公益技 术应用研究计划项目 (2017C31072). 通信作者:赵燕伟 (1959-). E-mail:ywz@zjut.edu.cn. 第 15 卷第 2 期 智 能 系 统 学 报 Vol.15 No.2 2020 年 3 月 CAAI Transactions on Intelligent Systems Mar. 2020
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有