正在加载图片...
第12卷第6期 智能系统学报 Vol.12 No.6 2017年12月 CAAI Transactions on Intelligent Systems Dec.2017 D0:10.11992/tis.201706029 网络出版地址:http:/kns.cnki.net/kcms/detail/23.1538.TP.20171109.1250.006.html 聚类有效性评价新指标 谢娟英,周颖,王明钊,姜炜亮 (陕西师范大学计算算计科学学院,陕西西安710062)】 摘要:聚类有效性评价指标分为外部评价指标和内部评价指标两大类。现有外部评价指标没有考虑聚类结果类偏 斜现象:现有内部评价指标的聚类有效性检验效果难以得到最佳类簇数。针对现有内外部聚类评价指标的缺陷,提 出同时考虑正负类信息的分别基于相依表和样本对的外部评价指标,用于评价任意分布数据集的聚类结果;提出采 用方差度量类内紧密度和类间分离度,以类间分离度与类内紧密度之比作为度量指标的内部评价指标。UCI数据集 和人工模拟数据集实验测试表明,提出的新内部评价指标能有效发现数据集的真实类簇数:提出的基于相依表和样 本对的外部评价指标,可有效评价存在类偏斜与噪音数据的聚类结果。 关键词:聚类:聚类有效性:评价指标:外部指标:内部指标:F-measure:Adjusted Rand Index:STDI:S2:PS2 中图分类号:TP108文献标志码:A文章编号:1673-4785(2017)06-0873-10 中文引用格式:谢娟英,周颖,王明钊,等.聚类有效性评价新指标.智能系统学报,2017,12(6):873-882. 英文引用格式:XIE Juanying,.ZHOU Ying,VANG Mingzhao,etal.New criteria for evaluating the validity of clustering Jl..CAAI transactions on intelligent systems,2017,12(6):873-882. New criteria for evaluating the validity of clustering XIE Juanying,ZHOU Ying,WANG Mingzhao,JIANG Weiliang (School of Computer Science,Shaanxi Normal University,Xi'an 710062,China) Abstract:There are two kinds of criteria for evaluating the clustering ability of a clustering algorithm,internal and ex- ternal.The current external evaluation indexes fails to consider the skewed clustering result,it is difficult to get optim- um cluster numbers from the clustering validity inspection results from the internal evaluation indexes.Considering the defects in the present internal and external clustering evaluation indices,we propose two external evaluation indexes, which consider both positive and negative information and which are respectively based on the contingency table and sample pairs for the evaluation of clustering results from a dataset with arbitrary distribution.The variance is proposed to measure the tightness of a cluster and the separability between clusters,and the ratio of these parameters is used as an internal evaluation index for the measurement index.Experiments on the datesets from UCI (University of California in Iven)machine learning repository and artificially simulated datasets show that the proposed new internal index can be used to effectively find the truenumber of clusters in a dataset.The proposed external indexes based on the contingency table and sample pairs are a very effective external evaluation indexes and can be used to evaluate the clustering results from existing types of skewed and noisy data. Keywords:clustering;validity of clustering;evaluation index;external criteria;internal criteria;F-measure;Adjusted Rand Index:STDI:S2:PS2 收稿日期:2017-06-08.网络出版日期:2017-11-09 随着人工智能技术如火如茶地发展,机器学习 基金项目:国家自然科学基金项目(61673251):陕西省科技攻关项 目(2013K12-03-24):陕西师范大学研究生创新基金项 在各行业得到了空前的重视和应用,并取得了前所 目(2015CXS028,2016CSY009):中央高校基本科研业 未有的成功。聚类分析作为无监督学习方法,是 务费重点项目(GK201701006). 通信作者:谢娟英.E-mail:xiejuany@snnu.edu.cn 各行业数据分析的主要工具之一,其旨在发现数据DOI: 10.11992/tis.201706029 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20171109.1250.006.html 聚类有效性评价新指标 谢娟英,周颖,王明钊,姜炜亮 (陕西师范大学 计算算计科学学院,陕西 西安 710062) 摘 要:聚类有效性评价指标分为外部评价指标和内部评价指标两大类。现有外部评价指标没有考虑聚类结果类偏 斜现象;现有内部评价指标的聚类有效性检验效果难以得到最佳类簇数。针对现有内外部聚类评价指标的缺陷,提 出同时考虑正负类信息的分别基于相依表和样本对的外部评价指标,用于评价任意分布数据集的聚类结果; 提出采 用方差度量类内紧密度和类间分离度,以类间分离度与类内紧密度之比作为度量指标的内部评价指标。UCI 数据集 和人工模拟数据集实验测试表明,提出的新内部评价指标能有效发现数据集的真实类簇数;提出的基于相依表和样 本对的外部评价指标,可有效评价存在类偏斜与噪音数据的聚类结果。 关键词:聚类;聚类有效性;评价指标;外部指标;内部指标;F-measure;Adjusted Rand Index;STDI;S2;PS2 中图分类号:TP108 文献标志码:A 文章编号:1673−4785(2017)06−0873−10 中文引用格式:谢娟英, 周颖, 王明钊, 等. 聚类有效性评价新指标[J]. 智能系统学报, 2017, 12(6): 873–882. 英文引用格式:XIE Juanying, ZHOU Ying, WANG Mingzhao, et al. New criteria for evaluating the validity of clustering[J]. CAAI transactions on intelligent systems, 2017, 12(6): 873–882. New criteria for evaluating the validity of clustering XIE Juanying,ZHOU Ying,WANG Mingzhao,JIANG Weiliang (School of Computer Science, Shaanxi Normal University, Xi’an 710062, China) Abstract: There are two kinds of criteria for evaluating the clustering ability of a clustering algorithm, internal and ex￾ternal. The current external evaluation indexes fails to consider the skewed clustering result; it is difficult to get optim￾um cluster numbers from the clustering validity inspection results from the internal evaluation indexes. Considering the defects in the present internal and external clustering evaluation indices, we propose two external evaluation indexes, which consider both positive and negative information and which are respectively based on the contingency table and sample pairs for the evaluation of clustering results from a dataset with arbitrary distribution. The variance is proposed to measure the tightness of a cluster and the separability between clusters, and the ratio of these parameters is used as an internal evaluation index for the measurement index. Experiments on the datesets from UCI (University of California in Iven) machine learning repository and artificially simulated datasets show that the proposed new internal index can be used to effectively find the truenumber of clusters in a dataset. The proposed external indexes based on the contingency table and sample pairs are a very effective external evaluation indexes and can be used to evaluate the clustering results from existing types of skewed and noisy data. Keywords: clustering; validity of clustering; evaluation index; external criteria; internal criteria; F-measure; Adjusted Rand Index; STDI; S2; PS2 随着人工智能技术如火如荼地发展,机器学习 在各行业得到了空前的重视和应用,并取得了前所 未有的成功[1-5]。聚类分析作为无监督学习方法,是 各行业数据分析的主要工具之一,其旨在发现数据 收稿日期:2017−06−08. 网络出版日期:2017−11−09. 基金项目:国家自然科学基金项目 (61673251);陕西省科技攻关项 目 (2013K12-03-24);陕西师范大学研究生创新基金项 目 (2015CXS028,2016CSY009);中央高校基本科研业 务费重点项目 (GK201701006). 通信作者:谢娟英. E-mail:xiejuany@snnu.edu.cn. 第 12 卷第 6 期 智 能 系 统 学 报 Vol.12 No.6 2017 年 12 月 CAAI Transactions on Intelligent Systems Dec. 2017
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有