正在加载图片...
第16卷第1期 智能系统学报 Vol.16 No.1 2021年1月 CAAI Transactions on Intelligent Systems Jan.2021 D0:10.11992tis.202003007 半监督类保持局部线性嵌入方法 邓廷权,王强 (哈尔滨工程大学数学科学学院,黑龙江哈尔滨150001) 摘要:为使局部线性嵌入(local linear embedding,LLE)这一无监督高维数据的非线性特征提取方法提取出的 特征在分类或聚类学习上更优,提出一种半监督类保持局部线性嵌入(semi-supervised class preserving local lin- ear embedding,SSCLLE)的非线性特征提取方法。该方法将半监督信息融入到LLE中,首先对标记样本近邻赋 予伪标签,增大标记样本数量。其次,对标记样本之间的距离进行局部调整,缩小同类样本间距,扩大异类样 本间距。同时在局部线性嵌入优化目标函数中增加全局同类样本间距和异类样本间距的约束项,使得提取出 的低维特征可以确保同类样本点互相靠近,而异类样本点彼此分离。在一系列实验中,其聚类精确度以及可视 化效果明显高于无监督LLE和现有半监督流特征提取方法,表明该方法提取出的特征具有很好的类保持特性。 关键词:非线性特征提取;流形学习;半监督:标记信息;聚类;可视化 中图分类号:TP181文献标志码:A文章编号:1673-4785(2021)01-0098-10 中文引用格式:邓廷权,王强.半监督类保持局部线性嵌入方法几.智能系统学报,2021,16(1):98-107. 英文引用格式:DENGTingquan,WANG Qiang.Semi-supervised class preserving locally linear embedding.CAAItransactions on intelligent systems,2021,16(1):98-107. Semi-supervised class preserving locally linear embedding DENG Tingquan,WANG Qiang (College of Mathematical Sciences,Harbin engineering university,Harbin 150001,China) Abstract:To make local linear embedding(LLE),the nonlinear feature extraction method for unsupervised high-dimen- sional data,more optimal in classification or clustering learning,we propose a nonlinear semi-supervised class pre- serving local linear embedding (SSCLLE)feature extraction method.This method integrates semi-supervised informa- tion into LLE.First,pseudo-labels are assigned to the nearby neighbors of the labeled samples to increase the number of labeled samples.Second,the distance between the labeled samples is partially adjusted to reduce the distance between similar samples and expand the distance between heterogeneous samples.Simultaneously,the constraints of the glob- ally same sample spacing and heterogeneous sample spacing are added in the local linear embedding optimization ob- jective function so that the extracted low-dimensional features can ensure that the same sample points are near each oth- er,whereas the heterogeneous sample points are separated from each other.In a series of experiments,the clustering ac- curacy and visualization effect of the proposed method are significantly higher than those of unsupervised LLE and the existing semi-supervised flow feature extraction methods,indicating that the features extracted by this method have good class retention characteristics. Keywords:nonlinear feature extraction;manifold learning;semi-supervised;labeled information;clustering;visualiza- tion 随着信息科技的迅速发展,数据规模的爆炸 点,为数据挖掘带来了空前的挑战。特征提取作 式增长成为了大数据时代的主要特征之一。在此 为处理高维数据的有效手段,通过提取数据的低 时代背景下,数据通常具有维数高和稀疏性等特 维特性,可以将高维特征空间映射到低维特征空 间中进行数据的分析和处理,通常分为线性特征 收稿日期:2020-03-04. 基金项目:国家自然科学基金项目(11471001,61872104). 提取和非线性特征提取2种方式。非线性特征提 通信作者:王强.E-mail:1005834631@qq.com, 取不依赖于线性假设,对于处理非线性结构的数DOI: 10.11992/tis.202003007 半监督类保持局部线性嵌入方法 邓廷权,王强 (哈尔滨工程大学 数学科学学院,黑龙江 哈尔滨 150001) 摘 要:为使局部线性嵌入(local linear embedding, LLE)这一无监督高维数据的非线性特征提取方法提取出的 特征在分类或聚类学习上更优,提出一种半监督类保持局部线性嵌入 (semi-supervised class preserving local lin￾ear embedding, SSCLLE) 的非线性特征提取方法。该方法将半监督信息融入到 LLE 中,首先对标记样本近邻赋 予伪标签,增大标记样本数量。其次,对标记样本之间的距离进行局部调整,缩小同类样本间距,扩大异类样 本间距。同时在局部线性嵌入优化目标函数中增加全局同类样本间距和异类样本间距的约束项,使得提取出 的低维特征可以确保同类样本点互相靠近,而异类样本点彼此分离。在一系列实验中,其聚类精确度以及可视 化效果明显高于无监督 LLE 和现有半监督流特征提取方法,表明该方法提取出的特征具有很好的类保持特性。 关键词:非线性特征提取;流形学习;半监督;标记信息;聚类;可视化 中图分类号:TP181 文献标志码:A 文章编号:1673−4785(2021)01−0098−10 中文引用格式:邓廷权, 王强. 半监督类保持局部线性嵌入方法 [J]. 智能系统学报, 2021, 16(1): 98–107. 英文引用格式:DENG Tingquan, WANG Qiang. Semi-supervised class preserving locally linear embedding[J]. CAAI transactions on intelligent systems, 2021, 16(1): 98–107. Semi-supervised class preserving locally linear embedding DENG Tingquan,WANG Qiang (College of Mathematical Sciences, Harbin engineering university, Harbin 150001, China) Abstract: To make local linear embedding (LLE), the nonlinear feature extraction method for unsupervised high-dimen￾sional data, more optimal in classification or clustering learning, we propose a nonlinear semi-supervised class pre￾serving local linear embedding (SSCLLE) feature extraction method. This method integrates semi-supervised informa￾tion into LLE. First, pseudo-labels are assigned to the nearby neighbors of the labeled samples to increase the number of labeled samples. Second, the distance between the labeled samples is partially adjusted to reduce the distance between similar samples and expand the distance between heterogeneous samples. Simultaneously, the constraints of the glob￾ally same sample spacing and heterogeneous sample spacing are added in the local linear embedding optimization ob￾jective function so that the extracted low-dimensional features can ensure that the same sample points are near each oth￾er, whereas the heterogeneous sample points are separated from each other. In a series of experiments, the clustering ac￾curacy and visualization effect of the proposed method are significantly higher than those of unsupervised LLE and the existing semi-supervised flow feature extraction methods, indicating that the features extracted by this method have good class retention characteristics. Keywords: nonlinear feature extraction; manifold learning; semi-supervised; labeled information; clustering; visualiza￾tion 随着信息科技的迅速发展,数据规模的爆炸 式增长成为了大数据时代的主要特征之一。在此 时代背景下,数据通常具有维数高和稀疏性等特 点,为数据挖掘带来了空前的挑战。特征提取作 为处理高维数据的有效手段,通过提取数据的低 维特性,可以将高维特征空间映射到低维特征空 间中进行数据的分析和处理,通常分为线性特征 提取和非线性特征提取 2 种方式。非线性特征提 取不依赖于线性假设,对于处理非线性结构的数 收稿日期:2020−03−04. 基金项目:国家自然科学基金项目 (11471001,61872104). 通信作者:王强. E-mail: 1005834631@qq.com. 第 16 卷第 1 期 智 能 系 统 学 报 Vol.16 No.1 2021 年 1 月 CAAI Transactions on Intelligent Systems Jan. 2021
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有