正在加载图片...
第12卷第5期 智能系统学报 Vol.12 No.5 2017年10月 CAAI Transactions on Intelligent Systems 0ct.2017 D0I:10.11992/is.201609020 网络出版地址:http:/kns.cmki.net/kcms/detail/23.1538.TP.20170317.1937.006.html 基于高维k-近邻互信息的特征选择方法 周红标12,3,乔俊飞12 (1.北京工业大学信息学部,北京100124:2.计算智能和智能系统北京市重点实验室,北京100124:3.淮阴工学院 自动化学院,江苏淮安223003) 摘要:针对多元序列预测建模过程中特征选择问题,提出了一种基于数据驱动型高维k近邻互信息的特征选择方 法。该方法首先将数据驱动型k近邻法扩展用于高维特征变量之间互信息的估计,然后采用前向累加策略给出全 部特征最优排序,根据预设无关特征个数剔除无关特征,再利用后向交叉策略找出并剔除冗余特征,最终得到最优 强相关特征子集。以riedman数据、Housing数据和实际污水处理出水总磷预测数据为例,采用多层感知器神经网 络预测模型进行仿真实验,验证了所提方法的有效性。 关键词:特征选择:互信息:k近邻:高维互信息:多层感知器 中图分类号:TP183文献标志码:A文章编号:1673-4785(2017)05-0595-06 中文引用格式:周红标,乔俊飞.基于高维k-近邻互信息的特征选择方法[J].智能系统学报,2017,12(5):595-600. 英文引用格式:ZHOU Hongbiao,QIAO Junfei.Feature selection method based on high dimensional-nearest neighbors mutual information[J].CAAI transactions on intelligent systems,2017,12(5):595-600. Feature selection method based on high dimensional k-nearest neighbors mutual information ZHOU Hongbiao2.3,QIAO Junfei (1.Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;2.Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing 100124,China;3.Faculty of Automation,Huaiyin Institute of Technology, Huai'an 223003,China) Abstract:Feature selection plays an important role in the modeling and forecast of multivariate series.In this paper,we propose a feature selection method based on data-driven high-dimensional k-nearest neighbor mutual information.First,this method extends the k-nearest neighbor method to estimate the amount of mutual information among high-dimensional feature variables.Next,optimal sorting of all these features is achieved by adopting a forward accumulation strategy in which irrelevant features are eliminated according to a preset number.Then, redundant features are located and removed using a backward cross strategy.Lastly,this method obtains optimal subsets that feature a strong correlation.Using Friedman data,housing data,and actual effluent total-phosphorus forecast data from wastewater treatment plant as examples,we performed a simulation experiment by adopting a neural network forecast model with multilayer perception.The simulation results demonstrate the feasibility of the proposed method. Keywords:feature selection;mutual information;k-nearest neighbor;high-dimensional mutual information; multilayer perceptron 通过特征选择实现数据降维,构建结构精简的 目前,特征选择的方法主要有偏最小二乘回归 辨识模型,能够有效避免输入特征太多造成“维数 (partial least squares regression,PLSR)[4、灰色关联 灾难”以及给学习模型带来“过拟合”等问题-)。 分析(grey relational analysis,GRA)Is)和互信息 收稿日期:2016-09-21.网络出版日期:2017-03-17. (mutual information,M)[s-刀等。MI对样本的分布 基金项目:国家自然科学基金重点项目(61533002). 类型无特别要求,可有效捕捉特征间的非线性关 通信作者:乔俊飞.E-mail:hyitzhb@163.com. 系,特别适合多元序列特征选择问题。第 12 卷第 5 期 智 能 系 统 学 报 Vol.12 №.5 2017 年 10 月 CAAI Transactions on Intelligent Systems Oct. 2017 DOI:10.11992 / tis.201609020 网络出版地址:http: / / kns.cnki.net / kcms/ detail / 23.1538.TP.20170317.1937.006.html 基于高维 k-近邻互信息的特征选择方法 周红标1,2,3 ,乔俊飞1,2 (1.北京工业大学 信息学部,北京 100124; 2. 计算智能和智能系统北京市重点实验室,北京 100124; 3.淮阴工学院 自动化学院,江苏 淮安 223003) 摘 要:针对多元序列预测建模过程中特征选择问题,提出了一种基于数据驱动型高维 k⁃近邻互信息的特征选择方 法。 该方法首先将数据驱动型 k⁃近邻法扩展用于高维特征变量之间互信息的估计,然后采用前向累加策略给出全 部特征最优排序,根据预设无关特征个数剔除无关特征,再利用后向交叉策略找出并剔除冗余特征,最终得到最优 强相关特征子集。 以 Friedman 数据、Housing 数据和实际污水处理出水总磷预测数据为例,采用多层感知器神经网 络预测模型进行仿真实验,验证了所提方法的有效性。 关键词:特征选择;互信息;k⁃近邻;高维互信息;多层感知器 中图分类号:TP183 文献标志码:A 文章编号:1673-4785(2017)05-0595-06 中文引用格式:周红标,乔俊飞.基于高维 k-近邻互信息的特征选择方法[J]. 智能系统学报, 2017, 12(5): 595-600. 英文引用格式:ZHOU Hongbiao,QIAO Junfei. Feature selection method based on high dimensional k⁃nearest neighbors mutual information[J]. CAAI transactions on intelligent systems, 2017, 12(5): 595-600. Feature selection method based on high dimensional k⁃nearest neighbors mutual information ZHOU Hongbiao 1,2,3 , QIAO Junfei 1,2 (1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; 2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China;3. Faculty of Automation, Huaiyin Institute of Technology, Huai’an 223003, China) Abstract:Feature selection plays an important role in the modeling and forecast of multivariate series. In this paper, we propose a feature selection method based on data⁃driven high⁃dimensional k⁃nearest neighbor mutual information. First, this method extends the k⁃nearest neighbor method to estimate the amount of mutual information among high⁃dimensional feature variables. Next, optimal sorting of all these features is achieved by adopting a forward accumulation strategy in which irrelevant features are eliminated according to a preset number. Then, redundant features are located and removed using a backward cross strategy. Lastly, this method obtains optimal subsets that feature a strong correlation. Using Friedman data, housing data, and actual effluent total⁃phosphorus forecast data from wastewater treatment plant as examples, we performed a simulation experiment by adopting a neural network forecast model with multilayer perception. The simulation results demonstrate the feasibility of the proposed method. Keywords: feature selection; mutual information; k⁃nearest neighbor; high⁃dimensional mutual information; multilayer perceptron 收稿日期:2016-09-21. 网络出版日期 基金项目:国家自然科学基金重点项目(6 : 1 2 5 0 3 1 3 7 0 - 0 0 2 3 ). -17. 通过特征选择实现数据降维,构建结构精简的 辨识模型,能够有效避免输入特征太多造成“ 通信作者:乔俊飞. E⁃mail:hyitzhb@ 163.com. 维数 灾难”以及给学习模型带来“过拟合” 等问题[1-3] 。 目前,特 征 选 择 的 方 法 主 要 有 偏 最 小 二 乘 回 归 (partial least squares regression, PLSR) [4] 、灰色关联 分析 ( grey relational analysis, GRA) [5] 和 互 信 息 (mutual information, MI) [6-7] 等。 MI 对样本的分布 类型无特别要求,可有效捕捉特征间的非线性关 系,特别适合多元序列特征选择问题
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有