正在加载图片...
第9卷第3期 智能系统学报 Vol.9 No.3 2014年6月 CAAI Transactions on Intelligent Systems Jun.2014 D0:10.3969/j.issn.1673-4785.201403067 网络出版地址:http://www.enki..net/kcms/doi/10.3969/j.issn.16734785.201403067.html 面向大数据流的半监督在线多核学习算法 张钢,谢晓珊,黄英,王春茹 (广东工业大学自动化学院,广东广州510006) 摘要:在机器学习中,核函数的选择对核学习器性能有很大的影响,而通过核学习的方法可以得到有效的核函数。 提出一种面向大数据流的半监督在线核学习算法,通过当前读取的大数据流片段以在线方式更新当前的核函数。 算法通过大数据流的标签对核函数参数进行有监督的调整,同时以无监督的方式通过流形学习对核函数参数进行 修改,以使得核函数所体现的等距面尽可能沿着数据的某种低维流形分布。算法的创新性在于能同时进行有监督 和无监督的核学习,且不需要对历史数据进行再次扫描,有效降低了算法的时间复杂度,适用于在大数据和高速数 据流环境下的核函数学习问题,其对无监督学习的支持有效解决了大数据流中部分标记缺失的问题。在MOA生成 的人工数据集以及UC大数据分析的基准数据集上进行算法有效性的评估,其结果表明该算法是有效的。 关键词:大数据流:在线多核学习:流形学习:数据依赖核:半监督学习 中图分类号:TP18文献标志码:A文章编号:1673-4785(2014)03-0355-09 中文引用格式:张钢,谢晓珊,黄英,等.面向大数据流的半监督在线多核学习算法[J].智能系统学报,2014,9(3):355-363. 英文引用格式:ZHANG Gang,XIE Xiaoxian,HUANG Ying,etal.An online multi-kernel learning algorithm for big data[J】 CAAI Transactions on Intelligent Systems,2014,9(3):355-363. An online multi-kernel learning algorithm for big data ZHANG Gang,XIE Xiaoshan,HUANG Ying,WANG Chunru (School of Automation,Guangdong University of Technology,Guangzhou 510006,China) Abstract:In machine learning,a proper kernel function affects much on the performance of target learners.Commonly an effective kernel function can be obtained through kernel learing.We present a semi-supervised online multiple ker- nel algorithm for big data stream analysis.The algorithm learns a kernel function through an online update procedure by reading current segments of a big data stream.The algorithm adjusts the parameters of currently learned kemel function in a supervised manner and modifies the kemel through unsupervised manifold learning,so as to make the contour sur- faces of the kemel along with some low dimensionality manifold in the data space as far as possible.The novelty is that it performs supervised and unsupervised leaming at the same time,and scans the training data only once,which reduces the computational complexity and is suitable for the kernel learning tasks in big datasets and high speed data streams. This algorithm's support to the unsupervised learning effectively solves the problem of label missing in big data streams. The evaluation results from the synthetic datasets generated by MOA and the benchmark datasets of the big data analysis from the UCI data repository show the effectiveness of the proposed algorithm. Keywords:big data stream;online multi-kemel learning;manifold learning;data-dependent kernel;semi-supervised learning 随着信息技术的快速发展和大规模应用,数据 收稿日期:2014-03-25.网络出版日期:2014-06-14. 的生成速度及存储规模也在快速增长,特别是Wb 基金项目:国家自然科学基金资助项目(81373883) 通信作者:张钢.E-mail:px@gut.edu.cn.. 页面、社交网络及物联网的普及和应用,使人们所要第 9 卷第 3 期 智 能 系 统 学 报 Vol.9 №.3 2014 年 6 月 CAAI Transactions on Intelligent Systems Jun. 2014 DOI:10.3969 / j.issn.1673⁃4785.201403067 网络出版地址:http: / / www.cnki.net / kcms/ doi / 10.3969 / j.issn.16734785.201403067.html 面向大数据流的半监督在线多核学习算法 张钢,谢晓珊,黄英,王春茹 (广东工业大学 自动化学院,广东 广州 510006) 摘 要:在机器学习中,核函数的选择对核学习器性能有很大的影响,而通过核学习的方法可以得到有效的核函数。 提出一种面向大数据流的半监督在线核学习算法,通过当前读取的大数据流片段以在线方式更新当前的核函数。 算法通过大数据流的标签对核函数参数进行有监督的调整,同时以无监督的方式通过流形学习对核函数参数进行 修改,以使得核函数所体现的等距面尽可能沿着数据的某种低维流形分布。 算法的创新性在于能同时进行有监督 和无监督的核学习,且不需要对历史数据进行再次扫描,有效降低了算法的时间复杂度,适用于在大数据和高速数 据流环境下的核函数学习问题,其对无监督学习的支持有效解决了大数据流中部分标记缺失的问题。 在 MOA 生成 的人工数据集以及 UCI 大数据分析的基准数据集上进行算法有效性的评估,其结果表明该算法是有效的。 关键词:大数据流;在线多核学习;流形学习;数据依赖核;半监督学习 中图分类号: TP18 文献标志码:A 文章编号:1673⁃4785(2014)03⁃0355⁃09 中文引用格式:张钢,谢晓珊,黄英,等. 面向大数据流的半监督在线多核学习算法[J]. 智能系统学报, 2014, 9(3): 355⁃363. 英文引用格式:ZHANG Gang,XIE Xiaoxian,HUANG Ying, et al. An online multi⁃kernel learning algorithm for big data [ J]. CAAI Transactions on Intelligent Systems, 2014, 9(3): 355⁃363. An online multi⁃kernel learning algorithm for big data ZHANG Gang, XIE Xiaoshan, HUANG Ying, WANG Chunru (School of Automation, Guangdong University of Technology, Guangzhou 510006, China) Abstract:In machine learning, a proper kernel function affects much on the performance of target learners. Commonly an effective kernel function can be obtained through kernel learning. We present a semi-supervised online multiple ker⁃ nel algorithm for big data stream analysis. The algorithm learns a kernel function through an online update procedure by reading current segments of a big data stream. The algorithm adjusts the parameters of currently learned kernel function in a supervised manner and modifies the kernel through unsupervised manifold learning, so as to make the contour sur⁃ faces of the kernel along with some low dimensionality manifold in the data space as far as possible. The novelty is that it performs supervised and unsupervised learning at the same time, and scans the training data only once, which reduces the computational complexity and is suitable for the kernel learning tasks in big datasets and high speed data streams. This algorithm’s support to the unsupervised learning effectively solves the problem of label missing in big data streams. The evaluation results from the synthetic datasets generated by MOA and the benchmark datasets of the big data analysis from the UCI data repository show the effectiveness of the proposed algorithm. Keywords:big data stream; online multi⁃kernel learning; manifold learning; data⁃dependent kernel; semi⁃supervised learning 收稿日期:2014⁃03⁃25. 网络出版日期:2014⁃06⁃14. 基金项目:国家自然科学基金资助项目(81373883). 通信作者:张钢. E⁃mail:ipx@ gdut.edu.cn. 随着信息技术的快速发展和大规模应用,数据 的生成速度及存储规模也在快速增长,特别是 Web 页面、社交网络及物联网的普及和应用,使人们所要
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有