正在加载图片...
第8卷第5期 智能系统学报 Vol.8 No.5 2013年10月 CAAI Transactions on Intelligent Systems 0ct.2013 D0L:10.3969/j.issn.1673-4785.201305033 网络出版地址:http:/www.cnki.net/kcms/detail/23.1538.TP.20130929.1105.006.html 基于Tri-training的半监督多标记学习算法 刘杨磊2,梁吉业2,高嘉伟2,杨静2 (1.山西大学计算机与信息技术学院,山西太原030006:2.山西大学计算智能与中文信息处理教育部重点实验室, 山西太原030006) 摘要:传统的多标记学习是监督意义下的学习,它要求获得完整的类别标记但是当数据规模较大且类别数目较多 时,获得完整类别标记的训练样本集是非常困难的.因而,在半监督协同训练思想的框架下,提出了基于Tmi-training 的半监督多标记学习算法(SMT).在学习阶段,SMLT引入一个虚拟类标记,然后针对每一对类别标记,利用协同训 练机制Ti-training算法训练得到对应的分类器:在预测阶段,给定一个新的样本,将其代入上述所得的分类器中,根 据类别标记得票数的多少将多标记学习问题转化为标记排序问题,并将虚拟类标记的得票数作为阈值对标记排序 结果进行划分.在UCI中4个常用的多标记数据集上的对比实验表明,SMLT算法在4个评价指标上的性能大多优于 其他对比算法,验证了该算法的有效性。 关键词:多标记学习:半监督学习:Tmi-training 中图分类号:TP181文献标志码:A文章编号:1673-4785(2013)05-439-07 中文引用格式:刘杨磊,梁吉业,高嘉伟,等.基于Ti-training的半监督多标记学习算法[J】.智能系统学报,2013,8(5):439-445, 英文引用格式:LIU Yanglei,LIANG Jiye,GAO Jiawei,etal.Semi-supervised multi-.label learning algorithm based on Tri-training [J].CAAI Transactions on Intelligent Systems,2013,8(5):439-445. Semi-supervised multi-label learning algorithm based on Tri-training LIU Yanglei,LIANG Jiye2,GAO Jiawei',YANG Jing2 (1.School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China) Abstract:Traditional multi-label learning is in the sense of supervision,in which the complete category labels are required.However,when the size of data is large and there are several categories of labels,it is quite difficult to obtain the training sample sets with complete labels.Therefore,a semi-supervised multi-label learning algorithm based on Tri-training (SMLT)is proposed.In the learning stage,SMLT initially introduces a virtual label,then for each pair of virtual labels,the Tri-training algorithm is utilized to train the corresponding classifiers for each pair of labels.In the forecast stage,a new sample is given,which will be substituted into the obtained classifier described above.According to the votes of each label,the multi-label learning problem is transformed into a label ranking problem,subsequently;the votes of the virtual label are taken as the threshold for distinguishing the label ranking results.The contrast experiments on four commonly used UCI multi-label datasets show the SMLT algorithm behaves better than other comparative algorithms in four evaluation indices and the effectiveness of the proposed algorithm is verified. Keywords:multi-label learning;semi-supervised learning;Tri-training 多标记学习(muli--label learning)们是机器学习 个训练样本可能同时对应于一个或多个不同的概念 领域的重要研究方向之一.在多标记学习问题中,一 标记,以表达其语义信息,学习的任务是为待学习样 本预测其对应的概念标记集合.多标记学习问题普 收稿日期:2013-05-09.网络出版日期:2013-09-29 遍存在于真实世界中,比如在图像场景分类任务中, 基金项目:国家“973”计划前期研究专项(2011CB311805):山西省科 技攻关计划资助项目(20110321027-01):山西省科技基础 一幅图像可能因包含“树木”、“天空”、“湖泊”以及 条件平台建设项目(2012091002-0101). 通信作者:梁吉业.E-mail:jy@sxu.cdu.cm “山峰”等语义概念,而拥有多个概念标记第 8 卷第 5 期 智 能 系 统 学 报 Vol.8 №.5 2013 年 10 月 CAAI Transactions on Intelligent Systems Oct. 2013 DOI:10.3969 / j.issn.1673⁃4785.201305033 网络出版地址:http: / / www.cnki.net / kcms/ detail / 23.1538.TP.20130929.1105.006.html 基于 Tri⁃training 的半监督多标记学习算法 刘杨磊1,2 ,梁吉业1,2 ,高嘉伟1,2 ,杨静1,2 (1.山西大学 计算机与信息技术学院,山西 太原 030006; 2.山西大学 计算智能与中文信息处理教育部重点实验室, 山西 太原 030006) 摘 要:传统的多标记学习是监督意义下的学习,它要求获得完整的类别标记.但是当数据规模较大且类别数目较多 时,获得完整类别标记的训练样本集是非常困难的.因而,在半监督协同训练思想的框架下,提出了基于 Tri⁃training 的半监督多标记学习算法(SMLT).在学习阶段,SMLT 引入一个虚拟类标记,然后针对每一对类别标记,利用协同训 练机制 Tri⁃training 算法训练得到对应的分类器;在预测阶段,给定一个新的样本,将其代入上述所得的分类器中,根 据类别标记得票数的多少将多标记学习问题转化为标记排序问题,并将虚拟类标记的得票数作为阈值对标记排序 结果进行划分.在 UCI 中 4 个常用的多标记数据集上的对比实验表明,SMLT 算法在 4 个评价指标上的性能大多优于 其他对比算法,验证了该算法的有效性. 关键词:多标记学习;半监督学习;Tri⁃training 中图分类号:TP181 文献标志码:A 文章编号:1673⁃4785(2013)05⁃439⁃07 中文引用格式:刘杨磊,梁吉业,高嘉伟,等.基于 Tri⁃training 的半监督多标记学习算法[J]. 智能系统学报, 2013, 8(5): 439⁃445. 英文引用格式:LIU Yanglei, LIANG Jiye, GAO Jiawei, et al. Semi⁃supervised multi⁃label learning algorithm based on Tri⁃training [J]. CAAI Transactions on Intelligent Systems, 2013, 8(5): 439⁃445. Semi⁃supervised multi⁃label learning algorithm based on Tri⁃training LIU Yanglei 1,2 , LIANG Jiye 1,2 , GAO Jiawei 1,2 , YANG Jing 1,2 (1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China; 2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China) Abstract:Traditional multi⁃label learning is in the sense of supervision, in which the complete category labels are required. However, when the size of data is large and there are several categories of labels, it is quite difficult to obtain the training sample sets with complete labels. Therefore, a semi⁃supervised multi⁃label learning algorithm based on Tri⁃training (SMLT) is proposed. In the learning stage, SMLT initially introduces a virtual label, then for each pair of virtual labels, the Tri⁃training algorithm is utilized to train the corresponding classifiers for each pair of labels. In the forecast stage, a new sample is given, which will be substituted into the obtained classifier described above. According to the votes of each label, the multi⁃label learning problem is transformed into a label ranking problem, subsequently; the votes of the virtual label are taken as the threshold for distinguishing the label ranking results. The contrast experiments on four commonly used UCI multi⁃label datasets show the SMLT algorithm behaves better than other comparative algorithms in four evaluation indices and the effectiveness of the proposed algorithm is verified. Keywords:multi⁃label learning; semi⁃supervised learning; Tri⁃training 收稿日期:2013⁃05⁃09. 网络出版日期:2013⁃09⁃29. 基金项目:国家“973”计划前期研究专项(2011CB311805);山西省科 技攻关计划资助项目( 20110321027⁃01);山西省科技基础 条件平台建设项目(2012091002⁃0101). 通信作者:梁吉业. E⁃mail: ljy@ sxu.edu.cn. 多标记学习(multi⁃label learning) [1]是机器学习 领域的重要研究方向之一.在多标记学习问题中,一 个训练样本可能同时对应于一个或多个不同的概念 标记,以表达其语义信息,学习的任务是为待学习样 本预测其对应的概念标记集合.多标记学习问题普 遍存在于真实世界中,比如在图像场景分类任务中, 一幅图像可能因包含“树木”、“天空”、“湖泊”以及 “山峰”等语义概念,而拥有多个概念标记.
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有