第1期 陈晓峰,等:半监督多标记学习的基因功能分析 ·89· SML_SVM的性能比MLSVM和Self-training 于后验概率最大原则对未标记样本分类,通过迭代 MLSVM更优,在5个指标上均达到最好.SML 的方式求解每个半监督单标记学习问题.实验表明, SVM在K,MaxIter参数上的实验结论与3.2节酵 SML SVM比自训练MLSVM和MLSVM性能更 母菌基因功能分析实验相似,且最大迭代次数Max 好,提高多标记学习的性能.在yeast基因功能分析 Iter对SML_SVM的影响比较大 和genbase蛋白质数据上的实验表明,SML_SVM 能利用未标记样本的信息,提高多标记学习的性能, 表6 Genbase数据集实验结果 SML SVM算法的不利之处是由于将多标记问题 Table 6 Experimental results of genbase dataset 转化为若干个不相关的单标记问题,所以,各标记间 SML_SVM MLSVM Self-training 的信息在算法中没有得到充分的利用,未来的工作 MLSVM 是研究标记间信息对半监督多标记学习的影响」 Hamming Loss X103 5 45454 6455426 4686±259 参考文献: Ranking Loss X103 4 4546 84249占940682254 [1]EISEN M B,SPELLMAN P T,BROWN P O,et al. Cluster analysis and display of genome-wide expression 0 neerror XI0~2↓31818±226377122820503844 patterns[C]//Proceedings of the National Academy of Science of the United States of America.Washington,D. Coverage↓ 05409100854059187h09441.07159034 C,USA,1998. [2]TAMAYO P,SLONIM D,MESIROV J,et al.Inter- Average Precision 0 959 89 0 0379 0 928 996 4520 595 65 043 preting patterns of gene expression with self-organizing maps[C]//Proceedings of the National Academy of Sci- 表7K取不同值时的实验结果 ences of the United States of America.Washington.D. Table 7 Experimental results with different K C,USA,1999. 2 4567 8 [3]WU S,LIEW A WC,YAN H,et al.Cluster analysis Hamming Loss XI0-3↓6181860455618185455604566364 of gene expression data based on self-splitting and mer- Ranking Loss003↓688263241683244546324121352 ging competitive learning[J].IEEE Transactions on In formation Technology in Biomedicine,2004,8(1):5-15. 0 eror02↓4431844304544318318182704528409 [4]MCCALLUM A K.Multi-label text classification with a Coverage 059773057841059773058091057841060341 mixture model trained by EM[C]//Working Notes of Average Precis0n↑09366509436109366509398909436109366 the AAAI'99 Workshop on Text Learning.Orlando, USA,1999. 表8K=3时不同的实验结果 [5]SCHAPIRE R E,SIN GER Y.Boostexter:a boosting Table 8 Experimental results with different when K=3 based system for text categorization[J].Machine Learn- Iteration 2 3 4 10 ing,2000,39(23):135168. [6]EL ISSEEFF A,WESTON J.A kernel method for multi- Hamming Loss X10~3↓5636456182555015487654545 labeled classification[Cl//Advances in Neural Informa- Ranking Loss XⅪ0-3↓683546027452113444214454 tion Processing Systems 14.Cambridge:MIT Press, 0 neerror X102↓501247126466544521144315 2002 Coverage↓ 0590910583410567305488054091 [7]BOUTELL M R,LUO J,SHEN X,et al.Learning multilabel scene classification [J ]Pattern Recognition, Average Precision↑0.93078093869094250.9434095989 2004,37(9):17571771. [8]OGIHARA LI T M.Detecting emotion in music [C]// 4 结束语 Proceedings of the International Symposium on Music In- formation Retrieval.Maryland,USA:ISMIR Press, 文中提出了基因表达数据的半监督多标记学习 2003. 问题,实现了半监督多标记支撑向量算法SML [9]ZHU X J.Semi-supervised learning literature survey SVM.SML SVM首先使用PT4策略把半监督多 [R].Department of Computer Sciences,University of 标记学习问题转化为半监督单标记问题,然后用基 Wisconsin,Madison,2005. 1994-2008 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.netSML _ SVM 的 性 能 比 ML SVM 和 Self2training ML SVM 更优 ,在 5 个指标上均达到最好. SML _ SVM 在 K ,MaxIter 参数上的实验结论与 312 节酵 母菌基因功能分析实验相似 ,且最大迭代次数 Max2 Iter 对 SML_SVM 的影响比较大. 表 6 Genbase 数据集实验结果 Table 6 Experimental results of genbase dataset SML_SVM MLSVM Self2training MLSVM Hamming Loss ×10 - 3 ↓ 51454 5 ±214 61455 4 ±216 46186 ±2519 Ranking Loss ×10 - 3 ↓ 41454 ±316 81424 9 ±319 4016812 ±2514 One2error ×10 - 2 ↓ 31181 8 ±2126 31777 1 ±2128 201503 ±8144 Coverage ↓ 01540 91 ±01085 4 01591 87 ±01094 411071 59 ±0134 Average Precision ↑ 01959 89 ±01037 9 01928 996 ±0145201595 65 ±01043 表 7 K取不同值时的实验结果 Table 7 Experimental results with different K K 2 4 5 6 7 8 Hamming Loss×10 - 3 ↓ 61181 8 61045 5 61181 8 51454 5 61044 5 61636 4 Ranking Loss ×10 - 3 ↓ 61883 2 61324 1 61883 2 41454 61324 1 71135 2 One2error ×10 - 2 ↓ 41431 8 41430 45 41431 8 31181 8 21704 5 21840 9 Coverage ↓ 01597 73 01578 41 01597 73 01580 91 01578 41 01603 41 Average Precision↑ 01936 65 01943 61 01936 65 01939 89 01943 61 01933 65 表 8 K= 3 时不同的实验结果 Table 8 Experimental results with different when K= 3 Iteration 2 3 4 5 10 Hamming Loss ×10 - 3 ↓51636 4 51618 2 51550 1 51487 6 51454 5 Ranking Loss ×10 - 3 ↓ 61835 4 61027 4 51211 3 41442 1 41454 One2error ×10 - 2 ↓ 51012 41712 6 41665 4 41521 1 41431 5 Coverage ↓ 01590 91 01583 41 01567 3 01548 8 01540 91 Average Precision ↑ 01930 78 01938 69 01942 5 01943 4 01959 89 4 结束语 文中提出了基因表达数据的半监督多标记学习 问题 ,实现了半监督多标记支撑向量算法 SML _ SVM. SML_SVM 首先使用 PT4 策略把半监督多 标记学习问题转化为半监督单标记问题 ,然后用基 于后验概率最大原则对未标记样本分类 ,通过迭代 的方式求解每个半监督单标记学习问题. 实验表明 , SML_SVM 比自训练 ML SVM 和 MLSVM 性能更 好 ,提高多标记学习的性能. 在 yeast 基因功能分析 和 genbase 蛋白质数据上的实验表明 ,SML _SVM 能利用未标记样本的信息 ,提高多标记学习的性能. SML_SVM 算法的不利之处是由于将多标记问题 转化为若干个不相关的单标记问题 ,所以 ,各标记间 的信息在算法中没有得到充分的利用 ,未来的工作 是研究标记间信息对半监督多标记学习的影响. 参考文献 : [1 ] EISEN M B , SPELLMAN P T , BROWN P O , et al. Cluster analysis and display of genome2wide expression patterns[ C]/ / Proceedings of the National Academy of Science of the United States of America. Washington ,D. C ,USA , 1998. [2 ] TAMA YO P , SLONIM D , MESIROV J , et al. Inter2 preting patterns of gene expression with self2organizing maps[C]/ / Proceedings of the National Academy of Sci2 ences of the United States of America. Washington ,D. C ,USA , 1999. [3 ]WU S , L IEW A W C , YAN H , et al. Cluster analysis of gene expression data based on self2splitting and mer2 ging competitive learning [J ]. IEEE Transactions on In2 formation Technology in Biomedicine , 2004 , 8 (1) :5215. [4 ]MCCALLUM A K. Multi2label text classification with a mixture model trained by EM [ C ]/ / Working Notes of the AAAI’99 Workshop on Text Learning. Orlando , USA ,1999. [5 ] SCHAPIRE R E , SIN GER Y. Boostexter : a boosting2 based system for text categorization[J ]. Machine Learn2 ing , 2000 , 39 (223) :1352168. [6 ] EL ISSEEFF A , WESTON J. A kernel method for multi2 labeled classification[ C]/ / Advances in Neural Informa2 tion Processing Systems 14. Cambridge : MIT Press , 2002. [7 ] BOU TELL M R , LUO J , SHEN X , et al. Learning multi2label scene classification [J ]. Pattern Recognition , 2004 , 37 (9) : 175721771. [8 ]O GIHARA L I T M. Detecting emotion in music [ C]/ / Proceedings of the International Symposium on Music In2 formation Retrieval. Maryland , USA : ISMIR Press , 2003. [ 9 ] ZHU X J. Semi2supervised learning literature survey [ R ]. Department of Computer Sciences , University of Wisconsin , Madison , 2005. 第 1 期 陈晓峰 ,等 :半监督多标记学习的基因功能分析 · 98 ·