正在加载图片...
·234· 智能系统学报 第15卷 由上述实验可知,专家集的选取对本方法有 2009:2035-2043 一定影响,在处理不平衡数据集时,随机选择的 [7]RAYKAR VC,YU Shisheng,ZHAO L H,et al.Learning 专家集可能会导致样本偏差较大从而影响到最终 from crowds[J].Journal of machine learning research. 的结果,人为选择使得专家集正负样本基本一致 2010.11:1297-1322. 会改进这种情况。 [8]DEMARTINI G.DIFALLAH D E.CUDRE-MAUROUX P.ZenCrowd:leveraging probabilistic reasoning and 4结论 crowdsourcing techniques for large-scale entity linking[Cl/ Proceedings of the 21st International Conference on World 1)本框架在工作人员数量较低或标记质量较 Wide Web.Lyon,France,2012:469-478. 低时均能取得不错的效果,且和增加标记人数或 [9]MUHAMMADI J.RABIEE H R.HOSSEINI A.A unified 提高标记质量所得结果差异不大。换句话说,可 statistical framework for crowd labeling[J].Knowledge 以在适当降低成本的同时获得更高质量的结果; and information systems,2015,45(2):271-294 2)和现有识别和验证框架比,由于引入专家 [10]FRENAY B,VERLEYSEN M.Classification in the pres- 标签,使得在标记质量较低时也能够取得不错的 ence of label noise:a survey[J].IEEE transactions on neural networks and learning systems,2014,25(5): 效果,且标记质量较高时准确率能够进一步提高; 845-869. 3)提供了将噪声实例校正并重新加入数据集 [11]GAMBERGER D,LAVRAC N,DZEROSKI S.Noise 的机会。 elimination in inductive concept learning:a case study in 本方法只适用于二分类标签,而扩展到多分 medical diagnosis[Cl//Proceedings of the 7th Internation- 类的情况时会变得较为复杂,结果偏差会变大。 al Workshop on Algorithmic Learning Theory.Sydney, 另外,本方法对于分布极不均衡的数据集效果略 Australia,1996:199-212. 差,如何应对也需要做进一步的研究。 [12]SUN Jiangwen,ZHAO Fengying,WANG Chongjun,et al.Identifying and correcting mislabeled training in- 参考文献: stances[Cl/Proceedings of Future Generation Communic- ation and Networking.Jeju,South Korea,2007:244-250. [1]ZHOU Zhihua.A brief introduction to weakly supervised [13]BRODLEY C E,FRIEDL M A.Identifying mislabeled learning[J].National science review,2018,5(1):44-53. training data[J].Journal of artificial intelligence research. [2]HU Huiqi,ZHENG Yudian,BAO Zhifeng,et al.Crowd- 1999,11(1片131-167. sourced POI labelling:location-aware result inference and [14]ZHOU Ta,ISHIBUCHI H,WANG Shitong.Stacked- task assignment[C]//Proceedings of 2016 IEEE 32nd Inter- structure-based hierarchical Takagi-Sugeno-Kang fuzzy national Conference on Data Engineering.Helsinki,Fin- classification through feature augmentation[J].IEEE land,2016:61-72. transactions on emerging topics in computational intelli- [3]RODRIGUES F,PEREIRA F C,RIBEIRO B.Gaussian gence,.2017,1(6):421-436. [15]ZHOU Zhihua.Ensemble methods:foundations and al- process classification and active learning with multiple an- gorithms[M].Boca Raton:Taylor Francis,2012. notators[C //Proceedings of the 31st International Confer- ence on International Conference on Machine Learning. 作者简介: Beijing,China,2014:II-433-II-441. 李易南,硕士研究生,主要研究方 [4]ZHANG Jing,SHENG V S,LI Tao,et al.Improving 向为人工智能与模式识别。 crowdsourced label quality using noise correction[J].IEEE transactions on neural networks and learning systems, 2018,29(5):1675-1688. [5]IPEIROTIS P G,PROVOST F,SHENG V S,et al.Re- peated labeling using multiple noisy labelers[J].Data min- ing and knowledge discovery,2014,28(2):402-441. 王士同,教授.博士生导师.主要 [6]WHITEHILL J,RUVOLO P,WU Tingfan,et al.Whose 研究方向为人工智能与模式识别。发 表学术论文近百篇。 vote should count more:optimal integration of labels from labelers of unknown expertise[C]//Proceedings of the 22nd International Conference on Neural Information Pro- cessing Systems.Vancouver,British Columbia,Canada,由上述实验可知,专家集的选取对本方法有 一定影响,在处理不平衡数据集时,随机选择的 专家集可能会导致样本偏差较大从而影响到最终 的结果,人为选择使得专家集正负样本基本一致 会改进这种情况。 4 结论 1) 本框架在工作人员数量较低或标记质量较 低时均能取得不错的效果,且和增加标记人数或 提高标记质量所得结果差异不大。换句话说,可 以在适当降低成本的同时获得更高质量的结果; 2) 和现有识别和验证框架比,由于引入专家 标签,使得在标记质量较低时也能够取得不错的 效果,且标记质量较高时准确率能够进一步提高; 3) 提供了将噪声实例校正并重新加入数据集 的机会。 本方法只适用于二分类标签,而扩展到多分 类的情况时会变得较为复杂,结果偏差会变大。 另外,本方法对于分布极不均衡的数据集效果略 差,如何应对也需要做进一步的研究。 参考文献: ZHOU Zhihua. A brief introduction to weakly supervised learning[J]. National science review, 2018, 5(1): 44–53. [1] HU Huiqi, ZHENG Yudian, BAO Zhifeng, et al. Crowd￾sourced POI labelling: location-aware result inference and task assignment[C]//Proceedings of 2016 IEEE 32nd Inter￾national Conference on Data Engineering. Helsinki, Fin￾land, 2016: 61–72. [2] RODRIGUES F, PEREIRA F C, RIBEIRO B. Gaussian process classification and active learning with multiple an￾notators[C]//Proceedings of the 31st International Confer￾ence on International Conference on Machine Learning. Beijing, China, 2014: II–433–II–441. [3] ZHANG Jing, SHENG V S, LI Tao, et al. Improving crowdsourced label quality using noise correction[J]. IEEE transactions on neural networks and learning systems, 2018, 29(5): 1675–1688. [4] IPEIROTIS P G, PROVOST F, SHENG V S, et al. Re￾peated labeling using multiple noisy labelers[J]. Data min￾ing and knowledge discovery, 2014, 28(2): 402–441. [5] WHITEHILL J, RUVOLO P, WU Tingfan, et al. Whose vote should count more: optimal integration of labels from labelers of unknown expertise[C]//Proceedings of the 22nd International Conference on Neural Information Pro￾cessing Systems. Vancouver, British Columbia, Canada, [6] 2009: 2035–2043. RAYKAR V C, YU Shisheng, ZHAO L H, et al. Learning from crowds[J]. Journal of machine learning research, 2010, 11: 1297–1322. [7] DEMARTINI G, DIFALLAH D E, CUDRÉ-MAUROUX P. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]// Proceedings of the 21st International Conference on World Wide Web. Lyon, France, 2012: 469–478. [8] MUHAMMADI J, RABIEE H R, HOSSEINI A. A unified statistical framework for crowd labeling[J]. Knowledge and information systems, 2015, 45(2): 271–294. [9] FRENAY B, VERLEYSEN M. Classification in the pres￾ence of label noise: a survey[J]. IEEE transactions on neural networks and learning systems, 2014, 25(5): 845–869. [10] GAMBERGER D, LAVRAČ N, DŽEROSKI S. Noise elimination in inductive concept learning: a case study in medical diagnosis[C]//Proceedings of the 7th Internation￾al Workshop on Algorithmic Learning Theory. Sydney, Australia, 1996: 199–212. [11] SUN Jiangwen, ZHAO Fengying, WANG Chongjun, et al. Identifying and correcting mislabeled training in￾stances[C]//Proceedings of Future Generation Communic￾ation and Networking. Jeju, South Korea, 2007: 244–250. [12] BRODLEY C E, FRIEDL M A. Identifying mislabeled training data[J]. Journal of artificial intelligence research, 1999, 11(1): 131–167. [13] ZHOU Ta, ISHIBUCHI H, WANG Shitong. Stacked￾structure-based hierarchical Takagi-Sugeno-Kang fuzzy classification through feature augmentation[J]. IEEE transactions on emerging topics in computational intelli￾gence, 2017, 1(6): 421–436. [14] ZHOU Zhihua. Ensemble methods: foundations and al￾gorithms[M]. Boca Raton: Taylor & Francis, 2012. [15] 作者简介: 李易南,硕士研究生,主要研究方 向为人工智能与模式识别。 王士同,教授,博士生导师,主要 研究方向为人工智能与模式识别。发 表学术论文近百篇。 ·234· 智 能 系 统 学 报 第 15 卷
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有