正在加载图片...
第11卷第4期 智能系统学报 Vol.11 No.4 2016年8月 CAAI Transactions on Intelligent Systems Aug.2016 D0L:10.11992/tis.201606009 网络出版地址:http:/www.cnki.net/kcms/detail/23.1538.TP.20160808.0830.010.html 融合实体特性识别越南语复杂命名实体的混合方法 刘艳超1,郭剑毅12,余正涛1,2,周兰江12,严馨12,陈秀琴3 (1.昆明理工大学信息工程与自动化学院,云南昆明650500:2.昆明理工大学智能信息处理重点实验室,云南昆 明650500:3.昆明理工大学国际教有学院,云南昆明650093) 摘要:命名实体识别是自然语言处理过程中的基础任务。本文针对越南语的复杂命名实体难识别及F值不够高 的问题,提出了一种结合实体库的越南语命名实体识别混合方法。首先,本文根据越南语的语言和实体特点,选 取有效的局部特征和全局特征,应用最大嫡模型进行越南语命名实体识别:其次,根据本文制定的命名实体的规 则进行越南语命名实体识别;然后,结合两者的识别结果,以规则为主,统计为辅原则;最后经过人工校对,把 获取到的正确标记的实体加入到实体库,动态扩增实体库,为规则制定和特征选取提供丰富的语料和依据。实验 表明,该方法能够有效地结合规则与统计的方法优点,互相弥补不足,明显提高了识别的正确率、召回率和F值。 关键词:越南语:实体库构建:实体识别:最大嫡:规则:实体特点:全局特征:局部特征 中图分类号:TP391 文献标志码:A 文章编号:1673-4785(2016)04-0503-10 中文引用格式:刘艳超,郭剑毅,余正涛,等.融合实体特性识别越南语复杂命名实体的混合方法[J].智能系统学报,2016,11(4): 503-512. 英文l用格式:LIU Yanchao,GUO Jianyi,YU Zhengtao,etal.A hybrid method to recognize complex vietnamese named entity incorporating entity properties [J].CAAI Transactions on Intelligent Systems,2016,11(4):503-512. A hybrid method to recognize vietnamese complex named entity incorporating entity properties LIU Yanchao',GUO Jianyi2,YU Zhengtao2,ZHOU Lanjiang 2,YAN Xin2,CHEN Xiuqin!2 (1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500, China,2.Key Laboratory of Pattem recognition and Intelligent computing of Yunnan College,Kunming 650500,China;3.The School of International Educaton,Kunming University of Science and Technology,Kunming,650093,China) Abstract:NER(named entity recognition)is the basic task in natural language processing.Aimed at the problems of low F values and the difficulty with complex Vietnamese named entity recognition,a hybrid method incorporating entity properties is proposed.Firstly,according to the Vietnamese language and entity characteristics,local and global features were selected and a maximum entropy model built to recognize Vietnamese named entities.Secondly,according to the named entity rules obtained,the Vietnamese entity was recognized.Then,combining the recognition results,this paper uses the rule as the main principle and statistics as the supplementary principle.Finally,the obtained correct entity was added to the entity corpus after manual correction,dynamically expanding the entity corpus,which provided a rich corpus and a basis for determining rules and selecting features.Experimental results show that the method can effectively take advantage of rules and statistics,and that recognition accuracy,recall,and Fare all significantly improved. Keywords:vietnamese;entity library construction;entity recognition,maximum entropy;rules set,entity characters:lobal features;local features 命名实体识别的任务是识别待处理文本中的人名、地名、机构名、数字、时间、货币和百分号这 7种命名实体。其中,人名、地名、组织机构名最 收稿日期:2016-06-02.网络出版时间:2016-08-08. 难识别,同时也是最重要的3类实体;虽然数字、 基金项目:国家自然科学基金项目(61262041,61472168,61562052):云南 时间、货币和百分号这些实体相对简单,但是对上 省自然科学基金重点项目(2013FA030). 通信作者:郭剑毅.E-mail:gjade86@hotmail.com.第 11 卷第 4 期 智 能 系 统 学 报 Vol.11 No.4 2016 年 8 月 CAAI Transactions on Intelligent Systems Aug.2016 DOI:10.11992/tis.201606009 网络出版地址:http://www.cnki.net/kcms/detail/23.1538.TP.20160808.0830.010.html 融合实体特性识别越南语复杂命名实体的混合方法 刘艳超 1,郭剑毅 1,2,余正涛 1,2,周兰江 1,2,严馨 1,2,陈秀琴 3 (1.昆明理工大学 信息工程与自动化学院,云南 昆明 650500;2.昆明理工大学 智能信息处理重点实验室,云南 昆 明 650500;3. 昆明理工大学 国际教育学院,云南 昆明 650093) 摘 要:命名实体识别是自然语言处理过程中的基础任务。本文针对越南语的复杂命名实体难识别及 F 值不够高 的问题,提出了一种结合实体库的越南语命名实体识别混合方法。首先,本文根据越南语的语言和实体特点,选 取有效的局部特征和全局特征,应用最大熵模型进行越南语命名实体识别;其次,根据本文制定的命名实体的规 则进行越南语命名实体识别;然后,结合两者的识别结果,以规则为主,统计为辅原则;最后经过人工校对,把 获取到的正确标记的实体加入到实体库,动态扩增实体库,为规则制定和特征选取提供丰富的语料和依据。实验 表明,该方法能够有效地结合规则与统计的方法优点,互相弥补不足,明显提高了识别的正确率、召回率和 F 值。 关键词:越南语;实体库构建;实体识别;最大熵;规则;实体特点;全局特征;局部特征 中图分类号:TP391 文献标志码:A 文章编号:1673-4785(2016)04-0503-10 中文引用格式:刘艳超,郭剑毅,余正涛,等. 融合实体特性识别越南语复杂命名实体的混合方法[J]. 智能系统学报, 2016, 11(4): 503-512. 英文引用格式:LIU Yanchao, GUO Jianyi, YU Zhengtao,et al. A hybrid method to recognize complex vietnamese named entity incorporating entity properties[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4): 503-512. A hybrid method to recognize vietnamese complex named entity incorporating entity properties LIU Yanchao1 , GUO Jianyi1,2, YU Zhengtao1,2 , ZHOU Lanjiang1,2, YAN Xin1,2, CHEN Xiuqin1,2 (1.School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;2.Key Laboratory of Pattern recognition and Intelligent computing of Yunnan College, Kunming 650500, China; 3.The School of International Educaton, Kunming University of Science and Technology, Kunming, 650093, China) Abstract:NER (named entity recognition) is the basic task in natural language processing. Aimed at the problems of low F values and the difficulty with complex Vietnamese named entity recognition, a hybrid method incorporating entity properties is proposed. Firstly, according to the Vietnamese language and entity characteristics, local and global features were selected and a maximum entropy model built to recognize Vietnamese named entities. Secondly, according to the named entity rules obtained, the Vietnamese entity was recognized. Then, combining the recognition results, this paper uses the rule as the main principle and statistics as the supplementary principle. Finally, the obtained correct entity was added to the entity corpus after manual correction, dynamically expanding the entity corpus, which provided a rich corpus and a basis for determining rules and selecting features. Experimental results show that the method can effectively take advantage of rules and statistics, and that recognition accuracy, recall, and F are all significantly improved. Keywords: vietnamese; entity library construction; entity recognition; maximum entropy; rules set; entity characters;lobal features; local features 1 命名实体识别的任务是识别待处理文本中的人 收稿日期:2016-06-02. 网络出版时间:2016-08-08. 基金项目:国家自然科学基金项目(61262041,61472168,61562052);云南 省自然科学基金重点项目(2013FA030). 通信作者:郭剑毅. E-mail: gjade86@hotmail.com. 名、地名、机构名、数字、时间、货币和百分号这 7 种命名实体。其中,人名、地名、组织机构名最 难识别,同时也是最重要的 3 类实体;虽然数字、 时间、货币和百分号这些实体相对简单,但是对上 照 片 尺 寸 为 20mm*30mm;最 好不用红色背景
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有