正在加载图片...
第14卷第4期 智能系统学报 Vol.14 No.4 2019年7月 CAAI Transactions on Intelligent Systems Jul.2019 D0:10.11992/tis.201804013 网络出版地址:http:/kns.cnki.net/kcms/detail/23.1538.tp.20190323.2251.002.html 反馈式K近邻语义迁移学习的领域命名实体识别 朱艳辉2,李飞2,冀相冰2,曾志高2,徐啸2 (1.湖南工业大学计算机学院,湖南株洲412008:2.湖南省智能信息感知及处理技术重点实验室,湖南株洲 412008) 摘要:领域命名实体识别是构建领域知识图谱的重要基础。针对专业领域语料匮乏的特点,构建基于深度学 习的BiLSTM-CNN-CRFs网络模型,并提出一种反馈式K近邻语义迁移学习的领域命名实体识别方法。首先, 对专业领域语料和通用领域语料分别训练得到语料文档向量,使用马哈拉诺比斯距离计算领域语料与通用语 料的语义相似性,针对每个专业领域样本分别取K个语义最相似的通用领域样本进行语义迁移学习,构建多个 迁移语料集。然后,使用BiLSTM-CNN-CRFs网络模型对迁移语料集进行领域命名实体识别,并对识别结果进 行评估和前馈,根据反馈结果选取合适的K值,作为语义迁移学习的最佳阈值。以包装领域和医疗领域为例进 行实验验证,结果表明:本文方法取得了很好的识别效果,可以有效解决专业领域语料匮乏问题。 关键词:领域命名实体识别:反馈式K近邻:语义迁移学习:深度学习:卷积神经网络:文档向量;马哈拉诺比斯 距离:包装领域:医疗领域 中图分类号:TP391文献标志码:A文章编号:1673-4785(201904-0820-11 中文引用格式:朱艳辉,李飞,冀相冰,等.反馈式K近邻语义迁移学习的领域命名实体识别.智能系统学报,2019,14(4): 820-830. 英文引用格式:ZHU Yanhui,,LI Fei,,JI Xiangbing,etal.Domain-named entity recognition based on feedback K-nearest semantic transfer learning JI.CAAI transactions on intelligent systems,2019,14(4):820-830. Domain-named entity recognition based on feedback K-nearest semantic transfer learning ZHU Yanhui,LI Fei,JI Xiangbing,ZENG Zhigao2,XU Xiao2 (1.School of Computer,Hu'nan University of Technology,Zhuzhou 412008,China;2.Hu'nan Key Laboratory of Intelligent Inform- ation Perception and Processing Technology,Zhuzhou 412008,China) Abstract:Domain-named entity recognition is an important foundation in constructing domain knowledge maps.In view of the scarcity of such recognition,this paper constructs a BiLSTM-CNN-CRFs network model based on deep learning as well as proposes a domain-named entity recognition method based on feedback K-nearest-neighbor semantic transfer learning.First,the corpus of the professional field and the general field were trained to obtain the corpus docu- ment vector,and the semantic similarity between the corpus of a domain and the common corpus was calculated using the Mahalanobis distance calculation.For each specialized domain sample,K common domain samples with the most similar semantics were taken for semantic transfer learning,and several transfer corpus sets were constructed.Then,the BiLSTM-CNN-CRFs network model was used to identify domain-named entities in N migration corpuses and evaluate and feedforward the recognition results.An appropriate K value was selected as the best threshold for semantic transfer learning according to the feedback results.The packaging and medical fields were taken as examples for experimental verification.The results showed that the method proposed in this paper has a good recognition effect and can effectively solve the problem of lack of corpus in the field of specialization. Keywords:domain-named entity recognition;feedback K-nearest neighbor,semantic transfer learning;deep learning; CNN;Doc2Vec;Mahalanobis distance;packaging field;medical field 收稿日期:2018-04-10.网络出版日期:2019-03-25 命名实体识别(named entity recognition,.NER) 基金项目:国家自然科学基金项目(61402165):湖南省教育厅 重点项目(15A049):湖南工业大学重点项目(17ZBL- 作为信息抽取的子任务,是指将非结构化文本中 WT001KT006):湖南省研究生科研创新项目 具有特定意义的实体抽取出来,对文本的结构化 (CX2017B688). 通信作者:李飞.E-mail:flytoskye@l63.com 起着至关重要的作用。由于其在自然语言处理中DOI: 10.11992/tis.201804013 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.tp.20190323.2251.002.html 反馈式 K 近邻语义迁移学习的领域命名实体识别 朱艳辉1,2,李飞1,2,冀相冰1,2,曾志高1,2,徐啸1,2 (1. 湖南工业大学 计算机学院,湖南 株洲 412008; 2. 湖南省智能信息感知及处理技术重点实验室,湖南 株洲 412008) 摘 要:领域命名实体识别是构建领域知识图谱的重要基础。针对专业领域语料匮乏的特点,构建基于深度学 习的 BiLSTM-CNN-CRFs 网络模型,并提出一种反馈式 K 近邻语义迁移学习的领域命名实体识别方法。首先, 对专业领域语料和通用领域语料分别训练得到语料文档向量,使用马哈拉诺比斯距离计算领域语料与通用语 料的语义相似性,针对每个专业领域样本分别取 K 个语义最相似的通用领域样本进行语义迁移学习,构建多个 迁移语料集。然后,使用 BiLSTM-CNN-CRFs 网络模型对迁移语料集进行领域命名实体识别,并对识别结果进 行评估和前馈,根据反馈结果选取合适的 K 值,作为语义迁移学习的最佳阈值。以包装领域和医疗领域为例进 行实验验证,结果表明:本文方法取得了很好的识别效果,可以有效解决专业领域语料匮乏问题。 关键词:领域命名实体识别;反馈式 K 近邻;语义迁移学习;深度学习;卷积神经网络;文档向量;马哈拉诺比斯 距离;包装领域;医疗领域 中图分类号:TP391 文献标志码:A 文章编号:1673−4785(2019)04−0820−11 中文引用格式:朱艳辉, 李飞, 冀相冰, 等. 反馈式 K 近邻语义迁移学习的领域命名实体识别 [J]. 智能系统学报, 2019, 14(4): 820–830. 英文引用格式:ZHU Yanhui, LI Fei, JI Xiangbing, et al. Domain-named entity recognition based on feedback K-nearest semantic transfer learning[J]. CAAI transactions on intelligent systems, 2019, 14(4): 820–830. Domain-named entity recognition based on feedback K-nearest semantic transfer learning ZHU Yanhui1,2 ,LI Fei1,2 ,JI Xiangbing1,2 ,ZENG Zhigao1,2 ,XU Xiao1,2 (1. School of Computer, Hu’nan University of Technology, Zhuzhou 412008, China; 2. Hu’nan Key Laboratory of Intelligent Inform￾ation Perception and Processing Technology, Zhuzhou 412008, China) Abstract: Domain-named entity recognition is an important foundation in constructing domain knowledge maps. In view of the scarcity of such recognition, this paper constructs a BiLSTM-CNN-CRFs network model based on deep learning as well as proposes a domain-named entity recognition method based on feedback K-nearest-neighbor semantic transfer learning. First, the corpus of the professional field and the general field were trained to obtain the corpus docu￾ment vector, and the semantic similarity between the corpus of a domain and the common corpus was calculated using the Mahalanobis distance calculation. For each specialized domain sample, K common domain samples with the most similar semantics were taken for semantic transfer learning, and several transfer corpus sets were constructed. Then, the BiLSTM-CNN-CRFs network model was used to identify domain-named entities in N migration corpuses and evaluate and feedforward the recognition results. An appropriate K value was selected as the best threshold for semantic transfer learning according to the feedback results. The packaging and medical fields were taken as examples for experimental verification. The results showed that the method proposed in this paper has a good recognition effect and can effectively solve the problem of lack of corpus in the field of specialization. Keywords: domain-named entity recognition; feedback K-nearest neighbor; semantic transfer learning; deep learning; CNN; Doc2Vec; Mahalanobis distance; packaging field; medical field 命名实体识别 (named entity recognition,NER) 作为信息抽取的子任务,是指将非结构化文本中 具有特定意义的实体抽取出来,对文本的结构化 起着至关重要的作用。由于其在自然语言处理中 收稿日期:2018−04−10. 网络出版日期:2019−03−25. 基金项目:国家自然科学基金项目 (61402165);湖南省教育厅 重点项目 (15A049);湖南工业大学重点项目 (17ZBL￾WT001KT006) ;湖南省研究生科研创新项 目 (CX2017B688). 通信作者:李飞. E-mail: flytoskye@163.com. 第 14 卷第 4 期 智 能 系 统 学 报 Vol.14 No.4 2019 年 7 月 CAAI Transactions on Intelligent Systems Jul. 2019
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有