第13卷第3期 智能系统学报 Vol.13 No.3 2018年6月 CAAI Transactions on Intelligent Systems Jun.2018 D0:10.11992/tis.201706092 网络出版t地址:http:/kns.cnki.net/cms/detail/23.1538.TP.20180408.1131.012.html 基于词缀的维吾尔谚语识别关键技术研究 穆妮热穆合塔尔2,李晓2,杨雅婷2,艾孜尔古丽,周喜2 (1.中国科学院新疆理化技术研究所,新疆乌鲁木齐83001山,2.新疆民族语音语言信息处理实验室,新疆乌鲁木齐 830011:3.中国科学院大学,北京100049:4.新疆师范大学计算机科学技术学院,新疆乌鲁木齐830054) 摘要:在自然语言理解、机器翻译、舆情分析等自然语言处理领域中,维吾尔谚语识别是整个文本实体识别的重要 组成部分。为满足维吾尔谚语信息化的需求,本文构建了比较完善的维吾尔谚语语料库。同时,从传统语言学角度 对维吾尔谚语的语法、语义结构进行分析,构建了一个由维吾尔谚语功能语类(词缀)组成的、专属维吾尔谚语规则 的知识库,并将此知识库与自然语言处理技术相结合,实现一个既能够从文本中识别出维吾尔谚语,又能提供维汉互 译等功能的信息软件系统。该系统也为开展计算机理解与处理维吾尔文字奠定了一个崭新的基础: 关键词:维吾尔谚语:谚语词缀:谚语规则:词缀覆盖率;谚语规则库:谚语语料库:识别系统 中图分类号:TP391.1 文献标志码:A文章编号:1673-4785(2018)03-0452-06 中文引用格式:穆妮热·穆合塔尔,李晓,杨雅婷,等.基于词缀的维吾尔谚语识别关键技术研究J.智能系统学报,2018,13(3): 452-457. 英文引用格式:Munire-Muhetaer,LI Xiao,YANG Yating,etal.Affix-based key technology for Uyghur proverb recognition|J. CAAI transactions on intelligent systems,2018,13(3):452-457. Affix-based key technology for Uyghur proverb recognition Munire Muhetaer,LI Xiao,YANG Yating2,AZRAGUL ZHOU Xi2 (1.Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Science,Urumqi 830011,China;2.Xinjiang Key Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,China;3.University of Chinese Academy of Science,Beijing 100049,China;4.School of Computer Science and Technology,Xinjiang Normal University,Urumqi 830054, China) Abstract:In fields of natural language processing such as natural language understanding,machine translation,and pub- lic opinion analysis,Uyghur proverb recognition is an important part of the whole text entity recognition.To meet the need of Uyghur proverb informationization,this paper establishes a relatively complete corpus of Uyghur proverbs.The grammar and semantic structure of Uygur proverbs were analyzed from the perspective of traditional linguistics,and a knowledge base that comprises functional genres(affixes)of Uyghur proverbs and obeys Uyghur proverb rules was con- structed.In addition,the knowledge base was combined with natural language processing technologies to realize an in- formation software system that can recognize Uyghur proverbs from text and mutually translate between Chinese and Uyghur language.The system also laid a new foundation for understanding and processing Uyghur language and charac- ters by computer. Keywords:Uyghur proverbs;proverbs affix;proverb rules;coverage rate of affix;proverb rule bases;proverb corpus; recognition system 收稿日期:2017-06-30.网络出版日期:201804-08. 基金项目:新疆维吾尔自治区重点实验室开放课题(2015KL031): 维吾尔语是在新疆维吾尔自治区范围内使用人 新疆维吾尔自治区重大科技专项课题(2016A03007-3): 新疆维吾尔自治区自然科学基金项目(2015211B034): 数较多的语言,维吾尔语信息化是我国少数民族语 中科院战略性先导科技专项项目XDA06030400):新疆 维吾尔自治区社会科学基金项日(2016CYY067). 言文字信息化的重要组成部分之一,一直以来得到 通信作者:李晓.E-mail:xiaoli@ms.xjb.ac.cn. 了党和国家的高度重视。维吾尔谚语在维吾尔语DOI: 10.11992/tis.201706092 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20180408.1131.012.html 基于词缀的维吾尔谚语识别关键技术研究 穆妮热·穆合塔尔1,2,3,李晓1,2,杨雅婷1,2,艾孜尔古丽4 ,周喜1,2 (1. 中国科学院 新疆理化技术研究所,新疆 乌鲁木齐 830011; 2. 新疆民族语音语言信息处理实验室,新疆 乌鲁木齐 830011; 3. 中国科学院大学,北京 100049; 4. 新疆师范大学 计算机科学技术学院,新疆 乌鲁木齐 830054) 摘 要:在自然语言理解、机器翻译、舆情分析等自然语言处理领域中,维吾尔谚语识别是整个文本实体识别的重要 组成部分。为满足维吾尔谚语信息化的需求,本文构建了比较完善的维吾尔谚语语料库。同时,从传统语言学角度 对维吾尔谚语的语法、语义结构进行分析,构建了一个由维吾尔谚语功能语类 (词缀) 组成的、专属维吾尔谚语规则 的知识库,并将此知识库与自然语言处理技术相结合,实现一个既能够从文本中识别出维吾尔谚语,又能提供维汉互 译等功能的信息软件系统。该系统也为开展计算机理解与处理维吾尔文字奠定了一个崭新的基础。 关键词:维吾尔谚语;谚语词缀;谚语规则;词缀覆盖率;谚语规则库;谚语语料库;识别系统 中图分类号:TP391.1 文献标志码:A 文章编号:1673−4785(2018)03−0452−06 中文引用格式:穆妮热·穆合塔尔, 李晓, 杨雅婷, 等. 基于词缀的维吾尔谚语识别关键技术研究[J]. 智能系统学报, 2018, 13(3): 452–457. 英文引用格式:Munire·Muhetaer , LI Xiao, YANG Yating, et al. Affix-based key technology for Uyghur proverb recognition[J]. CAAI transactions on intelligent systems, 2018, 13(3): 452–457. Affix-based key technology for Uyghur proverb recognition Munire·Muhetaer 1,2,3 ,LI Xiao1,2 ,YANG Yating1,2 ,AZRAGUL 4 ,ZHOU Xi1,2 (1. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China; 2. Xinjiang Key Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China; 3. University of Chinese Academy of Science, Beijing 100049, China; 4. School of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China) Abstract: In fields of natural language processing such as natural language understanding, machine translation, and public opinion analysis, Uyghur proverb recognition is an important part of the whole text entity recognition. To meet the need of Uyghur proverb informationization, this paper establishes a relatively complete corpus of Uyghur proverbs. The grammar and semantic structure of Uygur proverbs were analyzed from the perspective of traditional linguistics, and a knowledge base that comprises functional genres (affixes) of Uyghur proverbs and obeys Uyghur proverb rules was constructed. In addition, the knowledge base was combined with natural language processing technologies to realize an information software system that can recognize Uyghur proverbs from text and mutually translate between Chinese and Uyghur language. The system also laid a new foundation for understanding and processing Uyghur language and characters by computer. Keywords: Uyghur proverbs; proverbs affix; proverb rules; coverage rate of affix; proverb rule bases; proverb corpus; recognition system 维吾尔语是在新疆维吾尔自治区范围内使用人 数较多的语言,维吾尔语信息化是我国少数民族语 言文字信息化的重要组成部分之一,一直以来得到 了党和国家的高度重视[1]。维吾尔谚语在维吾尔语 收稿日期:2017−06−30. 网络出版日期:2018−04−08. 基金项目:新疆维吾尔自治区重点实验室开放课题 (2015KL031); 新疆维吾尔自治区重大科技专项课题 (2016A03007-3); 新疆维吾尔自治区自然科学基金项目 (2015211B034); 中科院战略性先导科技专项项目 (XDA06030400);新疆 维吾尔自治区社会科学基金项目 (2016CYY067). 通信作者:李晓. E-mail:xiaoli@ms.xjb.ac.cn. 第 13 卷第 3 期 智 能 系 统 学 报 Vol.13 No.3 2018 年 6 月 CAAI Transactions on Intelligent Systems Jun. 2018