·412 智能系统学报 第4卷 使正确识别结果的含量接近90%,比反向结果还要 [5]于俊伟,黄德根.汉语介词短语的自动识别[J].中文信 高出4个百分点,对PP更是如此.这证实了基于历 息学报,2005,19(4):17-23 史的分类器模型能够体现最长短语识别在汉语句子 GAN Junwei,HUANG Degen.Automatic identification of 正反2个方向上的互补性,基于“分歧点”的融合算 Chinese prepositional phrase[J].Journal of Chinese Infor- mation Processing,2005,19(4):17-23. 法能部分地发掘这一特性,同时也显示出融合精度 [6]ZHOU Guodong,SU Jian,TEY Tongguan.Hybrid text 依然还有很大的提升空间 chunking[C]//Proceedings of the 2nd Workshop on Learn- 同样是使用基于SVMs的标注系统,NP的理想 ing Language in Logic and the 4th Conference on Computa- 融合精度与单向精度的差别不大,能够改进融合算 tional Natural Language Learning.Lisbon,Portugal,2000: 法的余地很小.而对于CRFs系统,其最长短语双向 163-165. 标注结果基本不具有互补能力 [7]KUDO T,MATSUMOTO Y.Chunking with support vector machines[C]//Proceedings of the North American Chapter 5 结束语 of the Association for Computational Linguistics.Pittsburgh, 本论文把广泛用于基本短语识别的基于复杂机 USA,2001:192-199. 器学习方法的序列标注技术用于汉语最长名词短语 [8]SHA Fei,PEREIRA F.Shallow parsing with conditional random fields [C]//Proceedings of the North American 和介词短语的识别,并从任务的语言学特殊性和序列 Chapter of the Association for Computational Linguistics. 标注算法的特点出发考察了算法的适应性.通过理论 Edmonton,Canada,2003:213-220. 分析和实验,证明了基于分类器的确定性标注方法对 [9]BAI Xuemei,LI Jinji,KIM Dongil,et al.Identification of 最长短语的识别是有效的,并且其双向结果有一定的 maximal-length noun phrases based on expanded chunks and 互补性.在此基础上提出的基于“分歧点”的融合算法 classified punctuations in Chinese[C]//Proceedings of In- 恰能发掘它们之间的互补性,并达到较高的融合精 terational Conference on Computer Processing of Oriental 度.本文提出的短语识别策略同样适用于其他具有相 Languages.Singapore,2006:268-276. 似特性的短语或语言,因此具有一定的普遍意义.实 [10]冯冲,陈肇雄,黄河燕,等.基于条件随机域的复杂最 验表明,对汉语MNP和PP双向标注融合方法的研 长名词短语识别[J].小型微型计算机系统,2006,27 (6):1134-1139 究还有很大的探索空间,这也指引我们继续寻找更 FENG Chong,CHEN Zhaoxiong,HUANG Heyan,et al. 有效的融合策略以进一步提高识别精度. Recognition of complex maximal length noun phrase using 参考文献: conditional random fields[J].Mini-Micro Systems,2006 27(6):1134-1139 [1]XUE Nanwen,XIA Fei,CHIOU Fudong,et al.The Penn [11]TJONG KIM SANG E F.Noun phrase recognition by sys- Chinese Treebank:phrase structure annotation of a large tem combination[C]//Proceedings of the North American corpus[J].Natural Language Engineering,2005,11(2): Chapter of the Association for Computational Linguistics. 207-238. Seattle,USA,2000:50-55. [2]李文捷,周明,潘海华,等。基于语料库的中文最长名 [12]CHEN Wenliang,ZHANG Yujie,ISAHARA H.An em- 词短语的自动抽取[C]/计算语言学进展与应用.北 pirical study of Chinese chunking[C]//Proceedings of the 京:清华大学出版社,1995:119-124 Joint Conference of the International Committee on Compu- LI Wenjie,ZHOU Ming,PAN Haihua,et al.Corpus-based tational Linguistics and the Association for Computational maximal-length Chinese noun phrases extraction[C]//Ad- Linguistics.Sydney,Australia,2006:97-104. vances and Applications on Computational Linguistics.Bei- [13]LEE Linshan,LIN Longji,CHEN Kehjiann.An efficient jing:Tsinghua University Press,1995:119-124. natural language processing system specially designed for [3]周强,孙茂松,黄昌宁.汉语最长名词短语的自动识别 the Chinese language [J].Computational Linguistics, [J].软件学报,2000,11(2):195-201 1991,17(4):347-374. ZHOU Qiang,SUN Maosong,HUANG Changning.Auto- [14]WU Yuchieh,YANG Jiechi,LEE Yueshi,et al.Efficient matic identification of Chinese maximal noun phrases[J. and robust phrase chunking using support vector machines Journal of Software,2000,11(2):195-201. [C]//Proceedings of Asia Information Retrieval Symposi- [4]王立霞,孙宏林.现代汉语介词短语边界识别研究[J] um.Singapore,2006:350-361. 中文信息学报,2005,19(3):80-86. [15]RATNAPARKHI A.A maximum entropy model for part-of- WANG Lixia,SUN Honglin.Automatic recognition of prep speech tagging[C]//Proceedings of the Empirical Methods ositional phrases in Chinese[J].Journal of Chinese Infor in Natural Language Processing.New Brunswick,USA mation Processing,2005,19(3):80-86. 1996:133-142