第14卷第2期 智能系统学报 Vol.14 No.2 2019年3月 CAAI Transactions on Intelligent Systems Mar.2019 D0:10.11992/tis.201709014 网络出版地址:http:/kns.cnki.net/kcms/detail/23.1538.tp.20180420.1018.002.html 改进SURF特征的维吾尔文复杂文档图像匹配检索 阿丽亚·巴吐尔,努尔毕亚·亚地卡尔,吾尔尼沙·买买提,阿力木江·艾沙2,库尔班·吾布力 (1.新疆大学信息科学与工程学院,新疆乌鲁木齐,830046;2.新疆大学网络与信息中心,新疆乌鲁木齐, 830046) 摘要:针对图像局部特征的词袋模型(Bag-of-Word,BOW)检索研究中聚类中心的不确定性和计算复杂性问 题,提出一种由不同种类的距离进行相似程度测量的检索和由匹配点数来检索的方法。这种方法首先需要改 进文档图像的SURF特征,有效降低特征提取复杂度;其次,对FAST+SURF特征实现FLANN双向匹配与KD- Tre+BBF匹配,在不同变换条件下验证特征鲁棒性;最后,基于这两种检索方法对已收集整理好的各类维吾尔 文文档图像数据库进行检索。实验结果表明:基于距离的相似性度量复杂度次于基于匹配数目的检索,而且两 种检索策略都能满足快速、精确查找需求。 关键词:复杂文档:维吾尔文档图像:文档图像分割:特征提取:SURF特征:FLANN双向匹配:KD-Tre+BBF匹 配:图像检索 中图分类号:TP391.1文献标志码:A文章编号:1673-4785(2019)02-0296-10 中文引用格式:阿丽亚·巴吐尔,努尔毕亚·亚地卡尔,吾尔尼沙·买买提,等.改进SURF特征的维吾尔文复杂文档图像匹配检 索1.智能系统学报,2019,14(2):296-305. 英文引用格式:ALIYA Batur,NURBIYA Yadikar,HORNISA Mamat,,etal.Complex Uyghur document image matching and re-. trieval based on modified SURF feature[JI.CAAI transactions on intelligent systems,2019,14(2):296-305. Complex Uyghur document image matching and retrieval based on modified SURF feature ALIYA Batur',NURBIYA Yadikar',HORNISA Mamat',ALIMJAN Aysa',KURBAN Ubul' (1.School of Information Science and Engineering,Xinjiang University,Urumgi 830046,China;2.Network and information center, Xinjiang University,Xinjiang University,Urumqi 830046,China) Abstract:This study is aimed at the uncertainty and computational complexity of the clustering center in local image features retrieval based on the bag-of-words(BOW)model.A method to retrieve the measure of similarity degree from different kinds of distance and another method that requires using the matching point number as the basis of retrieval are proposed in this paper.In this method,the SURF feature is first modified to effectively reduce feature extraction com- plexity,and then FLANN(fast library for approximate nearest neighbors)bidirectional matching and KD-Tree +BBF matching are implemented for FAST+SURF features.Feature robustness is verified under different transformation con- ditions.Finally,all kinds of Uyghur document images that have been classified and sorted based on these two retrieval methods are retrieved.The results of the retrieval experiments indicate that the similarity degree measure retrieval based on distance is inferior to the retrieval based on matching number,and both of these two retrieval strategies can meet the requirements of fast and accurate searching. Keywords:complex document image;Uyghur document image;document image segmentation;feature extraction; SURF feature;FALNN bidirectional matching;KD-Tree+BBF matching;image retrieval 收稿日期:2017-09-17.网络出版日期:2018-04-24 在当今信息技术高速发展的背景下,多媒体 基金项目:国家自然科学基金项目(61563052,61163028, 61363064片新疆大学博士科研启动基金项目BS150262), 技术的发展使文档图像在信息的交换中运用越来 新疆维吾尔自治区高校科研计划创新团队项目 越频繁。日益增长的需求使文档图像的数量越来 (XJEDU2017T002). 通信作者:库尔班吾布力.E-mail:urbanu@xju.edu.cn. 越庞大,这就要求文档图像存储系统能够为用户DOI: 10.11992/tis.201709014 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.tp.20180420.1018.002.html 改进 SURF 特征的维吾尔文复杂文档图像匹配检索 阿丽亚·巴吐尔1 ,努尔毕亚·亚地卡尔1 ,吾尔尼沙·买买提1 ,阿力木江·艾沙2 ,库尔班·吾布力1 (1. 新疆大学 信息科学与工程学院,新疆 乌鲁木齐,830046; 2. 新疆大学 网络与信息中心,新疆 乌鲁木齐, 830046) 摘 要:针对图像局部特征的词袋模型 (Bag-of-Word, BOW) 检索研究中聚类中心的不确定性和计算复杂性问 题,提出一种由不同种类的距离进行相似程度测量的检索和由匹配点数来检索的方法。这种方法首先需要改 进文档图像的 SURF 特征,有效降低特征提取复杂度;其次,对 FAST+SURF 特征实现 FLANN 双向匹配与 KDTree+BBF 匹配,在不同变换条件下验证特征鲁棒性;最后,基于这两种检索方法对已收集整理好的各类维吾尔 文文档图像数据库进行检索。实验结果表明:基于距离的相似性度量复杂度次于基于匹配数目的检索,而且两 种检索策略都能满足快速、精确查找需求。 关键词:复杂文档;维吾尔文档图像;文档图像分割;特征提取;SURF 特征;FLANN 双向匹配;KD-Tree+BBF 匹 配;图像检索 中图分类号:TP391.1 文献标志码:A 文章编号:1673−4785(2019)02−0296−10 中文引用格式:阿丽亚·巴吐尔, 努尔毕亚·亚地卡尔, 吾尔尼沙·买买提, 等. 改进 SURF 特征的维吾尔文复杂文档图像匹配检 索[J]. 智能系统学报, 2019, 14(2): 296–305. 英文引用格式:ALIYA Batur, NURBIYA Yadikar, HORNISA Mamat, et al. Complex Uyghur document image matching and retrieval based on modified SURF feature[J]. CAAI transactions on intelligent systems, 2019, 14(2): 296–305. Complex Uyghur document image matching and retrieval based on modified SURF feature ALIYA Batur1 ,NURBIYA Yadikar1 ,HORNISA Mamat1 ,ALIMJAN Aysa2 ,KURBAN Ubul1 (1. School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China; 2. Network and information center, Xinjiang University, Xinjiang University, Urumqi 830046, China) Abstract: This study is aimed at the uncertainty and computational complexity of the clustering center in local image features retrieval based on the bag-of-words (BOW) model. A method to retrieve the measure of similarity degree from different kinds of distance and another method that requires using the matching point number as the basis of retrieval are proposed in this paper. In this method, the SURF feature is first modified to effectively reduce feature extraction complexity, and then FLANN (fast library for approximate nearest neighbors) bidirectional matching and KD-Tree + BBF matching are implemented for FAST + SURF features. Feature robustness is verified under different transformation conditions. Finally, all kinds of Uyghur document images that have been classified and sorted based on these two retrieval methods are retrieved. The results of the retrieval experiments indicate that the similarity degree measure retrieval based on distance is inferior to the retrieval based on matching number, and both of these two retrieval strategies can meet the requirements of fast and accurate searching. Keywords: complex document image; Uyghur document image; document image segmentation; feature extraction; SURF feature; FALNN bidirectional matching; KD-Tree+BBF matching; image retrieval 在当今信息技术高速发展的背景下,多媒体 技术的发展使文档图像在信息的交换中运用越来 越频繁。日益增长的需求使文档图像的数量越来 越庞大,这就要求文档图像存储系统能够为用户 收稿日期:2017−09−17. 网络出版日期:2018−04−24. 基金项目:国家自然科学基金项 目 (61563052, 61163028, 61363064);新疆大学博士科研启动基金项目 (BS150262), 新疆维吾尔自治区高校科研计划创新团队项 目 (XJEDU2017T002). 通信作者:库尔班·吾布力. E-mail:urbanu@xju.edu.cn. 第 14 卷第 2 期 智 能 系 统 学 报 Vol.14 No.2 2019 年 3 月 CAAI Transactions on Intelligent Systems Mar. 2019