正在加载图片...
GRAW+:A Two-View Graph Propagation Method With Word Coupling for Readability Assessment Zhiwei Jiang State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,210023,China. E-mail:jiangzhiwei@outlook.com Qing Gu State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,210023,China. E-mail:guq@nju.edu.cn Yafeng Yin State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,210023,China. E-mail:yafeng@nju.edu.cn Jianxiang Wang State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,210023,China. E-mail:wjxnju@outlook.com Daoxu Chen State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,210023,China. E-mail:cdx@nju.edu.cn Existing methods for readability assessment usually con- Introduction struct inductive classification models to assess the readabil- ity of singular text documents based on extracted features. Readability assessment evaluates the reading difficulties which have been demonstrated to be effective.However. of text documents,which are normally represented as dis- they rarely make use of the interrelationship among docu- crete reading levels.Automatic readability assessment is a ments on readability,which can help increase the accuracy of readability assessment.In this article,we adopt a graph- challenging task,which has attracted researchers'attention based classification method to model and utilize the relation- from the beginning of the last century (Collins-Thompson, ship among documents using the coupled bag-of-words 2014).Traditionally,it can be used by educationists to model.We propose a word coupling method to build the coupled bag-of-words model by estimating the correlation choose appropriate reading materials for students of differ- between words on reading difficulty.In addition,we propose ent education or grade levels.In modern times,it can be a two-view graph propagation method to make use of both used by web search engines to do personalized searches the coupled bag-of-words model and the linguistic features. based on web users'educational backgrounds. Our method employs a graph merging operation to combine graphs built according to different views,and improves the Existing methods for readability assessment usually con- label propagation by incorporating the ordinal relation centrate on feature engineering and then applying inductive among reading levels.Experiments were conducted on both classification models to utilize the features.In the early English and Chinese data sets,and the results demonstrate stages,researchers proposed readability formulas to mea- both effectiveness and potential of the method. sure the readability of texts (Zakaluk Samuels,1988). These formulas are usually attained by linear regression on several easy-to-compute text features relevant to reading difficulty.Recently,by employing machine-learning tech- Received March 26,2017:revised June 20,2018:accepted July 14,2018 niques,classification-based methods have been proposed and 2019 ASIS&T.Published online February 18.2019 in Wiley Online demonstrated to be more effective than readability formulas Library(wileyonlinelibrary.com).DOI:10.1002/asi.24123 (Benjamin,2012;Collins-Thompson,2014).These methods JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY,70(5):433-447,2019GRAW+: A Two-View Graph Propagation Method With Word Coupling for Readability Assessment Zhiwei Jiang State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China. E-mail: jiangzhiwei@outlook.com Qing Gu State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China. E-mail: guq@nju.edu.cn Yafeng Yin State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China. E-mail: yafeng@nju.edu.cn Jianxiang Wang State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China. E-mail: wjxnju@outlook.com Daoxu Chen State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China. E-mail: cdx@nju.edu.cn Existing methods for readability assessment usually con￾struct inductive classification models to assess the readabil￾ity of singular text documents based on extracted features, which have been demonstrated to be effective. However, they rarely make use of the interrelationship among docu￾ments on readability, which can help increase the accuracy of readability assessment. In this article, we adopt a graph￾based classification method to model and utilize the relation￾ship among documents using the coupled bag-of-words model. We propose a word coupling method to build the coupled bag-of-words model by estimating the correlation between words on reading difficulty. In addition, we propose a two-view graph propagation method to make use of both the coupled bag-of-words model and the linguistic features. Our method employs a graph merging operation to combine graphs built according to different views, and improves the label propagation by incorporating the ordinal relation among reading levels. Experiments were conducted on both English and Chinese data sets, and the results demonstrate both effectiveness and potential of the method. Introduction Readability assessment evaluates the reading difficulties of text documents, which are normally represented as dis￾crete reading levels. Automatic readability assessment is a challenging task, which has attracted researchers’ attention from the beginning of the last century (Collins-Thompson, 2014). Traditionally, it can be used by educationists to choose appropriate reading materials for students of differ￾ent education or grade levels. In modern times, it can be used by web search engines to do personalized searches based on web users’ educational backgrounds. Existing methods for readability assessment usually con￾centrate on feature engineering and then applying inductive classification models to utilize the features. In the early stages, researchers proposed readability formulas to mea￾sure the readability of texts (Zakaluk & Samuels, 1988). These formulas are usually attained by linear regression on several easy-to-compute text features relevant to reading difficulty. Recently, by employing machine-learning tech￾niques, classification-based methods have been proposed and demonstrated to be more effective than readability formulas (Benjamin, 2012; Collins-Thompson, 2014). These methods Received March 26, 2017; revised June 20, 2018; accepted July 14, 2018 © 2019 ASIS&T • Published online February 18, 2019 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/asi.24123 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 70(5):433–447, 2019
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有