正在加载图片...
where G is a weighted graph the edge weights of which Comparisons to the State-of-the-Art Methods are modified from Ga.according to the difference in cur- rent reading levels of their end nodes.For each unlabeled To address ROl,we implement the following readabil- document di,its final reading level is y if qd (y)is greatest ity assessment methods and compare our method GRAW+ with them: in the final label distribution qd SMOG (McLaughlin,1969)and FK (Kincaid et al.,1975)are Experiments two widely used readability formulas.We reserve their features and refine the coefficients on both data sets to befit the reading In this section,we conduct experiments on data sets of (grade)levels. both English and Chinese to investigate the following four SUM (Collins-Thompson Callan,2004)is a word-based research questions: method,which trains one unigram model for each reading RQ1:Whether the proposed method (that is,GRAW+) level,and applies model smoothing among the reading levels. outperforms the state-of-the-art methods for readability V&M (Vajjala Meurers,2012)is one of the current best assessment? readability assessment methods for English,which adopts three RQ2:What are the effects of the word coupling groups of features for classification.As majority of the features matrix on the performance of the coupled bag-of-words are designed specifically for English,we run V&M on ENCT only. model? Jiang (Jiang et al.,2014)is a readability assessment method for RQ3:How effective is the two-view graph propagation Chinese.It adopts five groups of features and designs an ordi- method,including the graph merging strategies and rein- nal multiclass classification with voting for classification.We forced label propagation algorithm? run Jiang on CPT only. RQ4:Whether introducing the external text corpus can SG-NN is a word embedding-based readability assessment improve the quality of the word coupling matrix? method proposed by Tseng et al.(2016).In SG-NN,the repre- sentation of a document is generated by adding up the word embedding of all words in the document.The word embedding Corpus and Performance Measures model used is Skip-Gram.The classification model used is the regularized neural network with one hidden layer. To evaluate our proposed method,we used two data SG-KM-SVM is a word embedding-based readability assess- sets.The first is CPT(Chinese primary textbook)(Jiang ment method proposed by Cha et al.(2017).In SG-KM-SVM. et al.,2014),which contains Chinese documents of six the representation of a document is generated by applying aver- reading levels corresponding to six grades.The second is age pooling on the word embedding and cluster membership of ENCT (English New Concept textbook)(Jiang et al., all words in the document.The word embedding model used is 2015),which contains English documents of four read- Skip-Gram.The cluster membership is generated by K-means. ing levels.Both data sets (shown in Table 2)are built SVM(Support Vector Machine)is used to predict the reading from well-known textbooks where documents are orga- level of a document. nized into grades by credible educationists.For the docu- SVM(Support Vector Machine)and LR (Logistic Regression) ments in CPT,we use the ICTCLAS tool (Zhang,2013; are two classification models that have widely been used for readability assessment in previous studies (Feng et al.,2010: Zhang,Yu,Xiong,Liu,2003)to do the word Jiang et al.,2014). segmentation TSVM (Transductive SVM)(Joachims,1998)is a classical We conducted experiments on both CPT and ENCT transductive method,which has not been applied in readability using the hold-out validation,which randomly divides a assessment.Since GRAW+is also a transductive method,we data set into labeled(training)and unlabeled(test)sets by run TSVM here as a baseline. stratified sampling.The labeling proportion is varied to OLR (Ordinal LR)(Mccullagh,1980)is a variant of LR that investigate the performance of a method under different can predict in ordinal scale.As the reinforced label propagation circumstances.To reduce variability,under each case, in GRAW+also exploits the ordinal relation among reading 100 rounds of hold-out validation are performed,and the levels,we run OLR here as another baseline. validation results are averaged over all the rounds.To tune Bi-LP is a graph propagation method that applies label propa- the hyperparameters,we randomly choose one partition gation on a complex graph (Gao et al..2015;Jiang,2011).Bi- from the training set as the development set.We chose the LP builds two separate subgraphs from cBow view and lin- precision(P),recall (R),and F1-measure (F1)as the per- guistic view,and connects them using the bipartite subgraph. The label propagation algorithm is performed on the integrated formance measures. graph and leads to two distributions for each document.A TABLE 2.Statistics of the English and Chinese data sets Data set Language #Grade #Doc #Sent #Word CPT Chinese 6 637 16.145 234.372 ENCT English 279 4,671 62.921 440 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY-May 2019 D0l:10.1002/asiwhere G0 is a weighted graph the edge weights of which are modified from Gd, v according to the difference in cur￾rent reading levels of their end nodes. For each unlabeled document di, its final reading level is y if qdi ð Þy is greatest in the final label distribution qdi . Experiments In this section, we conduct experiments on data sets of both English and Chinese to investigate the following four research questions: RQ1: Whether the proposed method (that is, GRAW+) outperforms the state-of-the-art methods for readability assessment? RQ2: What are the effects of the word coupling matrix on the performance of the coupled bag-of-words model? RQ3: How effective is the two-view graph propagation method, including the graph merging strategies and rein￾forced label propagation algorithm? RQ4: Whether introducing the external text corpus can improve the quality of the word coupling matrix? Corpus and Performance Measures To evaluate our proposed method, we used two data sets. The first is CPT (Chinese primary textbook) (Jiang et al., 2014), which contains Chinese documents of six reading levels corresponding to six grades. The second is ENCT (English New Concept textbook) (Jiang et al., 2015), which contains English documents of four read￾ing levels. Both data sets (shown in Table 2) are built from well-known textbooks where documents are orga￾nized into grades by credible educationists. For the docu￾ments in CPT, we use the ICTCLAS tool (Zhang, 2013; Zhang, Yu, Xiong, & Liu, 2003) to do the word segmentation. We conducted experiments on both CPT and ENCT using the hold-out validation, which randomly divides a data set into labeled (training) and unlabeled (test) sets by stratified sampling. The labeling proportion is varied to investigate the performance of a method under different circumstances. To reduce variability, under each case, 100 rounds of hold-out validation are performed, and the validation results are averaged over all the rounds. To tune the hyperparameters, we randomly choose one partition from the training set as the development set. We chose the precision (P), recall (R), and F1-measure (F1) as the per￾formance measures. Comparisons to the State-of-the-Art Methods To address RQ1, we implement the following readabil￾ity assessment methods and compare our method GRAW+ with them: • SMOG (McLaughlin, 1969) and FK (Kincaid et al., 1975) are two widely used readability formulas. We reserve their features and refine the coefficients on both data sets to befit the reading (grade) levels. • SUM (Collins-Thompson & Callan, 2004) is a word-based method, which trains one unigram model for each reading level, and applies model smoothing among the reading levels. • V&M (Vajjala & Meurers, 2012) is one of the current best readability assessment methods for English, which adopts three groups of features for classification. As majority of the features are designed specifically for English, we run V&M on ENCT only. • Jiang (Jiang et al., 2014) is a readability assessment method for Chinese. It adopts five groups of features and designs an ordi￾nal multiclass classification with voting for classification. We run Jiang on CPT only. • SG-NN is a word embedding-based readability assessment method proposed by Tseng et al. (2016). In SG-NN, the repre￾sentation of a document is generated by adding up the word embedding of all words in the document. The word embedding model used is Skip-Gram. The classification model used is the regularized neural network with one hidden layer. • SG-KM-SVM is a word embedding-based readability assess￾ment method proposed by Cha et al. (2017). In SG-KM-SVM, the representation of a document is generated by applying aver￾age pooling on the word embedding and cluster membership of all words in the document. The word embedding model used is Skip-Gram. The cluster membership is generated by K-means. SVM (Support Vector Machine) is used to predict the reading level of a document. • SVM (Support Vector Machine) and LR (Logistic Regression) are two classification models that have widely been used for readability assessment in previous studies (Feng et al., 2010; Jiang et al., 2014). • TSVM (Transductive SVM) (Joachims, 1998) is a classical transductive method, which has not been applied in readability assessment. Since GRAW+ is also a transductive method, we run TSVM here as a baseline. • OLR (Ordinal LR) (Mccullagh, 1980) is a variant of LR that can predict in ordinal scale. As the reinforced label propagation in GRAW+ also exploits the ordinal relation among reading levels, we run OLR here as another baseline. • Bi-LP is a graph propagation method that applies label propa￾gation on a complex graph (Gao et al., 2015; Jiang, 2011). Bi￾LP builds two separate subgraphs from cBoW view and lin￾guistic view, and connects them using the bipartite subgraph. The label propagation algorithm is performed on the integrated graph and leads to two distributions for each document. A TABLE 2. Statistics of the English and Chinese data sets. Data set Language #Grade #Doc #Sent #Word CPT Chinese 6 637 16,145 234,372 ENCT English 4 279 4,671 62,921 440 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—May 2019 DOI: 10.1002/asi
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有