(a) (b) oW-bas 0克7 07 Graghs for label prepacotion Grachs for lbel propagation The average Fl-measures of different graphs on both datasets with the The average FI-measures of the two graph merging strategies on both labeling proportion varied from 0.1 to 0.9 datasets (c) (d) 0 0 059 0码 G G neral L 03 07 GhntrIbeepoia The average Fl-measure of the inter-view merged graph on both datasets The average FI-measures in box-plots of the two label propagation with the weight parameter B varied from 0 to I algorithms on both datasets FIG.7.The performance comparison among the graph merging strategies and the reinforced label propagation algorithm.[Color figure can be viewed at wileyonlinelibrary.com] which stands for the linguistic view.The sharp rises at 0.1 Figure 7d,shows that the reinforced label propagation on both data sets indicate the necessity of the cBow view. algorithm outperforms the general label propagation algo- To verify that graph merging is superior to matrix concat- rithm on both data sets no matter which of the five graphs enation,we present the averaged F1-measures resulting after is used,which means that our enhancement to the general applying the general label propagation algorithm on graphs label propagation algorithm is effective,and the ordinal built by matrix concatenation in Table 4."sur-lex-syn"refers relation among reading levels shall be utilized.Since pre- to the graph built by concatenating the three coupled TF- classification is required to get the a priori labels,the rein- IDF matrices and "sur-lex-syn-lin"refers to the graph built forced label propagation provides a way to combine two by concatenating the three coupled TF-IDF matrices and the weak classifiers into a stronger one. linguistic matrix.The former is compared to the intra-view graph merging strategy,while the latter is compared to the inter-view graph merging strategy.As shown in Table 4,the External Corpus for Constructing Word Coupling Matrix two graph merging strategies always outperform the matrix For RQ4,we investigated the effects of using external concatenation on both data sets.Besides,"sur-lex.syn-lin" corpus on constructing the word coupling matrix.We col- performs worse than"sur-lex-syn,"which implies that matrix lect two external corpora for both languages:the Chinese concatenation is not a good choice in integrating heteroge- Wikipedia (denoted Cwiki)and the English Wikipedia neous vector space models. (denoted Ewiki),as shown in Table 5.For the documents in Cwiki,we use the ICTCLAS tool (Zhang,2013;Zhang Effectiveness of reinforced label propagation.To study et al.,2003)to do the word segmentation. the effectiveness of the reinforced label propagation,we Different from previous experiments,which a construct compared the general label propagation algorithm to the word coupling matrix based on the target data set(CPT or reinforced label propagation algorithm.Figure 7d depicts ENCT)itself,we conducted experiments to verify if we the boxplots of applying the two label propagation algo- can use the external text corpus for constructing universal rithms on the three singular and two merged graphs. word coupling matrices.Figure 8 depicts the performance TABLE 4.Comparison between the graph merging strategies and the matrix concatenation strategies Strategy CPT ENCT Matrix concatenation surlex*syn 47.45 88.43 surlex*syn.lin 45.82 86.44 The intra-view merged graph G 49.67 88.67 The inter-view merged graph G 51.54 91.16 444 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY-May 2019 D0l:10.1002/asiwhich stands for the linguistic view. The sharp rises at 0.1 on both data sets indicate the necessity of the cBoW view. To verify that graph merging is superior to matrix concatenation, we present the averaged F1-measures resulting after applying the general label propagation algorithm on graphs built by matrix concatenation in Table 4. “surlexsyn” refers to the graph built by concatenating the three coupled TFIDF matrices and “surlexsynlin” refers to the graph built by concatenating the three coupled TF-IDF matrices and the linguistic matrix. The former is compared to the intra-view graph merging strategy, while the latter is compared to the inter-view graph merging strategy. As shown in Table 4, the two graph merging strategies always outperform the matrix concatenation on both data sets. Besides, “surlexsynlin” performs worse than “surlexsyn,” which implies that matrix concatenation is not a good choice in integrating heterogeneous vector space models. Effectiveness of reinforced label propagation. To study the effectiveness of the reinforced label propagation, we compared the general label propagation algorithm to the reinforced label propagation algorithm. Figure 7d depicts the boxplots of applying the two label propagation algorithms on the three singular and two merged graphs. Figure 7d, shows that the reinforced label propagation algorithm outperforms the general label propagation algorithm on both data sets no matter which of the five graphs is used, which means that our enhancement to the general label propagation algorithm is effective, and the ordinal relation among reading levels shall be utilized. Since preclassification is required to get the a priori labels, the reinforced label propagation provides a way to combine two weak classifiers into a stronger one. External Corpus for Constructing Word Coupling Matrix For RQ4, we investigated the effects of using external corpus on constructing the word coupling matrix. We collect two external corpora for both languages: the Chinese Wikipedia (denoted Cwiki) and the English Wikipedia (denoted Ewiki), as shown in Table 5. For the documents in Cwiki, we use the ICTCLAS tool (Zhang, 2013; Zhang et al., 2003) to do the word segmentation. Different from previous experiments, which a construct word coupling matrix based on the target data set (CPT or ENCT) itself, we conducted experiments to verify if we can use the external text corpus for constructing universal word coupling matrices. Figure 8 depicts the performance FIG. 7. The performance comparison among the graph merging strategies and the reinforced label propagation algorithm. [Color figure can be viewed at wileyonlinelibrary.com] TABLE 4. Comparison between the graph merging strategies and the matrix concatenation strategies. Strategy CPT ENCT Matrix concatenation sur•lex•syn 47.45 88.43 sur•lex•syn•lin 45.82 86.44 The intra-view merged graph Gc 49.67 88.67 The inter-view merged graph Gcl 51.54 91.16 444 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—May 2019 DOI: 10.1002/asi