正在加载图片...
(a) Comparison of the average Fl-measure between the coupled and basic TF- The effects of n and A on the performance of the word coupling matrix IDF matrices (c) 、08 t stra 07 0 07 0.7 The effects of word filtering rate a on the performance of the word The effects of the size of S on the performance of the word coupling coupling matrix matrix FIG.6.The performance comparison among the word coupling matrices constructed from different perspectives.[Color figure can be viewed at wileyonlinelibrary.com] A=2.000 on ENCT)will result in an effective word cou- Effectiveness of Two-View Graph Propagation pling matrix. For RQ3,we conducted experiments to validate the The effect of word filtering.To investigate the effect of effectiveness of the graph merging strategies and the rein- the word filtering strategy on the performance of the forced label propagation algorithm. coupled Bow model,we vary the ratio a of filtered words, and compute the average Fl-measures resulting from the Effectiveness of graph merging.We compared graphs three coupled Bow matrices (that is,M,Mex,and My"). built on singular coupled BoW matrix (that is,G Gler The random filtering is depicted for comparison,which fil- and G)to the intra-view merged graph (that is,G)and ters out words from the vocabulary randomly.From the inter-view merged graph (that is,G).Figure 7a depicts Figure 6c,we find that the random filtering performs worse the averaged Fl-measures resulting after applying the gen- than our word filtering strategy on both data sets.By eral label propagation on these graphs.From Figure 7a,the employing our word filtering strategy,a stable performance merged graph Ge outperforms the three basic graphs on can be attained for all three coupled BoW matrices on both both data sets in most cases.Within the three singular data sets when no more than 40%words are filtered out. matrices,Gm performs best,especially on the English data set ENCT,where it can outperform G slightly when the The effect of the size of S.To investigate if the size of labeling proportion is small (0.2-0.4).By combining the S(that is,the sentence set)takes effect on the performance graph from the linguistic view,G performs evidently bet- of GRAW+,we vary the size of S by randomly removing ter than G on both data sets,while G always performs the sentences from it.Figure 6d depicts the average poorest.Figure 7b further validates the effectiveness of F1-measures resulting from the three coupled Bow matri- the graph merging strategies.By merging the graph from ces.From Figure 6d,on the Chinese data set CPT,the per- the linguistic view,all the cBow-based graphs(green bars) formance of GRAW+suffers little from removing get consistently improved performance (yellow bars).The sentences,even if only 20%of sentences are left for build- intra-view merging strategy provides a stable improvement ing the word coupling matrices.However,on the English for all the graphs built from the cBow view. data set ENCT,the mean performance drops evidently and To study the effect of B(in Equation 10)on the perfor- the deviation increases evidently when too many sentences mance of the merged graph G,we present the results of are removed.This suggests that cumulating a sufficient text applying the general label propagation on G with varied B corpus is required for building a suitable word coupling in Figure 7c.From Figure 7c,G performs well when B is matrix for the coupled Bow model,and factors other than in range [0.4,0.8]on CPT and [0.2,0.4]on ENCT.This the number of sentences may influence the corpus quality, means that the cBow view requires more weight on CPT which will be discussed later. than on ENCT.Note that the graph with B=0 equals G, JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY-May 2019 443 D0:10.1002/asiλ = 2,000 on ENCT) will result in an effective word cou￾pling matrix. The effect of word filtering. To investigate the effect of the word filtering strategy on the performance of the coupled BoW model, we vary the ratio α of filtered words, and compute the average F1-measures resulting from the three coupled BoW matrices (that is, Msur, Mlex, and Msyn). The random filtering is depicted for comparison, which fil￾ters out words from the vocabulary randomly. From Figure 6c, we find that the random filtering performs worse than our word filtering strategy on both data sets. By employing our word filtering strategy, a stable performance can be attained for all three coupled BoW matrices on both data sets when no more than 40% words are filtered out. The effect of the size of S. To investigate if the size of S (that is, the sentence set) takes effect on the performance of GRAW+, we vary the size of S by randomly removing sentences from it. Figure 6d depicts the average F1-measures resulting from the three coupled BoW matri￾ces. From Figure 6d, on the Chinese data set CPT, the per￾formance of GRAW+ suffers little from removing sentences, even if only 20% of sentences are left for build￾ing the word coupling matrices. However, on the English data set ENCT, the mean performance drops evidently and the deviation increases evidently when too many sentences are removed. This suggests that cumulating a sufficient text corpus is required for building a suitable word coupling matrix for the coupled BoW model, and factors other than the number of sentences may influence the corpus quality, which will be discussed later. Effectiveness of Two-View Graph Propagation For RQ3, we conducted experiments to validate the effectiveness of the graph merging strategies and the rein￾forced label propagation algorithm. Effectiveness of graph merging. We compared graphs built on singular coupled BoW matrix (that is, Gsur, Glex, and Gsyn) to the intra-view merged graph (that is, Gc ) and the inter-view merged graph (that is, Gcl). Figure 7a depicts the averaged F1-measures resulting after applying the gen￾eral label propagation on these graphs. From Figure 7a, the merged graph Gc outperforms the three basic graphs on both data sets in most cases. Within the three singular matrices, Gsyn performs best, especially on the English data set ENCT, where it can outperform Gc slightly when the labeling proportion is small (0.2−0.4). By combining the graph from the linguistic view, Gcl performs evidently bet￾ter than Gc on both data sets, while Gl always performs the poorest. Figure 7b further validates the effectiveness of the graph merging strategies. By merging the graph from the linguistic view, all the cBoW-based graphs (green bars) get consistently improved performance (yellow bars). The intra-view merging strategy provides a stable improvement for all the graphs built from the cBoW view. To study the effect of β (in Equation 10) on the perfor￾mance of the merged graph Gcl, we present the results of applying the general label propagation on Gcl with varied β in Figure 7c. From Figure 7c, Gcl performs well when β is in range [0.4, 0.8] on CPT and [0.2, 0.4] on ENCT. This means that the cBoW view requires more weight on CPT than on ENCT. Note that the graph with β = 0 equals Gl , (a) Comparison of the average F1-measure between the coupled and basic TF￾IDF matrices λ λ (b) The effects of η and λ on the performance of the word coupling matrix α α (c) The effects of word filtering rate α on the performance of the word coupling matrix (d) The effects of the size of S on the performance of the word coupling matrix FIG. 6. The performance comparison among the word coupling matrices constructed from different perspectives. [Color figure can be viewed at wileyonlinelibrary.com] JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—May 2019 DOI: 10.1002/asi 443
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有