正在加载图片...
Graph Merging.By constructing graphs from two views, labeled set D={di,d2,.d}and the unlabeled set we get four graphs,where three graphs correspond to the Du={d+1,dx+2.,dn).The goal of label propagation is three coupled TF-IDF matrices (denoted G,Glex,and to propagate class labels from the labeled nodes (that is, G,respectively),and one graph corresponds to the lin- documents with known reading levels)to the entire graph. guistic features (denoted G).The former three graphs are We use a simplified label propagation algorithm presented homogeneous,since they share the same view,while the in Subramanya et al.(2010),which has been proved effec- last graph is different.For the three homogeneous graphs, tive (Kim et al.,2013).The algorithm iteratively updates we provide an intra-view homogeneous graph merging the label distribution on a document node using the follow- strategy to merge them into one (denoted G).To combine ing equation: the graphs from both views,we provide an inter-view het- erogeneous graph merging strategy to merge graph Ge and graph G into the final graphG g26y= qgy)8d∈D)+】 Ga.rg-y) (11) Kd vEK(d) Intra-view homogeneous graph merging. Given the three graphs (that is,G,Glex and G"),where each node has k neighbors,we merge them into the graph Ge where At the left side of Equation 11.y)is the afterward each node still has k neighbors.The basic idea is keeping probability of y (that is,the reading level y)on a node d at the common edges while removing edges containing the i-th iteration.At the right side,Ka is the normalizing redundant information,as shown in Figure 4.Given a node constant to make sure the sum of the level probabilities is v,firstly we reserve its neighbors which are common in all 1,and qa(y)is the initial probability of y on d if d is ini- three graphs.Secondly,for the candidate nodes,which are tially labeled (1 if d is labeled as level y or 0 otherwise). neighbors of v in at least one graph,we select one by one 6(x)is the indicator function.K(d)denotes the set of the node which possesses the least number of common neighbors of d.The iteration stops when the changes in neighbors (that is,the nodes that are already selected in (y)for all the nodes and reading levels are small K(v)).The objective is to keep the number of triangles in enough (for example,less than e),or i exceeds a prede- G to a minimum.The edge weights of G are averaged on fined number(for example,greater than 30).For each unla- the corresponding edges appeared in the three graphs. beled document di,its predicted reading level is y if the element qd(y)is greatest in the latest label distribution qd Inter-view heterogeneous graph merging.Considering that edges of either Ge or G describe the relationship of Reinforced label propagation.The above label propa- documents on readability from a certain perspective,we gation algorithm (Subramanya et al.,2010)treats the class reserve all the edges (that is,some nodes will have >k labels as in nominal scale,and ignores the ordinal relation neighbors)and use the factor B to balance their weights on among reading levels.We develop a reinforced label prop- the final graph G.The following equation defines the agation algorithm to utilize the ordinal relationship.Pre- strategy: classification is required using the linguistic features, which can provide extra information to amplify the edge G品=BG+(1-)G 10) weights. Let documents have m reading levels in ordinal scale. The parameter BE[0,1]is used to specify the relative Given the x labeled documents with reading levels [y1..., importance of the two graphs.Clearly,the case of B=0 or y,a preclassifier can be trained and the prelabels can then 1 reduces the graph to a single view be predicted for the unlabeled documents,denoted as +,Thus,we have the a priori label of all the Label propagation.Given a graph G constructed in pre- documents (1,.+,,n),where zi=yi(i(1, vious sections,its nodes are divided into two sets:the .…,x)for labeled documents and=yG∈{x+l,…,n}) for unlabeled documents. candidate neighbors Based on the a priori labels,we amplify the weights of the edges which connect two nodes of similar a priori labels.The reinforced label propagation iteratively updates the label distribution on a document node using the follow- ing enhanced equation: common neighbors y)= qy)(d∈D)+ G-(y v∈K(d) FIG.4.Illustration of the intra-view homogeneous graph merging strat- egy.[Color figure can be viewed at wileyonlinelibrary.com] Gd.y=(m-kzd-Zvl)Gd.v (12) JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY-May 2019 439 D0:10.1002/asiGraph Merging. By constructing graphs from two views, we get four graphs, where three graphs correspond to the three coupled TF-IDF matrices (denoted Gsur, Glex, and Gsyn, respectively), and one graph corresponds to the lin￾guistic features (denoted Gl ). The former three graphs are homogeneous, since they share the same view, while the last graph is different. For the three homogeneous graphs, we provide an intra-view homogeneous graph merging strategy to merge them into one (denoted Gc ). To combine the graphs from both views, we provide an inter-view het￾erogeneous graph merging strategy to merge graph Gc and graph Gl into the final graph Gcl. Intra-view homogeneous graph merging. Given the three graphs (that is, Gsur, Glex and Gsyn), where each node has k neighbors, we merge them into the graph Gc where each node still has k neighbors. The basic idea is keeping the common edges while removing edges containing redundant information, as shown in Figure 4. Given a node v, firstly we reserve its neighbors which are common in all three graphs. Secondly, for the candidate nodes, which are neighbors of v in at least one graph, we select one by one the node which possesses the least number of common neighbors (that is, the nodes that are already selected in Kc ð Þv ). The objective is to keep the number of triangles in Gc to a minimum. The edge weights of Gc are averaged on the corresponding edges appeared in the three graphs. Inter-view heterogeneous graph merging. Considering that edges of either Gc or Gl describe the relationship of documents on readability from a certain perspective, we reserve all the edges (that is, some nodes will have >k neighbors) and use the factor β to balance their weights on the final graph Gcl. The following equation defines the strategy: Gcl i,j ¼ βGc i,j + 1ð Þ −β Gl i,j ð10Þ The parameter β 2 [0, 1] is used to specify the relative importance of the two graphs. Clearly, the case of β = 0 or 1 reduces the graph to a single view. Label propagation. Given a graph G constructed in pre￾vious sections, its nodes are divided into two sets: the labeled set Dl ¼ f g d1,d2,,dx and the unlabeled set Du ¼ f g dx + 1,dx + 2,,dn . The goal of label propagation is to propagate class labels from the labeled nodes (that is, documents with known reading levels) to the entire graph. We use a simplified label propagation algorithm presented in Subramanya et al. (2010), which has been proved effec￾tive (Kim et al., 2013). The algorithm iteratively updates the label distribution on a document node using the follow￾ing equation: q ð Þi d ð Þ¼ y 1 κd q0 dð Þy δð Þ d 2 Dl + X v2Kð Þ d Gd,vqð Þ i−1 v ð Þy 0 @ 1 A ð11Þ At the left side of Equation 11, q ð Þi d ð Þy is the afterward probability of y (that is, the reading level y) on a node d at the i-th iteration. At the right side, κd is the normalizing constant to make sure the sum of the level probabilities is 1, and q0 dð Þy is the initial probability of y on d if d is ini￾tially labeled (1 if d is labeled as level y or 0 otherwise). δ(x) is the indicator function. Kð Þ d denotes the set of neighbors of d. The iteration stops when the changes in q ð Þi d ð Þy for all the nodes and reading levels are small enough (for example, less than e −3 ), or i exceeds a prede- fined number (for example, greater than 30). For each unla￾beled document di, its predicted reading level is y if the element qdi ð Þy is greatest in the latest label distribution qdi . Reinforced label propagation. The above label propa￾gation algorithm (Subramanya et al., 2010) treats the class labels as in nominal scale, and ignores the ordinal relation among reading levels. We develop a reinforced label prop￾agation algorithm to utilize the ordinal relationship. Pre￾classification is required using the linguistic features, which can provide extra information to amplify the edge weights. Let documents have m reading levels in ordinal scale. Given the x labeled documents with reading levels {y1, , yx}, a preclassifier can be trained and the prelabels can then be predicted for the unlabeled documents, denoted as y^x + 1 f g ,,y^n . Thus, we have the a priori label of all the documents {z1, , zx, zx + 1, , zn}, where zi = yi (i 2 {1, , x}) for labeled documents and zj ¼ y^j (j 2 {x + 1, , n}) for unlabeled documents. Based on the a priori labels, we amplify the weights of the edges which connect two nodes of similar a priori labels. The reinforced label propagation iteratively updates the label distribution on a document node using the follow￾ing enhanced equation: q ð Þi d ð Þ¼ y 1 κd q0 dð Þy δð Þ d 2 Dl + X v2Kð Þ d G0 d,vqð Þ i−1 v ð Þy 0 @ 1 A G0 d,v ¼ ð Þ m−j j zd −zv Gd,v ð12Þ FIG. 4. Illustration of the intra-view homogeneous graph merging strat￾egy. [Color figure can be viewed at wileyonlinelibrary.com] JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—May 2019 DOI: 10.1002/asi 439
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有