正在加载图片...
IEEE TRANSACTIONS ON INFORMATION THEORY 10 be efficiently computed as Algorithm 4:3-Spectral Clustering H1(Gk)=-b-(2m-)f(c)-c(f(2m)-2mH1(Gk)-2): Input:The graph G=(V,E)of order n wherey=∑ewd,z=∑efd).c=,and Output:A cluster membership vector cl f0,1,2]n 1LLaplacian matrix of the graph G; b=∑ieMf( (+4m+公m d+△d 2v2←-eigenvector corresponding toλ2ofL: The pseudocode of our fast algorithm IncreSim for com- 3 v3 eigenvector corresponding to A3 of L; puting the structural information distance in a graph stream is 4 cl k-means clustering of [v2,v3]with k =3; shown in Algorithm 2.It starts by computing the structural 5 return cl information of the base graph Gi (line 1-3),which takes (n)time.In each following iteration,it first computes the value of a,b,c,y,z(line 5-7),then calculates the structural where a ranges over all bijections a:[→[and△ information distance between two adjacent graphs (line 8-9). represents the symmetric difference. finally updates the edge count m and the degree sequence d (line 10-11).The time cost of each iteration is e(V), 3)Empirical Results:The results are shown in Fig.2 and consequently the total time complexity is(n). Fig.3,from which we have the following observations: Observation I (Dynamics of graph entropy):Both the VI.CONNECTIONS WITH COMMUNITY STRUCTURE von Neumann graph entropy and structural information In this section.we discuss the connections between the are stationary with small fluctuations and linearly corre- graph entropy and the community structure in graphs. lated as cout varies. Observation 2 (Dynamics of eigenvalues):In the stochas- A.Empirical Analysis of Stochastic Block Model tic block model with two clusters of equal size,the second smallest eigenvalue A2 linearly increases and 1)Preparations:To study the connections between graph entropy and community structures in a specific ensemble of finally reaches a steady state as cout increases,while the eigenvalues A3 and A4 above it are stationary all the time. graphs,suppose that a graph G is generated by the stochastic block model.There are g groups of nodes,and each node In the stochastic block model with three clusters of equal v has a group label go{1,...,q}.Edges are generated size,both the second smallest eigenvalue A2 and the third smallest eigenvalue Ag linearly increase and finally reach independently according to a probability matrix P E[0,1]9x4, with P(Au=1)=P[gu,go].In the sparse case,we have a steady state as cout increases,while the eigenvaluesA and s above them are stationary all the time. P[a,b]=Cla,b]/n,where the affinity matrix C stays constant .Observation 3 (Phase transition):In the stochastic block in the limit noo.For simplicity we make a common assumption that the affinity matrix C has two distinct entries model with two clusters of equal size,both the detection cin and cout where Cla,b]cin if a b and cout if error of the spectral algorithm and the spectral gap A3-A2 ab.For any graph generated from the stochastic block undergoes a same phase transition as cout varies.The spectral gap A4-A3 is stationary and close to 0 all the model with two(three)groups,we use the spectral algorithm in Algorithm 3(Algorithm 4)to detect the community structure. time.For example,in Fig.2(b)when cout <7 the spectral algorithm can discover the true clusters correctly and A3- A2 is significantly larger than A-A3.When cout >11, Algorithm 3:2-Spectral Clustering the spectral algorithm works like a random guess and Input:The graph G=(V,E)of order n λ3-2 is mixed with入4-入g. Output:A cluster membership vector cl {0,1)" In the stochastic block model with three clusters of equal 1cl←0: size,both the detection error of the spectral algorithm and 2L Laplacian matrix of the graph G; the spectral gap A4-A3 undergo a same phase transition 3v2←-eigenvector corresponding to入2ofL; as cout varies.The spectral gap As-A is stationary and 4 for i=1,...,n do close to 0 all the time. 5ifv2[<0 then cl同=l; 6 return cl Empirically,we conclude that 1)Graph entropy and community structure:Both the von 2)Evaluation Metrics:For each synthetic graph,we com- Neumann graph entropy and structural information reveal pute the structural information,von Neumann graph entropy, nothing about the community structure and the assorta- Laplacian eigenvalues A2,A3,A4,...with small magnitude, tivity/disassortativity of graphs. spectral gaps +1-Ak,and the detection error. 2)Spectral gaps and community structure:If a graph has Specifically,let P={P1,...,P}and ={Q1,...,Qk} significant community structure with k clusters,then the be two k-partitions of V.We view P as the ground-truth com- spectral gap +1-should be significantly larger munity structure and Q as the detected community structure, than k+2-k+1 and Ak-Ak-1.Conversely,if there then the detection error is is a significant peak in the sequence of spectral gaps of a graph,the graph should have signif- min>lP△Qalg icant community structure that could be easily detected by some algorithms. i=1IEEE TRANSACTIONS ON INFORMATION THEORY 10 be efficiently computed as H1(Gk) = −b−(2m−y)f(c)−c(f(2m)−2mH1(Gk)−z), where y = P i∈Vk di , z = P i∈Vk f(di), c = 2m+∆m 4m(m+∆m) , and b = P i∈Vk f  di 4m + di+∆di 4(m+∆m)  . The pseudocode of our fast algorithm IncreSim for com￾puting the structural information distance in a graph stream is shown in Algorithm 2. It starts by computing the structural information of the base graph G1 (line 1-3), which takes Θ(n) time. In each following iteration, it first computes the value of a, b, c, y, z (line 5-7), then calculates the structural information distance between two adjacent graphs (line 8-9), finally updates the edge count m and the degree sequence d (line 10-11). The time cost of each iteration is Θ(|Vk|), consequently the total time complexity is Θ(n+ PK−1 k=1 |Vk|). VI. CONNECTIONS WITH COMMUNITY STRUCTURE In this section, we discuss the connections between the graph entropy and the community structure in graphs. A. Empirical Analysis of Stochastic Block Model 1) Preparations: To study the connections between graph entropy and community structures in a specific ensemble of graphs, suppose that a graph G is generated by the stochastic block model. There are q groups of nodes, and each node v has a group label gv ∈ {1, . . . , q}. Edges are generated independently according to a probability matrix P ∈ [0, 1]q×q , with P(Auv = 1) = P[gu, gv]. In the sparse case, we have P[a, b] = C[a, b]/n, where the affinity matrix C stays constant in the limit n → ∞. For simplicity we make a common assumption that the affinity matrix C has two distinct entries cin and cout where C[a, b] = cin if a = b and cout if a 6= b. For any graph generated from the stochastic block model with two(three) groups, we use the spectral algorithm in Algorithm 3(Algorithm 4) to detect the community structure. Algorithm 3: 2-Spectral Clustering Input: The graph G = (V, E) of order n Output: A cluster membership vector cl ∈ {0, 1} n 1 cl ← 0; 2 L ← Laplacian matrix of the graph G; 3 v2 ← eigenvector corresponding to λ2 of L; 4 for i = 1, . . . , n do 5 if v2[i] < 0 then cl[i] = 1; 6 return cl 2) Evaluation Metrics: For each synthetic graph, we com￾pute the structural information, von Neumann graph entropy, Laplacian eigenvalues λ2, λ3, λ4, . . . with small magnitude, spectral gaps λk+1 − λk, and the detection error. Specifically, let P = {P1, . . . , Pk} and Q = {Q1, . . . , Qk} be two k-partitions of V . We view P as the ground-truth com￾munity structure and Q as the detected community structure, then the detection error is min σ X k i=1 |Pi4Qσ(i) |, Algorithm 4: 3-Spectral Clustering Input: The graph G = (V, E) of order n Output: A cluster membership vector cl ∈ {0, 1, 2} n 1 L ← Laplacian matrix of the graph G; 2 v2 ← eigenvector corresponding to λ2 of L; 3 v3 ← eigenvector corresponding to λ3 of L; 4 cl ← k-means clustering of [v2, v3] with k = 3; 5 return cl where σ ranges over all bijections σ : [k] → [k] and 4 represents the symmetric difference. 3) Empirical Results: The results are shown in Fig. 2 and Fig. 3, from which we have the following observations: • Observation 1 (Dynamics of graph entropy): Both the von Neumann graph entropy and structural information are stationary with small fluctuations and linearly corre￾lated as cout varies. • Observation 2 (Dynamics of eigenvalues): In the stochas￾tic block model with two clusters of equal size, the second smallest eigenvalue λ2 linearly increases and finally reaches a steady state as cout increases, while the eigenvalues λ3 and λ4 above it are stationary all the time. In the stochastic block model with three clusters of equal size, both the second smallest eigenvalue λ2 and the third smallest eigenvalue λ3 linearly increase and finally reach a steady state as cout increases, while the eigenvalues λ4 and λ5 above them are stationary all the time. • Observation 3 (Phase transition): In the stochastic block model with two clusters of equal size, both the detection error of the spectral algorithm and the spectral gap λ3−λ2 undergoes a same phase transition as cout varies. The spectral gap λ4 − λ3 is stationary and close to 0 all the time. For example, in Fig. 2(b) when cout < 7 the spectral algorithm can discover the true clusters correctly and λ3− λ2 is significantly larger than λ4 − λ3. When cout > 11, the spectral algorithm works like a random guess and λ3 − λ2 is mixed with λ4 − λ3. In the stochastic block model with three clusters of equal size, both the detection error of the spectral algorithm and the spectral gap λ4 −λ3 undergo a same phase transition as cout varies. The spectral gap λ5 −λ4 is stationary and close to 0 all the time. Empirically, we conclude that 1) Graph entropy and community structure: Both the von Neumann graph entropy and structural information reveal nothing about the community structure and the assorta￾tivity/disassortativity of graphs. 2) Spectral gaps and community structure: If a graph has significant community structure with k clusters, then the spectral gap λk+1 − λk should be significantly larger than λk+2 − λk+1 and λk − λk−1. Conversely, if there is a significant peak in the sequence of spectral gaps {λi+1 −λi} n−1 i=1 of a graph, the graph should have signif￾icant community structure that could be easily detected by some algorithms
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有