正在加载图片...
IEEE TRANSACTIONS ON INFORMATION THEORY two weighted,undirected graphs G=(V,E1,A1)and Algorithm 2:IncreSim G2 =(V,E2,A2)on the same node set V is defined as aput:G1and{(△CT DQJs(G1,G2)=Hvn(G)-(Hvn(G1)+Hvn(G2))/2, Output:(DsI(Gk,G where G=(V,E1UE2,A)is an weighted graph with A= 1 d the degree sequence of the graph G1; A1/2vol(G1)+A2/2vol(G2). 2m←-1d4/2: Generally,the node sets of two graphs to be compared 3t(G)←log2(2m)-品∑-1fd4): 4 fork =1,...,K-1 do are not necessarily aligned.For simplicity,we restrict our 5 △d←the degree sequence of the signed graph discussion to the aligned case and recommend [54]for a more △Gk: detailed treatment of the unaligned case. Based on the quantum Jensen-Shannon divergence between △m←∑iew△d/2: 7 Compute a,b,y,z in Lemma 3 via iterating over graphs,we consider the following problem that has found applications in anomaly detection and multiplex network com- Vie: Compute H(Gk+1)and H1(Gk)based on pression. Lemma 3; Problem 2:Compute the square root of the quantum Jensen- 9 Shannon divergence between adjacent graphs in a stream of DsI(Gk,Gk+1)← graphs {Gk=(V,Ek,t)where tk is the timestamp of VH(Gk)-(H(G)+H1(Gk+1)/2: the graph Gk and t<t+1. 10 m←m+△m: Since VDoJs(G1,G2)is computationally expensive,we 11 foreach i∈Vdod,←d:+△d: propose a new distance measure based on structural infor- 2 return (DsI(G,G+) mation in Definition 8 and analyze its metric properties in Theorem 5. Definition 8 (Structural information distance between two graphs):The structural information distance between two Then we compute the structural information of Gk whose weighted,undirected graphs G1=(V,E1,A1)and G2 adjacent matrix AkAk/2vol(G&)+Ak+1/2vol(Gk+1)for (V,E2,A2)on the same node set V is defined as each k E [K-1].Since the degree of node i in Gk is ak=zd&n十2vol()and∑-,aik=l,the structural DsI(G1,G2)=VH(G)-(H(G1)+H(G2)/2, information of G isH(G)=-f(d)which takes where G=(V,E1 UE2,A)is an weighted graph with A= e(n)time for each k.Therefore,the total computational cost isΘ(2K-1)n). A1/2vol(G1)+A2/2vol(G2). Theorem 5(Properties of the distance measure Ds):The In practice,the graph stream is fully dynamic such that it would be more efficient to represent the graph stream as distance measure DsI(G1,G2)is a pseudometric on the space a stream of operations over time,rather than a sequence of undirected graphs: of graphs.Suppose that the graph stream is represented as DsI is symmetric,i.e.,DsI(G1,G2)=DsI(G2,G1); an initial base graph G1=(V,E1,t1)and a sequence of ·DsI is non-negative,i.e,Dsr(G,G2)≥0 where the ”1 equality holds if and only if operations [AGk =(Vk,E+,E-.k,t)on the graph ole8eaee% d4.2 for where t is the timestamp of the set E+k of edge insertions and the set E-k of edge deletions,and V is the subset of Gj nodes covered by EUE_k.We can view the operation Dsr obeys the triangle inequality,i.e., AGk as a signed network where the edge in Ek has positive DsI(G1,G2)+DsI(G2,G3)>DsI(G1,G3); weight +1 and the edge in Ek has negative weight-1. The degree of node i V in the operation AGk refers ·Ds is upper bounded by 1,ie,Ds(G1,G2)≤1 where to∑jew{6,)∈E+,k}-I{(i,i)∈E-,k}.Using the the equality holds if and only if min{di.1,di.2)=0 for information about previous graph Gk and current operation every node iV where di.j is the degree of node i in AGk,we can compute the entropy statistics of the current Gi graph G+1 incrementally and efficiently via the following Besides the metric properties,we further establish a connection lemma,whose proof can be found in the appendix between DsI and VDoJs by studying their extreme values, Lemma 3:Using the degree sequence d of the graph Gk. the results of which are summarized in Theorem 6. the structural information H(G),and the degree sequence Theorem 6(Connection between Doss and Dsu):Both Ad of the signed graph AGk,the structural information of the VDoJs(G1,G2)and DsI(G1,G2)attain the same maximum graph G+1 can be efficiently computed as value of 1 under the identical condition that min{di.1,di.2}= 0 for every node iV where di.j is the degree of node i in h(Gk+H)=f2m+△m》-a-f2m)+2mh1(G) G1. 2(m+△m) In order to compute the structural information distance between adjacent graphs in the graph stream {G where where m=∑-1d/2,△m=∑ieM△d4i/2,anda= G=(V,Ek,t),we first compute the structural informa- iev f(di+Adi)-f(di).Moreover,the structural infor- tion H(G)for each k EK,which takes e(Kn)time. mation of the averaged graph Gk between G&and Gk+1 canIEEE TRANSACTIONS ON INFORMATION THEORY 9 two weighted, undirected graphs G1 = (V, E1, A1) and G2 = (V, E2, A2) on the same node set V is defined as DQJS(G1, G2) = Hvn(G) − (Hvn(G1) + Hvn(G2))/2, where G = (V, E1 ∪ E2, A) is an weighted graph with A = A1/2vol(G1) + A2/2vol(G2). Generally, the node sets of two graphs to be compared are not necessarily aligned. For simplicity, we restrict our discussion to the aligned case and recommend [54] for a more detailed treatment of the unaligned case. Based on the quantum Jensen-Shannon divergence between graphs, we consider the following problem that has found applications in anomaly detection and multiplex network com￾pression. Problem 2: Compute the square root of the quantum Jensen￾Shannon divergence between adjacent graphs in a stream of graphs {Gk = (V, Ek, tk)} K k=1 where tk is the timestamp of the graph Gk and tk < tk+1. Since p DQJS(G1, G2) is computationally expensive, we propose a new distance measure based on structural infor￾mation in Definition 8 and analyze its metric properties in Theorem 5. Definition 8 (Structural information distance between two graphs): The structural information distance between two weighted, undirected graphs G1 = (V, E1, A1) and G2 = (V, E2, A2) on the same node set V is defined as DSI(G1, G2) = q H1(G) − (H1(G1) + H1(G2)) /2, where G = (V, E1 ∪ E2, A) is an weighted graph with A = A1/2vol(G1) + A2/2vol(G2). Theorem 5 (Properties of the distance measure DSI): The distance measure DSI(G1, G2) is a pseudometric on the space of undirected graphs: • DSI is symmetric, i.e., DSI(G1, G2) = DSI(G2, G1); • DSI is non-negative, i.e., DSI(G1, G2) ≥ 0 where the equality holds if and only if P di,1 n k=1 dk,1 = P di,2 n k=1 dk,2 for every node i ∈ V where di,j is the degree of node i in Gj ; • DSI obeys the triangle inequality, i.e., DSI(G1, G2) + DSI(G2, G3) ≥ DSI(G1, G3); • DSI is upper bounded by 1, i.e., DSI(G1, G2) ≤ 1 where the equality holds if and only if min{di,1, di,2} = 0 for every node i ∈ V where di,j is the degree of node i in Gj . Besides the metric properties, we further establish a connection between DSI and p DQJS by studying their extreme values, the results of which are summarized in Theorem 6. Theorem 6 (Connection between p p DQJS and DSI): Both DQJS(G1, G2) and DSI(G1, G2) attain the same maximum value of 1 under the identical condition that min{di,1, di,2} = 0 for every node i ∈ V where di,j is the degree of node i in Gj . In order to compute the structural information distance between adjacent graphs in the graph stream {Gk} K k=1 where Gk = (V, Ek, tk), we first compute the structural informa￾tion H1(Gk) for each k ∈ [K], which takes Θ(Kn) time. Algorithm 2: IncreSim Input: G1 and {∆Gk} K−1 k=1 Output: {DSI(Gk, Gk+1)} K−1 k=1 1 d ← the degree sequence of the graph G1; 2 m ← Pn i=1 di/2; 3 H1(G1) ← log2 (2m) − 1 2m Pn i=1 f(di); 4 for k = 1, . . . , K − 1 do 5 ∆d ← the degree sequence of the signed graph ∆Gk; 6 ∆m ← P i∈Vk ∆di/2; 7 Compute a, b, y, z in Lemma 3 via iterating over Vk; 8 Compute H1(Gk+1) and H1(Gk) based on Lemma 3; 9 D q SI(Gk, Gk+1) ← H1(Gk) − (H1(Gk) + H1(Gk+1))/2; 10 m ← m + ∆m; 11 foreach i ∈ Vk do di ← di + ∆di ; 12 return {DSI(Gk, Gk+1)} K−1 k=1 Then we compute the structural information of Gk whose adjacent matrix Ak = Ak/2vol(Gk) + Ak+1/2vol(Gk+1) for each k ∈ [K − 1]. Since the degree of node i in Gk is di,k = di,k 2vol(Gk) + di,k+1 2vol(Gk+1) and Pn i=1 di,k = 1, the structural information of Gk is H1(Gk) = − Pn i=1 f(di,k) which takes Θ(n) time for each k. Therefore, the total computational cost is Θ((2K − 1)n). In practice, the graph stream is fully dynamic such that it would be more efficient to represent the graph stream as a stream of operations over time, rather than a sequence of graphs. Suppose that the graph stream is represented as an initial base graph G1 = (V, E1, t1) and a sequence of operations {∆Gk = (Vk, E+,k, E−,k, tk)} K−1 k=1 on the graph where tk is the timestamp of the set E+,k of edge insertions and the set E−,k of edge deletions, and Vk is the subset of nodes covered by E+,k ∪ E−,k. We can view the operation ∆Gk as a signed network where the edge in E+,k has positive weight +1 and the edge in E−,k has negative weight −1. The degree of node i ∈ Vk in the operation ∆Gk refers to P j∈Vk I{(i, j) ∈ E+,k} − I{(i, j) ∈ E−,k}. Using the information about previous graph Gk and current operation ∆Gk, we can compute the entropy statistics of the current graph Gk+1 incrementally and efficiently via the following lemma, whose proof can be found in the appendix. Lemma 3: Using the degree sequence d of the graph Gk, the structural information H1(Gk), and the degree sequence ∆d of the signed graph ∆Gk, the structural information of the graph Gk+1 can be efficiently computed as H1(Gk+1) = f(2(m + ∆m)) − a − f(2m) + 2mH1(Gk) 2(m + ∆m) , where m = Pn i=1 di/2, ∆m = P P i∈Vk ∆di/2, and a = i∈Vk f(di + ∆di) − f(di). Moreover, the structural infor￾mation of the averaged graph Gk between Gk and Gk+1 can
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有