IEEE TRANSACTIONS ON INFORMATION THEORY 1 On the Similarity between von Neumann Graph Entropy and Structural Information:Interpretation, Computation,and Applications Xuecheng Liu,Luoyi Fu,Xinbing Wang,and Chenghu Zhou Abstract-The von Neumann graph entropy is a measure downstream tasks such as entropy-driven network design,graph of graph complexity based on the Laplacian spectrum.It has comparison,and community obfuscation. recently found applications in various learning tasks driven by the networked data.However,it is computationally demanding Index Terms-Spectral graph theory,Laplacian spectrum, and hard to interpret using simple structural patterns.Due to spectral polarization,community obfuscation. the close relation between the Laplacian spectrum and the degree sequence,we conjecture that the structural information,defined as the Shannon entropy of the normalized degree sequence,might I.INTRODUCTION be a good approximation of the von Neumann graph entropy that VIDENCE has rapidly grown in the past few years that is both scalable and interpretable. In this work,we thereby study the difference between the graphs are ubiquitous in our daily life;online social structural information and the von Neumann graph entropy networks,metabolic networks,transportation networks,and named as entropy gap.Based on the knowledge that the degree collaboration networks are just a few examples that could be sequence is majorized by the Laplacian spectrum,we for the represented precisely by graphs.One important issue in graph first time prove that the entropy gap is between 0 and log,e in analysis is to measure the complexity of these graphs [2], any undirected unweighted graphs.Consequently we certify that [3]which refers to the level of organization of the structural the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy,scala- features such as the scaling behavior of degree distribution, bility,and interpretability simultaneously.This approximation is community structure,core-periphery structure,etc.In order further applied to two entropy-related tasks:network design and to capture the inherent structural complexity of graphs,many graph similarity measure,where a novel graph similarity measure entropy based graph measures [3]-[8]are proposed,each of and the corresponding fast algorithms are proposed.Meanwhile, which is a specific form of the Shannon entropy for different we show empirically and theoretically that maximizing the von Neumann graph entropy can effectively hide the community types of distributions extracted from the graphs. structure,and then propose an alternative metric called spectral As one of the aforementioned entropy based graph com- polarization to guide the community obfuscation. plexity measures,the von Neumann graph entropy,defined Our experimental results on graphs of various scales and types as the Shannon entropy of the spectrum of the trace rescaled show that the very small entropy gap readily applies to a wide Laplacian matrix of a graph (see Definition 1),is of special range of simple/weighted graphs.As an approximation of the von Neumann graph entropy,the structural information is the only interests to scholars and practitioners [1],[9]-[15].This spec- one that achieves both high efficiency and high accuracy among trum based entropy measure distinguishes between different the prominent methods.It is at least two orders of magnitude graph structures.For instance,it is maximal for complete faster than SLaQ [1]with comparable accuracy.Our structural graphs,minimal for graphs with only a single edge,and information based methods also exhibit superior performance in takes on intermediate values for ring graphs.Historically,the entropy measure originates from quantum information theory Manuscript received February 18,2021:revised November 9,2021:ac- and is used to describe the mixedness of a quantum system cepted January 2.2022.Date of publication TBD:date of current ver. which is represented as a density matrix.It is Braunstein et sion TBD.This work was supported by National Key R&D Program of China (No.2018YFB2100302).NSF China (No.42050105,62020106005 al.that first use the von Neumann entropy to measure the 62061146002.61960206002.61822206.61832013.61829201),2021 Tencent complexity of graphs by viewing the scaled Laplacian matrix AI Lab RhinoBird Focused Research Program (No.JR202132),and the as a density matrix [8]. Program of Shanghai Academic/Technology Research Leader under Grant No.18XD1401800.(Corresponding author:Xinbing Wang.) Built upon the Laplacian spectrum,the von Neumann graph Xuecheng Liu and Xinbing Wang are with the Department of Elec- entropy is a natural choice to capture the graph complexity tronic Engineering,Shanghai Jiao Tong University,Shanghai,200240 China since the Laplacian spectrum is well-known to contain rich in- (email:liuxuecheng@sjtu.edu.cn,xwang8@sjtu.edu.cn). Luoyi Fu is with the Department of Computer Science and En- formation about the multi-scale structure of graphs [16],[17]. gineering,Shanghai Jiao Tong University,Shanghai,200240 China As a result,it has recently found applications in downstream (email:yiluofu@sjtu.edu.cn). tasks of complex network analysis and pattern recognition. Chenghu Zhou is with the Institute of Geographical Science and Natural Resources Research,Chinese Academy of Sciences,Beijing.100101 China For example,the von Neumann graph entropy facilitates the (email:zhouch@lreis.ac.cn). measure of graph similarity via Jensen-Shannon divergence Communicated by R.Talmon,Associate Editor for Signal Processing. which could be used to compress multilayer networks [13]and Copyright (c)2017 IEEE.Personal use of this material is permitted. However,permission to use this material for any other purposes must be detect anomalies in graph streams [9].As another example, obtained from the IEEE by sending a request to pubs-permissions@ieee.org. the von Neumann graph entropy could be used to design edge
IEEE TRANSACTIONS ON INFORMATION THEORY 1 On the Similarity between von Neumann Graph Entropy and Structural Information: Interpretation, Computation, and Applications Xuecheng Liu , Luoyi Fu , Xinbing Wang , and Chenghu Zhou Abstract—The von Neumann graph entropy is a measure of graph complexity based on the Laplacian spectrum. It has recently found applications in various learning tasks driven by the networked data. However, it is computationally demanding and hard to interpret using simple structural patterns. Due to the close relation between the Laplacian spectrum and the degree sequence, we conjecture that the structural information, defined as the Shannon entropy of the normalized degree sequence, might be a good approximation of the von Neumann graph entropy that is both scalable and interpretable. In this work, we thereby study the difference between the structural information and the von Neumann graph entropy named as entropy gap. Based on the knowledge that the degree sequence is majorized by the Laplacian spectrum, we for the first time prove that the entropy gap is between 0 and log2 e in any undirected unweighted graphs. Consequently we certify that the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously. This approximation is further applied to two entropy-related tasks: network design and graph similarity measure, where a novel graph similarity measure and the corresponding fast algorithms are proposed. Meanwhile, we show empirically and theoretically that maximizing the von Neumann graph entropy can effectively hide the community structure, and then propose an alternative metric called spectral polarization to guide the community obfuscation. Our experimental results on graphs of various scales and types show that the very small entropy gap readily applies to a wide range of simple/weighted graphs. As an approximation of the von Neumann graph entropy, the structural information is the only one that achieves both high efficiency and high accuracy among the prominent methods. It is at least two orders of magnitude faster than SLaQ [1] with comparable accuracy. Our structural information based methods also exhibit superior performance in Manuscript received February 18, 2021; revised November 9, 2021; accepted January 2, 2022. Date of publication TBD; date of current version TBD. This work was supported by National Key R&D Program of China (No. 2018YFB2100302), NSF China (No. 42050105, 62020106005, 62061146002, 61960206002, 61822206, 61832013, 61829201), 2021 Tencent AI Lab RhinoBird Focused Research Program (No. JR202132), and the Program of Shanghai Academic/Technology Research Leader under Grant No. 18XD1401800. (Corresponding author: Xinbing Wang.) Xuecheng Liu and Xinbing Wang are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240 China (email:liuxuecheng@sjtu.edu.cn, xwang8@sjtu.edu.cn). Luoyi Fu is with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240 China (email:yiluofu@sjtu.edu.cn). Chenghu Zhou is with the Institute of Geographical Science and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101 China (email:zhouch@lreis.ac.cn). Communicated by R. Talmon, Associate Editor for Signal Processing. Copyright (c) 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. downstream tasks such as entropy-driven network design, graph comparison, and community obfuscation. Index Terms—Spectral graph theory, Laplacian spectrum, spectral polarization, community obfuscation. I. INTRODUCTION E VIDENCE has rapidly grown in the past few years that graphs are ubiquitous in our daily life; online social networks, metabolic networks, transportation networks, and collaboration networks are just a few examples that could be represented precisely by graphs. One important issue in graph analysis is to measure the complexity of these graphs [2], [3] which refers to the level of organization of the structural features such as the scaling behavior of degree distribution, community structure, core-periphery structure, etc. In order to capture the inherent structural complexity of graphs, many entropy based graph measures [3]–[8] are proposed, each of which is a specific form of the Shannon entropy for different types of distributions extracted from the graphs. As one of the aforementioned entropy based graph complexity measures, the von Neumann graph entropy, defined as the Shannon entropy of the spectrum of the trace rescaled Laplacian matrix of a graph (see Definition 1), is of special interests to scholars and practitioners [1], [9]–[15]. This spectrum based entropy measure distinguishes between different graph structures. For instance, it is maximal for complete graphs, minimal for graphs with only a single edge, and takes on intermediate values for ring graphs. Historically, the entropy measure originates from quantum information theory and is used to describe the mixedness of a quantum system which is represented as a density matrix. It is Braunstein et al. that first use the von Neumann entropy to measure the complexity of graphs by viewing the scaled Laplacian matrix as a density matrix [8]. Built upon the Laplacian spectrum, the von Neumann graph entropy is a natural choice to capture the graph complexity since the Laplacian spectrum is well-known to contain rich information about the multi-scale structure of graphs [16], [17]. As a result, it has recently found applications in downstream tasks of complex network analysis and pattern recognition. For example, the von Neumann graph entropy facilitates the measure of graph similarity via Jensen-Shannon divergence, which could be used to compress multilayer networks [13] and detect anomalies in graph streams [9]. As another example, the von Neumann graph entropy could be used to design edge
IEEE TRANSACTIONS ON INFORMATION THEORY 2 centrality measure [15],vertex centrality measure [18].and To address the second question,we conduct to our knowl- entropy-driven networks [19]. edge a first study of the difference between the structural information and the von Neumann graph entropy,which we A.Motivations name as entropy gap. However,despite the popularity received in applications,the main obstacle encountered in practice is the computational B.Contributions inefficiency of the exact von Neumann graph entropy.Indeed, as the spectrum based entropy measure,the von Neumann To study the entropy gap,we are based on a fundamental graph entropy suffers from computational inefficiency since relationship between the Laplacian spectrum A and the degree the computational complexity of the graph spectrum is cubic in sequence d in undirected graphs:d is majorized by A.In the number of nodes.Meanwhile,the existing approximation other words,there is a doubly stochastic matrix P such that approaches [1].[9],[10]such as the quadratic approximation, PA=d.Leveraging the majorization and the classic Jensen's fail to capture the presence of non-trivial structural patterns inequality,we prove that the entropy gap is strictly larger than that seem to be encapsulated in the spectrum based entropy 0 in arbitrary undirected graphs.By exploiting the Jensen's measure.Therefore,there is a strong desire to find a good gap [21]which is an inverse version of the classic Jensen's approximation that achieves accuracy,scalability,and inter- inequality,we further prove that the entropy gap is no more pretability simultaneously. than for any undirected graphwhere A is the Instead of starting from scratch,we are inspired by the well- weight matrix,6 is the minimum degree,and vol(G)is the known knowledge that there is a close relationship between volume of the graph.The upper bound on the entropy gap the combinatorial characteristics of a graph and the algebraic turns out to be log2e for any unweighted graph.And both properties of its associated matrices [20].To illustrate this,we the constant lower and upper bounds on the entropy gap can plot the Laplacian spectrum and the degree sequence together be further sharpened using more advanced knowledge about in a same figure for four representative real-world graphs the Lapalcian spectrum and the degree sequence,such as the and four synthetic graphs.As shown in Fig.1,the sorted Grone-Merris majorization [22]. spectrum sequence and the sorted degree sequence almost In a nutshell,our paper makes the following contributions: coincide with each other.The similar phenomenon can also Theory and interpretability:Inspired by the close relation be observed in larger scale free graphs,which indicates that it between the Laplacian spectrum and the degree sequence, is possible to reduce the approximation of the von Neumann we for the first time bridge the gap between the von graph entropy to the time-efficient computation of simple node Neumann graph entropy and the structural information by degree statistics.Therefore,we ask without hesitation the first proving that the entropy gap is between 0 and log2e in research question, any unweighted graph.To the best of our knowledge,the RQ1:Does there exist some non-polynomial function constant bounds on the approximation error in unweighted such that∑-1p(d,/∑-1d)is close to the von Neumann graphs are sharper than that of any existing approaches graph entropy?Here di is the degree of the node i in a graph with provable accuracy,such as FINGER [9].Therefore, of order n. the answers to both RQ1 and RQ2 are YES!As shown We emphasize the non-polynomial property of the func- in Table I,the relative approximation error is around 1% tion since most of previous works that are based on the for small graphs,which is practically good.Besides,the polynomial approximations fail to fulfill the interpretabil- structural information provides a simple geometric interpre- ity.The challenges from both the scalability and the in- tation of the von Neumann graph entropy as a measure of terpretability are translated directly into two requirements degree heterogeneity.Thus,the structural information is a on the function to be determined.First,the explicit ex- good approximation of the von Neumann graph entropy that pression of must exist and be kept simple to ensure the achieves provable accuracy,scalability,and interpretability simultaneously. interpretability of the sum over degree statistics.Second, the function o should be graph-agnostic to meet the scal- Applications and efficient algorithms:Using the structural ability requirement,that is,o should be independent from information as a proxy of the von Neumann graph entropy the graph to be analyzed.One natural choice yielded by with bounded error(the entropy gap).we develop fast algo- the entropic nature of the graph complexity measure for the rithms for two entropy based applications:network design non-polynomial function o is ()=-zlog2z.The sum and graph similarity measure.Since the network design -∑-1(d,/∑=1d)log2(d,/∑-1d)has been named problem aims to maximize the von Neumann graph entropy, we combine the greedy method and a pruning strategy to as one-dimensional structural information by Li et al.[3]in accelerate the searching process.For the graph similarity a connected graph since it has an entropic form and captures measure,we propose a new distance measure based on the the information of a classic random walker in a graph.We structural information and the Jensen-Shannon divergence. extend this notion to arbitrary undirected graphs.Following We further show that the proposed distance measure is a the question RQ1,we raise the second research question, pseudometric and devise a fast incremental algorithm to RQ2:Is the structural information an accurate proxy of the compute the similarity between adjacent graphs in a graph von Neumann graph entropy? stream
IEEE TRANSACTIONS ON INFORMATION THEORY 2 centrality measure [15], vertex centrality measure [18], and entropy-driven networks [19]. A. Motivations However, despite the popularity received in applications, the main obstacle encountered in practice is the computational inefficiency of the exact von Neumann graph entropy. Indeed, as the spectrum based entropy measure, the von Neumann graph entropy suffers from computational inefficiency since the computational complexity of the graph spectrum is cubic in the number of nodes. Meanwhile, the existing approximation approaches [1], [9], [10] such as the quadratic approximation, fail to capture the presence of non-trivial structural patterns that seem to be encapsulated in the spectrum based entropy measure. Therefore, there is a strong desire to find a good approximation that achieves accuracy, scalability, and interpretability simultaneously. Instead of starting from scratch, we are inspired by the wellknown knowledge that there is a close relationship between the combinatorial characteristics of a graph and the algebraic properties of its associated matrices [20]. To illustrate this, we plot the Laplacian spectrum and the degree sequence together in a same figure for four representative real-world graphs and four synthetic graphs. As shown in Fig. 1, the sorted spectrum sequence and the sorted degree sequence almost coincide with each other. The similar phenomenon can also be observed in larger scale free graphs, which indicates that it is possible to reduce the approximation of the von Neumann graph entropy to the time-efficient computation of simple node degree statistics. Therefore, we ask without hesitation the first research question, RQ1: Does there exist some non-polynomial function φ such that Pn i=1 φ di/ Pn j=1 dj is close to the von Neumann graph entropy? Here di is the degree of the node i in a graph of order n. We emphasize the non-polynomial property of the function φ since most of previous works that are based on the polynomial approximations fail to fulfill the interpretability. The challenges from both the scalability and the interpretability are translated directly into two requirements on the function φ to be determined. First, the explicit expression of φ must exist and be kept simple to ensure the interpretability of the sum over degree statistics. Second, the function φ should be graph-agnostic to meet the scalability requirement, that is, φ should be independent from the graph to be analyzed. One natural choice yielded by the entropic nature of the graph complexity measure for the non-polynomial function φ is φ(x) = −x log2 x. The sum − Pn i=1 di/ Pn j=1 dj log2 di/ Pn j=1 dj has been named as one-dimensional structural information by Li et al. [3] in a connected graph since it has an entropic form and captures the information of a classic random walker in a graph. We extend this notion to arbitrary undirected graphs. Following the question RQ1, we raise the second research question, RQ2: Is the structural information an accurate proxy of the von Neumann graph entropy? To address the second question, we conduct to our knowledge a first study of the difference between the structural information and the von Neumann graph entropy, which we name as entropy gap. B. Contributions To study the entropy gap, we are based on a fundamental relationship between the Laplacian spectrum λ and the degree sequence d in undirected graphs: d is majorized by λ. In other words, there is a doubly stochastic matrix P such that Pλ = d. Leveraging the majorization and the classic Jensen’s inequality, we prove that the entropy gap is strictly larger than 0 in arbitrary undirected graphs. By exploiting the Jensen’s gap [21] which is an inverse version of the classic Jensen’s inequality, we further prove that the entropy gap is no more than log2 e·tr(A 2 ) δ·vol(G) for any undirected graph G, where A is the weight matrix, δ is the minimum degree, and vol(G) is the volume of the graph. The upper bound on the entropy gap turns out to be log2 e for any unweighted graph. And both the constant lower and upper bounds on the entropy gap can be further sharpened using more advanced knowledge about the Lapalcian spectrum and the degree sequence, such as the Grone-Merris majorization [22]. In a nutshell, our paper makes the following contributions: • Theory and interpretability: Inspired by the close relation between the Laplacian spectrum and the degree sequence, we for the first time bridge the gap between the von Neumann graph entropy and the structural information by proving that the entropy gap is between 0 and log2 e in any unweighted graph. To the best of our knowledge, the constant bounds on the approximation error in unweighted graphs are sharper than that of any existing approaches with provable accuracy, such as FINGER [9]. Therefore, the answers to both RQ1 and RQ2 are YES! As shown in Table I, the relative approximation error is around 1% for small graphs, which is practically good. Besides, the structural information provides a simple geometric interpretation of the von Neumann graph entropy as a measure of degree heterogeneity. Thus, the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously. • Applications and efficient algorithms: Using the structural information as a proxy of the von Neumann graph entropy with bounded error (the entropy gap), we develop fast algorithms for two entropy based applications: network design and graph similarity measure. Since the network design problem aims to maximize the von Neumann graph entropy, we combine the greedy method and a pruning strategy to accelerate the searching process. For the graph similarity measure, we propose a new distance measure based on the structural information and the Jensen-Shannon divergence. We further show that the proposed distance measure is a pseudometric and devise a fast incremental algorithm to compute the similarity between adjacent graphs in a graph stream
IEEE TRANSACTIONS ON INFORMATION THEORY 6 spectrum spectrum A spectrum 20 A spectrum o degree 10 o degree 10 bb 50 o degree o degree 10 20 30 20 40 500 200 400 index index index index (a)Zachary's karate club (b)Dolphins (c)Email (d)Celegans A spectrum 500 A spectrum o degree 2 o degree △spectrum △spectrum o degree o degree 200 400 200 400 200 400 200 400 index index index index (e)ER graph of order 500 (f)BA graph of order 500 (g)Complete graph of order 500 (h)Ring graph of order 500 Fig.1:The close relation between Laplacian spectra and degree sequence in four representative real-world graphs(a-d)and four common synthetic graphs (e-h).Both the Laplacian spectra and degree sequence are sorted in non-increasing order.The x-axis represents the index of the sorted sequences,and the y-axis represents the value of Laplacian spectrum and degree. TABLE I:Structural information and von Neumann graph entropy of the graphs in Fig.1. Measurements Zachary Dolphins Email Celegans ER BA Complete Ring structural information 1 4.7044 5.7005 9.5665 7.9257 8.9497 8.5739 8.9659 8.9658 von Neumann graph entropy 4.5504 5.5489 9.5029 7.8631 8.9302 8.4935 8.9629 8.5231 entropy gap△H 0.1540 0.1516 0.0636 0.0626 0.0195 0.0804 0.0030 0.4427 relative error △H 3.38% 2.73% 0.67% 0.80% 0.22% 0.95% 0.03% 5.19% Connection with community structure:While the two- between the Laplacian spectrum and the degree sequence dimensional structural information [3]encodes the com- (Fig.1): munity structure,we find empirically that both the von More straightforward results to illustrate the tightness of Neumann graph entropy and the one-dimensional structural the approximation (Table D); information are uninformative of the community structure. A fine-grained analysis on the lower bound of the entropy However,they are effective in adversarial attacks on com- gap to show that the entropy gap is actually strictly larger munity detection,since maximizing the von Neumann graph than 0(Theorem 1); entropy will make the Laplacian spectrum uninformative A theoretical analysis on the entropy gap of several of the community structure.Using the similar idea,we classes of graphs,including complete graph,complete propose an alternative metric called spectral polarization bipartite graph,path graph,and ring graph(Table III and which is both effective and efficient in hiding the community Appendix B); structure. An analysis and discussion over the connection between Extensive experiments and evaluations:We use 3 random graph entropy and community structure (Section VI and graph models,9 real-world static graphs,and 2 real-world Appendix A-D). temporal graphs to evaluate the properties of the entropy Roadmap:The remainder of this paper is organized as gap and the proposed algorithms.The results show that the follows.We review three related issues in Section II.In entropy gap is small in a wide range of simple/weighted Section III we introduce the definitions of the von Neumann graphs.And it is insensitive to the change of graph size. graph entropy,structural information,and the notion of entropy Compared with the prominent methods for approximating gap.Section IV shows the close relationship between von the von Neumann graph entropy,the structural information Neumann graph entropy and structural information by bound- is superior in both accuracy and computational speed.It is ing the entropy gap.Section V presents efficient algorithms at least 2 orders of magnitude faster than the accurate SLaQ for two graph entropy based applications.In Section VI we [1]algorithm with comparable accuracy.Our proposed discuss the connection between von Neumann graph entropy algorithms based on structural information also exhibit and community structure.Section VII provides experimental superb performance in downstream tasks such as entropy- results.Section VIII offers some conclusions and directions driven network design,graph comparison,and community for future research. obfuscation. An earlier version of this work appeared in our WWW 2021 II.RELATED WORK conference paper [23].In addition to revising the conference We review three issues related to the von Neumann graph version,this TIT submission includes the following new entropy:computation,interpretation,and connection with the materials: community structure.The first two issues arise from the broad More experimental results to illustrate the relationship applications [13]-[15],[19],[24]-[27]of the von Neumann
IEEE TRANSACTIONS ON INFORMATION THEORY 3 0 10 20 30 index 0 10 spectrum degree (a) Zachary’s karate club 0 20 40 60 index 0 10 spectrum degree (b) Dolphins 0 500 1000 index 0 50 spectrum degree (c) Email 0 200 400 index 0 200 spectrum degree (d) Celegans 0 200 400 index 0 50 spectrum degree (e) ER graph of order 500 0 200 400 index 0 50 spectrum degree (f) BA graph of order 500 0 200 400 index 0 500 spectrum degree (g) Complete graph of order 500 0 200 400 index 0.0 2.5 spectrum degree (h) Ring graph of order 500 Fig. 1: The close relation between Laplacian spectra and degree sequence in four representative real-world graphs (a-d) and four common synthetic graphs (e-h). Both the Laplacian spectra and degree sequence are sorted in non-increasing order. The x-axis represents the index of the sorted sequences, and the y-axis represents the value of Laplacian spectrum and degree. TABLE I: Structural information and von Neumann graph entropy of the graphs in Fig. 1. Measurements Zachary Dolphins Email Celegans ER BA Complete Ring structural information H1 4.7044 5.7005 9.5665 7.9257 8.9497 8.5739 8.9659 8.9658 von Neumann graph entropy Hvn 4.5504 5.5489 9.5029 7.8631 8.9302 8.4935 8.9629 8.5231 entropy gap ∆H 0.1540 0.1516 0.0636 0.0626 0.0195 0.0804 0.0030 0.4427 relative error ∆H Hvn 3.38% 2.73% 0.67% 0.80% 0.22% 0.95% 0.03% 5.19% • Connection with community structure: While the twodimensional structural information [3] encodes the community structure, we find empirically that both the von Neumann graph entropy and the one-dimensional structural information are uninformative of the community structure. However, they are effective in adversarial attacks on community detection, since maximizing the von Neumann graph entropy will make the Laplacian spectrum uninformative of the community structure. Using the similar idea, we propose an alternative metric called spectral polarization which is both effective and efficient in hiding the community structure. • Extensive experiments and evaluations: We use 3 random graph models, 9 real-world static graphs, and 2 real-world temporal graphs to evaluate the properties of the entropy gap and the proposed algorithms. The results show that the entropy gap is small in a wide range of simple/weighted graphs. And it is insensitive to the change of graph size. Compared with the prominent methods for approximating the von Neumann graph entropy, the structural information is superior in both accuracy and computational speed. It is at least 2 orders of magnitude faster than the accurate SLaQ [1] algorithm with comparable accuracy. Our proposed algorithms based on structural information also exhibit superb performance in downstream tasks such as entropydriven network design, graph comparison, and community obfuscation. An earlier version of this work appeared in our WWW 2021 conference paper [23]. In addition to revising the conference version, this TIT submission includes the following new materials: • More experimental results to illustrate the relationship between the Laplacian spectrum and the degree sequence (Fig. 1); • More straightforward results to illustrate the tightness of the approximation (Table I); • A fine-grained analysis on the lower bound of the entropy gap to show that the entropy gap is actually strictly larger than 0 (Theorem 1); • A theoretical analysis on the entropy gap of several classes of graphs, including complete graph, complete bipartite graph, path graph, and ring graph (Table III and Appendix B); • An analysis and discussion over the connection between graph entropy and community structure (Section VI and Appendix A-D). Roadmap: The remainder of this paper is organized as follows. We review three related issues in Section II. In Section III we introduce the definitions of the von Neumann graph entropy, structural information, and the notion of entropy gap. Section IV shows the close relationship between von Neumann graph entropy and structural information by bounding the entropy gap. Section V presents efficient algorithms for two graph entropy based applications. In Section VI we discuss the connection between von Neumann graph entropy and community structure. Section VII provides experimental results. Section VIII offers some conclusions and directions for future research. II. RELATED WORK We review three issues related to the von Neumann graph entropy: computation, interpretation, and connection with the community structure. The first two issues arise from the broad applications [13]–[15], [19], [24]–[27] of the von Neumann
IEEE TRANSACTIONS ON INFORMATION THEORY TABLE II:Comparison of methods for approximating the von B.Spectral Descriptor of Graphs and Its Structural Counter- Neumann graph entropy in terms of fulfilled()and missing part (x)properties. Researchers in spectral graph theory have always been inter- 9] 1 [10]Structural Information (Ours) ested in establishing a connection between the combinatorial Provable accuracy characteristics of a graph and the algebraic properties of its Scalability associated matrices.For example,the algebraic connectivity Interpretability (also known as Fiedler eigenvalue),defined as the second smallest eigenvalue of a graph Laplacian matrix,has been used to measure the robustness [16]and synchronizability [32]of graph entropy,whereas the last issue comes from spectral graphs.The magnitude of the algebraic connectivity has also clustering [28]and two-dimensional structural information been found to reflect how well connected the overall graph based clustering [3]. is [17].As another example,the Fiedler vector,defined as the eigenvector corresponding to the Fiedler eigenvalue of a graph Laplacian matrix,has been found to be a good sign of A.Approximate Computation of the von Neumann Graph the bi-partition structure of a graph [33].However,there are Entropy some other spectral descriptors that have found applications In an effort to overcome the computational inefficiency of in graph analytics,but require more structural interpretations, the von Neumann graph entropy,past works have resorted such as the heat kernel trace [34],[35]and von Neumann graph entropy. to various numerical approximations.Chen et al.[9]first Simmons et al.[36]suggest to interpret the von Neumann compute a quadratic approximation of the entropy via Taylor graph entropy as the centralization of graphs,which is very expansion,then derive two finer approximations with accuracy guarantee by spectrum-based and degree-based rescaling.re- similar to our interpretation using the structural information. spectively.Before Chen's work,the Taylor expansion is widely They derive both upper and lower bounds on the von Neumann adopted to give computationally efficient approximations [29], graph entropy in terms of graph centralization under some but there is no theoretical guarantee on the approximation hard assumptions on the range of the von Neumann graph accuracy.Following Chen's work,Choi et al.[10]propose entropy.Therefore,their results cannot be directly converted several more complicated quadratic approximations based on to accuracy guaranteed approximations of the von Neumann advanced polynomial approximation methods the superiority graph entropy for arbitrary simple graphs.By constrast,our work shows that the structural information is an accurate, of which is verified through experiments. scalable,and interpretable proxy of the von Neumann graph Besides,there is a trend to approximate spectral sums using stochastic trace estimation based approximations [30], entropy for arbitrary simple graphs.Besides,the techniques used in our proof are also quite different from [36]. the merit of which is the provable error-bounded estimation of the spectral sums.For example,Kontopoulou et al.[11 propose three randomized algorithms based on Taylor series, C.Spectrum,Detectability,and Significance of Community Chebyshev polynomials,and random projection matrices to Structure approximate the von Neumann entropy of density matrices.As Community structure is one of the most recognized char- another example,based on the stochastic Lanczos quadrature acteristics of large scale real-world graphs in which similar technique [31].Tsitsulin et al.[1]propose an efficient and nodes tend to cluster together.Thus it has found applications effective approximation technique called SLaQ to estimate in classification,recommendation.and link prediction.etc. the von Neumann entropy and other spectral descriptors for Started from the Fiedler vector,spectral algorithms have been web-scale graphs.However,the approximation error bound of widely studied for detecting the community structure in a SLaQ for the von Neumann graph entropy is not provided. graph [37].[38]because they are simple and theoretically The disadvantages of such stochastic approximations are also warranted.Cheeger's inequality hc v2u2 bounds obvious;their computational efficiency depends on the number the conductance hc of a graph G using the second smallest of random vectors used in stochastic trace estimation,and eigenvalue of the normalized Laplacian matrix.This is later they are not suitable for applications like anomaly detection generalized to multiway spectral partitioning[39]yielding the in graph streams and entropy-driven network design. higher-order Cheeger inequalities学≤pc(k)≤O(k2), The comparison of methods for approximating the von for each k,where uk is the k-th smallest eigenvalue of Neumann graph entropy is presented in Table II.One of the the normalized Lapalcian matrix and pc(k)is the k-way common drawbacks of the aforementioned methods is the expansion constant.Since both hc and Pc(k)measure the lack of interpretability,that is,none of these methods provide significance of community structure,the graph spectrum is enough evidence to interpret this spectrum based entropy closely related to the community structure. measure in terms of structural patterns.By contrast,as a The coupling between graph spectrum and community good proxy of the von Neumann graph entropy,the structural structure has been empirically validated in [37]where New- information offers us the intuition that the spectrum based man found that if the second smallest eigenvalue A2 of the entropy measure is closely related to the degree heterogeneity Laplacian matrix is well separated from the eigenvalues above of graphs. it,the spectral clustering based on Lapalcian matrix often
IEEE TRANSACTIONS ON INFORMATION THEORY 4 TABLE II: Comparison of methods for approximating the von Neumann graph entropy in terms of fulfilled (✓) and missing (✗) properties. [9] [1] [10] Structural Information (Ours) Provable accuracy ✓ ✗ ✗ ✓ Scalability ✓ ✓ ✗ ✓ Interpretability ✗ ✗ ✗ ✓ graph entropy, whereas the last issue comes from spectral clustering [28] and two-dimensional structural information based clustering [3]. A. Approximate Computation of the von Neumann Graph Entropy In an effort to overcome the computational inefficiency of the von Neumann graph entropy, past works have resorted to various numerical approximations. Chen et al. [9] first compute a quadratic approximation of the entropy via Taylor expansion, then derive two finer approximations with accuracy guarantee by spectrum-based and degree-based rescaling, respectively. Before Chen’s work, the Taylor expansion is widely adopted to give computationally efficient approximations [29], but there is no theoretical guarantee on the approximation accuracy. Following Chen’s work, Choi et al. [10] propose several more complicated quadratic approximations based on advanced polynomial approximation methods the superiority of which is verified through experiments. Besides, there is a trend to approximate spectral sums using stochastic trace estimation based approximations [30], the merit of which is the provable error-bounded estimation of the spectral sums. For example, Kontopoulou et al. [11] propose three randomized algorithms based on Taylor series, Chebyshev polynomials, and random projection matrices to approximate the von Neumann entropy of density matrices. As another example, based on the stochastic Lanczos quadrature technique [31], Tsitsulin et al. [1] propose an efficient and effective approximation technique called SLaQ to estimate the von Neumann entropy and other spectral descriptors for web-scale graphs. However, the approximation error bound of SLaQ for the von Neumann graph entropy is not provided. The disadvantages of such stochastic approximations are also obvious; their computational efficiency depends on the number of random vectors used in stochastic trace estimation, and they are not suitable for applications like anomaly detection in graph streams and entropy-driven network design. The comparison of methods for approximating the von Neumann graph entropy is presented in Table II. One of the common drawbacks of the aforementioned methods is the lack of interpretability, that is, none of these methods provide enough evidence to interpret this spectrum based entropy measure in terms of structural patterns. By contrast, as a good proxy of the von Neumann graph entropy, the structural information offers us the intuition that the spectrum based entropy measure is closely related to the degree heterogeneity of graphs. B. Spectral Descriptor of Graphs and Its Structural Counterpart Researchers in spectral graph theory have always been interested in establishing a connection between the combinatorial characteristics of a graph and the algebraic properties of its associated matrices. For example, the algebraic connectivity (also known as Fiedler eigenvalue), defined as the second smallest eigenvalue of a graph Laplacian matrix, has been used to measure the robustness [16] and synchronizability [32] of graphs. The magnitude of the algebraic connectivity has also been found to reflect how well connected the overall graph is [17]. As another example, the Fiedler vector, defined as the eigenvector corresponding to the Fiedler eigenvalue of a graph Laplacian matrix, has been found to be a good sign of the bi-partition structure of a graph [33]. However, there are some other spectral descriptors that have found applications in graph analytics, but require more structural interpretations, such as the heat kernel trace [34], [35] and von Neumann graph entropy. Simmons et al. [36] suggest to interpret the von Neumann graph entropy as the centralization of graphs, which is very similar to our interpretation using the structural information. They derive both upper and lower bounds on the von Neumann graph entropy in terms of graph centralization under some hard assumptions on the range of the von Neumann graph entropy. Therefore, their results cannot be directly converted to accuracy guaranteed approximations of the von Neumann graph entropy for arbitrary simple graphs. By constrast, our work shows that the structural information is an accurate, scalable, and interpretable proxy of the von Neumann graph entropy for arbitrary simple graphs. Besides, the techniques used in our proof are also quite different from [36]. C. Spectrum, Detectability, and Significance of Community Structure Community structure is one of the most recognized characteristics of large scale real-world graphs in which similar nodes tend to cluster together. Thus it has found applications in classification, recommendation, and link prediction, etc. Started from the Fiedler vector, spectral algorithms have been widely studied for detecting the community structure in a graph [37], [38] because they are simple and theoretically warranted. Cheeger’s inequality µ2 2 ≤ hG ≤ √ 2µ2 bounds the conductance hG of a graph G using the second smallest eigenvalue of the normalized Laplacian matrix. This is later generalized to multiway spectral partitioning [39] yielding the higher-order Cheeger inequalities µk 2 ≤ ρG(k) ≤ O(k 2 ) √µk for each k, where µk is the k-th smallest eigenvalue of the normalized Lapalcian matrix and ρG(k) is the k-way expansion constant. Since both hG and ρG(k) measure the significance of community structure, the graph spectrum is closely related to the community structure. The coupling between graph spectrum and community structure has been empirically validated in [37] where Newman found that if the second smallest eigenvalue λ2 of the Laplacian matrix is well separated from the eigenvalues above it, the spectral clustering based on Lapalcian matrix often
IEEE TRANSACTIONS ON INFORMATION THEORY 5 does very well.However,community detection by spectral Definition 3(Entropy gap):The entropy gap of an undi- algorithms in sparse graphs often fails,because the spectrum rected graph G=(V,E,A)is defined as AH(G)=H1(G)- contains no clear evidence of community structure.This is 孔vm(G) exemplified under the sparse stochastic block model with two The von Neumann graph entropy and the structural informa- clusters of equal size [38],[40],where the second smallest tion are well-defined for all the undirected graphs except for eigenvalue of the adjacency matrix gets lost in the bulk of the graphs with empty edge set,in which vol(G)=0.When uninformative eigenvalues.Our experiments complement the E=,we take it for granted that H1(G)=Hvn(G)=0. correlation between graph spectrum and community structure by showing that the spikes in a sequence of spectral gaps are IV.APPROXIMATION ERROR ANALYSIS good indicators of the community structure. In this section we bound the entropy gap in the undirected The significance and detectability of community structure graphs of order n.Since the nodes with degree 0 have no has found its application in an emerging area called com- contribution to both the structural information and the von munity obfuscation [41]-[47],where the graph structure is Neumann graph entropy,without loss of generality we assume minimally perturbed to protect its community structure from that di>0 for any node iV. being detected.None of these practical algorithms exploit the correlation between graph spectrum and community structure except for the structural entropy proposed by Liu et al.[45]. A.Bounds on the Approximation Error Our work bridges the one-dimensional structural entropy in We first provide bounds on the additive approximation error [45]with the spectral entropy,elaborates both empirically and in Theorem 1,Corollary 1,and Corollary 2,then analyze the theoretically that maximizing the spectral entropy is effective multiplicative approximation error in Theorem 2 in community obfuscation,and thus provides a theoretical Theorem I (Bounds on the absolute approximation error): foundation for the success of the structural entropy [45]. For any undirected graph G=(V,E,A),the inequality III.PRELIMINARIES 00}is the minimum positive with positive edge weights,where V=[n]{1,...,n} degree. is the node set.E is the edge set and A is the Before proving Theorem 1,we introduce two techniques: symmetric weight matrix with positive entry Aij denoting the the majorization and the Jensen's gap.The former one is weight of an edge (i,j)EE.If the node pair (i,j)E, a preorder of the vector of reals,while the latter is an then Aij=0.If the graph G is unweighted,the weight inverse version of the Jensen's inequality,whose definitions matrix A E(0,1]nxn is called the adjacency matrix of G. are presented as follows. The degree of the node i V in the graph G is defined Definition 4 (Majorization [48]):For a vector x E Rd,we as di= A.The Laplacian matrix of the graph G denote by xERd the vector with the same components,but is defined as L D-A where D=diag(d1,...,dn)is sorted in descending order.Given x,y Rd,we say that x the degree matrix.Let (A:)1 be the sorted eigenvalues of majorizesy(written as x>y)if and only if∑-1≥ L such that0=d≤2≤…≤入n,which is called Laplacian specmm.w demnevoas the ∑=1 for any∈[网and xT1=yr1. Lemma I (Jensen's gap [211):Let X be a one-dimensional volume of graph G,then vol(G)=tr(L))=∑=1入iwhere random variable with the mean u and the support Let ( tr()is the trace operator.For the convenience of delineation, be a twice differentiable function on and define the function we define a special function f(z)xlog2x on the support h()P-,then Ev(X)】-中(EX])≤ [0,oo)where f(0)limzLo f(x)=0 by convention. suprth()}.var(X).Additionally,if (r)is convex,then In the following,we present the formal definitions of the h(x)is monotonically increasing in x,and if (is concave, von Neumann graph entropy,the structural information,and then h(x)is monotonically decreasing in z. the entropy gap.Slightly different from the one-dimensional Lemma 2:The function f()=xlog2x is convex,its first structural information proposed by Li et al.[3],our definition order derivative f()=log2x+log2e is concave. of structural information does not require the graph G to be Proof:The second order derivative f"(z)=(log2 e)/x> connected. 0,thus f(x)=log2x is convex. Definition I(von Neumann graph entropy):The von Neu- We can see that the majorization characterizes the degree of mann graph entropy of an undirected graph G=(V,E,A) concentration between the two vectors x and y.Specifically. is defined asH(G=-∑=f,/ol(G)》,where0= x>y means that the entries of y are more concentrated on its 入1≤2≤·≤入are the eigenvalues of the Laplacian mean yT1/1T1 than the entires of x.An equivalent definition matrix L=D-A of the graph G,and vol(G)=>=1Ai is of the majorization [48]using linear algebra says that x the volume of G. y if and only if there exists a doubly stochastic matrix P Definition 2 (Structural information):The structural infor- such that Px =y.As a famous example of the majorization, mation of an undirected graph G=(V,E,A)is defined as the Schur-Horn theorem [48]says that the diagonal elements H(G)=->1f(di/vol(G)),where di is the degree of of a positive semidefinite Hermitian matrix are majorized by node i in G and vol(G)=d is the volume of G. its eigenvalues.Since xTLx=(i)Aj()20
IEEE TRANSACTIONS ON INFORMATION THEORY 5 does very well. However, community detection by spectral algorithms in sparse graphs often fails, because the spectrum contains no clear evidence of community structure. This is exemplified under the sparse stochastic block model with two clusters of equal size [38], [40], where the second smallest eigenvalue of the adjacency matrix gets lost in the bulk of uninformative eigenvalues. Our experiments complement the correlation between graph spectrum and community structure by showing that the spikes in a sequence of spectral gaps are good indicators of the community structure. The significance and detectability of community structure has found its application in an emerging area called community obfuscation [41]–[47], where the graph structure is minimally perturbed to protect its community structure from being detected. None of these practical algorithms exploit the correlation between graph spectrum and community structure except for the structural entropy proposed by Liu et al. [45]. Our work bridges the one-dimensional structural entropy in [45] with the spectral entropy, elaborates both empirically and theoretically that maximizing the spectral entropy is effective in community obfuscation, and thus provides a theoretical foundation for the success of the structural entropy [45]. III. PRELIMINARIES In this paper, we study the undirected graph G = (V, E, A) with positive edge weights, where V = [n] , {1, . . . , n} is the node set, E is the edge set, and A ∈ R n×n + is the symmetric weight matrix with positive entry Aij denoting the weight of an edge (i, j) ∈ E. If the node pair (i, j) ∈/ E, then Aij = 0. If the graph G is unweighted, the weight matrix A ∈ {0, 1} n×n is called the adjacency matrix of G. The degree of the node i ∈ V in the graph G is defined as di = Pn j=1 Aij . The Laplacian matrix of the graph G is defined as L , D − A where D = diag(d1, . . . , dn) is the degree matrix. Let {λi} n i=1 be the sorted eigenvalues of L such that 0 = λ1 ≤ λ2 ≤ · · · ≤ λn, which is called Laplacian spectrum. We define vol(G) = Pn i=1 di as the volume of graph G, then vol(G) = tr(L) = Pn i=1 λi where tr(·) is the trace operator. For the convenience of delineation, we define a special function f(x) , x log2 x on the support [0, ∞) where f(0) , limx↓0 f(x) = 0 by convention. In the following, we present the formal definitions of the von Neumann graph entropy, the structural information, and the entropy gap. Slightly different from the one-dimensional structural information proposed by Li et al. [3], our definition of structural information does not require the graph G to be connected. Definition 1 (von Neumann graph entropy): The von Neumann graph entropy of an undirected graph G = (V, E, A) is defined as Hvn(G) = − Pn i=1 f(λi/vol(G)), where 0 = λ1 ≤ λ2 ≤ · · · ≤ λn are the eigenvalues of the Laplacian matrix L = D − A of the graph G, and vol(G) = Pn i=1 λi is the volume of G. Definition 2 (Structural information): The structural information of an undirected graph G = (V, E, A) is defined as H1(G) = − Pn i=1 f(di/vol(G)), where di is the degree of node i in G and vol(G) = Pn i=1 di is the volume of G. Definition 3 (Entropy gap): The entropy gap of an undirected graph G = (V, E, A) is defined as ∆H(G) = H1(G)− Hvn(G). The von Neumann graph entropy and the structural information are well-defined for all the undirected graphs except for the graphs with empty edge set, in which vol(G) = 0. When E = ∅, we take it for granted that H1(G) = Hvn(G) = 0. IV. APPROXIMATION ERROR ANALYSIS In this section we bound the entropy gap in the undirected graphs of order n. Since the nodes with degree 0 have no contribution to both the structural information and the von Neumann graph entropy, without loss of generality we assume that di > 0 for any node i ∈ V . A. Bounds on the Approximation Error We first provide bounds on the additive approximation error in Theorem 1, Corollary 1, and Corollary 2, then analyze the multiplicative approximation error in Theorem 2. Theorem 1 (Bounds on the absolute approximation error): For any undirected graph G = (V, E, A), the inequality 0 0} is the minimum positive degree. Before proving Theorem 1, we introduce two techniques: the majorization and the Jensen’s gap. The former one is a preorder of the vector of reals, while the latter is an inverse version of the Jensen’s inequality, whose definitions are presented as follows. Definition 4 (Majorization [48]): For a vector x ∈ R d , we denote by x ↓ ∈ R d the vector with the same components, but sorted in descending order. Given x, y ∈ R d , we say that x majorizes y (written as x y) if and only if Pk i=1 x ↓ i ≥ Pk i=1 y ↓ i for any k ∈ [d] and x |1 = y |1. Lemma 1 (Jensen’s gap [21]): Let X be a one-dimensional random variable with the mean µ and the support Ω. Let ψ(x) be a twice differentiable function on Ω and define the function h(x) , ψ(x)−ψ(µ) (x−µ) 2 − ψ 0 (µ) x−µ , then E[ψ(X)] − ψ(E[X]) ≤ supx∈Ω{h(x)}· var(X). Additionally, if ψ 0 (x) is convex, then h(x) is monotonically increasing in x, and if ψ 0 (x) is concave, then h(x) is monotonically decreasing in x. Lemma 2: The function f(x) = x log2 x is convex, its first order derivative f 0 (x) = log2 x + log2 e is concave. Proof: The second order derivative f 00(x) = (log2 e)/x > 0, thus f(x) = x log2 x is convex. We can see that the majorization characterizes the degree of concentration between the two vectors x and y. Specifically, x y means that the entries of y are more concentrated on its mean y |1/1 |1 than the entires of x. An equivalent definition of the majorization [48] using linear algebra says that x y if and only if there exists a doubly stochastic matrix P such that Px = y. As a famous example of the majorization, the Schur-Horn theorem [48] says that the diagonal elements of a positive semidefinite Hermitian matrix are majorized by its eigenvalues. Since x TLx = P (i,j)∈E Aij (xi − xj ) 2 ≥ 0
IEEE TRANSACTIONS ON INFORMATION THEORY 6 for any vector x E R",the Laplacian matrix L is a positive where semidefinite symmetric matrix whose diagonal elements form the degree sequence d and eigenvalues form the spectrum A. h回=f四)-fEX-f'EK Therefore,the majorization Ad implies that there exists (x-EX])2 x-E[Xi] some doubly stochastic matrix p=(p)∈[0,l]nxn such Since f'(x)is concave,hi(z)is monotonically decreasing in that Pλ=d. Therefore,supo.(h()=h().Since Using the relation PA d and the convexity of f(x)in Lemma 2,we can now proceed to prove Theorem 1. h:(0)= ()-f(di)f(d)logaesloge Proof of Theorem 1:For each i E V,we define a discrete d di 6 random variable X;whose probability mass function is given the inequality in (5)can be simplified as by().where (is the Kronecker delta func- tion.Then the expectationE=d and the variance var(X)=∑=1p%(-d=∑1p号-d. ()-rd)s (6 First,we express the entropy gap in terms of the Lapalcian spectrum and the degree sequence.Since By summing both sides of the inequality (6)over i,we get di di an upper bound UB on∑=1f(a)-∑-1f(d)as i=1 vol(G) f(d)-∑dlog2(vol(G) ue-学((区w-) i=1 log2 (vol(G))- ∑,fa log2e vol(G) 区-) and similarly Hv(G)=loga(vol(G)-( log2e.(tr(L2)-tr(D2)) vol(G) (3) g2e.(tr(A2)-tr(AD)-tr(DA)) =1 we have 6 △H(G)=,(G-Hn(G=∑f0)-∑1fd =1og2e.tr(42). vol(G) (4) As a result,△H(G)=ZRf-fa≤loe48 Second,we use Jensen's inequality to prove AH(G)>0. vol(G) 6 vol(G) ■ Since f(r)is convex,f(di)=f(E[Xi])0 for any undirected graphs.Actu- Corollary I(Constant bounds on the entropy gap):For any ally,AH(G)cannot be 0.To see this,suppose that the unweighted,undirected graph G.00, =vol(G), we have P(Xi=di)=1.However,it contradicts the fact that P(Xi =0)>P(Xi =A1)=pi,1 =Therefore the amdi≥1,therefore00 for any undirected log2 e. ■ graphs. Corollary 2 (Entropy gap of regular graphs):For any Finally,we use the Jensen's gap to prove the upper bound unweighted,undirected,regular graph Ga of degree d,the on AH(G)in(1).Apply the Jensen's gap to Xi and f(z), inequality0<△H(Ga)≤holds.. Ef(X】-f(EX)≤sup{h:(}var(X,(S)d Proof sketch:In any unweighted,regular graph Gd.6= x∈0,vol(G)】 ■
IEEE TRANSACTIONS ON INFORMATION THEORY 6 for any vector x ∈ R n, the Laplacian matrix L is a positive semidefinite symmetric matrix whose diagonal elements form the degree sequence d and eigenvalues form the spectrum λ. Therefore, the majorization λ d implies that there exists some doubly stochastic matrix P = (pij ) ∈ [0, 1]n×n such that Pλ = d. Using the relation Pλ = d and the convexity of f(x) in Lemma 2, we can now proceed to prove Theorem 1. Proof of Theorem 1: For each i ∈ V , we define a discrete random variable Xi whose probability mass function is given by Pn j=1 pij δλj (x), where δa(x) is the Kronecker delta function. Then the expectation E[Xi ] = Pn j=1 pijλj = di and the variance var(Xi) = Pn j=1 pij (λj − di) 2 = Pn j=1 pijλ 2 j − d 2 i . First, we express the entropy gap in terms of the Lapalcian spectrum and the degree sequence. Since H1(G) = − Xn i=1 di vol(G) log2 di vol(G) = − 1 vol(G) Xn i=1 f(di) − Xn i=1 di log2 (vol(G))! = log2 (vol(G)) − Pn i=1 f(di) vol(G) , (2) and similarly Hvn(G) = log2 (vol(G)) − Pn i=1 f(λi) vol(G) , (3) we have ∆H(G) = H1(G) − Hvn(G) = Pn i=1 f(λi) − Pn i=1 f(di) vol(G) . (4) Second, we use Jensen’s inequality to prove ∆H(G) > 0. Since f(x) is convex, f(di) = f(E[Xi ]) ≤ E[f(Xi)] for any i ∈ {1, . . . , n}. By summing over i, we have Xn i=1 f(di) ≤ Xn i=1 E[f(Xi)] = Xn i=1 Xn j=1 pijf(λj ) = Xn j=1 f(λj ). Therefore, ∆H(G) ≥ 0 for any undirected graphs. Actually, ∆H(G) cannot be 0. To see this, suppose that the Laplacian matrix can be decomposed as L = UΛU | where Λ = diag(λ1, . . . , λn) and U = (u·,1| · · · |u·,n) is a unitary matrix. Note that λ1 = 0 and u·,1 = 1n/ √ n. The i-th diagonal element of the matrix UΛU | = Pn j=1 λju·,ju | ·,j P is given by n j=1 λj |uij | 2 . Since the i-th diagonal element of L is di and L = UΛU | , we have di = Pn j=1 λj |uij | 2 for each i ∈ [n]. Recall that Pλ = d where P is a doubly-stochastic matrix, therefore pij = |uij | 2 . Now suppose for contradiction that ∆H(G) = 0, which means that f(E[Xi ]) = E[f(Xi)] for each i ∈ [n]. Due to the strict convexity of f(·), the discrete random variable Xi has to be deterministic. Since E[Xi ] = di > 0, we have P(Xi = di) = 1. However, it contradicts the fact that P(Xi = 0) ≥ P(Xi = λ1) = pi,1 = 1 n . Therefore the contradiction implies that ∆H(G) > 0 for any undirected graphs. Finally, we use the Jensen’s gap to prove the upper bound on ∆H(G) in (1). Apply the Jensen’s gap to Xi and f(x), E[f(Xi)] − f(E[Xi ]) ≤ sup x∈[0,vol(G)] {hi(x)} · var(Xi), (5) where hi(x) = f(x) − f(E[Xi ]) (x − E[Xi ])2 − f 0 (E[Xi ]) x − E[Xi ] . Since f 0 (x) is concave, hi(x) is monotonically decreasing in x. Therefore, supx∈[0,vol(G)]{hi(x)} = hi(0). Since hi(0) = f(0) − f(di) d 2 i + f 0 (di) di = log2 e di ≤ log2 e δ , the inequality in (5) can be simplified as Xn j=1 pijf(λj ) − f(di) ≤ log2 e δ · Xn j=1 pijλ 2 j − d 2 i . (6) By summing both sides of the inequality (6) over i, we get an upper bound UB on Pn j=1 f(λj ) − Pn i=1 f(di) as UB = log2 e δ · Xn i=1 Xn j=1 pijλ 2 j − d 2 i = log2 e δ · Xn j=1 λ 2 j − Xn i=1 d 2 i = log2 e δ · tr(L 2 ) − tr(D2 ) = log2 e δ · tr(A 2 ) − tr(AD) − tr(DA) = log2 e δ · tr(A 2 ). As a result, ∆H(G) = Pn i=1 f(λi)− Pn i=1 f(di) vol(G) ≤ log2 e δ tr(A 2 ) vol(G) . To illustrate the tightness of the bounds in Theorem 1, we further derive bounds on the entropy gap for unweighted graphs, especially the regular graphs. Via multiplicative error analysis, we show that the structural information converges to the von Neumann graph entropy as the graph size grows. Corollary 1 (Constant bounds on the entropy gap): For any unweighted, undirected graph G, 0 < ∆H(G) ≤ log2 e holds. Proof: In unweighted graph G, tr(A 2 ) = Xn i=1 Xn j=1 AijAji = Xn i=1 Xn j=1 Aij = Xn i=1 di = vol(G), and δ ≥ 1, therefore 0 < ∆H(G) ≤ log2 e δ tr(A 2 ) vol(G) = log2 e δ ≤ log2 e. Corollary 2 (Entropy gap of regular graphs): For any unweighted, undirected, regular graph Gd of degree d, the inequality 0 < ∆H(Gd) ≤ log2 e d holds. Proof sketch: In any unweighted, regular graph Gd, δ = d
IEEE TRANSACTIONS ON INFORMATION THEORY 7 Theorem 2 (Convergence of the multiplicative approxima- V.APPLICATIONS AND ALGORITHMS tion error):For almost all unweighted graphs G of order n, 10and decays to at the rate of o(1/loga(n). As a measure of the structural complexity of a graph the von Neumann entropy has been applied in a variety of Proof:Dairyko et al.[49]proved that for almost all applications.For example,the von Neumann graph entropy is unweighted graphs G of order n,Hvn(G)>Hvn(K1.n-1) exploited to measure the edge centrality [15]and the vertex where Ki.n-1 stands for the star graph.Since Hvn(K1.n-1)= centrality [18]in complex networks.As another example,the 1os影2m-2)-zlog2n=1+号1og2n+o1).高-1= von Neumann graph entropy can also be used to measure the 会8≤m爱=o(g ◆ distance between graphs for graph classification and anomaly detection [9].[14].In addition,the von Neumann graph entropy is used in the context of graph representation learning B. Sharpened Bounds on the Entropy Gap [27]to learn low-dimensional feature representations of nodes. We observe that,in these applications,the von Neumann graph Though the constant bounds on the entropy gap are tight entropy is used to address the following primitive tasks: enough for applications,we can still sharpen the bounds on Entropy-based network design:Design a new graph by the entropy gap in unweighted graphs using more advanced minimally perturbing the existing graph to meet the entropic majorizations. requirement.For example,Minello et al.[19]use the von Theorem 3 (Sharpened lower bound on entropy gap): Neumann entropy to explore the potential network growth For any unweighted,undirected graph G,AH(G)is lower model via experiments. bounded by (f(dmax+1)-f(dmax)+f(6-1)-f(6))/vol(G) Graph similarity measure:Compute the similarity score where diax is the maximum degree and 6 is the minimum between two graphs,which is represented by a real positive positive degree. number.For example,Domenico et al.[13]use the von Proof:The proof is based on the advanced majorization Neumann graph entropy to compute the Jensen-Shannon [50]λ>(d1+1,d2,.,dn-1)where d1≥d2≥·≥dnis distance between graphs for the purpose of compressing the sorted degree sequence of the unweighted undirected graph multilayer networks. G.Similar to the proof of Theorem 1,we havef()> Both of the primitive tasks require exact computation of f(d+1)+f(dn-1)+2 f(di).Then the sharpened the von Neumann graph entropy.To reduce the computational upper bound follows from the equation (4)since di=dmax complexity and preserve the interpretability,we can use the and dn=0. accurate proxy,structural information,to approximately solve Theorem 4 (Sharpened upper bound on entropy gap):For these tasks. any unweighted,undirected graph G=(V,E),AH(G)is up- per bounded by minflog2 e,61,b2}where b =(i) vol(G) A.Entropy-based network design 2 and-1og21+∑”1golG)-a vol(G) Network design aims to minimally perturb the network to Here (di,...,d)is the conjugate degree sequence of G fulfill some goals.Consider such a goal to maximize the where d=l{ild:≥k von Neumann entropy of a graph,it helps to understand Proof:We first prove△H(G)≤b1 using the Grone- how different structural patterns influence the entropy value. Merris majorization [22]:(df,...,d)A.Similar to the The entropy-based network design problem is formulated as proof of Theorem 1.we have∑glf(d)≥∑1fA), follows. thus b≥ o∑f)-fa=△H(G).Ve then prove Problem I(MaxEntropy):Given an unweighted,undirected vol(G) △H(G)≤b2.Since graph G=(V,E)of order n and an integer budget k,find a set F of non-existing edges of G whose addition to G creates ∑1f(A) the largest increase of the von Neumann graph entropy and vol(G) 1 0g2≤log2 IF≤k. =1 Due to the spectral nature of the von Neumann graph entropy,it is not easy to find an effective strategy to perturb the and graph,especially in the scenario where there are exponential 2=1+∑ number of combinations for the subset F.If we use the ∑=1xvol(G) vol(G) structural information as a proxy of the von Neumann entropy, Problem 1 can be reduced to maximizing H1(G')where we have△H(G)=Zif-fd≤b2 G'=(V,EU F)such that F<k.To further alleviate vol(G) the computational pressure rooted in the exponential size of the search space for F,we adopt the greedy method in which C.Entropy Gap of Various Types of Graphs the new edges are added one by one until either the structural information attains its maximum value log2 n or k new edges As summarized in Table III,we analyze the entropy gap have already been added.We denote the graph with l new of various types of graphs including the complete graph,the edges as Gi=(V,E),then Go =G.Now suppose that complete bipartite graph,the path graph,and the ring graph, we have G whose structural information is less than log2n, the proofs of which can be found in Appendix B. then we want to find a new edge e+1=(u,v)such that
IEEE TRANSACTIONS ON INFORMATION THEORY 7 Theorem 2 (Convergence of the multiplicative approximation error): For almost all unweighted graphs G of order n, H1(G) Hvn(G) − 1 > 0 and decays to 0 at the rate of o(1/ log2 (n)). Proof: Dairyko et al. [49] proved that for almost all unweighted graphs G of order n, Hvn(G) ≥ Hvn(K1,n−1) where K1,n−1 stands for the star graph. Since Hvn(K1,n−1) = log2 (2n−2)− n 2n−2 log2 n = 1+ 1 2 log2 n+o(1), H1(G) Hvn(G) −1 = ∆H(G) Hvn(G) ≤ log2 e Hvn(K1,n−1) = o( 1 log2 n ). B. Sharpened Bounds on the Entropy Gap Though the constant bounds on the entropy gap are tight enough for applications, we can still sharpen the bounds on the entropy gap in unweighted graphs using more advanced majorizations. Theorem 3 (Sharpened lower bound on entropy gap): For any unweighted, undirected graph G, ∆H(G) is lower bounded by (f(dmax+1)−f(dmax)+f(δ−1)−f(δ))/vol(G) where dmax is the maximum degree and δ is the minimum positive degree. Proof: The proof is based on the advanced majorization [50] λ (d1+1, d2, . . . , dn−1) where d1 ≥ d2 ≥ · · · ≥ dn is the sorted degree sequence of the unweighted undirected graph G. Similar to the proof of Theorem 1, we have Pn i=1 f(λi) ≥ f(d1 + 1) + f(dn − 1) + Pn−1 i=2 f(di). Then the sharpened upper bound follows from the equation (4) since d1 = dmax and dn = δ. Theorem 4 (Sharpened upper bound on entropy gap): For any unweighted, undirected graph G = (V, E), ∆H(G) is upper bounded by min{log2 e, b1, b2} where b1 = Pn i=1 f(d ∗ i ) vol(G) − Pn i=1 f(di) vol(G) and b2 = log2 (1 + Pn i=1 d 2 i /vol(G)) − Pn i=1 f(di) vol(G) . Here (d ∗ 1 , . . . , d∗ n ) is the conjugate degree sequence of G where d ∗ k = |{i|di ≥ k}|. Proof: We first prove ∆H(G) ≤ b1 using the GroneMerris majorization [22]: (d ∗ 1 , . . . , d∗ n ) λ. Similar to the proof of Theorem 1, we have Pn i=1 f(d ∗ i ) ≥ Pn i=1 f(λi), thus b1 ≥ Pn i=1 f(λi)− Pn i=1 f(di) vol(G) = ∆H(G). We then prove ∆H(G) ≤ b2. Since Pn i=1 f(λi) vol(G) = Xn i=1 P λi n j=1 λj ! log2 λi ≤ log2 Pn i=1 λ 2 P i n j=1 λj ! and Pn i=1 λ 2 P i n i=1 λi = tr(L 2 ) vol(G) = 1 + Pn i=1 d 2 i vol(G) , we have ∆H(G) = Pn i=1 f(λi)−f(di) vol(G) ≤ b2. C. Entropy Gap of Various Types of Graphs As summarized in Table III, we analyze the entropy gap of various types of graphs including the complete graph, the complete bipartite graph, the path graph, and the ring graph, the proofs of which can be found in Appendix B. V. APPLICATIONS AND ALGORITHMS As a measure of the structural complexity of a graph, the von Neumann entropy has been applied in a variety of applications. For example, the von Neumann graph entropy is exploited to measure the edge centrality [15] and the vertex centrality [18] in complex networks. As another example, the von Neumann graph entropy can also be used to measure the distance between graphs for graph classification and anomaly detection [9], [14]. In addition, the von Neumann graph entropy is used in the context of graph representation learning [27] to learn low-dimensional feature representations of nodes. We observe that, in these applications, the von Neumann graph entropy is used to address the following primitive tasks: • Entropy-based network design: Design a new graph by minimally perturbing the existing graph to meet the entropic requirement. For example, Minello et al. [19] use the von Neumann entropy to explore the potential network growth model via experiments. • Graph similarity measure: Compute the similarity score between two graphs, which is represented by a real positive number. For example, Domenico et al. [13] use the von Neumann graph entropy to compute the Jensen-Shannon distance between graphs for the purpose of compressing multilayer networks. Both of the primitive tasks require exact computation of the von Neumann graph entropy. To reduce the computational complexity and preserve the interpretability, we can use the accurate proxy, structural information, to approximately solve these tasks. A. Entropy-based network design Network design aims to minimally perturb the network to fulfill some goals. Consider such a goal to maximize the von Neumann entropy of a graph, it helps to understand how different structural patterns influence the entropy value. The entropy-based network design problem is formulated as follows, Problem 1 (MaxEntropy): Given an unweighted, undirected graph G = (V, E) of order n and an integer budget k, find a set F of non-existing edges of G whose addition to G creates the largest increase of the von Neumann graph entropy and |F| ≤ k. Due to the spectral nature of the von Neumann graph entropy, it is not easy to find an effective strategy to perturb the graph, especially in the scenario where there are exponential number of combinations for the subset F. If we use the structural information as a proxy of the von Neumann entropy, Problem 1 can be reduced to maximizing H1(G0 ) where G0 = (V, E ∪ F) such that |F| ≤ k. To further alleviate the computational pressure rooted in the exponential size of the search space for F, we adopt the greedy method in which the new edges are added one by one until either the structural information attains its maximum value log2 n or k new edges have already been added. We denote the graph with l new edges as Gl = (V, El), then G0 = G. Now suppose that we have Gl whose structural information is less than log2 n, then we want to find a new edge el+1 = (u, v) such that
IEEE TRANSACTIONS ON INFORMATION THEORY TABLE III: Structural information,von Neumann graph entropy,and entropy gap of specific graphs Graph Types Structural information H1 von Neumann graph entropy Hvn Entropy gap△H Complete graph Kn log2 n log2(n-1) o821+点) Complete bipartite graphKa 1+号log2(ab) 1+log2(ab)-log2(loga( a o821+2+1o82+2 Path Pn 1og2(n-1)+n log2(n -1)+1-log2 e log2e-1 Ring Rn l0g2 n log2 n+1-log2 e log2 e-1 H1(G+1)is maximized,where G+1=(V,EL U [e+}). Algorithm 1:EntropyAug Since H1(G+1)can be rewritten as Input:The graph G=(V,E)of order n,the budget k Output:A set of node pairs 1og2(2E+2)- f(d.+1)+fd,+1)+∑u,wf(d) 2E+2 1F←-O,H←-0: 2 while FT then accelerate the process of finding a single new edge that creates 8 |tail←i-l;break; a largest increase of the von Neumann entropy.EntropyAug if (V,[head],Vli])E then starts by initiating an empty set F used to contain the node u←V,[head],v←Va[, pairs to be found and an entropy value H used to record the maximum structural information in the graph evolution T←EC(u,v: tail←-i-l;break process (line 1).In each following iteration,it sorts the set 12 head←head+l: of nodes V in non-decreasing degree order (line 3).Note that 13 E←EU{(u,)},F←FU{(u,v): the edge centrality EC(u,v)has a nice monotonic property: 14 if孔1(G)>H then H←-H1(G),F*←F: EC(u1,v1)<EC(u2,v2)if min{du,du}<min{dua,doz} 15 if H=log2 n then break; and maxfdu,d<max{du2,de).With the sorted list of 16 return F* nodes Va,the monotonicity of EC(u,v)can be translated into EC(V.[i],V[])<EC(V [i2],V[j2])if the indices satisfy i<j,i2 j2,i<i2,and j<j2.Thus,using the two pointers (head,tail}and a threshold T,it can prune the Vlog2 being attained when minfpi,qi}=0 for each i E N. search space and find the desired non-adjacent node pair as However,the Jensen-Shannon distance cannot measure the fast as possible (line 4-12).It then adds the non-adjacent node similarity between high dimensional objects such as matrices pair minimizing EC(u,v)into F and update the graph G and graphs.Therefore,Majtey et al.[24]introduce a quantum (line 13).The structural information of the updated graph is Jensen-Shannon divergence to measure the similarity between computed to determine whether F is the optimal subset till mixed quantum states,the formal definition of which is current iteration (line 14-15). presented in the following, Definition 6(Quantum Jensen-Shannon divergence between B.Graph Similarity Measure quantum states):The quantum Jensen-Shannon divergence Entropy based graph similarity measure aims to compare between two quantum states p and o is defined as graphs using Jensen-Shannon divergence.The Jensen-Shannon divergence,as a symmetrized and smoothed version of the Dojs(p,a)=tr plogp+ologapta loga Kullback-Leibler divergence,is defined formally in the fol- 2 2 2 lowing Definition 5. Definition 5(Jensen-Shannon divergence):Let P and Q be where p,o are symmetric and positive semi-definite density two probability distributions on the same support set N matrices with tr(p)=1 and tr(o)=1. {1,...,N},where P=(p1,...,pN)and Q=(1,...,qN). The square root of DoJs(p,)has also been proved to be The Jensen-Shannon divergence between P and Q is defined a metric on the space of density matrices [52],[53].Since as the Laplacian matrix L of a weighted undirected graph G Ds(P,Q)=H(P+Q)/2)-H(P)/2-H(Q)/2, is symmetric and positive semi-definite,we can view as a density matrix.Therefore,the quantum Jensen-Shannon where H(P)loi is the entropy of the distri- divergence can be used to measure the similarity between two bution P. graphs Gi and G2.given that they share the same node set, The square root of DJs(P,Q),also known as Jensen- i.e.V(G1)=V(G2) Shannon distance,has been proved [51]to be a bounded metric Definition 7(Quantum Jensen-Shannon divergence between on the space of distributions over N,with its maximum value graphs):The quantum Jensen-Shannon divergence between
IEEE TRANSACTIONS ON INFORMATION THEORY 8 TABLE III: Structural information, von Neumann graph entropy, and entropy gap of specific graphs. Graph Types Structural information H1 von Neumann graph entropy Hvn Entropy gap ∆H Complete graph Kn log2 n log2 (n − 1) log2 (1 + 1 n−1 ) Complete bipartite graph Ka,b 1 + 1 2 log2 (ab) 1 + 1 2 log2 (ab) − log2(1+ b a ) 2b − log2(1+ a b ) 2a log2(1+ b a ) 2b + log2(1+ a b ) 2a Path Pn log2 (n − 1) + 1 n−1 log2 (n − 1) + 1 − log2 e log2 e − 1 Ring Rn log2 n log2 n + 1 − log2 e log2 e − 1 H1(Gl+1) is maximized, where Gl+1 = (V, El ∪ {el+1}). Since H1(Gl+1) can be rewritten as log2 (2|El | + 2) − f(du + 1) + f(dv + 1) + P i6=u,v f(di) 2|El | + 2 , the edge el+1 maximizing H1(Gl+1) should also minimize the edge centrality EC(u, v) = f(du + 1) − f(du) + f(dv + 1) − f(dv), where di is the degree of node i in Gl . We present the pseudocode of our fast algorithm EntropyAug in Algorithm 1, which leverages the pruning strategy to accelerate the process of finding a single new edge that creates a largest increase of the von Neumann entropy. EntropyAug starts by initiating an empty set F used to contain the node pairs to be found and an entropy value H used to record the maximum structural information in the graph evolution process (line 1). In each following iteration, it sorts the set of nodes V in non-decreasing degree order (line 3). Note that the edge centrality EC(u, v) has a nice monotonic property: EC(u1, v1) ≤ EC(u2, v2) if min{du1 , dv1 } ≤ min{du2 , dv2 } and max{du1 , dv1 } ≤ max{du2 , dv2 }. With the sorted list of nodes Vs, the monotonicity of EC(u, v) can be translated into EC(Vs[i1], Vs[j1]) ≤ EC(Vs[i2], Vs[j2]) if the indices satisfy i1 H then H ← H1(G), F ∗ ← F; 15 if H = log2 n then break; 16 return F ∗ . √ log 2 being attained when min{pi , qi} = 0 for each i ∈ ΩN . However, the Jensen-Shannon distance cannot measure the similarity between high dimensional objects such as matrices and graphs. Therefore, Majtey et al. [24] introduce a quantum Jensen-Shannon divergence to measure the similarity between mixed quantum states, the formal definition of which is presented in the following, Definition 6 (Quantum Jensen-Shannon divergence between quantum states): The quantum Jensen-Shannon divergence between two quantum states ρ and σ is defined as DQJS(ρ,σ) = tr ρ log ρ + σ log σ 2 − ρ + σ 2 log ρ + σ 2 where ρ,σ are symmetric and positive semi-definite density matrices with tr(ρ) = 1 and tr(σ) = 1. The square root of DQJS(ρ,σ) has also been proved to be a metric on the space of density matrices [52], [53]. Since the Laplacian matrix L of a weighted undirected graph G is symmetric and positive semi-definite, we can view L tr(L) as a density matrix. Therefore, the quantum Jensen-Shannon divergence can be used to measure the similarity between two graphs G1 and G2, given that they share the same node set, i.e. V (G1) = V (G2). Definition 7 (Quantum Jensen-Shannon divergence between graphs): The quantum Jensen-Shannon divergence between
IEEE TRANSACTIONS ON INFORMATION THEORY two weighted,undirected graphs G=(V,E1,A1)and Algorithm 2:IncreSim G2 =(V,E2,A2)on the same node set V is defined as aput:G1and{(△CT DQJs(G1,G2)=Hvn(G)-(Hvn(G1)+Hvn(G2))/2, Output:(DsI(Gk,G where G=(V,E1UE2,A)is an weighted graph with A= 1 d the degree sequence of the graph G1; A1/2vol(G1)+A2/2vol(G2). 2m←-1d4/2: Generally,the node sets of two graphs to be compared 3t(G)←log2(2m)-品∑-1fd4): 4 fork =1,...,K-1 do are not necessarily aligned.For simplicity,we restrict our 5 △d←the degree sequence of the signed graph discussion to the aligned case and recommend [54]for a more △Gk: detailed treatment of the unaligned case. Based on the quantum Jensen-Shannon divergence between △m←∑iew△d/2: 7 Compute a,b,y,z in Lemma 3 via iterating over graphs,we consider the following problem that has found applications in anomaly detection and multiplex network com- Vie: Compute H(Gk+1)and H1(Gk)based on pression. Lemma 3; Problem 2:Compute the square root of the quantum Jensen- 9 Shannon divergence between adjacent graphs in a stream of DsI(Gk,Gk+1)← graphs {Gk=(V,Ek,t)where tk is the timestamp of VH(Gk)-(H(G)+H1(Gk+1)/2: the graph Gk and tDsI(G1,G3); weight +1 and the edge in Ek has negative weight-1. The degree of node i V in the operation AGk refers ·Ds is upper bounded by 1,ie,Ds(G1,G2)≤1 where to∑jew{6,)∈E+,k}-I{(i,i)∈E-,k}.Using the the equality holds if and only if min{di.1,di.2)=0 for information about previous graph Gk and current operation every node iV where di.j is the degree of node i in AGk,we can compute the entropy statistics of the current Gi graph G+1 incrementally and efficiently via the following Besides the metric properties,we further establish a connection lemma,whose proof can be found in the appendix between DsI and VDoJs by studying their extreme values, Lemma 3:Using the degree sequence d of the graph Gk. the results of which are summarized in Theorem 6. the structural information H(G),and the degree sequence Theorem 6(Connection between Doss and Dsu):Both Ad of the signed graph AGk,the structural information of the VDoJs(G1,G2)and DsI(G1,G2)attain the same maximum graph G+1 can be efficiently computed as value of 1 under the identical condition that min{di.1,di.2}= 0 for every node iV where di.j is the degree of node i in h(Gk+H)=f2m+△m》-a-f2m)+2mh1(G) G1. 2(m+△m) In order to compute the structural information distance between adjacent graphs in the graph stream {G where where m=∑-1d/2,△m=∑ieM△d4i/2,anda= G=(V,Ek,t),we first compute the structural informa- iev f(di+Adi)-f(di).Moreover,the structural infor- tion H(G)for each k EK,which takes e(Kn)time. mation of the averaged graph Gk between G&and Gk+1 can
IEEE TRANSACTIONS ON INFORMATION THEORY 9 two weighted, undirected graphs G1 = (V, E1, A1) and G2 = (V, E2, A2) on the same node set V is defined as DQJS(G1, G2) = Hvn(G) − (Hvn(G1) + Hvn(G2))/2, where G = (V, E1 ∪ E2, A) is an weighted graph with A = A1/2vol(G1) + A2/2vol(G2). Generally, the node sets of two graphs to be compared are not necessarily aligned. For simplicity, we restrict our discussion to the aligned case and recommend [54] for a more detailed treatment of the unaligned case. Based on the quantum Jensen-Shannon divergence between graphs, we consider the following problem that has found applications in anomaly detection and multiplex network compression. Problem 2: Compute the square root of the quantum JensenShannon divergence between adjacent graphs in a stream of graphs {Gk = (V, Ek, tk)} K k=1 where tk is the timestamp of the graph Gk and tk < tk+1. Since p DQJS(G1, G2) is computationally expensive, we propose a new distance measure based on structural information in Definition 8 and analyze its metric properties in Theorem 5. Definition 8 (Structural information distance between two graphs): The structural information distance between two weighted, undirected graphs G1 = (V, E1, A1) and G2 = (V, E2, A2) on the same node set V is defined as DSI(G1, G2) = q H1(G) − (H1(G1) + H1(G2)) /2, where G = (V, E1 ∪ E2, A) is an weighted graph with A = A1/2vol(G1) + A2/2vol(G2). Theorem 5 (Properties of the distance measure DSI): The distance measure DSI(G1, G2) is a pseudometric on the space of undirected graphs: • DSI is symmetric, i.e., DSI(G1, G2) = DSI(G2, G1); • DSI is non-negative, i.e., DSI(G1, G2) ≥ 0 where the equality holds if and only if P di,1 n k=1 dk,1 = P di,2 n k=1 dk,2 for every node i ∈ V where di,j is the degree of node i in Gj ; • DSI obeys the triangle inequality, i.e., DSI(G1, G2) + DSI(G2, G3) ≥ DSI(G1, G3); • DSI is upper bounded by 1, i.e., DSI(G1, G2) ≤ 1 where the equality holds if and only if min{di,1, di,2} = 0 for every node i ∈ V where di,j is the degree of node i in Gj . Besides the metric properties, we further establish a connection between DSI and p DQJS by studying their extreme values, the results of which are summarized in Theorem 6. Theorem 6 (Connection between p p DQJS and DSI): Both DQJS(G1, G2) and DSI(G1, G2) attain the same maximum value of 1 under the identical condition that min{di,1, di,2} = 0 for every node i ∈ V where di,j is the degree of node i in Gj . In order to compute the structural information distance between adjacent graphs in the graph stream {Gk} K k=1 where Gk = (V, Ek, tk), we first compute the structural information H1(Gk) for each k ∈ [K], which takes Θ(Kn) time. Algorithm 2: IncreSim Input: G1 and {∆Gk} K−1 k=1 Output: {DSI(Gk, Gk+1)} K−1 k=1 1 d ← the degree sequence of the graph G1; 2 m ← Pn i=1 di/2; 3 H1(G1) ← log2 (2m) − 1 2m Pn i=1 f(di); 4 for k = 1, . . . , K − 1 do 5 ∆d ← the degree sequence of the signed graph ∆Gk; 6 ∆m ← P i∈Vk ∆di/2; 7 Compute a, b, y, z in Lemma 3 via iterating over Vk; 8 Compute H1(Gk+1) and H1(Gk) based on Lemma 3; 9 D q SI(Gk, Gk+1) ← H1(Gk) − (H1(Gk) + H1(Gk+1))/2; 10 m ← m + ∆m; 11 foreach i ∈ Vk do di ← di + ∆di ; 12 return {DSI(Gk, Gk+1)} K−1 k=1 Then we compute the structural information of Gk whose adjacent matrix Ak = Ak/2vol(Gk) + Ak+1/2vol(Gk+1) for each k ∈ [K − 1]. Since the degree of node i in Gk is di,k = di,k 2vol(Gk) + di,k+1 2vol(Gk+1) and Pn i=1 di,k = 1, the structural information of Gk is H1(Gk) = − Pn i=1 f(di,k) which takes Θ(n) time for each k. Therefore, the total computational cost is Θ((2K − 1)n). In practice, the graph stream is fully dynamic such that it would be more efficient to represent the graph stream as a stream of operations over time, rather than a sequence of graphs. Suppose that the graph stream is represented as an initial base graph G1 = (V, E1, t1) and a sequence of operations {∆Gk = (Vk, E+,k, E−,k, tk)} K−1 k=1 on the graph where tk is the timestamp of the set E+,k of edge insertions and the set E−,k of edge deletions, and Vk is the subset of nodes covered by E+,k ∪ E−,k. We can view the operation ∆Gk as a signed network where the edge in E+,k has positive weight +1 and the edge in E−,k has negative weight −1. The degree of node i ∈ Vk in the operation ∆Gk refers to P j∈Vk I{(i, j) ∈ E+,k} − I{(i, j) ∈ E−,k}. Using the information about previous graph Gk and current operation ∆Gk, we can compute the entropy statistics of the current graph Gk+1 incrementally and efficiently via the following lemma, whose proof can be found in the appendix. Lemma 3: Using the degree sequence d of the graph Gk, the structural information H1(Gk), and the degree sequence ∆d of the signed graph ∆Gk, the structural information of the graph Gk+1 can be efficiently computed as H1(Gk+1) = f(2(m + ∆m)) − a − f(2m) + 2mH1(Gk) 2(m + ∆m) , where m = Pn i=1 di/2, ∆m = P P i∈Vk ∆di/2, and a = i∈Vk f(di + ∆di) − f(di). Moreover, the structural information of the averaged graph Gk between Gk and Gk+1 can
IEEE TRANSACTIONS ON INFORMATION THEORY 10 be efficiently computed as Algorithm 4:3-Spectral Clustering H1(Gk)=-b-(2m-)f(c)-c(f(2m)-2mH1(Gk)-2): Input:The graph G=(V,E)of order n wherey=∑ewd,z=∑efd).c=,and Output:A cluster membership vector cl f0,1,2]n 1LLaplacian matrix of the graph G; b=∑ieMf( (+4m+公m d+△d 2v2←-eigenvector corresponding toλ2ofL: The pseudocode of our fast algorithm IncreSim for com- 3 v3 eigenvector corresponding to A3 of L; puting the structural information distance in a graph stream is 4 cl k-means clustering of [v2,v3]with k =3; shown in Algorithm 2.It starts by computing the structural 5 return cl information of the base graph Gi (line 1-3),which takes (n)time.In each following iteration,it first computes the value of a,b,c,y,z(line 5-7),then calculates the structural where a ranges over all bijections a:[→[and△ information distance between two adjacent graphs (line 8-9). represents the symmetric difference. finally updates the edge count m and the degree sequence d (line 10-11).The time cost of each iteration is e(V), 3)Empirical Results:The results are shown in Fig.2 and consequently the total time complexity is(n). Fig.3,from which we have the following observations: Observation I (Dynamics of graph entropy):Both the VI.CONNECTIONS WITH COMMUNITY STRUCTURE von Neumann graph entropy and structural information In this section.we discuss the connections between the are stationary with small fluctuations and linearly corre- graph entropy and the community structure in graphs. lated as cout varies. Observation 2 (Dynamics of eigenvalues):In the stochas- A.Empirical Analysis of Stochastic Block Model tic block model with two clusters of equal size,the second smallest eigenvalue A2 linearly increases and 1)Preparations:To study the connections between graph entropy and community structures in a specific ensemble of finally reaches a steady state as cout increases,while the eigenvalues A3 and A4 above it are stationary all the time. graphs,suppose that a graph G is generated by the stochastic block model.There are g groups of nodes,and each node In the stochastic block model with three clusters of equal v has a group label go{1,...,q}.Edges are generated size,both the second smallest eigenvalue A2 and the third smallest eigenvalue Ag linearly increase and finally reach independently according to a probability matrix P E[0,1]9x4, with P(Au=1)=P[gu,go].In the sparse case,we have a steady state as cout increases,while the eigenvaluesA and s above them are stationary all the time. P[a,b]=Cla,b]/n,where the affinity matrix C stays constant .Observation 3 (Phase transition):In the stochastic block in the limit noo.For simplicity we make a common assumption that the affinity matrix C has two distinct entries model with two clusters of equal size,both the detection cin and cout where Cla,b]cin if a b and cout if error of the spectral algorithm and the spectral gap A3-A2 ab.For any graph generated from the stochastic block undergoes a same phase transition as cout varies.The spectral gap A4-A3 is stationary and close to 0 all the model with two(three)groups,we use the spectral algorithm in Algorithm 3(Algorithm 4)to detect the community structure. time.For example,in Fig.2(b)when cout 11, Algorithm 3:2-Spectral Clustering the spectral algorithm works like a random guess and Input:The graph G=(V,E)of order n λ3-2 is mixed with入4-入g. Output:A cluster membership vector cl {0,1)" In the stochastic block model with three clusters of equal 1cl←0: size,both the detection error of the spectral algorithm and 2L Laplacian matrix of the graph G; the spectral gap A4-A3 undergo a same phase transition 3v2←-eigenvector corresponding to入2ofL; as cout varies.The spectral gap As-A is stationary and 4 for i=1,...,n do close to 0 all the time. 5ifv2[lP△Qalg icant community structure that could be easily detected by some algorithms. i=1
IEEE TRANSACTIONS ON INFORMATION THEORY 10 be efficiently computed as H1(Gk) = −b−(2m−y)f(c)−c(f(2m)−2mH1(Gk)−z), where y = P i∈Vk di , z = P i∈Vk f(di), c = 2m+∆m 4m(m+∆m) , and b = P i∈Vk f di 4m + di+∆di 4(m+∆m) . The pseudocode of our fast algorithm IncreSim for computing the structural information distance in a graph stream is shown in Algorithm 2. It starts by computing the structural information of the base graph G1 (line 1-3), which takes Θ(n) time. In each following iteration, it first computes the value of a, b, c, y, z (line 5-7), then calculates the structural information distance between two adjacent graphs (line 8-9), finally updates the edge count m and the degree sequence d (line 10-11). The time cost of each iteration is Θ(|Vk|), consequently the total time complexity is Θ(n+ PK−1 k=1 |Vk|). VI. CONNECTIONS WITH COMMUNITY STRUCTURE In this section, we discuss the connections between the graph entropy and the community structure in graphs. A. Empirical Analysis of Stochastic Block Model 1) Preparations: To study the connections between graph entropy and community structures in a specific ensemble of graphs, suppose that a graph G is generated by the stochastic block model. There are q groups of nodes, and each node v has a group label gv ∈ {1, . . . , q}. Edges are generated independently according to a probability matrix P ∈ [0, 1]q×q , with P(Auv = 1) = P[gu, gv]. In the sparse case, we have P[a, b] = C[a, b]/n, where the affinity matrix C stays constant in the limit n → ∞. For simplicity we make a common assumption that the affinity matrix C has two distinct entries cin and cout where C[a, b] = cin if a = b and cout if a 6= b. For any graph generated from the stochastic block model with two(three) groups, we use the spectral algorithm in Algorithm 3(Algorithm 4) to detect the community structure. Algorithm 3: 2-Spectral Clustering Input: The graph G = (V, E) of order n Output: A cluster membership vector cl ∈ {0, 1} n 1 cl ← 0; 2 L ← Laplacian matrix of the graph G; 3 v2 ← eigenvector corresponding to λ2 of L; 4 for i = 1, . . . , n do 5 if v2[i] 11, the spectral algorithm works like a random guess and λ3 − λ2 is mixed with λ4 − λ3. In the stochastic block model with three clusters of equal size, both the detection error of the spectral algorithm and the spectral gap λ4 −λ3 undergo a same phase transition as cout varies. The spectral gap λ5 −λ4 is stationary and close to 0 all the time. Empirically, we conclude that 1) Graph entropy and community structure: Both the von Neumann graph entropy and structural information reveal nothing about the community structure and the assortativity/disassortativity of graphs. 2) Spectral gaps and community structure: If a graph has significant community structure with k clusters, then the spectral gap λk+1 − λk should be significantly larger than λk+2 − λk+1 and λk − λk−1. Conversely, if there is a significant peak in the sequence of spectral gaps {λi+1 −λi} n−1 i=1 of a graph, the graph should have significant community structure that could be easily detected by some algorithms