正在加载图片...
B)=Bi-8g2 Bik. found by the algorithm described here and three other sions [61 Table 1. Comparison of modularities for the network din k∈g previously published methods as described in the text, for six Because Eq 5 has the same form as Eq 2 we can now apply the networks of varying sizes spectral approach to this generalized modularity matrix, just as Modularity Q before, to maximize A@. Note that the rows and columns of B g) still sum to zero and that AQ is correctly zero if group g is Network Size n This article undivided. Note also that for a complete network Eq 6 reduces Karate 0.419 to the previous definition of the modularity matrix, Eq. 3, Jazz musicians 0.439 0.445 0.442 because k Bi is zero in that case. In repeatedly subdividing the network, an important question E-mail 0.574 we need to address is at what point to halt the subdivision Key signing 10,680081607330.846 ess. a nice feature of this method is that it provides a clear Physicists 0.6680.679 0.723 answer to this question: if there exists no division of a subgraph that will increase the modularity of the network, or equivalent order, the karate club network of Zachary(23), the that gives a positive value for A@, then there is nothing to (27), a metabolic network for the nematode Caenorhabditis elegans(28), a gained by dividing the subgraph and it should be left alone; it is network of e-mail contacts at a university(29), a trust network of mutual indivisible in the sense of the previous section. This ha signing of cryptography keys (30), when there are no positive eigenvalues to the matrix B), and working in condensed matter physics(31). No modularity figure is given for thus the leading eigenvalue provides a simple check for the the last network with the GN algorithm because the slow o(n)operation of termination of the subdivision process: if the leading eigenvalue the algorithm prevents its application to such large systems. GN, Glrvan and is zero, which is the smallest value it can take, then the subgraph Newman (10): CNM, Clauset et al. (26): DA, Duch and Arenas (19) is indivisible Note, however, that although the absence of positive eigen- algorithm(25), and indeed the Kernighan-Lin algorithm pro- alues is a sufficient condition for indivisibility, it is not necessary one. In particular, if there are only small positive ided the inspiration for the method. Despite its simplicity, we find that this method works mod negative B, may outweigh those for positive. It is straightforward ately well. It is not competitive with the best previous methods nodularity contribution sibility, however; we simply calculate the but it gives respectable modularity values in the trial applications d split directly and we have made, However, the method really comes into its own confirm that it is greater than zero Thus the algorithm is as follows. We construct the modularity duced earlier. It a common approach in standard graph eigenvalue and the corresponding eigenvector. We divide the graph Laplacian o s to use spectral partitioning based on the matrix, Eq 3, for the network and find its leading(most positive) partitioning probler an initial broad division of a network network into two parts according to the signs of the elements of Kernighan-Lin algorithm. For community structure problems sing the generalized modularity matrix, Eq 6. If at any stage we spectral approach based on the leading eigenvector of the to the total modularity, we leave the corresponding subgraph that the communities should take and this general form can then undivided.When the entire network has been decomposed into be fine-tuned by the vertex moving method to reach the best indivisible subgraphs in this way, the algorithm ends One immediate corollary of this approach is that all "com- subdivide the network until every remaining subgraph is indi- munities"in the network are, by definition, indivisible sub- visible, and no further improvement in the modularity is possible graphs. A number of authors have in the past proposed formal (We note in passing that in principle the fine-tuning method definitions of what a community is(9, 16, 24). The present could also be used to refine results from other modularity method provides an alternative first-principles definition of a maximization algorithms, such as the extremal optimization ble sub rithm of ref. 19) Further Techniques for Modularity Maximization Typically, the fine-tuning stages of the algorithm add only a few percent to the final value of the modularity. For the karate In this section I describe briefly another method for dividing club network of Fig. 2, for instance, the spectral method on its networks in two by modularity optimization, which is entirely own finds a division of the network with modularity 0=0.39 ifferent from the spectral method. Although not of special which improves to Q=0.419 upon fine-tuning. Nonetheless, an interest on its own. this second method is. as will be show provement of this magnitude is enough, as we will see, to make shortly, very effective when combined with the spectral method the difference between a method that is merely good and one Suppose we are given some initial division of our vertices into that is exceptional. two groups. We then find among the vertices the one that, when moved to the other group, will give the biggest increase in the Example Applications modularity of the complete network, or the smallest decrease if In practice, the algorithm developed here gives excellent results no increase is possible. We make such moves repeatedly, with the For a quantitative comparison between this algorithm and others constraint that each vertex is moved only once. When all n we follow Duch and Arenas(19) and compare values of the vertices have been moved, we search the set of intermediate modularity for a variety of networks drawn from the literature states occupied by the network during the operation of the Results are shown in Table 1 for six different networks, the exact algorithm to find the state that has the greatest modularity. same six used by Duch and Arenas. We compare modularity Starting again from this state, we repeat the entire process figures against three previously published algorithms: the be- teratively until no further improvement in the modularity tweenness-based algorithm of Girvan and Newman(10), which results. Those familiar with the literature on graph partitioning is widely used and has been incorporated into some of the more may find this algorithm reminiscent of the Kernighan-Lin popular network analysis programs; the fast algorithm of Clauset 8580 pnas org/cgi/doi/10.1073/pnas. 0601602103Bij g  Bij ij kg Bik. [6] Because Eq. 5 has the same form as Eq. 2 we can now apply the spectral approach to this generalized modularity matrix, just as before, to maximize Q. Note that the rows and columns of B(g) still sum to zero and that Q is correctly zero if group g is undivided. Note also that for a complete network Eq. 6 reduces to the previous definition of the modularity matrix, Eq. 3, because k Bik is zero in that case. In repeatedly subdividing the network, an important question we need to address is at what point to halt the subdivision process. A nice feature of this method is that it provides a clear answer to this question: if there exists no division of a subgraph that will increase the modularity of the network, or equivalently that gives a positive value for Q, then there is nothing to be gained by dividing the subgraph and it should be left alone; it is indivisible in the sense of the previous section. This happens when there are no positive eigenvalues to the matrix B(g) , and thus the leading eigenvalue provides a simple check for the termination of the subdivision process: if the leading eigenvalue is zero, which is the smallest value it can take, then the subgraph is indivisible. Note, however, that although the absence of positive eigen￾values is a sufficient condition for indivisibility, it is not a necessary one. In particular, if there are only small positive eigenvalues and large negative ones, the terms in Eq. 4 for negative i may outweigh those for positive. It is straightforward to guard against this possibility, however; we simply calculate the modularity contribution Q for each proposed split directly and confirm that it is greater than zero. Thus the algorithm is as follows. We construct the modularity matrix, Eq. 3, for the network and find its leading (most positive) eigenvalue and the corresponding eigenvector. We divide the network into two parts according to the signs of the elements of this vector, and then repeat the process for each of the parts, using the generalized modularity matrix, Eq. 6. If at any stage we find that a proposed split makes a zero or negative contribution to the total modularity, we leave the corresponding subgraph undivided. When the entire network has been decomposed into indivisible subgraphs in this way, the algorithm ends. One immediate corollary of this approach is that all ‘‘com￾munities’’ in the network are, by definition, indivisible sub￾graphs. A number of authors have in the past proposed formal definitions of what a community is (9, 16, 24). The present method provides an alternative first-principles definition of a community as an indivisible subgraph. Further Techniques for Modularity Maximization In this section I describe briefly another method for dividing networks in two by modularity optimization, which is entirely different from the spectral method. Although not of special interest on its own, this second method is, as will be shown shortly, very effective when combined with the spectral method. Suppose we are given some initial division of our vertices into two groups. We then find among the vertices the one that, when moved to the other group, will give the biggest increase in the modularity of the complete network, or the smallest decrease if no increase is possible. We make such moves repeatedly, with the constraint that each vertex is moved only once. When all n vertices have been moved, we search the set of intermediate states occupied by the network during the operation of the algorithm to find the state that has the greatest modularity. Starting again from this state, we repeat the entire process iteratively until no further improvement in the modularity results. Those familiar with the literature on graph partitioning may find this algorithm reminiscent of the Kernighan–Lin algorithm (25), and indeed the Kernighan–Lin algorithm pro￾vided the inspiration for the method. Despite its simplicity, we find that this method works moder￾ately well. It is not competitive with the best previous methods, but it gives respectable modularity values in the trial applications we have made. However, the method really comes into its own when it is used in combination with the spectral method intro￾duced earlier. It is a common approach in standard graph partitioning problems to use spectral partitioning based on the graph Laplacian to give an initial broad division of a network into two parts, and then refine that division by using the Kernighan–Lin algorithm. For community structure problems we find that the equivalent joint strategy works very well. The spectral approach based on the leading eigenvector of the modularity matrix gives an excellent guide to the general form that the communities should take and this general form can then be fine-tuned by the vertex moving method to reach the best possible modularity value. The whole procedure is repeated to subdivide the network until every remaining subgraph is indi￾visible, and no further improvement in the modularity is possible. (We note in passing that in principle the fine-tuning method could also be used to refine results from other modularity maximization algorithms, such as the extremal optimization algorithm of ref. 19.) Typically, the fine-tuning stages of the algorithm add only a few percent to the final value of the modularity. For the karate club network of Fig. 2, for instance, the spectral method on its own finds a division of the network with modularity Q  0.393, which improves to Q  0.419 upon fine-tuning. Nonetheless, an improvement of this magnitude is enough, as we will see, to make the difference between a method that is merely good and one that is exceptional. Example Applications In practice, the algorithm developed here gives excellent results. For a quantitative comparison between this algorithm and others we follow Duch and Arenas (19) and compare values of the modularity for a variety of networks drawn from the literature. Results are shown in Table 1 for six different networks, the exact same six used by Duch and Arenas. We compare modularity figures against three previously published algorithms: the be￾tweenness-based algorithm of Girvan and Newman (10), which is widely used and has been incorporated into some of the more popular network analysis programs; the fast algorithm of Clauset Table 1. Comparison of modularities for the network divisions found by the algorithm described here and three other previously published methods as described in the text, for six networks of varying sizes Network Size n Modularity Q GN CNM DA This article Karate 34 0.401 0.381 0.419 0.419 Jazz musicians 198 0.405 0.439 0.445 0.442 Metabolic 453 0.403 0.402 0.434 0.435 E-mail 1,133 0.532 0.494 0.574 0.572 Key signing 10,680 0.816 0.733 0.846 0.855 Physicists 27,519 — 0.668 0.679 0.723 The networks are, in order, the karate club network of Zachary (23), the network of collaborations between early jazz musicians of Gleiser and Danon (27), a metabolic network for the nematode Caenorhabditis elegans (28), a network of e-mail contacts at a university (29), a trust network of mutual signing of cryptography keys (30), and a coauthorship network of scientists working in condensed matter physics (31). No modularity figure is given for the last network with the GN algorithm because the slow O(n3) operation of the algorithm prevents its application to such large systems. GN, Glrvan and Newman (10); CNM, Clauset et al. (26); DA, Duch and Arenas (19). 8580  www.pnas.orgcgidoi10.1073pnas.0601602103 Newman
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有