IEEE/ACM TRANSACTIONS ON NETWORKING VOL I7.NO. 1 FEBRUARY 2009 Virus spread in Networks Piet Van Mieghem, Member, IEEE, Jasmina Omic, and robert Kooij Abstract-The ce of the network characteristics on the Many authors(see, e.g., 3], [6], [11], and [12])mention the read is in a new-the N-intertwined Markov existence of an epidemic threshold Tc. If the effective spreading chain-model nly approximation lies in the application rate T=(B/6)>Te, the virus persists and a nonzero fraction in detail. The -intertwined model has been compared with of the nodes are infected, whereas for. s Te the epidemic dies the exact 2-state Markov model and with previously proposed out. However, when the same model is exactly described via "homogeneous"or"local"models. The sharp epidemic threshold Markov theory as shown in Section Ill, the observation that this To, which is a consequence of mean field theory, is rigorously Markov chain(with a finite number of states) possesses an ab- shown to be equal to Tc=1/(Amax(A),where Amax(A)is the sorbing state, contradicting the existence of any threshold. Be largest eigenvalue--the spectral radius--of the adjacency matrix cause. in an irreducible Markov chain--all states are reachable A. A continued fraction expansion of the steady-state infection from each other-the existence of an absorbing state implies probability at node is presented as well as several upper bounds that all other states are transient states and that the steady state is Index Terms-Epidemic threshold, Markov theory, mean field the absorbing state. Moreover, the probability that the process is eory, spectral radius, virus spread. in a transient state exponentially tends to zero with time. How- ever, the convergence time T to the steady state can be very large, as shown in Section Ill. Ganesh et al. [9] give estimates L. INTRODUCTION complications arise. An infinite-state Markov process is consid- W甲 ding of a virus in a network that was earlier con- trated by, e. g, branching pro(1+12则ph sidered by Ganesh et al. [9] and by Wang et al.[15]in dis- bility of extinction is a characteristic feature that is not presented crete time. The model belongs to the class of susceptible-in- in a finite-state Markov chain. Although there is an absorbing fected-susceptible(SIS)models that, together with the suscep- state, in an infinite-state Markov process, there is a nonzero tible-infected-removed(SIR)models, are the standard models chance that the process never dies out. Since the exact Markov for computer virus infections. Each node in the network is ei- chain(see Section III)consists of 2--states in a network of N ther infected or healthy. An infected node can infect its neigh- nodes, features of the infinite-state Markov process rapidly pop bors with an infection rate B, but it is cured with curing rate 8. up. The apparent steady-state connected with the observation However,once cured and healthy, the node is again prone to the of an epidemic threshold is often termed the"metastable state" virus. Both infection and curing processes are independent. Re- SInce, on a sufficiently long time-scale for finite-state systems, finements like the existence of an incubation period, an infection Our major motivation is to understand the influence of graph rate that depends on the number of neighbors, a curing process characteristics on epidemic spreading. Earlier, Wang et al. [15 that takes a certain amount of time, and other sophistications presented an approximate analysis from which they concluded are not considered here, but we refer to, eg-[2],[6], [10] and that the threshold of the effective infection rate Tc equals can be applied to the spread of e-mail worms and other com- adjacency matrix A of the network. This result relates--for puter viruses, the propagation of faults or failures, and, more the first time to the best of our knowledge-the epidemiolog generally,the spread of information(e.g, news, rumors, brand ical spreading to a specific characteristic, the spectral radius awareness, and marketing of new products)and epidemic dis- Amax(A), of the network. When using mean field theory(or re- semination or/and routing in ad hoc and peer-to-peer networks. lated averaging techniques), we rigorously show in Section IV that, in the steady state, there is indeed a well-defined threshold Tc= 1/(max(A). This result relativizes the belief of the IEEE/ACM TRANSACTIONS ON NETWORKING Editor L Massoulie. First pub- physics society(see, e.g., [1] and [ 12]) that scale-free networks lished June 24. 2008: current on published February 19, 2009. This work like the Internet possess a vanishingly small epidemic threshold as supported in part by the Netherlands Organization for Scientific Research NwO) under Project 643.000.503 and by ation Infrastructures and, hence vulnerable to viruses This announcement has provok P. Van Mieghem and J Omic are with the Faculty of Electrical Engi- for scale-free complex networks, which is somehow question Mathematics and Computer Science, Delft Ur 前出抽甲mNW 600 GA Delft, The Netherlands (e-mail: P Ift. l: able In fact, since Amax(A)is smaller than the mean Omic@@ewi tudelft.nl) degree of the network, the class of connected Erdos-Renyi (e-mail: R E Kooij @ewi tudelft. nl) Digital Object Identifier 101109/TNET. 2008.925623 average degree scales with the number of nodes N, wh 1063-66922500@2008IEEE
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY 2009 1 Virus Spread in Networks Piet Van Mieghem, Member, IEEE, Jasmina Omic, and Robert Kooij Abstract—The influence of the network characteristics on the virus spread is analyzed in a new—the -intertwined Markov chain—model, whose only approximation lies in the application of mean field theory. The mean field approximation is quantified in detail. The -intertwined model has been compared with the exact -state Markov model and with previously proposed “homogeneous” or “local” models. The sharp epidemic threshold , which is a consequence of mean field theory, is rigorously shown to be equal to , where is the largest eigenvalue—the spectral radius—of the adjacency matrix . A continued fraction expansion of the steady-state infection probability at node is presented as well as several upper bounds. Index Terms—Epidemic threshold, Markov theory, mean field theory, spectral radius, virus spread. I. INTRODUCTION WE FOCUS ON A simple continuous-time model for the spreading of a virus in a network that was earlier considered by Ganesh et al. [9] and by Wang et al. [15] in discrete time. The model belongs to the class of susceptible-infected-susceptible (SIS) models that, together with the susceptible-infected-removed (SIR) models, are the standard models for computer virus infections. Each node in the network is either infected or healthy. An infected node can infect its neighbors with an infection rate , but it is cured with curing rate . However, once cured and healthy, the node is again prone to the virus. Both infection and curing processes are independent. Re- finements like the existence of an incubation period, an infection rate that depends on the number of neighbors, a curing process that takes a certain amount of time, and other sophistications are not considered here, but we refer to, e.g., [2], [6], [10], and [16]. The theory of the spreads of epidemics through a network can be applied to the spread of e-mail worms and other computer viruses, the propagation of faults or failures, and, more generally, the spread of information (e.g., news, rumors, brand awareness, and marketing of new products) and epidemic dissemination or/and routing in ad hoc and peer-to-peer networks. Manuscript received March 23, 2007; revised July 20, 2007; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor L. Massoulie. First published June 24, 2008; current version published February 19, 2009. This work was supported in part by the Netherlands Organization for Scientific Research (NWO) under Project 643.000.503 and by Next Generation Infrastructures (Bsik). P. Van Mieghem and J. Omic are with the Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2600 GA Delft, The Netherlands (e-mail: P.VanMieghem@ewi.tudelft.nl; J.S.Omic@ewi.tudelft.nl). R. Kooij is with the Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2600 GA Delft, The Netherlands, and also with TNO Information Communication Technology, P.2600 GB Delft (e-mail: R.E.Kooij@ewi.tudelft.nl). Digital Object Identifier 10.1109/TNET.2008.925623 Many authors (see, e.g., [3], [6], [11], and [12]) mention the existence of an epidemic threshold . If the effective spreading rate , the virus persists and a nonzero fraction of the nodes are infected, whereas for the epidemic dies out. However, when the same model is exactly described via Markov theory as shown in Section III, the observation that this Markov chain (with a finite number of states) possesses an absorbing state, contradicting the existence of any threshold. Because, in an irreducible Markov chain—all states are reachable from each other—the existence of an absorbing state implies that all other states are transient states and that the steady state is the absorbing state. Moreover, the probability that the process is in a transient state exponentially tends to zero with time. However, the convergence time to the steady state can be very large, as shown in Section III. Ganesh et al. [9] give estimates of . When the number of states grows unboundedly, major complications arise. An infinite-state Markov process is considerably more complex than a finite-state Markov chain as illustrated by, e.g., a branching process [14, Ch. 12] where the probability of extinction is a characteristic feature that is not presented in a finite-state Markov chain. Although there is an absorbing state, in an infinite-state Markov process, there is a nonzero chance that the process never dies out. Since the exact Markov chain (see Section III) consists of —states in a network of nodes, features of the infinite-state Markov process rapidly pop up. The apparent steady-state connected with the observation of an epidemic threshold is often termed the “metastable state” since, on a sufficiently long time-scale for finite-state systems, it disappears. Our major motivation is to understand the influence of graph characteristics on epidemic spreading. Earlier, Wang et al. [15] presented an approximate analysis from which they concluded that the threshold of the effective infection rate equals , where is the largest eigenvalue of the adjacency matrix of the network. This result relates—for the first time to the best of our knowledge—the epidemiological spreading to a specific characteristic, the spectral radius , of the network. When using mean field theory (or related averaging techniques), we rigorously show in Section IV that, in the steady state, there is indeed a well-defined threshold . This result relativizes the belief of the physics society (see, e.g., [1] and [12]) that scale-free networks like the Internet possess a vanishingly small epidemic threshold and, hence, are vulnerable to viruses. This announcement has provoked a rush of investigations on immunization strategies for scale-free complex networks, which is somehow questionable. In fact, since is never smaller than the mean degree of the network, the class of connected Erdös–Rényi random graphs [14] possesses a far larger spectral radius than any scale-free graph with a same number of nodes . Most complex networks are not small-world networks such that their average degree scales with the number of nodes , which 1063-6692/$25.00 © 2008 IEEE
IEEE/ACM TRANSACTIONS ON NETWORKING. VOL 17. NO. 1 FEBRUARY 2009 neans that, for sufficiently large N, all of these complex net- variations, possess the same type of solution, specified by a works, and not only scale-free graphs, seem prone to potential"steady-state"epidemic threshold infections After a review of basic models for epidemics in Section Il, we study the matrix structure of the infinitesimal generator Q of the exact 2-state Markov chain in Section Ill and give rather pre- Since each node has(on average)the same degree, the Kephart cise fitting results for the convergence time T' in two limiting and White model is also termed a"homogeneous"model. Many raphs: the complete graph and the line graph. The major part variations on and extensions of the Kephart and White model is devoted to our new N-intertwined Markov model: Section I have been proposed(see, e.g.[13]). The Kephart and White derives the model, assesses the influence of the mean field ap- model has already appeared in earlier work(see, e.g.[3]).The proximation, and derives precise relations and upper bounds for logistic model of population growth that was first introduced by the steady-state Sections V and Vi characterize the exponential verhulst in 1838 as mentioned by Daley and Gani [6, p 20] is, dying out for T Te and the role of the spectrum of A, respec- in fact, the same as the simple kephart and White model. More tively. The accuracy of the Kephart and White model is eval- over, the simplest stochastic analogon [6, p. 56-63]-a pure uated in Section VIl, while Section vIll compares our model birth process with transition rate An, n+1 with exact computations. Section IX concludes the paper mathematically identical to the shortest path problem [14, ch. I. REVIEW OF SOME BASIC MODELS 16] in the complete graph with i.i. d. exponential link weights This observation and relation to the complete graph shows tha Here, we review basic models that may help to understand these earlier models do not take the confining way of actual virus the finer details of our N-intertwined model. All models are transport into account. The central role of the network structure phrased in our notation used in [14]. Other more ger in the spread of viruses is the focal point of this pa models for virus d in networks based on Markov theory are found in [2] and [10] B. Model of Wang et al. A. Kephart and White Model The major merit of the model of Wang et al.[15] is the in- corporation of an arbitrary network characterized by the adja Kephart and White [11] considered a connected regular cency matrix A, which generalizes the homogeneous Kephart raph' on N nodes where each node has degree k. The number and White model. where the only network characteristic of infected nodes in the population at time t is denoted by the(average)degree. The discrete-time model of Wang et I(t). If the population N is sufficiently large, we can convert belongs to the class of mean field models. Their major and in- I(t)to yt)=I(t)/N, a continuous quantity representing triguing result is that the epidemic threshold is specified by the fraction of infected nodes. Hence, the implicit assumption is that the number of states is sufficiently large such that the asymptotic regime for an infinite number of states is reached C WCWF determined by two processes: 1)infected nodes are being cured Unfortunately, this result is proved in an approximate manner and 2)susceptible nodes are infected. For process 1), the cure which questions to what extent this remarkable result holds in rate of a fraction y of infected nodes is &y. The rate at which general. In the sequel, we show that the Wang et al. model is the fraction y grows in process 2) is proportional to the fraction only accurate when the effective spreading rate T is below the of susceptible nodes, i.e., 1-y. For every susceptible node, "steady-state "epidemic threshold Te the rate of infection is the product of the infection rate B per link, the number of infected neighbors (i.e, the degree k:)of the II. EXACt 2-STATE MARKOV CHAIN node, which is ky Combining all contributions yields the time We consider the virus spread in an undirected graph G(N, L) evolution of y(t)in the Kephart and White model, described characterized by a symmetric adjacency matrix A. We assume by the differential equation that the arrival of an infection on a link and the curing process Bk(1-3)-0y of an infected node are independent Poisson processes with rate dt )B and with rate 8, respectively. As soon as a node i receives an infection at time t. it is considered to be infected and infectious whose solution is and in state Xi(t)=l. Similarly, an infected node i is cured y(t)= with rate 8, and in the healthy state Xi (t)=0at timet. At each time t. a node is in one of these two states where yo is the initial fraction of infected node The state Y(t) of the network at time t is defined by all the steady-state fraction is 3ho limy - u(t)obeying possible combinations of states in which the N nodes can be doo/dt 0. The Kephart and White differential equation(1)is the of a large class of mean field models that, apart from Y(t)=[0(t)Yi1(t) White have modeled an Erdos-Renyi random tends. for large a. to a Hence to first order in n. the of virus spread in Erdos-Renyi random graphs and regular graphs are the same. 1(0=10,2=1(2-1 人=1
2 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY 2009 means that, for sufficiently large , all of these complex networks, and not only scale-free graphs, seem prone to potential infections. After a review of basic models for epidemics in Section II, we study the matrix structure of the infinitesimal generator of the exact -state Markov chain in Section III and give rather precise fitting results for the convergence time in two limiting graphs: the complete graph and the line graph. The major part is devoted to our new -intertwined Markov model: Section IV derives the model, assesses the influence of the mean field approximation, and derives precise relations and upper bounds for the steady-state. Sections V and VI characterize the exponential dying out for and the role of the spectrum of , respectively. The accuracy of the Kephart and White model is evaluated in Section VII, while Section VIII compares our model with exact computations. Section IX concludes the paper. II. REVIEW OF SOME BASIC MODELS Here, we review basic models that may help to understand the finer details of our -intertwined model. All models are rephrased in our notation used in [14]. Other more general models for virus spread in networks based on Markov theory are found in [2] and [10]. A. Kephart and White Model Kephart and White [11] considered a connected regular graph1 on nodes where each node has degree . The number of infected nodes in the population at time is denoted by . If the population is sufficiently large, we can convert to , a continuous quantity representing the fraction of infected nodes. Hence, the implicit assumption is that the number of states is sufficiently large such that the asymptotic regime for an infinite number of states is reached. The rate at which the fraction of infected nodes changes, is determined by two processes: 1) infected nodes are being cured and 2) susceptible nodes are infected. For process 1), the cure rate of a fraction of infected nodes is . The rate at which the fraction grows in process 2) is proportional to the fraction of susceptible nodes, i.e., . For every susceptible node, the rate of infection is the product of the infection rate per link, the number of infected neighbors (i.e., the degree ) of the node, which is . Combining all contributions yields the time evolution of in the Kephart and White model, described by the differential equation (1) whose solution is (2) where is the initial fraction of infected nodes whereas the steady-state fraction is obeying . The Kephart and White differential equation (1) is the basis of a large class of mean field models that, apart from some 1Kephart and White have modeled an Erdös–Rényi random graph with average degree , which tends, for large , to a regular graph. Hence, to first order in , the properties of virus spread in Erdös–Rényi random graphs and regular graphs are the same. variations, possess the same type of solution, specified by a “steady-state” epidemic threshold (3) Since each node has (on average) the same degree, the Kephart and White model is also termed a “homogeneous” model. Many variations on and extensions of the Kephart and White model have been proposed (see, e.g., [13]). The Kephart and White model has already appeared in earlier work (see, e.g., [3]). The logistic model of population growth that was first introduced by Verhulst in 1838 as mentioned by Daley and Gani [6, p. 20] is, in fact, the same as the simple Kephart and White model. Moreover, the simplest stochastic analogon [6, p. 56–63]—a pure birth process with transition rate —is mathematically identical to the shortest path problem [14, ch. 16] in the complete graph with i.i.d. exponential link weights. This observation and relation to the complete graph shows that these earlier models do not take the confining way of actual virus transport into account. The central role of the network structure in the spread of viruses is the focal point of this paper. B. Model of Wang et al. The major merit of the model of Wang et al. [15] is the incorporation of an arbitrary network characterized by the adjacency matrix , which generalizes the homogeneous Kephart and White model, where the only network characteristic was the (average) degree. The discrete-time model of Wang et al. belongs to the class of mean field models. Their major and intriguing result is that the epidemic threshold is specified by Unfortunately, this result is proved in an approximate manner which questions to what extent this remarkable result holds in general. In the sequel, we show that the Wang et al. model is only accurate when the effective spreading rate is below the “steady-state” epidemic threshold . III. EXACT -STATE MARKOV CHAIN We consider the virus spread in an undirected graph characterized by a symmetric adjacency matrix . We assume that the arrival of an infection on a link and the curing process of an infected node are independent Poisson processes with rate and with rate , respectively. As soon as a node receives an infection at time , it is considered to be infected and infectious and in state . Similarly, an infected node is cured with rate , and in the healthy state at time . At each time , a node is in one of these two states. The state of the network at time is defined by all possible combinations of states in which the nodes can be at time and
VAN MIEGHEM et al: VIRUS SPREAD IN NETWORKS 0000 Fig. 1. State diagram in a graph with N G Nodes and the binary numbering Hence, the state space of the Markov chain is organized with Fig. 2. Lower triangular part Qs of the infinitesimal generator Q {0,1 State number i node j, we obtain the probability that a node i is either healthy 00..000 0123 i=Oor infected Ci=l 00.00 010 Pr[Xi(t=j si(t) 11..,11 whe, in the index i=∑A1xk2k-1, every ak with k≠j takes both values from the set 0, 11, while for k The number of states with j infected nodes is (). Fig. 1 either 0(healthy)or 1(infected). Defining U; (t)=Pr[X,(t) shows an example of the Markov state diagram in a graph with 1], then the relation between the vectors s(t) and v(t)is N= 4 nodes The defined virus infection process is a contin u(t)=s7() Markov chain with 2: states specified by the infi generator Q with elements where the 2X N matrix M contains the states in binary nota tion but bit-reversed. as if i=j+2m- 2.N;xm=1 100 ;mn=0(4) 010 110 O<k=1≠hj,i讠= otherwise 001 Tk 2k-. The time dependence of the probabil 111 state vector s(t), with components s(t)=Pr[()=引] The binary representation of the network determines the structure of the Q matrix The upper tria Pr[i(t=[1, X2(t)= 2, .,Xn(t=inl denoted by QA, depends on the adjacency mat while the lower triangular part Qs does not. The dia and normalization 2i-0si(t)= l, obeys [14, p. 182] the ements of any Q matrix are the negative sum of the row ele- differential equation ments, such that Diag diag(goo, q11,..., q2N-1, 2N-1)with dst(t-s(t)Q 93=-Ck=1 k+i qkj as in(4). It is thus instructive to write Q Q=Q5+QA+ whose solution is ture of the matrix Q& is shown in Fig. 2, where the block matrix B(=812 x2 and the nondefined elements are zeros. This s()=s2(0)e nested structure is the consequence of the binary representation. The matrix QA is shown in Fig 3. The block matrices C( ition of si (t) as a joint probability distribution n QA are diagonal matrices of size 2x 23 with diagonal el- that, if we sum over all of the states of all nodes except fc ements depending on the adjacency matrix A. The first row of
VAN MIEGHEM et al.: VIRUS SPREAD IN NETWORKS 3 Fig. 1. State diagram in a graph with nodes and the binary numbering of the states. Hence, the state space of the Markov chain is organized with The number of states with infected nodes is . Fig. 1 shows an example of the Markov state diagram in a graph with nodes. The defined virus infection process is a continuous-time Markov chain with states specified by the infinitesimal generator with elements if if if otherwise (4) and . The time dependence of the probability state vector , with components and normalization , obeys [14, p. 182] the differential equation whose solution is The definition of as a joint probability distribution shows that, if we sum over all of the states of all nodes except for the Fig. 2. Lower triangular part of the infinitesimal generator . node , we obtain the probability that a node is either healthy or infected where, in the index , every with takes both values from the set {0,1}, while for is either 0 (healthy) or 1 (infected). Defining , then the relation between the vectors and is where the matrix contains the states in binary notation, but bit-reversed, as . . . . . . . . . . . . . . . The binary representation of the network states determines the structure of the matrix. The upper triangular part of , denoted by , depends on the adjacency matrix elements , while the lower triangular part does not. The diagonal elements of any matrix are the negative sum of the row elements, such that with as in (4). It is thus instructive to write as a sum of three matrices . The structure of the matrix is shown in Fig. 2, where the block matrix and the nondefined elements are zeros. This nested structure is the consequence of the binary representation. The matrix is shown in Fig. 3. The block matrices in are diagonal matrices of size with diagonal elements depending on the adjacency matrix . The first row of
IEEE/ACM TRANSACTIONS ON NETWORKING. VOL 17. NO. 1 FEBRUARY 2009 olor)Histogram eigenvalues )of( of the in the complete graph es of T gives the number of times an eis insert shows the spectrum of 1 11 for an extremely high T N 0 Fig 3. Upper triangular part( A of Theorem 1: For B=0, the eigenvalues of the matrix Q, de fined by (4), are A(QB-0)=-hS with multiplicity (%),where the matrix Q is zero, and, as a consequence, the largest block is 0k=I()=2-1 For small values of T, Q tends thus to a discrete binomial 5(t)=s(0)=丌+ spectrum. Fig. 4 illustrates that, also for larger T, the spectrum of Q for the complete graph KN is still discrete, 2 containing many eigenvalues with high multiplici where nk denotes the multiplicity of the eigenvalue Ak(with Proposition 2: For constant 8 and increasing B(and T Re Ak <0) and the vector rk m is related to the left- and right B/6), the eigenvalues of Q shift, on average to more negative eigenvector belonging to Ak and the initial conditions. Since values than those of @p=0 U(t)=(s(t)M);=2k-0 Sk Mki is a sum of certain rows Proof: We apply Gershgorin,s Theorem to Q=Q5+ of s(t), we may write Diag+QA, where QA= BTA and TA only contains(nonzero) integer elements related to the adjacency matrix A as observed from(4). Hence, gi <0 decreases with B which implies that u(t) both the center position and the possible range of each eigen- =o iEMj value Mi(Q)increases with B where M, denotes the jth column in the matrix M. Let ui be Corollary 3: The eige nvalues of Q for the complete graph the largest eigenvalue Au of the set where (Tk, m)i# 0, then smallest)possible range among all connected graphs. The Ui(t)is dominated(for not too small t) by of the real part of eigenvalues of Q for any connected graph is(-((BN+8)/2B),0 (t) Proof: From QA=BTA, defined in the proof of Theorem 2, it follows that the maximum possible sum of row elements occul urs for KN(all ai except for e min- which shows that a"bell-shape"distribution of ug(t) can only imum one for line graph (only one1-element on each row in the occur if that largest eigenvalue uu than 1 has a multiplicity larger adjacency matrix A). Gershgorin's Theorem then provides the -Random matrices of this size exhibit an almost continuous spectrum of Q For all infinitesimal generators, it holds that det Q=0 and, ( Gershgorin's Theorem shows that t wi for any infinitesimal centers bjj and radii R,N @i: Cand that the ence, the largest eigenvalue is A possible interval for real eigenvalues of( is [0, 2 maxi Cii gi
4 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY 2009 Fig. 3. Upper triangular part of . the matrix is zero, and, as a consequence, the largest block is . The elements of depend on the indexes and , where as where ; ; . The exact -state Markov chain has an absorbing state because the first row in is a zero row and the absorbing state is the zero state in which all nodes are healthy. The steady-state is just this absorbing state, with steady-state vector . The probability state vector requires the insights in the eigenstructure of because [7] where denotes the multiplicity of the eigenvalue (with ) and the vector is related to the left- and right eigenvector belonging to and the initial conditions. Since is a sum of certain rows of , we may write where denotes the th column in the matrix . Let be the largest eigenvalue of the set where , then is dominated (for not too small ) by (5) which shows that a “bell-shape” distribution of can only occur if that largest eigenvalue has a multiplicity larger than 1. A. Spectrum of For all infinitesimal generators, it holds that and, hence, the largest eigenvalue is . Fig. 4. (in color) Histogram eigenvalues of of the in the complete graph for three values of gives the number of times an eigenvalue occurs. The insert shows the spectrum of for an extremely high . Theorem 1: For , the eigenvalues of the matrix , de- fined by (4), are with multiplicity , where . Proof: For , the infinitesimal generator reduces to the lower-triangular matrix , whose eigenvalues are identical to the diagonal elements of , which are multiples of . In fact, the structure of shows that each block row has a row sum equal to for whose value appears times. Hence, has an eigenvalue at with multiplicity . These contain all of the nonzero eigenvalues of because . For small values of , tends thus to a discrete, binomial spectrum. Fig. 4 illustrates that, also for larger , the spectrum of for the complete graph is still discrete,2 containing many eigenvalues with high multiplicity. Proposition 2: For constant and increasing (and ), the eigenvalues of shift, on average, to more negative values than those of . Proof: We apply Gershgorin’s Theorem3 to , where and only contains (nonzero) integer elements related to the adjacency matrix as observed from (4). Hence, decreases with which implies that both the center position and the possible range of each eigenvalue increases with . Corollary 3: The eigenvalues of for the complete graph and line graph spread over the largest (respectively, smallest) possible range among all connected graphs. The maximum possible range of the real part of eigenvalues of for any connected graph is Proof: From , defined in the proof of Theorem 2, it follows that the maximum possible sum of row elements occurs for (all except for ) and the minimum one for line graph (only one 1-element on each row in the adjacency matrix ). Gershgorin’s Theorem then provides the 2Random matrices of this size exhibit an almost continuous spectrum. 3Every eigenvalue of a matrix lies in at least one of the circular discs with centers and radii . For any infinitesimal generator , Gershgorin’s Theorem shows that and that the maximum possible interval for real eigenvalues of is .
VAN MIEGHEM et al: VIRUS SPREAD IN NETWORKS Fig. 6. Logarithm of )22 versus the number links for 0 G Fig. 5. Four largest eigenvalues of the infinitesimal generator N for the com- 00-yoxpl0xy.033 andI GNAo- plete graph with size @G N8 and 10 as a function of 0 with[ G Npd he second largest eigenvalues are increasing with 0 as 22(N)()[Im tiplicity larger than 1, creating a bell-shape. This observation agrees with the figures in Section VIll In Fig. 6, the eigenvalues of Q for all computable complete first statement. Since the maximum eigenvalue range thus oc- graphs(up to N= 13) have been numerically calculated. The curs for a complete graph, we consider in the Q-matrix for KN second largest eigenvalue seems well fitted(for T>0.05)by the ith row with h one-bits in the binary representation. The row elements, except from the diagonal element, represents the transitions from and to a state with N-k healthy and k in- fected nodes. The row sum of these positive elements equals where L=() denotes the number of links in the complete Bk (N-h)+ ho, and, hence, ii=-Bk(n-k)-ha Opti- graph KN. The dependence on T is approximately given by mizing with respect to k proves the corollary. b(r)0.177(1+2T). Assuming that the scaling law(6)of A2 As shown in the Appendix, also for the line graph, the max- holds for any N, the convergence time T' of the virus spread be in KN towards the steady state(the zero state), Yet, there are open questions regarding the spectrum of Q. r2e-IAzIT= 10-E is found as T=O(eb()L)=O(eb(T)2N) 1)Although Q is not symmetric, computations reveal that all In other words, for large size N and T>0, the convergence eigenvalues of Q are real (and negative) time T is so large that convergence towards the zero state is 2)Perturbation theory of Q for small B(or T) expresses the in reality never reached, which explains the appearance of the eigenvalues in terms of those of QB-0 and of the corre- so-called"metastable state. ponding right and left-eigenvectors of QB=0. However, Ganesh et al. [9] show that, for T Tc [a regime that is the multiplicity of the eigenvalues of QB=0 further com- not covered by(6)], the mean epidemic lifetime El] scales as plicates the perturbation analysis O(og N)while, for T>T*> T where T* is the generalized 3)The recursive block-structure( due to the binary represen- isoperimetric constant, ET]=O(e ) for some constant a tation)of Q needs to be exploited If we may extrapolate(6)to large N, it shows that the In the sequel of this section, we confine to explicit computa- a=2 for K, tion of the Q2 matrix for two extreme types of graphs: the com- 2) Line Graph: Fig. 7 plots the second largest eigen- plete graph which has the smallest average hopcount (or the value A2 of Q for the line graph. The largest eigenvalue of fastest virus penetration)and the line graph that possesses the the adjacency matrix A of the line graph, where each row largest possible average hopcount has precisely one nonzero element in the upper triangular 1)Complete Graph KN: Fig. 5 shows the four largest part of A, is Ama(A)= 2cos(T/(N+ 1))(1/2)versus N.As eigenvalue that increases--contrary to the expectations of Ger- observed from Fig. 7, the curves A2 increase very slowly shgorin's Theorem--roughly exponentially in T and with rate with N. Via curve fitting in the range N E [ 8, 13,we increasing for increasing size N. This second largest eigenvalue found that A2(, N)N -Se-r(1.184+0.0-413N), which shows determines the speed of convergence towards the steady state. the exponential dependence on T(accurate)and the less ig. 5 also shows that, initially for small T, the third and fourth accurate dependence on N. If extrapolation to large M eigenvalue are the same and bifurcate(see dots)into distinct allowed, the convergence time T of the virus spread in values roughly around Tc 1/(Amax(A)) 1/(N-1). the line graph towards the steady-state(the zero state) Hence, (5)indicates that, below Tc, the dominant eigenval =O(1/)2)=O(er(1.181+0.0113)), which is considerably simply causing exponential decay, while above Te it has smaller than in KN, which is the other extreme case
VAN MIEGHEM et al.: VIRUS SPREAD IN NETWORKS 5 Fig. 5. Four largest eigenvalues of the infinitesimal generator for the complete graph with size , 8 and 10 as a function of with . The second largest eigenvalues are increasing with as , and . first statement. Since the maximum eigenvalue range thus occurs for a complete graph, we consider in the -matrix for the th row with one-bits in the binary representation. The row elements, except from the diagonal element, represents the transitions from and to a state with healthy and infected nodes. The row sum of these positive elements equals , and, hence, . Optimizing with respect to proves the corollary. As shown in the Appendix, also for the line graph, the maximum of the diagonal elements can be computed. Yet, there are open questions regarding the spectrum of . 1) Although is not symmetric, computations reveal that all eigenvalues of are real (and negative). 2) Perturbation theory of for small (or ) expresses the eigenvalues in terms of those of and of the corresponding right- and left-eigenvectors of . However, the multiplicity of the eigenvalues of further complicates the perturbation analysis. 3) The recursive block-structure (due to the binary representation) of needs to be exploited. In the sequel of this section, we confine to explicit computation of the matrix for two extreme types of graphs: the complete graph which has the smallest average hopcount (or the fastest virus penetration) and the line graph that possesses the largest possible average hopcount. 1) Complete Graph : Fig. 5 shows the four largest eigenvalues of for the complete graph for , 8 and 10. The second largest eigenvalue seems like the only eigenvalue that increases—contrary to the expectations of Gershgorin’s Theorem—roughly exponentially in and with rate increasing for increasing size . This second largest eigenvalue determines the speed of convergence towards the steady state. Fig. 5 also shows that, initially for small , the third and fourth eigenvalue are the same and bifurcate (see dots) into distinct values roughly around . Hence, (5) indicates that, below , the dominant eigenvalue is simply causing exponential decay, while above it has mulFig. 6. Logarithm of versus the number links in for and . tiplicity larger than 1, creating a bell-shape. This observation agrees with the figures in Section VIII. In Fig. 6, the eigenvalues of for all computable complete graphs (up to ) have been numerically calculated. The second largest eigenvalue seems well fitted (for ) by (6) where denotes the number of links in the complete graph . The dependence on is approximately given by . Assuming that the scaling law (6) of holds for any , the convergence time of the virus spread in towards the steady state (the zero state), defined by is found as . In other words, for large size and , the convergence time is so large that convergence towards the zero state is in reality never reached, which explains the appearance of the so-called “metastable state.” Ganesh et al. [9] show that, for [a regime that is not covered by (6)], the mean epidemic lifetime scales as while, for where is the generalized isoperimetric constant, , for some constant . If we may extrapolate (6) to large , it shows that the constant for . 2) Line Graph: Fig. 7 plots the second largest eigenvalue of for the line graph. The largest eigenvalue of the adjacency matrix of the line graph, where each row has precisely one nonzero element in the upper triangular part of , is . Fig. 7 (axis on the right) also shows the epidemic threshold of the line graph versus . As observed from Fig. 7, the curves increase very slowly with . Via curve fitting in the range , we found that , which shows the exponential dependence on (accurate) and the less accurate dependence on . If extrapolation to large is allowed, the convergence time of the virus spread in the line graph towards the steady-state (the zero state) is , which is considerably smaller than in , which is the other extreme case.
IEEE/ACM TRANSACTIONS ON NETWORKING. VOL 17. NO. 1 FEBRUARY 2009 I which is equivalent to conditioning to all possible combinations of the states X, (t)= 1(and their complements X, (t)=0) Hom of the neighbors of node i. Hence, the number of basic states dramatically increases. Eventually, after conditioning each node ass in such a way, we end up with a 2 -state Markov chain, defined a earlier in SectionⅢ Instead of conditioning, we replace the actual random infec- tion rate by an effective or average infection rate, which is bas cally a mean field approximation E[④1;]=E a1(x()=1y (7 45103 In general, we may take the expectation over the rate B, the net- work topology via the matrix A, and the states Xi (t). Since we Fig. 7. Second larg genvalue N2 of Q in the line graph versus the number assume that both the infection rate B and the network are con- of nodes for variou nd G Np0-. The epidemic threshold [s is shown stant and given, we only average ove ates. Using Ela]= in the dotted line on the axis on the right-hand side. Pr[](see, e.g., [ 14]), we replace q1: i by B. Conclusion aij Pr[X,(t)=1 An upper and lower bound on the spectrum of any graph are given. Via fitting, we complement the scaling laws of Ganesh which results in an effective infinitesimal generator N=13. Simulations and the analytic matrix computations are Qi(t) Elgi:il Elgi:il within the simulation accuracy, identical. This observation al lows us to replace the matrix computations by simulations be- yond graph sizes of N= 13 The effective Qi(t)allows us to proceed with Markov theory Denoting ui (t)=PrLXi(t)=1]and recalling that Pr[Xi(t) IV. N-INTERTWINED CONTINUOUS MARKOV CHAINS 0 he Markov differential equation [14, (10.11)on WITH TWO STATES p. 182] for state Xi(t=l turns out to be nonlinear By separately observing each node, we will model the virus dvi(t) pread in a bi-directional network specified by a symmetric ad- ax=B2>(-(0)(2( jacency matrix A. Every node i at time t in the network has two states:infected with probability Pr[Xi(t)=1]and healthy with Each node obeys a differential equatic probability Pr[Xi(t)=0. At each moment t, a node can only be in one of two states, thus Pr[Xi (t)=1+Pr[Xi(t=0 Ⅱ we apply Markov theory straight away,, the infinitesimal gen-(±①=B 1(y(-m1()(a∑1y(+6) erator Qi (t) of this two-state continuous Markov chain is h;1; 山A=8∑1021()-2()(∑=121()+ with (2: i=8 and 沿=∑aN1()-0((H=ay( Written in matrix form, with V(t)=[un(t)u2(t) ..uN(t n;=B2a1(x(m)=1 we arrive at where the indicator function 1r=1 if the event r is true. else it BAv(t)-diag(ui (t))(BAv(t)+Su)(9) is zero. The coupling of node i to the rest of the network is de- where u is the all-one vector and diagfvi (t) is the diagonal scribed by an infection rate q1: i that is a random variable, which matrix with elements u1(t),v2(t)., UN(t) essentially makes the process doubly stochastic. This observa- We rewrite(9) with V(t)=diag(vi(t)uas tion is crucial. Using the definition of the infinitesimal generato dv=Bav(t-5diag(u (t)u -(a (t )BAv( Pr[X(t+△)=1X4(t)=0]=q△t+o(4 =(BA-SDV(t)-Bdiag(i (t))Av(t) the continuity and differentiability shows that this process is not or Markovian anymore. The random nature of q1: i is removed by an additional conditioning to all possible combinations of rates, d2=iga-()4-6Dv(.(10)
6 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY 2009 Fig. 7. Second largest eigenvalue of in the line graph versus the number of nodes for various and . The epidemic threshold is shown in the dotted line on the axis on the right-hand side. B. Conclusion An upper and lower bound on the spectrum of any graph are given. Via fitting, we complement the scaling laws of Ganesh et al. [9]. The matrix computations (on a PC) are limited to . Simulations and the analytic matrix computations are, within the simulation accuracy, identical. This observation allows us to replace the matrix computations by simulations beyond graph sizes of . IV. -INTERTWINED CONTINUOUS MARKOV CHAINS WITH TWO STATES By separately observing each node, we will model the virus spread in a bi-directional network specified by a symmetric adjacency matrix . Every node at time in the network has two states: infected with probability and healthy with probability . At each moment , a node can only be in one of two states, thus . If we apply Markov theory straight away, the infinitesimal generator of this two-state continuous Markov chain is with and where the indicator function if the event is true, else it is zero. The coupling of node to the rest of the network is described by an infection rate that is a random variable, which essentially makes the process doubly stochastic. This observation is crucial. Using the definition of the infinitesimal generator [14, p. 181] the continuity and differentiability shows that this process is not Markovian anymore. The random nature of is removed by an additional conditioning to all possible combinations of rates, which is equivalent to conditioning to all possible combinations of the states (and their complements ) of the neighbors of node . Hence, the number of basic states dramatically increases. Eventually, after conditioning each node in such a way, we end up with a -state Markov chain, defined earlier in Section III. Instead of conditioning, we replace the actual random infection rate by an effective or average infection rate, which is basically a mean field approximation (7) In general, we may take the expectation over the rate , the network topology via the matrix , and the states . Since we assume that both the infection rate and the network are constant and given, we only average over the states. Using (see, e.g., [14]), we replace by which results in an effective infinitesimal generator The effective allows us to proceed with Markov theory. Denoting and recalling that , the Markov differential equation [14, (10.11) on p. 182] for state turns out to be nonlinear (8) Each node obeys a differential equation as (8) . . . Written in matrix form, with , we arrive at (9) where is the all-one vector and is the diagonal matrix with elements . We rewrite (9) with as or (10)
VAN MIEGHEM et al: VIRUS SPREAD IN NETWORKS The time-continuous analog of Wang et al. [15 would be 1 if node i is infected, else it is zero. If the node i is infected dv(t/dt=(BA- DV(, which thus ignores the impor- (Xi(t)=1), Si(t)can change from I to O with curing rate 8 tant nonlinear term Diag(ui(t))Av(t), and, consequently as If the node i is healthy(Xi (t)=0, Si(t) can change from shown in Section IV-B, it limits the validity to T Oand large N vi(t)of infection Next, we will address the effect on the size n by computin prISN-EISNI2xvEISx\2r es dt the variance of q1: i, Varig:1=ElR: ]-(Ea1:-1)2.First, we largea e √2丌 EGR=E∑a10=1B k1x()=1 where the last step follows after(successive) partial integration and retaining the O(-)term in the series for large T. Hence, ∑∑a;E[1(x:(=1(x()=1] for independent indicators, large deviations from the mean are very unlikely However, 91: /B =neighbor(iX,(t)=1] ∑∑a;P[X()=1,X()=1 10, 1,.,di is a sum of dependent indicators. In addi 1k=1 tion, if N is large, q1: i does not always se with N deed, gl; i< Bdmax(A)and the maximum degree dmax(A) or in terms of the conditional probabilities in a graph can be independent of N, for example, in the line graph where d max 2 for any N E团=F∑∑aPT[X()=1Xk(O We will first elaborate on the dependence. Let us consider the ime-dependent random variable Si(t)=1(x:(=1], which is X Pr[Xk(t=1
VAN MIEGHEM et al.: VIRUS SPREAD IN NETWORKS 7 The time-continuous analog of Wang et al. [15] would be , which thus ignores the important nonlinear term , and, consequently as shown in Section IV-B, it limits the validity to . An extension of the -intertwined model where the curing and infection rates are node specific is where the curing rate vector is . We note that is, in general, not symmetric anymore, unless and commute, in which case the eigenvalue and both and have a same eigenvector . In case the curing and infection rates are link-specific, the adjacency matrix can be extended to that of a multilink graph, where is an integer counting the number of links (representing the strength of infection) between node and node . Generally, can be a nonnegative real, symmetric matrix where each contains the strength of the infection of link in units of a constant . A. Mean Field Approximation At first glance, the averaging process—replacing in (7) by its mean —seems quite accurate, because a sum of independent indicators (Bernoulli random variables) is close—exactly if all Bernoulli random variables have the same distribution—to a binomial random variable, whose standard deviation is small compared to the mean . The latter implies that the random variable is closely approximated by its mean for large . More precisely, the central limit theorem for a sum of independent random variables , each with finite variance (and small compared with ) states that, for large Applied to independent indicators with shows that, for and large where the last step follows after (successive) partial integration and retaining the term in the series for large . Hence, for independent indicators, large deviations from the mean are very unlikely. However, is a sum of dependent indicators. In addition, if is large, does not always increase with . Indeed, and the maximum degree in a graph can be independent of , for example, in the line graph where for any . We will first elaborate on the dependence. Let us consider the time-dependent random variable , which is 1 if node is infected, else it is zero. If the node is infected , can change from 1 to 0 with curing rate . If the node is healthy , can change from 0 to 1 with rate . The change of in a sufficiently small time interval is After taking the expectation of both sides, we obtain (with ) Since , only the case where appears in the remaining expectation, which is where the conditional probability . Hence, when , we arrive at Assuming that the graph is connected, then because a given infection at node cannot negatively influence the probability of infection at node . When comparing with (8), we observe that the mean field approximation implicitly makes the assumption of independence that . Hence, the positive correlation is not incorporated appropriately. As a consequence, the rate of change in is always overestimated. The -intertwined Markov chain thus upper-bounds the exact probability of infection. Next, we will address the effect on the size by computing the variance of , . First, we have or in terms of the conditional probabilities
IEEE/ACM TRANSACTIONS ON NETWORKING. VOL 17. NO. 1 FEBRUARY 2009 Since PrLX, (t)=lXk(t=lkPX(0=1= max[a:elg1,t82h:计 0. and thus The variance of qr: i is aijUjoo=vioo nin+6=0. Var[a;i]=B2>ai; Pr[X, (t)=1](1-Pr[X; (t)=1]) Since all of the diagonal elements of the adjacency matrix A are ak0;(ck()-0(t)tk(t).(12) k=j+1 (13) Since chi(t)>U(t) as argued above, the second double sum Miivjoo consists of nonnegative terms such that the variance Var[qi: i] is larger than in the case of independent random variables(where This nodal steady-state is the ratio of the(average)infection rate the double sum disappears). This fact is not in favor of the mean induced by the node 's direct neighbors 2i-1aijUjoo over the field approximation since larger variations around the mean total(average)rate of both the competing infection and curing Ela1] can occur which makes the mean a less good approxi- process. Since a-j=0,(13)is equal to the steady-state prob- mation for the random variable g1 i. In particular, (12)shows ability in a two-state continuous Markov chain(see, e.g.[14 that standard deviation Varg:=O(N), whereas the p 196), which exemplifies the local (or nodal) character of our standard deviation scales as O(VN)in case of independence! w-intertwined markov model. We observe the trivial solution 0 for all i, which means that eventually all nodes will be Especially in graphs with bounded maximum degree (such as healthy. On the other hand, if 6=0, then all vin=1or,to be the line graph ), vVarlgn: may not decrease sufficiently fast slightly more precise, (13)shows that vino =1-0(T-1)for in N compared to Elgi:i]. Thus, we expect deviations between ge T. Of course, if there is no curing at all( =0), all nodes the N-intertwined and the exact model in those graphs to be will eventually be infected almost surely For small T(and t large enough to ign Lemma 4: In a connected graph, either vioo =0 for all tions), Pr[X,(t)=l pproximation. Hence, we expect that the deviations between 0, the nonzero steady-state infection probability of any node the N-intertwined and the exact model are largest for interme- i in the N-intertwined me del can be expressed as a continued diate values of T. As shown in Section VIll, in some T-region fraction around Tc, large deviations are indeed found The two observations, dependence and absence of a limiting vioo process towards the mean as N increases, complicate a more precise assessment of the averaging process at this point. Since 1 the mean field approximation is the only approximation made, 11+nd;-y a comparison of the nonlinear model(9)with the exact 2-state 1+Td-T2j=11+d-2k=11+rd solution in Section VIll further quantifies the effect of the mean 1+Tdg field approximation. Finally, the mean field approximation also excludes informa- tion about the joint probability of states, where di=Cisi aij is the degree of node i. Consequently, the exact steady-state infection probability of any node i is bounded Pr[X1(t)=m1,X2(t)=m2,,XN(t)=N] where alln,E 0, 1], as in the 2-state Markov chain 0≤vx≤1
8 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY 2009 Since , an upper bound of is (11) The variance of is (12) Since as argued above, the second double sum consists of nonnegative terms such that the variance is larger than in the case of independent random variables (where the double sum disappears). This fact is not in favor of the mean field approximation since larger variations around the mean can occur which makes the mean a less good approximation for the random variable . In particular, (12) shows that standard deviation , whereas the standard deviation scales as in case of independence! Especially in graphs with bounded maximum degree (such as the line graph), may not decrease sufficiently fast in compared to . Thus, we expect deviations between the -intertwined and the exact model in those graphs to be largest. For small (and large enough to ignore the initial conditions), and (12) shows that the double sum is . Hence, for small , the situation is close to the independence case, in which mean field theory performs generally well. An upper bound for follows from (11) such that the coefficient of variation This shows, that for large where , the coefficient of variation is small, again in favor of the mean field approximation. Hence, we expect that the deviations between the -intertwined and the exact model are largest for intermediate values of . As shown in Section VIII, in some -region around , large deviations are indeed found. The two observations, dependence and absence of a limiting process towards the mean as increases, complicate a more precise assessment of the averaging process at this point. Since the mean field approximation is the only approximation made, a comparison of the nonlinear model (9) with the exact -state solution in Section VIII further quantifies the effect of the mean field approximation. Finally, the mean field approximation also excludes information about the joint probability of states, where all , as in the -state Markov chain. B. Steady-State Under the Mean Field Approximation Assuming that the steady-state exists, we can calculate the steady-state probabilities of infection for each node. The steadystate, denoted by , implies that , and thus we obtain from (8) for each node Since all of the diagonal elements of the adjacency matrix are zero, , we find (13) This nodal steady-state is the ratio of the (average) infection rate induced by the node’s direct neighbors over the total (average) rate of both the competing infection and curing process. Since , (13) is equal to the steady-state probability in a two-state continuous Markov chain (see, e.g., [14, p. 196]), which exemplifies the local (or nodal) character of our -intertwined Markov model. We observe the trivial solution for all , which means that eventually all nodes will be healthy. On the other hand, if , then all or, to be slightly more precise, (13) shows that for large . Of course, if there is no curing at all , all nodes will eventually be infected almost surely. Lemma 4: In a connected graph, either for all nodes or none of the components is zero. Proof: If for one node in a connected graph, then it follows from (13) that which is only possible provided for all neighbors of node . Applying this argument repeatedly to the neighbors of neighbors in a connected graph proves the lemma. Apart from the exact steady-state for all , the nonlinearity gives rise to a second solution, coined the “metastable state.” That second nonzero solution can be interpreted as the fraction of time that a node is infected while the system is in the “metastable state,” i.e., there is a long-lived epidemic. Theorem 5: For any effective spreading rate , the nonzero steady-state infection probability of any node in the -intertwined model can be expressed as a continued fraction ... (14) where is the degree of node . Consequently, the exact steady-state infection probability of any node is bounded by (15)
VAN MIEGHEM et aL. VIRUS SPREAD IN NETWORKS AV∞+(1/r)u,the Ignoring the absence of curing(8=0 or T- oo), the bound (15)shows that vioo cannot be one such that the matrix (I Fig8. Difference between the exact result and the (-iterations (N-(-p diag(vioo))=diag(1-vioo)is invertible. Hence (elfo h i templete seraph and line graph (both with ) pnodes) versu 1 1+∑101元 and we end up with the equation 1+72-r∑1n1(1-v 1「∞ 1-x AVoc 0 because Ui E[O, 1] for all ]. k=1vioo,where the geometric series always converges since This proves(15) 0, and forTTc Fo >0 is an arbitrary smal ≤1 constant, Voo=ET, where is the eigenvector belonging to the 1+7d2-r∑1 largest eigenvalue of the adjacency matrix A Proof: Theorem 5 shows that the only solution atT=0is This bound improves on(15). The third iteration gives the trivial solution Voo =0. Let Vo E, where e>0 is an arbitrary small constant and each component i >0. Introduced in(16)gives, after division by a 1+7-N,a-∑N=+r4 Ax=x+=x2+O(2) For sufficiently small e >0, the steady-state equations reduce Ignoring E akg (1 -vgo)>0 yields a new upper bound to the eigenvalue equation nat sharpens the previous upper bound of the second itera Each iteration provides a tighter upper bound by putting (17) : Oin the deepest fraction process leads to an infinite continued fraction expansion(14)for which shows that a is an eigenvector of A belonging to ince A is a nonnegative matrix, the The continued fraction stopped at iteration k: includes the ef- Perron-Frobenius Theorem [14, p. 451]states that A has a fect of virus spread up to the(k-1)-hop neighbors of node i. positive largest eigenvalue Amax(A)with a corresponding As illustrated in Fig 8(and typical for other graphs that we have eigenvector whose elements are all positive and there is only simulated ) a few iterations in(14)already give an accurate ap- one eigenvector of A with nonnegative components. Hence, proximation. The accuracy seems worst around T=Tc if Amax(A)>0, then (and scaled vector Additional insight can be gained from (9), which in the Voo Ez)is the eigenvector of A belonging to Amax(A) steady-state reduces to TTc, Theorem 5 provides the nonzero solution of(13)
VAN MIEGHEM et al.: VIRUS SPREAD IN NETWORKS 9 Fig. 8. Difference between the exact result and the -iterations of (14) for the complete graph and line graph (both with nodes) versus the effective infection rate . Proof: We rewrite (13) as since because for all . This proves (15). We proceed further by introducing , such that This bound improves on (15). The third iteration gives Ignoring yields a new upper bound that sharpens the previous upper bound of the second iteration. Each iteration provides a tighter upper bound by putting in the deepest fraction. Continuing the process leads to an infinite continued fraction expansion (14) for . The continued fraction stopped at iteration includes the effect of virus spread up to the -hop neighbors of node . As illustrated in Fig. 8 (and typical for other graphs that we have simulated), a few iterations in (14) already give an accurate approximation. The accuracy seems worst around . Additional insight can be gained from (9), which in the steady-state reduces to Define the vector , then or Ignoring the absence of curing ( or ), the bound (15) shows that cannot be one such that the matrix is invertible. Hence and we end up with the equation Further, we expand each element as , where the geometric series always converges since . With the notation , we arrive at the steady-state equation (16) Lemma 6: There exists a value , and, for , there is only the trivial steady-state solution . Beside the solution, there is a second nonzero solution for all . For where is an arbitrary small constant, , where is the eigenvector belonging to the largest eigenvalue of the adjacency matrix . Proof: Theorem 5 shows that the only solution at is the trivial solution . Let , where is an arbitrary small constant and each component . Introduced in (16) gives, after division by For sufficiently small , the steady-state equations reduce to the eigenvalue equation (17) which shows that is an eigenvector of belonging to the eigenvalue . Since is a nonnegative matrix, the Perron–Frobenius Theorem [14, p. 451] states that has a positive largest eigenvalue with a corresponding eigenvector whose elements are all positive and there is only one eigenvector of with nonnegative components. Hence, if , then (and any scaled vector ) is the eigenvector of belonging to . If , then cannot be an eigenvalue of and the only possible solution is , leading to the trivial solution . For , Theorem 5 provides the nonzero solution of (13).
IEEE/ACM TRANSACTIONS ON NETWORKING. VOL 17. NO. 1 FEBRUARY 2009 Canright et al. [4] proposed the eigenvector centrality(Evc) measure of a spreading power of a node bori) where ek is the spreading power of a node h. Written in our nota- tion as vioo )>isaijUjoo, the EVC is recognized as the component representation of the eigenvalue equation(17) forT=Tc. The steady-state infection probability is the long-run fraction of time during which the node is infected. The higher the probability vioo, the faster the node i is prone to infection and the more important its role is in further spreading. This Markov Fig 9. Comparison of(19)and exact computations or precise simulations for steady-state interpretation may explain the term centrality anal- different types of graphs with( N(O nodes ogously as the betweenness centrality of a node In passing, we note that, by combining Theorem 5 and Lemma 9: In a connected graph G with minimum degree Lemma 6, a continued fraction expansion of the(scaled) dmin an largest eigenvector in any graph is found from (14)for equal d forT 1/d min, a lower bound of vioo for any node i T=T=1/Amax(a) Lemma 7: For any effective spreading rate T=(B/6)> 0, the components vioo of the steady-state infection probability vector obe Proof: Lemmas 4 and 6 show that, forT> Te, there exists a 0. vioo>0 of steady-state in fection probabilities, which obeys(13), assuming that this min imum U'min occurs at node i, to yield Proof: Summing all rows in(16), whic multiplication of both sides in(16) by the all-one vector u T, 1+T∑ aiijUmin divi z1-1+ dN] is the degree vector. After from the last inequality, it can be shown that rewriting the k-sum, we arrive at(18) mmn≥1 Equation(18)is obeyed for the trivial solution vioo =0 and if vioo =1-(1/Tdi In the case of regular graphs(where di=d for all 1 (1/dmin)2Te are exact solutions of (13). This shows that, in certain cases. the Introducing the bound (21), we also have for each node continued fraction(14)can be simplified nodes in the network, based on the estimate u≈1-(1/r),v;≥ Umin≥1 1+ Odium +ddi(Tdmin-1 which is(20) 3x(T)≈1 (19) For dmin =1, the lowest possible lower bound for node i Numerical computations in Fig. 9 assess the quality of the ap- 1+(7-1) proximation(19) Lemma 8: For all i, vioo=1-(1rdi) cannot be a solution Finally, by combining the upper bound(15)and the lower bound of(13)forTdmin is the second smallest (20)for T>1/dmin, we find that vioo belongs to the interval degree in the graph G Proof: Indeed, 1 -Vioo =(1/Tdi) dmin is important. Lemma 8 ex lains that larger variations in the degree lead to worse results This shows clearly that for T-+ oo variations between all values of(19)in Fig 9 of U; for all i will tend to o
10 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY 2009 Canright et al. [4] proposed the eigenvector centrality (EVC) measure of a spreading power of a node where is the spreading power of a node . Written in our notation as , the EVC is recognized as the component representation of the eigenvalue equation (17) for . The steady-state infection probability is the long-run fraction of time during which the node is infected. The higher the probability , the faster the node is prone to infection and the more important its role is in further spreading. This Markov steady-state interpretation may explain the term centrality analogously as the betweenness centrality of a node. In passing, we note that, by combining Theorem 5 and Lemma 6, a continued fraction expansion of the (scaled) largest eigenvector in any graph is found from (14) for . Lemma 7: For any effective spreading rate , the components of the steady-state infection probability vector obey (18) Proof: Summing all rows in (16), which is equivalent to multiplication of both sides in (16) by the all-one vector , yields where is the degree vector. After rewriting the -sum, we arrive at (18). Equation (18) is obeyed for the trivial solution and if . In the case of regular graphs (where for all ), both and are exact solutions of (13). This shows that, in certain cases, the continued fraction (14) can be simplified. The fraction of infected nodes in the network, based on the estimate , is (19) Numerical computations in Fig. 9 assess the quality of the approximation (19). Lemma 8: For all , cannot be a solution of (13) for where is the second smallest degree in the graph . Proof: Indeed, leads for to , which is impossible. The strict inequality is important. Lemma 8 explains that larger variations in the degree lead to worse results of (19) in Fig. 9. Fig. 9. Comparison of (19) and exact computations or precise simulations for different types of graphs with nodes. Lemma 9: In a connected graph with minimum degree and for , a lower bound of for any node equals (20) Proof: Lemmas 4 and 6 show that, for , there exists a nonzero minimum of steady-state infection probabilities, which obeys (13), assuming that this minimum occurs at node , to yield From the last inequality, it can be shown that (21) which is only larger than zero provided . Introducing the bound (21), we also have for each node which is (20). For , the lowest possible lower bound for node is Finally, by combining the upper bound (15) and the lower bound (20) for , we find that belongs to the interval This shows clearly that for variations between all values of for all will tend to 0.