C 2003 Society for Industrial and Applied Mathematics L.45.Na.2,pp.167-256 The structure and function of Complex Networks E.J. Newma Abstract. Inspired by empirical studies of networked systems such as the Internet, social networks, review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks Key words. networks, graph theory, complex systems, computer networks, social networks, random AMS subject classifications. 05C75, 05C90, 94015 PIlS0036144503424804 Contents I Introduction 1.2 Other resources 172 1.3 Outline of the review 174 2 Networks in the real world l74 2.1 Social networks 2.2 Information networks 176 2.3 Technological Networks 2.4 Biological Networks 179 3 Properties of Networks 3.1 The smal-World effect 180 3.2 Transitivity or Clustering .3 Degree Distributions 185 3.1 Scale-Free Networks 3.3.2 Maximum Degr 3.4 Network resilience 3.5 Mixing patterns Received by the editors January 20, 2003; accepted for publication (in revised form) March 17 2003: published elect May 2, 2003. This work was supported in part by the U.S. National rants DMS-0109086 and DMS-0234188 and by the James S McDonnell http://www.siam.org/journa t Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109(mejnQumich. edu)
SIAM REVIEW c 2003 Society for Industrial and Applied Mathematics Vol. 45,No. 2,pp. 167–256 The Structure and Function of Complex Networks∗ M. E. J. Newman† Abstract. Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks. Key words. networks, graph theory, complex systems, computer networks, social networks, random graphs, percolation theory AMS subject classifications. 05C75, 05C90, 94C15 PII. S0036144503424804 Contents. 1 Introduction 168 1.1 Types of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 1.2 Other Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 1.3 Outline of the Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 2 Networks in the Real World 174 2.1 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 2.2 Information Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 2.3 Technological Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 178 2.4 Biological Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 3 Properties of Networks 180 3.1 The Small-World Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 180 3.2 Transitivity or Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 183 3.3 Degree Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 3.3.1 Scale-Free Networks . . . . . . . . . . . . . . . . . . . . . . . . 186 3.3.2 Maximum Degree . . . . . . . . . . . . . . . . . . . . . . . . . . 188 3.4 Network Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 3.5 Mixing Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 ∗Received by the editors January 20, 2003; accepted for publication (in revised form) March 17, 2003; published electronically May 2, 2003. This work was supported in part by the U.S. National Science Foundation under grants DMS-0109086 and DMS-0234188 and by the James S. McDonnell Foundation and the Santa Fe Institute. http://www.siam.org/journals/sirev/45-2/42480.html †Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109 (mejn@umich.edu). 167
MLE.」 NEWMAN 3.6 Degree Correlations 3.7 Community Structure 3.8 Network Navigation 3.9 Other Network Properties m era 196 4.1 Poisson Random Graphs 4.2 Generalized Random graphs 4.2.1 The Configuration Mod 4.2.2 Example: Power-Law Degree Distribution 4.2.3 Directed Graphs 4.2.4 Bipartite Graphs .204 4.2.5 Degree Correlations 5 Exponential Random Graphs and Markov Graphs 206 6 The small-World model 208 6.1 Clustering Coefficient 6.2 Degree Distribution 210 6.3 Average Path Length 7 Models of network growth 212 7. 1 Prices Model 13 7.2 The Model of Barabasi and Albert 215 7. 3 Generalizations of the model of Barabasi and Albert 219 7. 4 Other Growth Models 7.5 Vertex Copying Models 8 Processes Taking Place on Networks 224 8.1 Percolation Theory and Network Resilience 8.2 Epidemiological Processes 8.2.1 The SIR model 8.2.2 The SIS Model 8. 3 Search on Networks 8.3.1 Exhaustive Network Search 8.3.2 Guided Network Search 8.3.3 Network Navigation 8.4 Phase Transitions on networks 8.5 Other Processes on Networks 9 Summary and Directions for Future Research 240 I. Introduction A network is a set of items which we will call vertices or some- times nodes, with connections between them, called edges(Figure 1.1). Systems king the form of networks(also called"graphs"in much of the mathematical lit erature)abound in the world. Examples include the Internet, the World Wide Web social networks of acquaintance or other connections between individuals, organiza tional networks and networks of business relations between companies, neural net works. metabolic networks food webs distribution networks such as blood vessels
168 M. E. J. NEWMAN 3.6Degree Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 3.7 Community Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 3.8 Network Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 3.9 Other Network Properties . . . . . . . . . . . . . . . . . . . . . . . . . 196 4 Random Graphs 196 4.1 Poisson Random Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 197 4.2 Generalized Random Graphs . . . . . . . . . . . . . . . . . . . . . . . 200 4.2.1 The Configuration Model . . . . . . . . . . . . . . . . . . . . . 200 4.2.2 Example: Power-Law Degree Distribution . . . . . . . . . . . . 202 4.2.3 Directed Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 203 4.2.4 Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 204 4.2.5 Degree Correlations . . . . . . . . . . . . . . . . . . . . . . . . 205 5 Exponential Random Graphs and Markov Graphs 206 6 The Small-World Model 208 6.1 Clustering Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 6.2 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 6.3 Average Path Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7 Models of Network Growth 212 7.1 Price’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.2 The Model of Barab´asi and Albert . . . . . . . . . . . . . . . . . . . . 215 7.3 Generalizations of the Model of Barab´asi and Albert . . . . . . . . . . 219 7.4 Other Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 7.5 Vertex Copying Models . . . . . . . . . . . . . . . . . . . . . . . . . . 223 8 Processes Taking Place on Networks 224 8.1 Percolation Theory and Network Resilience . . . . . . . . . . . . . . . 225 8.2 Epidemiological Processes . . . . . . . . . . . . . . . . . . . . . . . . . 229 8.2.1 The SIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 8.2.2 The SIS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 8.3 Search on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 8.3.1 Exhaustive Network Search . . . . . . . . . . . . . . . . . . . . 234 8.3.2 Guided Network Search . . . . . . . . . . . . . . . . . . . . . . 235 8.3.3 Network Navigation . . . . . . . . . . . . . . . . . . . . . . . . 236 8.4 Phase Transitions on Networks . . . . . . . . . . . . . . . . . . . . . . 238 8.5 Other Processes on Networks . . . . . . . . . . . . . . . . . . . . . . . 239 9 Summary and Directions for Future Research 240 1. Introduction. A network is a set of items, which we will call vertices or sometimes nodes, with connections between them, called edges (Figure 1.1). Systems taking the form of networks (also called “graphs” in much of the mathematical literature) abound in the world. Examples include the Internet, the World Wide Web, social networks of acquaintance or other connections between individuals, organizational networks and networks of business relations between companies, neural networks, metabolic networks, food webs, distribution networks such as blood vessels or
THE STRUC TURE AND FUNCTION OF COMPLEX NETWORKS vertex Fig. II A small example network with eight vertices and ten edges. postal delivery routes, networks of citations between papers, and many others(Fig- ure 1.2). This paper reviews recent(and some not-so-recent) work on the structure and function of networked systems such as these The study of networks, in the form of mathematical graph theory, is one of the fundamental pillars of discrete mathematics. Eulers celebrated 1735 solution of the Konigsberg bridge problem is often cited as the first true proof in the theory of net- works, and during the twentieth century graph theory has developed into a substantial body of knowledge Networks have also been studied extensively in the social sciences. Already in the 1930s, sociologists realized the importance of the patterns of connection between peo- ple to the understanding of the functioning of human society(Figure 1.3). Typical net work studies in sociology involve the circulation of questionnaires, asking respondents to detail their interactions with others. One can then use the responses to reconstruct a network in which vertices represent individuals and edges the interactions between them. Typical social network studies address issues of centrality(which individuals are best connected to others or have most influence) and connectivity (whether and how individuals are connected to one another through the network) Recent years, however, have witnessed a substantial new movement in network research, with the focus shifting away from the analysis of single small graphs and the properties of individual vertices or edges within such graphs to consideration of large-scale statistical properties of graphs. This new approach has been driven largely by the availability of computers and communication networks that allow us to gather and analyze data on a scale far larger than previously possible. Where studies use to look at networks of maybe tens or in extreme cases hundreds of vertices, it is not uncommon now to see networks with millions or even billions of vertices. This change of scale forces upon us a corresponding change in our analytic approach. Many of the questions that might previously have been asked in studies of small networks are simply not useful in much larger networks. A social network analyst might have asked which vertex in this network would prove most crucial to the network's connectivity if it were removed? But such a question has little meaning in most networks of a million vertices-no single vertex in such a network will have much effect at when removed. On the other hand, one could reasonably ask a question like, "What percentage of vertices need to be removed to substantially affect network connectivity in some given way? " and this type of statistical question has real meaning even in a very large network However, there is another reason why our approach to the study of networks has changed in recent years, a reason whose importance should not be underestimated although it often is. For networks of tens or hundreds of vertices, it is a relatively straightforward matter to draw a picture of the network with actual points and lines
THE STRUCTURE AND FUNCTION OF COMPLEX NETWORKS 169 edge vertex Fig. 1.1 A small example network with eight vertices and ten edges. postal delivery routes, networks of citations between papers, and many others (Figure 1.2). This paper reviews recent (and some not-so-recent) work on the structure and function of networked systems such as these. The study of networks, in the form of mathematical graph theory, is one of the fundamental pillars of discrete mathematics. Euler’s celebrated 1735 solution of the K¨onigsberg bridge problem is often cited as the first true proof in the theory of networks, and during the twentieth century graph theory has developed into a substantial body of knowledge. Networks have also been studied extensively in the social sciences. Already in the 1930s, sociologists realized the importance of the patterns of connection between people to the understanding of the functioning of human society (Figure 1.3). Typical network studies in sociology involve the circulation of questionnaires, asking respondents to detail their interactions with others. One can then use the responses to reconstruct a network in which vertices represent individuals and edges the interactions between them. Typical social network studies address issues of centrality (which individuals are best connected to others or have most influence) and connectivity (whether and how individuals are connected to one another through the network). Recent years, however, have witnessed a substantial new movement in network research, with the focus shifting away from the analysis of single small graphs and the properties of individual vertices or edges within such graphs to consideration of large-scale statistical properties of graphs. This new approach has been driven largely by the availability of computers and communication networks that allow us to gather and analyze data on a scale far larger than previously possible. Where studies used to look at networks of maybe tens or in extreme cases hundreds of vertices, it is not uncommon now to see networks with millions or even billions of vertices. This change of scale forces upon us a corresponding change in our analytic approach. Many of the questions that might previously have been asked in studies of small networks are simply not useful in much larger networks. A social network analyst might have asked, “Which vertex in this network would prove most crucial to the network’s connectivity if it were removed?” But such a question has little meaning in most networks of a million vertices—no single vertex in such a network will have much effect at all when removed. On the other hand, one could reasonably ask a question like, “What percentage of vertices need to be removed to substantially affect network connectivity in some given way?” and this type of statistical question has real meaning even in a very large network. However, there is another reason why our approach to the study of networks has changed in recent years, a reason whose importance should not be underestimated, although it often is. For networks of tens or hundreds of vertices, it is a relatively straightforward matter to draw a picture of the network with actual points and lines
Fig. 1. 2 Three eramples of the kinds of networks that are the topic of this review. (a) A visualization of the network structure of the Internet at the level of" autonomous systems"-local groups of computers each representing hundreds or thousands of machines. Picture by Hal Burc and Bill Cheswick, courtesy of Lumeta Corporation.(b)A social network, in this case of seral contacts, redrawn from the HIv data of Potterat et aL. 341].(c)A food web of predator-prey interactions between species in a freshwater lake [271]. Picture courtesy of ichard williams and to answer specific questions about network structure by examining this picture This has been one of the primary methods of network analysts since the field be- gan(see Figure 1.3). The human eye is an analytic tool of remarkable power, and eyeballing pictures of networks is an excellent way to gain an understanding of their
170 M. E. J. NEWMAN (b) (c) (a) Fig. 1.2 Three examples of the kinds of networks that are the topic of this review. (a) A visualization of the network structure of the Internet at the level of “autonomous systems”—local groups of computers each representing hundreds or thousands of machines. Picture by Hal Burch and Bill Cheswick, courtesy of Lumeta Corporation. (b) A social network, in this case of sexual contacts, redrawn from the HIV data of Potterat et al. [341]. (c) A food webof predator-prey interactions between species in a freshwater lake [271]. Picture courtesy of Richard Williams. and to answer specific questions about network structure by examining this picture. This has been one of the primary methods of network analysts since the field began (see Figure 1.3). The human eye is an analytic tool of remarkable power, and eyeballing pictures of networks is an excellent way to gain an understanding of their
THE STRUCTURE AND FUNCTION OF COMPLEX NETWORKS 17 Fig 1.3 An early hand-drawn twork from 1934 representing friendships between school children. After Moreno Reprinted with permission from ASGPP. structure. With a network of a million or a billion vertices, however, this approach is useless.( See Figure 1.2a for an example of a network that lies at the upper limit of hat can usefully be drawn on a piece of paper or computer screen. )One simply can- ot draw a meaningful picture of a million vertices, even with modern 3D computer rendering tools, and therefore direct analysis by eye is hopeless. The recent devel- ment of statistical methods for quantifying large networks is to a large extent an attempt to find something to play the part played by the eye in the network analysis of the twentieth century. Statistical methods answer the question, "How can I tell what this network looks like, when I can't actually look at it? The body of theory that is the primary focus of this review aims to do three things. First, it aims to find and highlight statistical properties, such as path lengths and degree distributions, that characterize the structure and behavior of networked systems, and to suggest appropriate ways to measure these properties. Second, it aims to create models of networks that can help us to understand the meaning of these properties--how they came to be as they are, and how they interact with one another Third, it aims to predict what the behavior of networked systems will be on the basis of measured structural properties and the local rules governing individual vertices. How, for example, will network structure affect traffic on the Internet, or the performance of a Web search engine, or the dynamics of social or biological systems? As we will see, the scientific community has, by drawing on ideas from a broad variety of and plines, made an excellent start on the first two of these aims, the characterization modeling of network structure. Studies of the effects of structure on system behavior on the other hand are still in their infancy. It remains to be seen what the crucial theoretical developments will be in this area I.I. Types of Networks. A set of vertices joined by edges is only the simplest type of network; there are many ways in which networks may be more this(Figure 1.4). For instance, there may be more than one different type of vertex n a network, or more than one different type of edge. And vertices or edges may have a variety of properties, numerical or otherwise, associated with them. Taking the example of a social network of people, the vertices may represent people of different nationalities, locations, ages, incomes, or many other things. Edges sent animosity acquaintance, or geographical proximity. They can carry weights, representing, say
THE STRUCTURE AND FUNCTION OF COMPLEX NETWORKS 171 Fig. 1.3 An early hand-drawn social network from 1934 representing friendships between school children. After Moreno [295]. Reprinted with permission from ASGPP. structure. With a network of a million or a billion vertices, however, this approach is useless. (See Figure 1.2a for an example of a network that lies at the upper limit of what can usefully be drawn on a piece of paper or computer screen.) One simply cannot draw a meaningful picture of a million vertices, even with modern 3D computer rendering tools, and therefore direct analysis by eye is hopeless. The recent development of statistical methods for quantifying large networks is to a large extent an attempt to find something to play the part played by the eye in the network analysis of the twentieth century. Statistical methods answer the question, “How can I tell what this network looks like, when I can’t actually look at it?” The body of theory that is the primary focus of this review aims to do three things. First, it aims to find and highlight statistical properties, such as path lengths and degree distributions, that characterize the structure and behavior of networked systems, and to suggest appropriate ways to measure these properties. Second, it aims to create models of networks that can help us to understand the meaning of these properties—how they came to be as they are, and how they interact with one another. Third, it aims to predict what the behavior of networked systems will be on the basis of measured structural properties and the local rules governing individual vertices. How, for example, will network structure affect traffic on the Internet, or the performance of a Web search engine, or the dynamics of social or biological systems? As we will see, the scientific community has, by drawing on ideas from a broad variety of disciplines, made an excellent start on the first two of these aims, the characterization and modeling of network structure. Studies of the effects of structure on system behavior on the other hand are still in their infancy. It remains to be seen what the crucial theoretical developments will be in this area. 1.1. Types of Networks. A set of vertices joined by edges is only the simplest type of network; there are many ways in which networks may be more complex than this (Figure 1.4). For instance, there may be more than one different type of vertex in a network, or more than one different type of edge. And vertices or edges may have a variety of properties, numerical or otherwise, associated with them. Taking the example of a social network of people, the vertices may represent men or women, people of different nationalities, locations, ages, incomes, or many other things. Edges may represent friendship, but they could also represent animosity, or professional acquaintance, or geographical proximity. They can carry weights, representing, say
MLE.」 NEWMAN 4 of verter and a single type of edge;(b)a network with a number of discrete verter and edge types;(c) a network with varying verter and edge weights;(d)a directed network in which each edge has a direction how well two people know each other. They can also be directed, pointing in only one direction. Graphs composed of directed edges are themselves called directed graphs or sometimes digraphs, for short. a graph representing telephone calls or email messages between individuals would be directed, since each message only goes in one direction Directed graphs can be either cyclic, meaning they contain closed loops of edges, or acyclic, meaning they do not. Some networks, such as food webs, are approximately but not perfectly acyclic One can also have hyperedges--edges that join more than two vertices together Graphs containing such edges are called hypergraphs. Hyperedges could be used to indicate family ties in a social network for example--n individuals connected to each other by virtue of belonging to the same immediate family could be represented by an n-edge joining them. Graphs may also be naturally partitioned in various ways. We will see a number of examples in this review of bipartite graphs: graphs that contain vertices of two distinct types, with edges running only between unlike types. So-called affiliation networks in which people are joined together by common membership of groups take this form, the two types of vertices representing the people and the groups Graphs may also evolve over time, with vertices or edges appearing or disappearing or values defined on those vertices and edges changing. And there are many other levels of sophistication one can add. The study of networks is by no means a complete ience yet, and many of the possibilities have yet to be explored in depth, but we will see examples of at least some of the variations described here in the work reviewed in The jargon of the study of networks is unfortunately confused by differing usages among investigators from different fields. To avoid (or at least reduce) confusion, we give in Box 1 a short glossary of terms as they are used in this paper. 1. 2. Other Resources. A number of other reviews of this area have appeared recently, which the reader may wish to consult. Albert and Barabasi [13 and Doro- govtsev and Mendes [120 have given extensive pedagogical reviews focusing on the physics literature. Both devote the larger part of their attention to the models of
172 M. E. J. NEWMAN (c) (b) (d) (a) Fig. 1.4 Examples of various types of networks: (a) an undirected network with only a single type of vertex and a single type of edge; (b) a network with a number of discrete vertex and edge types; (c) a network with varying vertex and edge weights; (d) a directed network in which each edge has a direction. how well two people know each other. They can also be directed, pointing in only one direction. Graphs composed of directed edges are themselves called directed graphs or sometimes digraphs, for short. A graph representing telephone calls or email messages between individuals would be directed, since each message only goes in one direction. Directed graphs can be either cyclic, meaning they contain closed loops of edges, or acyclic, meaning they do not. Some networks, such as food webs, are approximately but not perfectly acyclic. One can also have hyperedges—edges that join more than two vertices together. Graphs containing such edges are called hypergraphs. Hyperedges could be used to indicate family ties in a social network for example—n individuals connected to each other by virtue of belonging to the same immediate family could be represented by an n-edge joining them. Graphs may also be naturally partitioned in various ways. We will see a number of examples in this review of bipartite graphs: graphs that contain vertices of two distinct types, with edges running only between unlike types. So-called affiliation networks in which people are joined together by common membership of groups take this form, the two types of vertices representing the people and the groups. Graphs may also evolve over time, with vertices or edges appearing or disappearing, or values defined on those vertices and edges changing. And there are many other levels of sophistication one can add. The study of networks is by no means a complete science yet, and many of the possibilities have yet to be explored in depth, but we will see examples of at least some of the variations described here in the work reviewed in this paper. The jargon of the study of networks is unfortunately confused by differing usages among investigators from different fields. To avoid (or at least reduce) confusion, we give in Box 1 a short glossary of terms as they are used in this paper. 1.2. Other Resources. A number of other reviews of this area have appeared recently, which the reader may wish to consult. Albert and Barab´asi [13] and Dorogovtsev and Mendes [120] have given extensive pedagogical reviews focusing on the physics literature. Both devote the larger part of their attention to the models of
THE STRUC TURE AND FUNCTION OF COMPLEX NETWORKS 173 letter(pl. vertices): The fundamental unit of a network, also called a site (physics), a node(computer science), or an actor(sociology) Edge: The line connecting two vertices. Also called a bond (physics), a link (computer science), or a tie(sociology) Directed/undirected: An edge is directed if it runs in only one direction(such as a one-way road between two points), and undirected if it runs in both directions Directed edges, which are sometimes called arcs, can be thought of as sporting arrows indicating their orientation. A graph is directed if all of its edges are directed. An undirected graph can be represented by a directed one having two edges between each pair of connected vertices, one in each direction Degree: The number of edges connected to a vertex. Note that the degree is not necessarily equal to the number of vertices adjacent to a vertex, since there may be more than one edge between any two vertices. In a few recent articles, the degree is referred to as the "connectivity"of a vertex, but we avoid this usage because the word connectivity already has another meaning in graph theory. A directed graph has both an in-degree and an out-degree for each vertex, which are the numbers of Component: The component to which a vertex belongs is that set of vertices that can be reached from it by paths running along edges of the graph. In a directed graph a vertex has both an in-component and an out-component, which are the sets of vertices from which the vertex can be reached and which can be reached from it Geodesic path: A geodesic path is the shortest path through the network from one vertex to another. Note that there may be and often is more than one geodesic path between two vertices. Diameter: The diameter of a network is the length (in number of edges)of the longest geodesic path between any two vertices. A few authors have also used this term to mean the average geodesic distance in a graph, although strictly the two quantities are quite distinct. Box I A short glossary of terms. growing graphs that we describe in section 7. Shorter reviews taking other viewpoints have been given by Newman 308 and Hayes [188, 189, who both concentrate on the so-called small-world models( see section 6), and by Strogatz 386, who includes an interesting discussion of the behavior of dynamical systems on networks A number of books also make worthwhile reading Dorogovtsev and Mendes [122] have expanded their above-mentioned review into a book, which again focuses on models of growing graphs. The edited volumes by Bornholdt and Schuster 70 and by Pastor-Satorras and Rubi 329 both contain contributed essays on various topics by leading researchers. Detailed treatments of many of the topics covered in the esent work can be found there. The book by Newman, Barabasi, and Watts 319 is a collection of previously published papers and also contains some review material by the editors Three popular books on the subject of networks merit a mention. Barabasi's Linked 31 gives a personal account of recent developments in the study of net works, focusing particularly on Barabasi's work on scale-free networks. Watts's Sir Degrees [413 gives a sociologist's view, partly historical, of discoveries old and new Buchanan's Nerus [76 gives an entertaining portrait of the field from the point of view of a science journalist
THE STRUCTURE AND FUNCTION OF COMPLEX NETWORKS 173 Vertex (pl. vertices): The fundamental unit of a network, also called a site (physics), a node (computer science), or an actor (sociology). Edge: The line connecting two vertices. Also called a bond (physics), a link (computer science), or a tie (sociology). Directed/undirected: An edge is directed if it runs in only one direction (such as a one-way road between two points), and undirected if it runs in both directions. Directed edges, which are sometimes called arcs, can be thought of as sporting arrows indicating their orientation. A graph is directed if all of its edges are directed. An undirected graph can be represented by a directed one having two edges between each pair of connected vertices, one in each direction. Degree: The number of edges connected to a vertex. Note that the degree is not necessarily equal to the number of vertices adjacent to a vertex, since there may be more than one edge between any two vertices. In a few recent articles, the degree is referred to as the “connectivity” of a vertex, but we avoid this usage because the word connectivity already has another meaning in graph theory. A directed graph has both an in-degree and an out-degree for each vertex, which are the numbers of incoming and outgoing edges respectively. Component: The component to which a vertex belongs is that set of vertices that can be reached from it by paths running along edges of the graph. In a directed graph a vertex has both an in-component and an out-component, which are the sets of vertices from which the vertex can be reached and which can be reached from it. Geodesic path: A geodesic path is the shortest path through the network from one vertex to another. Note that there may be and often is more than one geodesic path between two vertices. Diameter: The diameter of a network is the length (in number of edges) of the longest geodesic path between any two vertices. A few authors have also used this term to mean the average geodesic distance in a graph, although strictly the two quantities are quite distinct. Box 1 A short glossary of terms. growing graphs that we describe in section 7. Shorter reviews taking other viewpoints have been given by Newman [308] and Hayes [188, 189], who both concentrate on the so-called small-world models (see section 6), and by Strogatz [386], who includes an interesting discussion of the behavior of dynamical systems on networks. A number of books also make worthwhile reading. Dorogovtsev and Mendes [122] have expanded their above-mentioned review into a book, which again focuses on models of growing graphs. The edited volumes by Bornholdt and Schuster [70] and by Pastor-Satorras and Rubi [329] both contain contributed essays on various topics by leading researchers. Detailed treatments of many of the topics covered in the present work can be found there. The book by Newman, Barab´asi, and Watts [319] is a collection of previously published papers and also contains some review material by the editors. Three popular books on the subject of networks merit a mention. Barab´asi’s Linked [31] gives a personal account of recent developments in the study of networks, focusing particularly on Barab´asi’s work on scale-free networks. Watts’s Six Degrees [413] gives a sociologist’s view, partly historical, of discoveries old and new. Buchanan’s Nexus [76] gives an entertaining portrait of the field from the point of view of a science journalist
MLE.」 NEWMAN Farther afield, there are a variety of books on the study of networks in particular fields. Within graph theory the books by Harary [187] and by Bollobas [62 are widely cited as are, among social network theorists, the books by Wasserman and Faust [408 and by Scott [362. The book by Ahuja, Magnanti, and Orlin 7 is a useful source for information on network algorithms 1.3. Outline of the Review. The outline of this paper is as follows. In section 2 we describe empirical studies of the structure of networks, including social networks. formation networks, technological networks, and biological networks. In section 3 we describe some of the common properties that are observed in many of these networks how they are measured, and why they are believed to be important for the functioning of networked systems. Sections 4 to 7 form the heart of the review. They describe work on the mathematical modeling of networks, including random graph models and their generalizations, exponential random graphs, p* models and Markov graphs the small-world model and its variations, and models of growing graphs including preferential attachment models and their many variations. In section 8 we discuss the progress, such as it is, that has been made on the study of processes taking place on networks, including epidemic processes, network failure, models displayin phase transitions, and dynamical systems like random Boolean networks and cellular automata. In section 9 we give our conclusions and point to directions for future research 2. Networks in the Real world. In this section we look at what is known about the structure of networks of different types. Recent work on the mathematics of networks has been driven largely by observations of the properties of actual networks and attempts to model them, so network data are the obvious starting point for a review such as this. It also makes sense to examine simultaneously data from different kinds of networks. One of the principal thrusts of recent work in this area, inspired particularly by a groundbreaking 1998 paper by Watts and Strogatz 415 has been the comparative study of networks from different branches of science, witl emphasis on properties that are common to many of them and the mathematical developments that mirror those properties. We here divide our summary into four loose categories of networks: social networks information networks technological networks, and biological networks. 2. 1. Social Networks. A social network is a set of people or groups of people with some pattern of contacts or interactions between them 362, 408. The pat terns of friendships between individuals 295, 347, business relationships between companies 268, 285, and intermarriages between families 326 are all examples of networks that have been studied in the past. Of the academic disciplines, the social ciences have the longest history of the substantial quantitative study of real-wor networks [162, 362. Of particular note among the early worl the following: Moreno's networks of friendships within small 295, of which Figure 1.3 is an example: the so-called southern women study Gardner 103, which focused on the social circles of women in an unnamed city in olleagues of social net works of factory workers in the late 1930s in Chicago 356; the mathematical models of Rapoport 1345, who was one of the first theorists, perhaps the first, to stress Occasionally social networks of animals have been investigated also, such as dolphins [96), not to mention networks of fictional characters, such as the protagonists of Tolstoy s Anna Karenina (243 or Marvel Comics superheroes [10]
174 M. E. J. NEWMAN Farther afield, there are a variety of books on the study of networks in particular fields. Within graph theory the books by Harary [187] and by Bollob´as [62] are widely cited as are, among social network theorists, the books by Wasserman and Faust [408] and by Scott [362]. The book by Ahuja, Magnanti, and Orlin [7] is a useful source for information on network algorithms. 1.3. Outline of the Review. The outline of this paper is as follows. In section 2 we describe empirical studies of the structure of networks, including social networks, information networks, technological networks, and biological networks. In section 3 we describe some of the common properties that are observed in many of these networks, how they are measured, and why they are believed to be important for the functioning of networked systems. Sections 4 to 7 form the heart of the review. They describe work on the mathematical modeling of networks, including random graph models and their generalizations, exponential random graphs, p∗ models and Markov graphs, the small-world model and its variations, and models of growing graphs including preferential attachment models and their many variations. In section 8 we discuss the progress, such as it is, that has been made on the study of processes taking place on networks, including epidemic processes, network failure, models displaying phase transitions, and dynamical systems like random Boolean networks and cellular automata. In section 9 we give our conclusions and point to directions for future research. 2. Networks in the Real World. In this section we look at what is known about the structure of networks of different types. Recent work on the mathematics of networks has been driven largely by observations of the properties of actual networks and attempts to model them, so network data are the obvious starting point for a review such as this. It also makes sense to examine simultaneously data from different kinds of networks. One of the principal thrusts of recent work in this area, inspired particularly by a groundbreaking 1998 paper by Watts and Strogatz [415], has been the comparative study of networks from different branches of science, with emphasis on properties that are common to many of them and the mathematical developments that mirror those properties. We here divide our summary into four loose categories of networks: social networks, information networks, technological networks, and biological networks. 2.1. Social Networks. A social network is a set of people or groups of people with some pattern of contacts or interactions between them [362, 408]. The patterns of friendships between individuals [295, 347], business relationships between companies [268, 285], and intermarriages between families [326] are all examples of networks that have been studied in the past.1 Of the academic disciplines, the social sciences have the longest history of the substantial quantitative study of real-world networks [162, 362]. Of particular note among the early works on the subject are the following: Moreno’s networks of friendships within small groups [295], of which Figure 1.3 is an example; the so-called southern women study of Davis, Gardner, and Gardner [103], which focused on the social circles of women in an unnamed city in the American south in 1936; the study by Elton Mayo and colleagues of social networks of factory workers in the late 1930s in Chicago [356]; the mathematical models of Rapoport [345], who was one of the first theorists, perhaps the first, to stress 1Occasionally social networks of animals have been investigated also, such as dolphins [96], not to mention networks of fictional characters, such as the protagonists of Tolstoy’s Anna Karenina [243] or Marvel Comics superheroes [10]
THE STRUCTURE AND FUNCTION OF COMPLEX NETWORKS the importance of the degree distribution in networks of all kinds, not just social networks; and the studies of friendship networks of school children by rapoport and others 347 In more recent years udies of business communities 167, 168, 268 and of patterns of sexual contacts 45, 217, 242, 265 have attracted particular atten- Another important set of experiments are the famous"small-world"experiments of Milgram 282, 392. No actual networks were reconstructed in these experiments, but nonetheless they tell us about network structure. The experiments probed the distribution of path lengths in an acquaintance network by asking participants to pass a letter to one of their first-name acquaintances in an attempt to get it to an assigned target individual. Most of the letters in the experiment were lost, but about a quarter reached the target and passed on average through the hands of only about six people in doing so. This experiment was the origin of the popular concept of the six degrees of separation, "although that phrase did not appear in Milgram's writing, being coined some decades later by Guare [182. A brief but useful early review of Milgram's work and work stemming from it was given by Garfield [1691 Traditional social network studies often suffer from problems of inaccuracy, sub- jectivity, and small sample size. With the exception of a few ingenious indirect studies such as Milgram's, data collection is usually carried out by querying participants di- rectly using questionnaires or interviews. Such methods are labor-intensive and there- fore limit the size of the network that can be observed. Survey data are, moreover, influenced by subjective biases on the part of respondents: how one respondent defines a friend, for example, could be quite different from how another does. Although much effort is put into eliminating possible sources of inconsistency, it is generally accepted that there are large and essentially uncontrolled errors in most of these studies. A review of the issues has been given by Marsden 270 Because of these problems many researchers have turned to other methods for probing social networks. One source of copious and relatively reliable data is col- laboration networks. These are typically affiliation networks in which participants collaborate in groups of one kind or another, and links between pairs of individuals are established by common group membership. A classic, though rather frivolous example of such a network is the collaboration network of film actors, which is thor- oughly documented in the online Internet Movie Database. In this network, actors collaborate in films and two actors are considered connected if they have appeared in a film together. Statistical properties of this network have been analyzed by a number of authors [ 4, 20, 322, 415. Other examples of networks of this type are networks of company directors, in which two directors are linked if they belong to the same board of directors [104, 105, 268: networks of coauthorship among aca- demics, in which individuals are linked if they have coauthored one or m 43, 68, 107, 181, 278, 291, 310, 311, 312: and coappearance networks, in which individuals are linked by mention in the same context, particularly on Web pages 3, 226 or in newspaper articles 99(see Figure 1. 2b) Another source of reliable data about personal connections between people is communication records of certain kinds. For example, one could construct a network in which each(directed)edge between two people represented a letter or package sent by mail from one to the other. No study of such a network has been published as far as ve are aware, but some similar things have. Aiello, Chung, and Lu 8, 9 have analyzed ler containing several documents
THE STRUCTURE AND FUNCTION OF COMPLEX NETWORKS 175 the importance of the degree distribution in networks of all kinds, not just social networks; and the studies of friendship networks of school children by Rapoport and others [347, 149]. In more recent years, studies of business communities [167, 168, 268] and of patterns of sexual contacts [45, 217, 242, 265] have attracted particular attention. Another important set of experiments are the famous “small-world” experiments of Milgram [282, 392]. No actual networks were reconstructed in these experiments, but nonetheless they tell us about network structure. The experiments probed the distribution of path lengths in an acquaintance network by asking participants to pass a letter2 to one of their first-name acquaintances in an attempt to get it to an assigned target individual. Most of the letters in the experiment were lost, but about a quarter reached the target and passed on average through the hands of only about six people in doing so. This experiment was the origin of the popular concept of the “six degrees of separation,” although that phrase did not appear in Milgram’s writing, being coined some decades later by Guare [182]. A brief but useful early review of Milgram’s work and work stemming from it was given by Garfield [169]. Traditional social network studies often suffer from problems of inaccuracy, subjectivity, and small sample size. With the exception of a few ingenious indirect studies such as Milgram’s, data collection is usually carried out by querying participants directly using questionnaires or interviews. Such methods are labor-intensive and therefore limit the size of the network that can be observed. Survey data are, moreover, influenced by subjective biases on the part of respondents; how one respondent defines a friend, for example, could be quite different from how another does. Although much effort is put into eliminating possible sources of inconsistency, it is generally accepted that there are large and essentially uncontrolled errors in most of these studies. A review of the issues has been given by Marsden [270]. Because of these problems many researchers have turned to other methods for probing social networks. One source of copious and relatively reliable data is collaboration networks. These are typically affiliation networks in which participants collaborate in groups of one kind or another, and links between pairs of individuals are established by common group membership. A classic, though rather frivolous, example of such a network is the collaboration network of film actors, which is thoroughly documented in the online Internet Movie Database.3 In this network, actors collaborate in films and two actors are considered connected if they have appeared in a film together. Statistical properties of this network have been analyzed by a number of authors [4, 20, 322, 415]. Other examples of networks of this type are networks of company directors, in which two directors are linked if they belong to the same board of directors [104, 105, 268]; networks of coauthorship among academics, in which individuals are linked if they have coauthored one or more papers [36, 43, 68, 107, 181, 278, 291, 310, 311, 312]; and coappearance networks, in which individuals are linked by mention in the same context, particularly on Web pages [3, 226] or in newspaper articles [99] (see Figure 1.2b). Another source of reliable data about personal connections between people is communication records of certain kinds. For example, one could construct a network in which each (directed) edge between two people represented a letter or package sent by mail from one to the other. No study of such a network has been published as far as we are aware, but some similar things have. Aiello, Chung, and Lu [8, 9] have analyzed 2Actually a folder containing several documents. 3http://www.imdb.com/
MLE.」 NEWMAN a network of telephone calls made over the at&T long-distance network on a single day. The vertices of this network represent telephone numbers and the directed edges calls from one number to another. Even for just a single day this graph is enormous, having about 50 million vertices, one of the largest graphs yet studied after the graph of the World Wide Web. Ebel, Mielsch, and Bornholdt [136 have reconstructed the pattern of email communications between five thousand students at Kiel University from logs maintained by email servers. In this network the vertices represent email addresses and directed edges represent a message passing from one address to another Email networks have also been studied by Newman, Forrest, and Balthrop 320 and by Guimera et al. [184, and similar networks have been constructed for an"instant messaging "system by Smith 370, and for an Internet community Web site by Holme Edling, and Liljeros [195. Dodds, Muhamad, and Watts [110 have carried out an email version of Milgram's small-world experiment in which participants were asked to forward an email message to one of their friends in an effort to get the message ultimately to some chosen target individual. Response rates for the experiment were quite low, but a few hundred completed chains of messages were recorded, enough to allow various statistical analyses 2. 2. Information Networks. Our second network category is what we will call information networks(also sometimes called"knowledge networks"). The classic example of an information network is the network of citations between academic [138. Most learned articles cite previous works by others on related topics These citations form a network in which the vertices are articles and a directed edge from article a to article b indicates that a cites b. the structure of the citation network then reflects the structure of the information stored at its vertices. hence the term"information network, although certainly there are social aspects to the citation patterns of papers too 419 Citation networks are acyclic(see section 1. 1)because papers can only cite other papers that have already been written, not those that have yet to be written. Thus all edges in the network point backwards in time, making closed loops impossible, or at least extremely rare(see Figure 2.1) As an object of scientific study, citation networks have a great advantage the copious and accurate data available for them. Quantitative study of publication patterns stretches back at least as far as Alfred Lotka's groundbreaking 1926 discover of the so-called Law of Scientific Productivity, which states that the distribution of the numbers of papers written by individual scientists follows a power law. That is, the number of scientists who have written k papers falls off as k-a for some constant a (In fact, this result extends to the arts and humanities as well. The first serious work on citation patterns was conducted in the 1960s as large citation databases became available through the work of Eugene Garfield and other pioneers in the field of bibliometrics. The network formed by citations was discussed in an early paper by Price 342, in which, among other things, the author points out for the first time that both the in- and out-degree distributions of the network follow power laws,a far-reaching discovery which we discuss further in section 3. 3. Many other studies of citation networks have been performed since then, using the ever better resources available in citation databases. Of particular note are the studies by Seglen 363 and Redner 350 4 An interesting development in the study of citation patterns has been the arrival of automatic citation "crawlers" that construct citation networks from online papers. Example seer(http://citeseer.nj.neccom/),SpirEs(hTtP://www.slac.stanfordedu/spires/hep/),andCite base(http://citebase.eprintsorg/)
176 M. E. J. NEWMAN a network of telephone calls made over the AT&T long-distance network on a single day. The vertices of this network represent telephone numbers and the directed edges calls from one number to another. Even for just a single day this graph is enormous, having about 50 million vertices, one of the largest graphs yet studied after the graph of the World Wide Web. Ebel, Mielsch, and Bornholdt [136] have reconstructed the pattern of email communications between five thousand students at Kiel University from logs maintained by email servers. In this network the vertices represent email addresses and directed edges represent a message passing from one address to another. Email networks have also been studied by Newman, Forrest, and Balthrop [320] and by Guimer`a et al. [184], and similar networks have been constructed for an “instant messaging” system by Smith [370], and for an Internet community Web site by Holme, Edling, and Liljeros [195]. Dodds, Muhamad, and Watts [110] have carried out an email version of Milgram’s small-world experiment in which participants were asked to forward an email message to one of their friends in an effort to get the message ultimately to some chosen target individual. Response rates for the experiment were quite low, but a few hundred completed chains of messages were recorded, enough to allow various statistical analyses. 2.2. Information Networks. Our second network category is what we will call information networks (also sometimes called “knowledge networks”). The classic example of an information network is the network of citations between academic papers [138]. Most learned articles cite previous works by others on related topics. These citations form a network in which the vertices are articles and a directed edge from article A to article B indicates that A cites B. The structure of the citation network then reflects the structure of the information stored at its vertices, hence the term “information network,” although certainly there are social aspects to the citation patterns of papers too [419]. Citation networks are acyclic (see section 1.1) because papers can only cite other papers that have already been written, not those that have yet to be written. Thus all edges in the network point backwards in time, making closed loops impossible, or at least extremely rare (see Figure 2.1). As an object of scientific study, citation networks have a great advantage in the copious and accurate data available for them. Quantitative study of publication patterns stretches back at least as far as Alfred Lotka’s groundbreaking 1926discovery of the so-called Law of Scientific Productivity, which states that the distribution of the numbers of papers written by individual scientists follows a power law. That is, the number of scientists who have written k papers falls off as k−α for some constant α. (In fact, this result extends to the arts and humanities as well.) The first serious work on citation patterns was conducted in the 1960s as large citation databases became available through the work of Eugene Garfield and other pioneers in the field of bibliometrics. The network formed by citations was discussed in an early paper by Price [342], in which, among other things, the author points out for the first time that both the in- and out-degree distributions of the network follow power laws, a far-reaching discovery which we discuss further in section 3.3. Many other studies of citation networks have been performed since then, using the ever better resources available in citation databases. Of particular note are the studies by Seglen [363] and Redner [350].4 4An interesting development in the study of citation patterns has been the arrival of automatic citation “crawlers” that construct citation networks from online papers. Examples include Citeseer (http://citeseer.nj.nec.com/), SPIRES (http://www.slac.stanford.edu/spires/hep/), and Citebase (http://citebase.eprints.org/)