正在加载图片...
Planetary- scale views on a large Instant-Messaging Network Jure leskovec Eric horvitz Carnegie Mellon University Microsoft Research jure@cs. cmu. edu horvitz@microsoft.com ABSTRACT We explore a dataset of 30 billion conversations generated We present a study of anonymized data capturing a month by 240 million distinct users over one month. We found that of high-level communication activities within the whole of approximately 90 million distinct Messenger accounts were the Microsoft Messenger instant-messaging system. We ex accessed each day and that these users produced about 1 bil- amine characteristics and patterns that emerge from the col- lion conversations, with approximately 7 billion exchanged lective dynamics of large numbers of people, rather than the messages per day. 180 million of the 240 million active ac- actions and characteristics of individuals. The dataset con- counts had at least one conversation on the observation pe- tains summary properties of 30 billion conversations among riod. We found that 99% of the conversations occurred be- 240 million people. From the data, we construct a commu- tween 2 people, and the rest with greater numbers of partic nication graph with 180 million nodes and 1.3 billion undi- ants. To our knowledge, our investigation represents the ected edges, creating the largest social network constructed largest and most comprehensive study to date of presence and analyzed to date. We report on multiple aspects of and communications in an IM system. A recent report [6] the dataset and synthesized graph. We find that the graph estimated that approximately 12 billion instant messages are well-connected and robust to node removal. We inves sent each day. Given the estimate and the growth of IM, we estimate that we captured approximately half of the world's tigate on a planetary-scale the oft-cited report that people IM communication during the observation period are separated by "six degrees of separation"and find that the average path length among Messenger users is 6.6. We We created an undirected communication network fro also find that people tend to communicate more with each the data where each user is represented by a node and an other when they have similar age, language, and location edge is placed between users if they exchanged at least one and that cross-gender conversations are both more frequent message during the month of observation. The network rep- nd of longer duration than conversations with the same resents accounts that were active during une 2006. In sum- mary, the communication graph has 180 million nodes, rep- resenting users who participated in at least one conversation Categories and Subject Descriptors: H.2.8 Database and 1.3 billion undirected edges among active users,where Management:: Database applications- Data mining an edge indicates that a pair of people communicated. We General Terms: Measurement; Experimentation note that this graph should be distinguished from a buddy Keywords: Social networks; Communication networks; User raph where two people are connected if they appear on eacl demographics: Large data: Online communication others contact lists. The buddy graph for the data contains 240 million nodes and 9.1 billion edges. On average each 1. INTRODUCTION account has approximately 50 buddies on a contact list To highlight several of our key findings, we discovered that Large-scale web services provide unprecedented opportu- the communication network is well connected, with 99.9% nities to capture and analyze behavioral data on a plan of the nodes belonging to the largest connected component etary scale. We discuss findings drawn from aggregations We evaluated the oft-cited finding by Travers and migra of anonymized data representing one month(June 2006) of high-level communication activities of people using the Mi that any two people are linked to one another on average crosoft Messenger instant-messaging(IM)network. We did via a chain with" 6-degrees-of-separation"[17. We found not have nor seek access to the content of messages. Rather that the average shortest path length in the Messenger net- work is 6.6(median 6), which is half a link more than the we consider structural properties of a communication graph path length measured in the classic study.However, we and study how structure and communication relate to us also found that longer paths exist in the graph, with lengths demographic attributes, such as gender, age, and location The data set provides a unique lens for studying patterns of up to 29. We observed that the network is well clustered, human behavior on a wide scale with a clustering coefficient [19 that decays with exponent -0.37. This decay is significantly lower than the value we Jure Leskovec performed this research during an internship had expected given prior research [11]. We found strong t microsoft research homophily 9, 12 among users; people have more conversa- Copyright is held by the Intemational World Wide Web Conference Com tions and converse for longer durations with people who are mittee(Iw3C2). Distribution of these papers is limited to classroom us similar to themselves. We find the strongest homophily for the language used, followed by conversants' geographic lo- www 2008, April 21-25, 2008, Beijing, China ACM978-1-60558-085-2/08/04Planetary-Scale Views on a Large Instant-Messaging Network Jure Leskovec ∗ Carnegie Mellon University jure@cs.cmu.edu Eric Horvitz Microsoft Research horvitz@microsoft.com ABSTRACT We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We ex￾amine characteristics and patterns that emerge from the col￾lective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset con￾tains summary properties of 30 billion conversations among 240 million people. From the data, we construct a commu￾nication graph with 180 million nodes and 1.3 billion undi￾rected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We inves￾tigate on a planetary-scale the oft-cited report that people are separated by “six degrees of separation” and find that the average path length among Messenger users is 6.6. We also find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender. Categories and Subject Descriptors: H.2.8 Database Management: : Database applications – Data mining General Terms: Measurement; Experimentation. Keywords: Social networks; Communication networks; User demographics; Large data; Online communication. 1. INTRODUCTION Large-scale web services provide unprecedented opportu￾nities to capture and analyze behavioral data on a plan￾etary scale. We discuss findings drawn from aggregations of anonymized data representing one month (June 2006) of high-level communication activities of people using the Mi￾crosoft Messenger instant-messaging (IM) network. We did not have nor seek access to the content of messages. Rather, we consider structural properties of a communication graph and study how structure and communication relate to user demographic attributes, such as gender, age, and location. The data set provides a unique lens for studying patterns of human behavior on a wide scale. ∗ Jure Leskovec performed this research during an internship at Microsoft Research. Copyright is held by the International World Wide Web Conference Com￾mittee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2008, April 21–25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. We explore a dataset of 30 billion conversations generated by 240 million distinct users over one month. We found that approximately 90 million distinct Messenger accounts were accessed each day and that these users produced about 1 bil￾lion conversations, with approximately 7 billion exchanged messages per day. 180 million of the 240 million active ac￾counts had at least one conversation on the observation pe￾riod. We found that 99% of the conversations occurred be￾tween 2 people, and the rest with greater numbers of partic￾ipants. To our knowledge, our investigation represents the largest and most comprehensive study to date of presence and communications in an IM system. A recent report [6] estimated that approximately 12 billion instant messages are sent each day. Given the estimate and the growth of IM, we estimate that we captured approximately half of the world’s IM communication during the observation period. We created an undirected communication network from the data where each user is represented by a node and an edge is placed between users if they exchanged at least one message during the month of observation. The network rep￾resents accounts that were active during June 2006. In sum￾mary, the communication graph has 180 million nodes, rep￾resenting users who participated in at least one conversation, and 1.3 billion undirected edges among active users, where an edge indicates that a pair of people communicated. We note that this graph should be distinguished from a buddy graph where two people are connected if they appear on each other’s contact lists. The buddy graph for the data contains 240 million nodes and 9.1 billion edges. On average each account has approximately 50 buddies on a contact list. To highlight several of our key findings, we discovered that the communication network is well connected, with 99.9% of the nodes belonging to the largest connected component. We evaluated the oft-cited finding by Travers and Milgram that any two people are linked to one another on average via a chain with “6-degrees-of-separation” [17]. We found that the average shortest path length in the Messenger net￾work is 6.6 (median 6), which is half a link more than the path length measured in the classic study. However, we also found that longer paths exist in the graph, with lengths up to 29. We observed that the network is well clustered, with a clustering coefficient [19] that decays with exponent −0.37. This decay is significantly lower than the value we had expected given prior research [11]. We found strong homophily [9, 12] among users; people have more conversa￾tions and converse for longer durations with people who are similar to themselves. We find the strongest homophily for the language used, followed by conversants’ geographic lo-
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有