正在加载图片...
versation duration peaks at an age difference of 20 years between participants. We speculate that the peak may cor- respond roughly to the gap between generations The plots reveal that there is strong homophily in the com more with people of similar reported age. This is especially salient for the number of buddies and conversations among people of th ages. We also observe that the links between people of similar attributes are used more often, Figure 12: Numbers of pairs of people of different t with shorter and more intense(more exchanged ages.(a) Randomly selected pairs of people;(b)peo messages) communications. The intensity of communica- ple who communicate. Correlation between age and tion decays linearly with the difference in age. In contrast communication is captured by the diagonal trend to findings of previous studies, we observe that the num- ber of cross-gender communication links follows a random chance. However, cross-gender communication takes longer and is faster paced as it seems that people tend to pay more attention when communicating with the opposite sex. Recently, using the data we generated, Singla and Richard- son further investigated the homophily within the Messenger network and found that people who communicate are also more likely to search the web for content on similar top- (a)Number of conversations(b)Conversation duration ics[14 Figure 13: Communication characteristics and age 7. THE COMMUNICATION NETWORK difference between conversants So far we have examined communication patterns based on pairwise communications. We now create a more general communication network from the data. Using this network who communicate. As we noted earlier, gender and commu we can examine the typical social distance between people, nication are slightly negatively correlated; people tend to e, the number of links that separate a random pair of communicate more with people of the opposite gender people. This analysis seeks to understand how many peo- Another method for identifying association ple can be reached within certain numbers of hops among the probability that a pair of users will show an exact matc eople who communicate. Also, we test the transitivity of in values of an attribute, i. e, identifying whether two users the network, i. e, the degree at which pairs with a common come from the same country, speak the same language, etc friend tend to be connected Table 2 shows the results for the probability of users sharing We constructed a from the set of all two-user con- he same attribute value. We make similar observations as versations, where each node corresponds to a person and before. People who communicate are more likely to share there is an undirected edge between a pair of nodes if the common characteristics, including age, location, language, users were engaged in an active conversation during the ob- and they are less likely to be of the same gender. We note servation period (users exchanged at least 1 message). The that the most common attribute of people who com resulting network contains 179, 792, 538 nodes, and 1, 342, 246, 427 cate is language. On the flip side, the amount of commun edges. Note that this is not simply a buddy network; we ation tends to decrease with increasing user dissimilarity. only connect people who are buddies and have communi- This relationship is highlighted in Figure 11, which shows cated during the observation period how communication among pairs of people decreases with Figures 14-15 show the structural properties of the com- distance munication network. The network degree distribution shown Figure 12 further illustrates the results displayed in Ta- in Figure 14(a) is heavy tailed but does not follow a power ble 2, where we randomly sample pairs of users from the law distribution. Using maximum likelihood estimation, w messenger user base, and then plot the distribution over fit a power-law with exponential cutoff p(k)ok e with reported ages. As most of the population comes from the fitted parameter values a= 0.8 and b=0.03. We found a age group 10-30, the distribution of random pairs of people strong cutoff parameter and low power-law exponent, sug- reaches the mode at those ages but there is no correlation gesting a distribution with high variance igure 12(b)shows the distribution of ages over the pairs Figure 14(b) displays the degree distribution of a buddy of people who communicate. Note the correlation, as repre- graph. We did not have access to the full buddy network ented by the diagonal trend on the plot, where people tend we only had access to data on the length of the user contact to communicate more with others of a similar age list which allowed us to create the plot. We found a total Next, we further explore communication patterns by the of 9.1 billion buddy edges in the graph with 49 buddies per differences in the reported ages among users. Figure 13(a) user. We fit the data with a power-law distribution with plots on a log-linear scale the number of conversations in the. exponential cutoff and identified parameters of a=0.6 and social network with participants of varying age differences. b=0.01. The power-law exponent now is even smaller. Again we see that links and conversations are strongly cor- This model described the data well. We note a spike elated with the age differences among participants. Fig- 600 which is the limit on the maximal number of buddies ure 13(b) shows the average conversation duration with the imposed by the Messenger software client. The maximal age difference among the users. Interestingly, the mean con- number of buddies was increased to 300 from 150 in March10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 (a) Random (b) Communicate Figure 12: Numbers of pairs of people of different ages. (a) Randomly selected pairs of people; (b) peo￾ple who communicate. Correlation between age and communication is captured by the diagonal trend. 0 20 40 60 80 100 103 104 105 106 107 108 109 age difference number of conversations 0 20 40 60 80 4.4 4.6 4.8 5 5.2 5.4 age difference time per conversation [min] (a) Number of conversations (b) Conversation duration Figure 13: Communication characteristics and age difference between conversants. who communicate. As we noted earlier, gender and commu￾nication are slightly negatively correlated; people tend to communicate more with people of the opposite gender. Another method for identifying association is to measure the probability that a pair of users will show an exact match in values of an attribute, i.e., identifying whether two users come from the same country, speak the same language, etc. Table 2 shows the results for the probability of users sharing the same attribute value. We make similar observations as before. People who communicate are more likely to share common characteristics, including age, location, language, and they are less likely to be of the same gender. We note that the most common attribute of people who communi￾cate is language. On the flip side, the amount of communi￾cation tends to decrease with increasing user dissimilarity. This relationship is highlighted in Figure 11, which shows how communication among pairs of people decreases with distance. Figure 12 further illustrates the results displayed in Ta￾ble 2, where we randomly sample pairs of users from the Messenger user base, and then plot the distribution over reported ages. As most of the population comes from the age group 10–30, the distribution of random pairs of people reaches the mode at those ages but there is no correlation. Figure 12(b) shows the distribution of ages over the pairs of people who communicate. Note the correlation, as repre￾sented by the diagonal trend on the plot, where people tend to communicate more with others of a similar age. Next, we further explore communication patterns by the differences in the reported ages among users. Figure 13(a) plots on a log-linear scale the number of conversations in the social network with participants of varying age differences. Again we see that links and conversations are strongly cor￾related with the age differences among participants. Fig￾ure 13(b) shows the average conversation duration with the age difference among the users. Interestingly, the mean con￾versation duration peaks at an age difference of 20 years between participants. We speculate that the peak may cor￾respond roughly to the gap between generations. The plots reveal that there is strong homophily in the com￾munication network for age; people tend to communicate more with people of similar reported age. This is especially salient for the number of buddies and conversations among people of the same ages. We also observe that the links between people of similar attributes are used more often, to interact with shorter and more intense (more exchanged messages) communications. The intensity of communica￾tion decays linearly with the difference in age. In contrast to findings of previous studies, we observe that the num￾ber of cross-gender communication links follows a random chance. However, cross-gender communication takes longer and is faster paced as it seems that people tend to pay more attention when communicating with the opposite sex. Recently, using the data we generated, Singla and Richard￾son further investigated the homophily within the Messenger network and found that people who communicate are also more likely to search the web for content on similar top￾ics [14]. 7. THE COMMUNICATION NETWORK So far we have examined communication patterns based on pairwise communications. We now create a more general communication network from the data. Using this network, we can examine the typical social distance between people, i.e., the number of links that separate a random pair of people. This analysis seeks to understand how many peo￾ple can be reached within certain numbers of hops among people who communicate. Also, we test the transitivity of the network, i.e., the degree at which pairs with a common friend tend to be connected. We constructed a graph from the set of all two-user con￾versations, where each node corresponds to a person and there is an undirected edge between a pair of nodes if the users were engaged in an active conversation during the ob￾servation period (users exchanged at least 1 message). The resulting network contains 179,792,538 nodes, and 1,342,246,427 edges. Note that this is not simply a buddy network; we only connect people who are buddies and have communi￾cated during the observation period. Figures 14–15 show the structural properties of the com￾munication network. The network degree distribution shown in Figure 14(a) is heavy tailed but does not follow a power￾law distribution. Using maximum likelihood estimation, we fit a power-law with exponential cutoff p(k) ∝ k −a e −bk with fitted parameter values a = 0.8 and b = 0.03. We found a strong cutoff parameter and low power-law exponent, sug￾gesting a distribution with high variance. Figure 14(b) displays the degree distribution of a buddy graph. We did not have access to the full buddy network; we only had access to data on the length of the user contact list which allowed us to create the plot. We found a total of 9.1 billion buddy edges in the graph with 49 buddies per user. We fit the data with a power-law distribution with exponential cutoff and identified parameters of a = 0.6 and b = 0.01. The power-law exponent now is even smaller. This model described the data well. We note a spike at 600 which is the limit on the maximal number of buddies imposed by the Messenger software client. The maximal number of buddies was increased to 300 from 150 in March
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有