IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING DRIMUX:Dynamic Rumor Influence Minimization with User Experience in Social Networks Biao Wang,Ge Chen,Luoyi Fu,Li Song,and Xinbing Wang,Senior Member,IEEE Abstract-With the soaring development of large scale online social networks,online information sharing is becoming ubiquitous everyday.Various information is propagating through online social networks including both the positive and negative.In this paper,we focus on the negative information problems such as the online rumors.Rumor blocking is a serious problem in large-scale social networks.Malicious rumors could cause chaos in society and hence need to be blocked as soon as possible after being detected.In this paper,we propose a model of dynamic rumor influence minimization with user experience(DRIMUX).Our goal is to minimize the influence of the rumor(i.e.,the number of users that have accepted and sent the rumor)by blocking a certain subset of nodes.A dynamic Ising propagation model considering both the global popularity and individual attraction of the rumor is presented based on realistic scenario.In addition,different from existing problems of influence minimization,we take into account the constraint of user experience utility.Specifically,each node is assigned a tolerance time threshold.If the blocking time of each user exceeds that threshold,the utility of the network will decrease.Under this constraint,we then formulate the problem as a network inference problem with survival theory,and propose solutions based on maximum likelihood principle.Experiments are implemented based on large-scale real world networks and validate the effectiveness of our method. Index Terms-Social network,rumor blocking,survival theory. 1 INTRODUCTION Wggeiowmh2mine0Tac companies then took measures to block related accounts delete relevant posts and fanpages on their social network and Chinese Sina Weibo,etc.,hundreds of millions of people platforms to prevent the ISIS from spreading malicious are able to become friends [2]and share all kinds of infor- information.Additionally,Facebook et al.also have issued mation with each other.Online social network analysis has relevant security policies and standards to claim the au- also attracted growing interest among researchers [3],[4], thority to block accounts of users when they are against [5],[6].On one hand,these online social platforms provide rules or at risk [13].Undoubtedly,malicious rumors should great convenience to the diffusion of positive information be stopped as soon as possible once detected so that their such as new ideas,innovations,and hot topics [7],[8].On negative influence can be minimized. the other hand,however,they may become a channel for the Most of the previous works studied the problem of spreading of malicious rumors or misinformation [9],[10], maximizing the influence of positive information through [11].For example,some people may post on social networks social networks [14],[15],[16].Fast approximation methods a rumor about an upcoming earthquake,which will cause were also proposed to influence maximization problem chaos among the crowd and hence may hinder the normal [17],[18].In contrast,the negative influence minimization public order.In this case,it is necessary to detect the rumor problem has gained much less attention,but still there have source and delete related messages,which may be enough been consistent efforts on designing effective strategies for to prevent the rumor from further spreading.However,in blocking malicious rumors and minimizing the negative certain extreme circumstances such as terrorist online attack, influence.Budak et al.[9]introduced the notion of a "good" it might be necessary to disable or block related Social campaign in a social network to counteract the negative Network(SN)accounts to avoid serious negative influences. influence of a "bad"one by convincing users to adopt For instance,in 2016,the families of three out of the forty the "good"one.Kimura et al.[19]studied the problem nine victims from the Orlando nightclub shooting incident of minimizing the propagation of malicious rumors by filed a lawsuit against Twitter,Facebook and Google for blocking a limited number of links in a social network.They providing "material support"to the terrorism organization provided two different definitions of contamination degree of the Islamic State of Iraq and Syria (ISIS)[12].These and proposed corresponding optimization algorithms. Fan et al.[20]investigated the least cost rumor blocking problem in social networks.They introduced the concept of .B.Wang,Ge Chen,Luoyi Fu,Li Song and Xinbing Wang are with the 'protectors"and try to select a minimal number of them to Department of Electronic Engineering,Shanghai Jiao Tong University, 800 Dongchuan Road,Shanghai,China. limit the bad influence of rumors by triggering a protection E-mail:{sweetmvp24,chenge,yiluofu,song _li,xwang8y@sjtu.edu.cn. cascade against the rumor cascade.However,there are a few limitations in those works.First,they consider the The early version of this paper appeared in the Proceedings of AAAl 2016[1]. rumor popularity as constant during the whole propagation
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 DRIMUX: Dynamic Rumor Influence Minimization with User Experience in Social Networks Biao Wang, Ge Chen, Luoyi Fu, Li Song, and Xinbing Wang, Senior Member, IEEE Abstract—With the soaring development of large scale online social networks, online information sharing is becoming ubiquitous everyday. Various information is propagating through online social networks including both the positive and negative. In this paper, we focus on the negative information problems such as the online rumors. Rumor blocking is a serious problem in large-scale social networks. Malicious rumors could cause chaos in society and hence need to be blocked as soon as possible after being detected. In this paper, we propose a model of dynamic rumor influence minimization with user experience (DRIMUX). Our goal is to minimize the influence of the rumor (i.e., the number of users that have accepted and sent the rumor) by blocking a certain subset of nodes. A dynamic Ising propagation model considering both the global popularity and individual attraction of the rumor is presented based on realistic scenario. In addition, different from existing problems of influence minimization, we take into account the constraint of user experience utility. Specifically, each node is assigned a tolerance time threshold. If the blocking time of each user exceeds that threshold, the utility of the network will decrease. Under this constraint, we then formulate the problem as a network inference problem with survival theory, and propose solutions based on maximum likelihood principle. Experiments are implemented based on large-scale real world networks and validate the effectiveness of our method. Index Terms—Social network, rumor blocking, survival theory. ✦ 1 INTRODUCTION WITH the soaring development and rising popularity of large-scale social networks such as Twitter, Facebook, and Chinese Sina Weibo, etc., hundreds of millions of people are able to become friends [2] and share all kinds of information with each other. Online social network analysis has also attracted growing interest among researchers [3], [4], [5], [6]. On one hand, these online social platforms provide great convenience to the diffusion of positive information such as new ideas, innovations, and hot topics [7], [8]. On the other hand, however, they may become a channel for the spreading of malicious rumors or misinformation [9], [10], [11]. For example, some people may post on social networks a rumor about an upcoming earthquake, which will cause chaos among the crowd and hence may hinder the normal public order. In this case, it is necessary to detect the rumor source and delete related messages, which may be enough to prevent the rumor from further spreading. However, in certain extreme circumstances such as terrorist online attack, it might be necessary to disable or block related Social Network (SN) accounts to avoid serious negative influences. For instance, in 2016, the families of three out of the forty nine victims from the Orlando nightclub shooting incident filed a lawsuit against Twitter, Facebook and Google for providing “material support” to the terrorism organization of the Islamic State of Iraq and Syria (ISIS) [12]. These • B. Wang, Ge Chen, Luoyi Fu, Li Song and Xinbing Wang are with the Department of Electronic Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, China. E-mail: {sweetmvp24, chenge, yiluofu, song li, xwang8}@sjtu.edu.cn. The early version of this paper appeared in the Proceedings of AAAI 2016 [1]. companies then took measures to block related accounts, delete relevant posts and fanpages on their social network platforms to prevent the ISIS from spreading malicious information. Additionally, Facebook et al. also have issued relevant security policies and standards to claim the authority to block accounts of users when they are against rules or at risk [13]. Undoubtedly, malicious rumors should be stopped as soon as possible once detected so that their negative influence can be minimized. Most of the previous works studied the problem of maximizing the influence of positive information through social networks [14], [15], [16]. Fast approximation methods were also proposed to influence maximization problem [17], [18]. In contrast, the negative influence minimization problem has gained much less attention, but still there have been consistent efforts on designing effective strategies for blocking malicious rumors and minimizing the negative influence. Budak et al. [9] introduced the notion of a “good” campaign in a social network to counteract the negative influence of a “bad” one by convincing users to adopt the “good” one. Kimura et al. [19] studied the problem of minimizing the propagation of malicious rumors by blocking a limited number of links in a social network. They provided two different definitions of contamination degree and proposed corresponding optimization algorithms. Fan et al. [20] investigated the least cost rumor blocking problem in social networks. They introduced the concept of “protectors” and try to select a minimal number of them to limit the bad influence of rumors by triggering a protection cascade against the rumor cascade. However, there are a few limitations in those works. First, they consider the rumor popularity as constant during the whole propagation
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2 process,which is not close to the realistic scenarios.Second, in the design of the rumor blocking strategies,either blocking nodes or links,they fail to take into account the issue of user experience in real world social networks.We have to avoid blocking the accounts of users for such a long time that they may lodge complaints or even quit the social network.Therefore,it is necessary to consider the impact of blocking time on both individual node and the entire network. In this paper,we investigate the problem of dynamic rumor influence minimization with user experience.First, based on existing works on information diffusion in social networks [21],[22],[23],[24],we incorporate the rumor popularity dynamics in the diffusion model.We analyze Fig.1.The random graph denotation of online social networks.Nodes existing investigations on topic propagation dynamics [25] with different colors illustrate the different communities they belong to. and bursty topic patterns [26].Then we choose Chi-squared The size of the nodes indicate their degrees or "influence"in the social distribution to approximate the global rumor popularity. network.Normally,the more influential a node is,the more contributions Inspired by the novel energy model proposed by Han et it will make to rumor propagation. al.[27],we then analyze the individual tendency towards the rumor and present the probability of successful rumor formulation in Section 4,the solutions in Section 5,and the propagation between a pair of nodes.Finally,inspired by experiments in Section 6.Finally,we conclude the paper in the concept of Ising model [28],we derive the cooperative Section 7. succeeding probability of rumor propagation that integrates the global rumor popularity with individual tendency.After that,we introduce the concept of user experience utility 2 PRELIMINARIES function and analyze the impact of blocking time of nodes to 2.1 Social Network Definition the rumor propagation process.We then adopt the survival A social network,in mathematical context,can be formu- theory to explain the likelihood of nodes getting activated, and propose both greedy and dynamic algorithms based on lated as a directed graph G=(V,E)consisting of a set of nodes V representing the users,and a set of directed edges maximum likelihood principle. E denoting the relationship between users(e.g.following or The contributions of our work are as follows: being followed).Figure 1 shows the random graph illustra- We propose a rumor propagation model taking into tion of a social network.LetV=N denote the number of account the following three elements:First,the glob- nodes,and(u,v)EE denote the directed edge from node al popularity of the rumor over the entire social u to node v(u,v∈V),and auv∈{0,l}denote the edge network,i.e.,the general topic dynamics.Second, coefficient,where ouv =1 represents the existence of edge the attraction dynamics of the rumor to a potential (u,v),and auv=0,otherwise.We use puv to denote the spreader,i.e.,the individual tendency to forward the probability of u sending the rumor to v and v accepting it, rumor to its neighbors.Third,the acceptance proba- i.e.,the success probability of u activating v.Let D(u)denote bility of the rumor recipients.In our model,inspired the in-degree of node u.From Figure 1,we can see nodes in by the Ising model,we combine all three factors larger size have higher degree than those in smaller size. together to propose a cooperative rumor propagation The degree of a node is also an indication of "influence"in probability. a social network since higher degree denotes more connec- In our rumor blocking strategies,we consider the tions to other nodes,thus it implies more opportunities to influence of blocking time to user experience in real share information (both positive and negative)with other world social networks.Thus we propose a blocking nodes time constraint into the traditional rumor influence minimization objective function.In that case,our 2.2 Rumor Diffusion Model method optimizes the rumor blocking strategy with- Rumor diffusion mechanism is similar with that of epidemic out sacrificing the online user experience. We use survival theory to analyze the likelihood of propagation [29].During the propagation of rumors,each node could have one of the following three states:Suscep- nodes becoming activated or infected by the rumor before a time threshold which is determined by the tible (S),Infected (I)and Recovered (R),which is known user experience constraint.Then we propose both as the SIR model [30],[31].The state of being susceptible greedy and dynamic blocking algorithms using the represents the node has the potential to accept and spread the rumor at any time;Infected indicates the node has maximum likelihood principle. already accepted and spread the rumor;Recovered denotes The rest of the paper is organized as follows.In Section the state of the node identifying the rumor and denying 2,we introduce the preliminaries of social network and it.In this paper,we consider the rumor propagation as a information diffusion models.Next we give an overview of progressive process,i.e.,once a node is infected,it will stay the related work in Section 3.Then we propose the problem infected and not recover,which is the SI model
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2 process, which is not close to the realistic scenarios. Second, in the design of the rumor blocking strategies, either blocking nodes or links, they fail to take into account the issue of user experience in real world social networks. We have to avoid blocking the accounts of users for such a long time that they may lodge complaints or even quit the social network. Therefore, it is necessary to consider the impact of blocking time on both individual node and the entire network. In this paper, we investigate the problem of dynamic rumor influence minimization with user experience. First, based on existing works on information diffusion in social networks [21], [22], [23], [24], we incorporate the rumor popularity dynamics in the diffusion model. We analyze existing investigations on topic propagation dynamics [25] and bursty topic patterns [26]. Then we choose Chi-squared distribution to approximate the global rumor popularity. Inspired by the novel energy model proposed by Han et al. [27], we then analyze the individual tendency towards the rumor and present the probability of successful rumor propagation between a pair of nodes. Finally, inspired by the concept of Ising model [28], we derive the cooperative succeeding probability of rumor propagation that integrates the global rumor popularity with individual tendency. After that, we introduce the concept of user experience utility function and analyze the impact of blocking time of nodes to the rumor propagation process. We then adopt the survival theory to explain the likelihood of nodes getting activated, and propose both greedy and dynamic algorithms based on maximum likelihood principle. The contributions of our work are as follows: • We propose a rumor propagation model taking into account the following three elements: First,the global popularity of the rumor over the entire social network, i.e., the general topic dynamics. Second, the attraction dynamics of the rumor to a potential spreader, i.e., the individual tendency to forward the rumor to its neighbors. Third, the acceptance probability of the rumor recipients. In our model, inspired by the Ising model, we combine all three factors together to propose a cooperative rumor propagation probability. • In our rumor blocking strategies, we consider the influence of blocking time to user experience in real world social networks. Thus we propose a blocking time constraint into the traditional rumor influence minimization objective function. In that case, our method optimizes the rumor blocking strategy without sacrificing the online user experience. • We use survival theory to analyze the likelihood of nodes becoming activated or infected by the rumor before a time threshold which is determined by the user experience constraint. Then we propose both greedy and dynamic blocking algorithms using the maximum likelihood principle. The rest of the paper is organized as follows. In Section 2, we introduce the preliminaries of social network and information diffusion models. Next we give an overview of the related work in Section 3. Then we propose the problem Fig. 1. The random graph denotation of online social networks. Nodes with different colors illustrate the different communities they belong to. The size of the nodes indicate their degrees or “influence” in the social network. Normally, the more influential a node is, the more contributions it will make to rumor propagation. formulation in Section 4, the solutions in Section 5, and the experiments in Section 6. Finally, we conclude the paper in Section 7. 2 PRELIMINARIES 2.1 Social Network Definition A social network, in mathematical context, can be formulated as a directed graph G = (V, E) consisting of a set of nodes V representing the users, and a set of directed edges E denoting the relationship between users (e.g. following or being followed). Figure 1 shows the random graph illustration of a social network. Let |V | = N denote the number of nodes, and (u, v) ∈ E denote the directed edge from node u to node v (u, v ∈ V ), and αuv ∈ {0, 1} denote the edge coefficient, where αuv = 1 represents the existence of edge (u, v), and αuv = 0, otherwise. We use puv to denote the probability of u sending the rumor to v and v accepting it, i.e., the success probability of u activating v. Let D(u) denote the in-degree of node u. From Figure 1, we can see nodes in larger size have higher degree than those in smaller size. The degree of a node is also an indication of “influence” in a social network since higher degree denotes more connections to other nodes, thus it implies more opportunities to share information (both positive and negative) with other nodes. 2.2 Rumor Diffusion Model Rumor diffusion mechanism is similar with that of epidemic propagation [29]. During the propagation of rumors, each node could have one of the following three states: Susceptible (S), Infected (I) and Recovered (R), which is known as the SIR model [30], [31]. The state of being susceptible represents the node has the potential to accept and spread the rumor at any time; Infected indicates the node has already accepted and spread the rumor; Recovered denotes the state of the node identifying the rumor and denying it. In this paper, we consider the rumor propagation as a progressive process, i.e., once a node is infected, it will stay infected and not recover, which is the SI model
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 3 1400000 Samples of Rumor Evolution Pattern ■Normal 1200000 100000( ■Blocked A0000 0000 Natural Rumor Propagation Propagation Under Blockage 40000 200000 Ap Oct Apr Oct Fig.2.Typical cases of the topic evolution pattern curves extracted from Sina Weibo from Jan.2011 to Jan.2013.Different colors represent different hot topics and their evolution tracks respectively. Fig.3.Illustration of the process of natural rumor propagation and Diffusion models describe the process of information the process with rumor blockage,i.e.,blocking a certain amount of propagating through the network.Two classic diffusion "important"nodes in the propagation path. models are the Linear Threshold (LT)[32]and the Inde- pendent Cascade (IC)model [33],[34].In LT model,an million-scale tweets and blog posts.Crane et al.[35]demon- inactive node u becomes activated if the ratio of its activat- strated the existence of Poisson distribution and Power- ed parent nodes surpasses a certain pre-defined threshold 01,each node u which was activated at time contagion process [39]with several special characteristics. step (t-1)has a single opportunity to activate any of its Firstly,people's interest of a rumor tends to decrease with currently inactive neighbors v with a success probability time,which indicates the probability of a node willing to pv.In our context,it means in each time step,any node forward the rumor.That process is similar to the simulated that has accepted the rumor in previous time step has a annealing process [40].Han et al.[27]proposed a novel en- chance to let their inactive neighbors accept the rumor.For ergy model to describe the rumor propagation process.They simplicity,we assume that once a node is activated by the introduce the heat energy calculation formula AE=cmAT rumor,it will stay activated until the end of the diffusion in Physics to analogize the rumor impact.The rumor's process. influence on individual node is formulated as the amount of accumulated heat energy.Based on the model proposed by Han et al.[27],we define the expression of individual 3 RELATED WORK tendency with respect to the success activation probability 3.1 Topic Dynamics between a pair of nodes.In addition,even though an ac- Researchers have studied the temporal dynamics of we- tivated node does transmit the rumor to its neighbors,the b topics based on real-world statistics.Yang et al.[25] probability of these neighbors accepting the rumor is still to analyzed how the number of tweets related to a specific be determined.In that case,we can define the acceptance theme(i.e.,the popularity of a topic)changes with time, probability of the rumor recipient.By combining the rumor and revealed that a topic evolution generally consists of sending probability at the transmitting end with the rumor three phases,i.e.,a rising phase from the start,a peak acceptance probability at the receiving end,we can obtain period and then a fading phase.Fluctuations in each phase the ultimate rumor propagation probability. may result in different temporal characteristics.Yang et al. [25]proposed K-Spectral Centroids clustering algorithm 3.3 Ising Model for classifying online content according to their temporal The Ising model [41]is a widely applicable model in the patterns and finally extract six representative patterns from research of Physics theory.It is a simple theoretical de-
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 3 Jan 2011 Apr Jul Oct Jan 2012 Apr Jul Oct Jan 2013 Date 0 200000 400000 600000 800000 1000000 1200000 1400000 Amount Samples of Rumor Evolution Pattern Fig. 2. Typical cases of the topic evolution pattern curves extracted from Sina Weibo from Jan. 2011 to Jan. 2013. Different colors represent different hot topics and their evolution tracks respectively. Diffusion models describe the process of information propagating through the network. Two classic diffusion models are the Linear Threshold (LT) [32] and the Independent Cascade (IC) model [33], [34]. In LT model, an inactive node u becomes activated if the ratio of its activated parent nodes surpasses a certain pre-defined threshold 0 < θ < 1. In this paper, since we mainly focus on pairwise probabilistic rumor propagation among nodes including individual tendency of a node to a rumor as well as the global popularity probability of a rumor, it is more suitable to adopt the IC model in our work. The IC model has been widely adopted in information diffusion problems. The whole propagation process proceeds in discrete time steps t0, t1, t2, . . . . Initially, the cascade is triggered by a set of activated nodes, i.e., the seed nodes at t0. In our rumor diffusion context, we assume the rumor is started by one source node s in the network, and the other nodes are inactive. We use su(t) ∈ {0, 1} to denote the state of node u at time step t, where su(t) = 1 represents u is activated and su(t) = 0, otherwise. For every following time step t ≥ 1, each node u which was activated at time step (t − 1) has a single opportunity to activate any of its currently inactive neighbors v with a success probability puv. In our context, it means in each time step, any node that has accepted the rumor in previous time step has a chance to let their inactive neighbors accept the rumor. For simplicity, we assume that once a node is activated by the rumor, it will stay activated until the end of the diffusion process. 3 RELATED WORK 3.1 Topic Dynamics Researchers have studied the temporal dynamics of web topics based on real-world statistics. Yang et al. [25] analyzed how the number of tweets related to a specific theme (i.e., the popularity of a topic) changes with time, and revealed that a topic evolution generally consists of three phases, i.e., a rising phase from the start, a peak period and then a fading phase. Fluctuations in each phase may result in different temporal characteristics. Yang et al. [25] proposed K-Spectral Centroids clustering algorithm for classifying online content according to their temporal patterns and finally extract six representative patterns from Normal Infected Blocked Natural Rumor Propagation Propagation Under Blockage Fig. 3. Illustration of the process of natural rumor propagation and the process with rumor blockage, i.e., blocking a certain amount of “important” nodes in the propagation path. million-scale tweets and blog posts. Crane et al. [35] demonstrated the existence of Poisson distribution and Powerlaw relaxation in controlling the topic evolution over time. Figure 2 shows several typical topic evolution curves based on the data extracted from Sina Weibo. From the figure, in each topic evolution curve, we can see the three phases mentioned above. It is also discernable that every topic has a certain lifespan of its own, which is similar to the case of rumor propagation process. Thus, it is reasonable to utilize a function to simulate the process of rumor propagation. According to the topic evolution characteristics, in our work, we use the Chi-squared distribution [36], [37], [38] to simulate the rumor propagation dynamics. 3.2 Energy Model Rumor propagation can be considered as a type of social contagion process [39] with several special characteristics. Firstly, people’s interest of a rumor tends to decrease with time, which indicates the probability of a node willing to forward the rumor. That process is similar to the simulated annealing process [40]. Han et al. [27] proposed a novel energy model to describe the rumor propagation process. They introduce the heat energy calculation formula ∆E = cm∆T in Physics to analogize the rumor impact. The rumor’s influence on individual node is formulated as the amount of accumulated heat energy. Based on the model proposed by Han et al. [27], we define the expression of individual tendency with respect to the success activation probability between a pair of nodes. In addition, even though an activated node does transmit the rumor to its neighbors, the probability of these neighbors accepting the rumor is still to be determined. In that case, we can define the acceptance probability of the rumor recipient. By combining the rumor sending probability at the transmitting end with the rumor acceptance probability at the receiving end, we can obtain the ultimate rumor propagation probability. 3.3 Ising Model The Ising model [41] is a widely applicable model in the research of Physics theory. It is a simple theoretical de-
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING scription of the concept of ferromagnetism in Physics [42]. 4 PROBLEM FORMULATION Specifically,it describes the phenomenon that when an array of atomic spins align in the way that the magnetic moments 4.1 Dynamic Rumor Propagation with Ising Model associated to them will all point in the same direction. Kempe et al.[14]considered the success probability puv Then it will create a macroscopic magnetic moment.Gen- as a system parameter and is fixed at the very beginning erally speaking,the Ising model contains two parts-the of the cascade.However,based on the topic dynamics we microscopic and macroscopic parts.The microscopic part discussed in a previous section,at different time steps of represents the local or individual behavior which is the the propagation process,a topic can vary dramatically in alignment of each of the atomic spins.Correspondingly,its popularity.Besides,the rumor attraction [27]for each the macroscopic part stands for the global or collective individual node u E V is also a realistic factor we should behavior which is the exterior magnetic moment.Based on take into account.That means the success of rumor prop- its intrinsic attributes,the Ising model can be generalized to agation between neighbors includes two aspects:first,the other similar scenarios.In our work,we utilize it to model activated node u has to be so attracted by the rumor that the rumor propagation process in social networks. it will choose to send the rumor to its neighbors;second, one of u's inactive neighbors v decides to accept the rumor Only after those two steps,we can claim that v is activated. In other words,the success of rumor propagation depends 3.4 User Experience both on the global popularity and the individual tendency of the rumor topic,which can be regarded as a generalized User experience is an important factor for various services feature of the Ising model. including social networks [43].Existing rumor blocking strategies block either nodes (users)or links (connections Now we investigate the two steps of a successful rumor propagation.In the first step,at any time stamp ti,u is between users)in social networks to prevent the rumor from further propagation.However,none has analyzed the one of the activated nodes in time stamp tj-1.Based on the work in [27],we give the modified version of the probability impact of blocking nodes.Generally speaking,the longer the user is blocked,the less satisfactory the user feels about of node u sending the rumor to one of its inactive neighbors vas the social network.Therefore,if the blocked time surpasses Po a certain threshold,it is possible that the user may quit pd(与)=1g(10+t) (1) the social network or at least lodge a complaint to the administrator.Bhatti et al.[44]analyzed the user-perceived where po is the initial sending probability at time stamp 0. quality in web server design and found that users'tolerance On the receiving end,the probability of node v accepting for latency decreases over the duration of interaction with the rumor transmitted by its parent node u is also given as a site.A utility function was presented to measure the piuce =1/Dv, (2) customer satisfaction.Inspired by that,in our work,we apply a modified utility function to measure user experience where D is the in-degree of node v.That denotation can in rumor blocking. be interpreted as that if a node has very high degree,it will receive more information than those with lower degrees.In that case,due to the large number of pieces of information 3.5 Rumor Influence Minimization it receives,every single piece of information will have much lower possibility to be read,recognized and forwarded by Rumor influence minimization addresses the problem of the node.For example,if a user has only one friend on minimizing the propagation effect of undesirable rumors a social network,he or she would probably read all the in social networks.Figure 3 demonstrates the mechanism messages forwarded by his or her friend,and thus if the user of rumor blocking in social networks.It shows both the finds a rumor interesting,he or she will probably believe normal rumor propagation process without any blockage in and forward it.In contrast,if a user has hundreds or even the social network and the process blocking a set of nodes thousands of friends,he or she could easily miss most of the on the path of rumor propagation.The rumor influence information. minimization problem is converse to the classic influence Thus,based on the above analysis,we then give the maximization problem [14]. probability of successful rumor propagation from u to v as It has been investigated in different influence diffusion mod- els in social networks.Fan et al.[20]studied the least cost rumor blocking problem in social networks,and introduced aat,)=pd,哈=D。g10+t 1 Po (3) the notion of "protectors"to limit the bad influence of ru- which can be defined as the individual tendency between mors by initiating a protector cascade to propagate against different pair of nodes in the network. the rumor cascade.Greedy algorithm is proposed for both Now we discuss the global topic popularity of the ru- opportunistic and deterministic cascade models.However, mor.As mentioned in related work,the rumor popularity Kimura et al.[19]proposed the strategy of blocking links generally includes three phases and approximately subject instead of nodes in social networks so as to minimize the to the chi square distribution,which is given by propagation of malicious rumors.Different contamination minimization problems are defined based on different defi- 21-)tk-1e-号 nitions of contamination degree of a network. Pglb(t;k)= T() (4)
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 4 scription of the concept of ferromagnetism in Physics [42]. Specifically, it describes the phenomenon that when an array of atomic spins align in the way that the magnetic moments associated to them will all point in the same direction. Then it will create a macroscopic magnetic moment. Generally speaking, the Ising model contains two parts – the microscopic and macroscopic parts. The microscopic part represents the local or individual behavior which is the alignment of each of the atomic spins. Correspondingly, the macroscopic part stands for the global or collective behavior which is the exterior magnetic moment. Based on its intrinsic attributes, the Ising model can be generalized to other similar scenarios. In our work, we utilize it to model the rumor propagation process in social networks. 3.4 User Experience User experience is an important factor for various services including social networks [43]. Existing rumor blocking strategies block either nodes (users) or links (connections between users) in social networks to prevent the rumor from further propagation. However, none has analyzed the impact of blocking nodes. Generally speaking, the longer the user is blocked, the less satisfactory the user feels about the social network. Therefore, if the blocked time surpasses a certain threshold, it is possible that the user may quit the social network or at least lodge a complaint to the administrator. Bhatti et al. [44] analyzed the user-perceived quality in web server design and found that users’ tolerance for latency decreases over the duration of interaction with a site. A utility function was presented to measure the customer satisfaction. Inspired by that, in our work, we apply a modified utility function to measure user experience in rumor blocking. 3.5 Rumor Influence Minimization Rumor influence minimization addresses the problem of minimizing the propagation effect of undesirable rumors in social networks. Figure 3 demonstrates the mechanism of rumor blocking in social networks. It shows both the normal rumor propagation process without any blockage in the social network and the process blocking a set of nodes on the path of rumor propagation. The rumor influence minimization problem is converse to the classic influence maximization problem [14]. It has been investigated in different influence diffusion models in social networks. Fan et al. [20] studied the least cost rumor blocking problem in social networks, and introduced the notion of “protectors” to limit the bad influence of rumors by initiating a protector cascade to propagate against the rumor cascade. Greedy algorithm is proposed for both opportunistic and deterministic cascade models. However, Kimura et al. [19] proposed the strategy of blocking links instead of nodes in social networks so as to minimize the propagation of malicious rumors. Different contamination minimization problems are defined based on different defi- nitions of contamination degree of a network. 4 PROBLEM FORMULATION 4.1 Dynamic Rumor Propagation with Ising Model Kempe et al. [14] considered the success probability puv as a system parameter and is fixed at the very beginning of the cascade. However, based on the topic dynamics we discussed in a previous section, at different time steps of the propagation process, a topic can vary dramatically in its popularity. Besides, the rumor attraction [27] for each individual node u ∈ V is also a realistic factor we should take into account. That means the success of rumor propagation between neighbors includes two aspects: first, the activated node u has to be so attracted by the rumor that it will choose to send the rumor to its neighbors; second, one of u’s inactive neighbors v decides to accept the rumor. Only after those two steps, we can claim that v is activated. In other words, the success of rumor propagation depends both on the global popularity and the individual tendency of the rumor topic, which can be regarded as a generalized feature of the Ising model. Now we investigate the two steps of a successful rumor propagation. In the first step, at any time stamp tj , u is one of the activated nodes in time stamp tj−1. Based on the work in [27], we give the modified version of the probability of node u sending the rumor to one of its inactive neighbors v as p send u (tj ) = p0 lg(10 + tj ) , (1) where p0 is the initial sending probability at time stamp 0. On the receiving end, the probability of node v accepting the rumor transmitted by its parent node u is also given as p acc v = 1/Dv, (2) where Dv is the in-degree of node v. That denotation can be interpreted as that if a node has very high degree, it will receive more information than those with lower degrees. In that case, due to the large number of pieces of information it receives, every single piece of information will have much lower possibility to be read, recognized and forwarded by the node. For example, if a user has only one friend on a social network, he or she would probably read all the messages forwarded by his or her friend, and thus if the user finds a rumor interesting, he or she will probably believe and forward it. In contrast, if a user has hundreds or even thousands of friends, he or she could easily miss most of the information. Thus, based on the above analysis, we then give the probability of successful rumor propagation from u to v as pind(tj ) = p send u · p acc v = 1 Dv p0 lg(10 + tj ) , (3) which can be defined as the individual tendency between different pair of nodes in the network. Now we discuss the global topic popularity of the rumor. As mentioned in related work, the rumor popularity generally includes three phases and approximately subject to the chi square distribution, which is given by pglb(t; k) = 2 (1− k 2 ) t k−1 e − t 2 2 Γ( k 2 ) , (4)
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 5 where >0 represents the degree of freedom,I()is the propagation process. Gamma function.It explains a common social phenomenon Heterogeneous Networks.In heterogeneous social net- that when a rumor spreads for a while,it may create a works,different nodes have distinct properties.For exam- "rumor atmosphere"that could affect the judgements or ple,in real world social networks like Twitter or Weibo, decisions of users on online social networks. different users have different levels according to the time According to the Ising model [28],the"phase transition" they have spent on it,the number of followers they have of a spin involves both short-range interaction with its or the number of messages they have posted.Typically, nearest neighbors and long-term system evolution,and is we can simply divide all users into VIPs and ordinary a combined result.Inspired by that,we propose the co- users considering their levels.Intuitively,the social network operative propagation probability integrating Palb(t;k)with operators would try to avoid blocking the VIP users as Pind(t)in the form of a logistic function as possible as they can because of the impact they have on 1 the enormous followers and hence on the entire network. Pu(d=1+exp-B'psbt内+Pm可 On the other hand,from the perspective of the VIPs,since they usually have frequent interactions with their followers, 1 there is a high possibility that they can not tolerate to be 1+exp- B,--竖+六品可 blocked for a long time.As a result,the VIPs usually have a T() relatively low tolerance threshold of the blocked time. (5) Let C(u)denote the level of node u,and T(u)denote the where B1,B2 E(0,1)are the balance coefficients which tolerance threshold of node u when it is blocked.Then we satisfy B1+B2=1. propose the following expression: Based on this cooperative propagation probability,the 1 probability of node v getting activated at time stamp t;can T(u=c(西 (8) be given by The mechanism can be explained as follows.When the level Pr[s(G)=1刂=1-Π[1-su(G-)pun(tl, (6) of a user u,C(u),is approximate to zero,i.e.,the user has UEP just joined the social network,its tolerance threshold is close where Pr]represents probability,and P represents the to infinity,because even if it is blocked forever,it can just parent nodes of v. apply for a new account without any loss.On the contrary, if u is a VIP user,the higher its level is,the less blocked time it can endure.When C(u)is large enough,its tolerance will 4.2 User Experience Utility be asymptotic to zero. In the formulated optimization problem,one important There are several metrics to define the level or sig- constraint condition is the user experience utility function.nificance of a node in a social network,such as degree, Therefore,before giving the concrete algorithm,first,we eigenvector or betweenness.Also there is a widely adopt- elaborate on the user experience utility function. ed PageRank algorithm [45]which can be regarded as a It is a common sense that user experience is a critical variant of eigenvector centrality.For simplicity,we here use element in the success of modern business.A large variety the degree of a node u,D(u),to represent its level,i.e., of communication services involve user experience,such as C(u)=D(u),where is a constant coefficient.Thus,the web searching,telephone connecting,et.al.For customers user experience utility in a heterogeneous network can be using those services,the latency plays a tremendous role written as in their satisfaction extent,i.e.,the user experience utility. N Specifically,in our proposed problem,the user experience 11 >(1-SD(u)T6(u) (9) utility lies in the blocked time t of the selected node u. N In our work,we try to find the proper user experience utility function to characterize the impact of nodes being Given the user experience utility,now we analyze the blocked to the entire social network.Specifically,we analyze strategy of blocking nodes of different levels and its influ- both the homogeneous and heterogeneous scenarios. ence to the entire network.Our goal now is to select a certain Homogeneous Networks.For homogeneous social net- number of nodes and then block them so that the final active works,the simple case,we assume that all the nodes have nodes within a duration can be minimized.Nevertheless,it the same blocked time threshold Tih.In other words,we is a challenge to select the proper nodes to block without assume that all the users in the same social network have triggering the complaint of users.Let's assume that every same tolerance of the time being blocked regardless of the time a node is blocked longer than its tolerance threshold, time they have been in the social network.In this scenario, it will trigger a complaint to the operator.Besides,we also based on the user experience definition on web server in assume that a fraction of its neighbors will also trigger a [44],we define the user experience utility function as certain amount of complaints.For example,some users may attract many followers by frequently sharing interesting 1 information,so if they are blocked for a long time,and the Uhom N Th-To(u) N (7 followers fail to receive the information,it is likely that some u=l of them may lodge a complaint.In the following part,we where the Tih represents the tolerance time threshold,T(u)will use survival theory to analyze the process mentioned is used to record the blocked time of node u in the whole above
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 5 where k > 0 represents the degree of freedom, Γ(·) is the Gamma function. It explains a common social phenomenon that when a rumor spreads for a while, it may create a “rumor atmosphere” that could affect the judgements or decisions of users on online social networks. According to the Ising model [28], the “phase transition” of a spin involves both short-range interaction with its nearest neighbors and long-term system evolution, and is a combined result. Inspired by that, we propose the cooperative propagation probability integrating pglb(t; k) with pind(t) in the form of a logistic function as puv(t) = 1 1 + exp − β1 · pglb(t; k) + β2 · pind(t) = 1 1 + exp − β1 2 (1− k 2 ) t k−1e − t2 2 Γ( k 2 ) + β2 1 Dv p0 lg(10+t) , (5) where β1, β2 ∈ (0, 1) are the balance coefficients which satisfy β1 + β2 = 1. Based on this cooperative propagation probability, the probability of node v getting activated at time stamp tj can be given by P r[sv(tj ) = 1] = 1 − Y u∈Pv [1 − su(tj−1)puv(tj )], (6) where P r[·] represents probability, and Pv represents the parent nodes of v. 4.2 User Experience Utility In the formulated optimization problem, one important constraint condition is the user experience utility function. Therefore, before giving the concrete algorithm, first, we elaborate on the user experience utility function. It is a common sense that user experience is a critical element in the success of modern business. A large variety of communication services involve user experience, such as web searching, telephone connecting, et.al. For customers using those services, the latency plays a tremendous role in their satisfaction extent, i.e., the user experience utility. Specifically, in our proposed problem, the user experience utility lies in the blocked time t b u of the selected node u. In our work, we try to find the proper user experience utility function to characterize the impact of nodes being blocked to the entire social network. Specifically, we analyze both the homogeneous and heterogeneous scenarios. Homogeneous Networks. For homogeneous social networks, the simple case, we assume that all the nodes have the same blocked time threshold Tth. In other words, we assume that all the users in the same social network have same tolerance of the time being blocked regardless of the time they have been in the social network. In this scenario, based on the user experience definition on web server in [44], we define the user experience utility function as Uhom = 1 N X N u=1 Tth − Tb(u) Tth , (7) where the Tth represents the tolerance time threshold, Tb(u) is used to record the blocked time of node u in the whole propagation process. Heterogeneous Networks. In heterogeneous social networks, different nodes have distinct properties. For example, in real world social networks like Twitter or Weibo, different users have different levels according to the time they have spent on it, the number of followers they have or the number of messages they have posted. Typically, we can simply divide all users into VIPs and ordinary users considering their levels. Intuitively, the social network operators would try to avoid blocking the VIP users as possible as they can because of the impact they have on the enormous followers and hence on the entire network. On the other hand, from the perspective of the VIPs, since they usually have frequent interactions with their followers, there is a high possibility that they can not tolerate to be blocked for a long time. As a result, the VIPs usually have a relatively low tolerance threshold of the blocked time. Let L(u) denote the level of node u, and T (u) denote the tolerance threshold of node u when it is blocked. Then we propose the following expression: T (u) = 1 L(u) . (8) The mechanism can be explained as follows. When the level of a user u, L(u), is approximate to zero, i.e., the user has just joined the social network, its tolerance threshold is close to infinity, because even if it is blocked forever, it can just apply for a new account without any loss. On the contrary, if u is a VIP user, the higher its level is, the less blocked time it can endure. When L(u) is large enough, its tolerance will be asymptotic to zero. There are several metrics to define the level or significance of a node in a social network, such as degree, eigenvector or betweenness. Also there is a widely adopted PageRank algorithm [45] which can be regarded as a variant of eigenvector centrality. For simplicity, we here use the degree of a node u, D(u), to represent its level, i.e., L(u) = ζD(u), where ζ is a constant coefficient. Thus, the user experience utility in a heterogeneous network can be written as Uhet = 1 N X N u=1 (1 − ζD(u)Tb(u)). (9) Given the user experience utility, now we analyze the strategy of blocking nodes of different levels and its influence to the entire network. Our goal now is to select a certain number of nodes and then block them so that the final active nodes within a duration can be minimized. Nevertheless, it is a challenge to select the proper nodes to block without triggering the complaint of users. Let’s assume that every time a node is blocked longer than its tolerance threshold, it will trigger a complaint to the operator. Besides, we also assume that a fraction of its neighbors will also trigger a certain amount of complaints. For example, some users may attract many followers by frequently sharing interesting information, so if they are blocked for a long time, and the followers fail to receive the information, it is likely that some of them may lodge a complaint. In the following part, we will use survival theory to analyze the process mentioned above.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 4.3 Objective Formulation 5.1.1 Survival Function Now our goal is to minimize the influence of a rumor as In order to analyze the likelihood of nodes getting activated much as possible (e.g.minimize the number of activated during one time slot in a cascade,we adopt the Survival nodes at the end of propagation process)under the con- Theory to calculate the probability of a single node v getting straint of user experience utility.We formulate the DRIMUX activated in a given time period.First,we introduce the problem as follows: survival function defined as [46] min [∑s(T] S(t)=Pr(tUth, occurrence time of an event of interest,t is a specified where s(T)represents the state of a node v by end of a constant.The survival function represents the probabili- ty that the event of interest occurs after the observation time intervalT as denoted in Equation(6).E[s(T)] "deadline".If we use the terminology "death"to represent denotes the expected number of activated nodes by the the occurrence of the event,we can claim that the target end of T.The constraint condition restrains the user expe- rience above a threshold Uth.In the following section we "survives"if its occurrence takes place after the specified time t.Then we have the cumulative distribution function will discuss how the user experience constraint affects the F(t): activation likelihood of each node. F(t)=Pr(T≤t)=1-S(t) (12) Accordingly,the probability density function f(t)is given PROPOSED SOLUTIONS by In this section,we analyze the DRIMUX optimization prob- f(t)=d lem from the perspective of a network inference problem F=-S. (13) with survival theory and then propose the greedy algorithm Alternatively,there is another method named hazard and dynamic blocking algorithm based on different nodes rate to express the instant activation rate of node v.Specif- selection schemes and the maximum likelihood principle. ically,the hazard rate indicates the probability of a single node with current state s(t)getting activated at time (t+dt). 5.1 Survival Theory We define (ts(t))as the hazard rate of node v condi- tioned on the set of nodes activated by time t.Our goal now In this section,we analyze the likelihood of nodes getting is to analyze the impact of the hazard rate of different nodes activated during each time slot in the process of rumor to the rumor influence minimization problem. propagation using the Survival Theory.Firstly,in our model, we assume that a rumor has been spreading for some time 5.1.2 Hazard Rate before it is detected at time to.Specifically,we assume Based on the above analysis,the hazard rate can be viewed that when the ratio of the infected nodes reaches a certain threshold,it would draw the attention of the monitoring as an alternative interpretation of the distribution of T, which characterizes the instantaneous rate of occurrence of department and then be detected whether it is a rumor or an event.It is defined as: not.If it is a rumor,relevant blocking strategies would be initiated to block it.Mathematically,we use I(t)to denote a(ts(t))=lim Pr(t≤T≤t+dtT>t) the ratio of infected nodes in the social network at time t. dt-0 dt It is also assumed that by time to,there have already been a total number of N1 activated nodes,and N2 lim Pr(t≤T≤t+dt) dt-0 Pr(T>t)dt N-Ni nodes remain inactive.Let VN,and VN2 denote the F(t+dt)-F(t) set of activated and inactive nodes at time to respectively. lim dt-0 S(t)dt Therefore,from to on,the system can be viewed as N 1 independent cascades propagating through the network, lim F(t+dt)-F(t) and our goal is to select K nodes from N2 and block them S(t)t→0 dt so that the final number of activated nodes during the =- S'() (14) observation time window T can be minimized. S(t) S(t) LetC=(c1,...,cN)denote the set of cascades triggered by Ni activated nodes by time to.A cascade ci C where S'(t)is the derivative of S(t).Pr(t T can be represented by a N-dimensional time vector t= t)denotes the conditional probability that the event of (t,...,t),where tE [to,to+T]uoo},j=1,2,...,N2 interest will occur in time period [t,t+dt)given that it has is the activated time of node j in cascade ci.The observation not occurred before time t. time window T is decided by the user experience utility Accordingly,we can have constraint mentioned in(10),and oo means the node is not S(t)=e-oa(rls())dr (15) activated until the end of the observation time (to +T). Here,we first consider the propagation process of only one and for a certain node v,according to (12),we have cascade and then the results can be extended to the scenario of multiple cascades. Fu(ts(t))=1-e-(rls())dr (16)
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 6 4.3 Objective Formulation Now our goal is to minimize the influence of a rumor as much as possible (e.g. minimize the number of activated nodes at the end of propagation process) under the constraint of user experience utility. We formulate the DRIMUX problem as follows: min E[ X v∈V sv(T) ] s. t. Uhom(Uhet) ≥ Uth, (10) where sv(T) represents the state of a node v by end of a time interval T as denoted in Equation (6). E[ P v∈V sv(T) ] denotes the expected number of activated nodes by the end of T. The constraint condition restrains the user experience above a threshold Uth. In the following section we will discuss how the user experience constraint affects the activation likelihood of each node. 5 PROPOSED SOLUTIONS In this section, we analyze the DRIMUX optimization problem from the perspective of a network inference problem with survival theory and then propose the greedy algorithm and dynamic blocking algorithm based on different nodes selection schemes and the maximum likelihood principle. 5.1 Survival Theory In this section, we analyze the likelihood of nodes getting activated during each time slot in the process of rumor propagation using the Survival Theory. Firstly, in our model, we assume that a rumor has been spreading for some time before it is detected at time t0. Specifically, we assume that when the ratio of the infected nodes reaches a certain threshold, it would draw the attention of the monitoring department and then be detected whether it is a rumor or not. If it is a rumor, relevant blocking strategies would be initiated to block it. Mathematically, we use I(t) to denote the ratio of infected nodes in the social network at time t. It is also assumed that by time t0, there have already been a total number of N1 activated nodes, and N2 = N − N1 nodes remain inactive. Let VN1 and VN2 denote the set of activated and inactive nodes at time t0 respectively. Therefore, from t0 on, the system can be viewed as N1 independent cascades propagating through the network, and our goal is to select K nodes from N2 and block them so that the final number of activated nodes during the observation time window T can be minimized. Let C = (c1, . . . , cN1 ) denote the set of cascades triggered by N1 activated nodes by time t0. A cascade ci ∈ C can be represented by a N-dimensional time vector t ci = (t ci 1 , . . . , tci N2 ), where t ci j ∈ [t0, t0+T]∪{∞}, j = 1, 2, . . . , N2 is the activated time of node j in cascade ci . The observation time window T is decided by the user experience utility constraint mentioned in (10), and ∞ means the node is not activated until the end of the observation time (t0 + T). Here, we first consider the propagation process of only one cascade and then the results can be extended to the scenario of multiple cascades. 5.1.1 Survival Function In order to analyze the likelihood of nodes getting activated during one time slot in a cascade, we adopt the Survival Theory to calculate the probability of a single node v getting activated in a given time period. First, we introduce the survival function defined as [46] S(t) = P r(t t) dt = lim dt→0 P r(t ≤ T ≤ t + dt) P r(T > t)dt = lim dt→0 F(t + dt) − F(t) S(t)dt = 1 S(t) lim dt→0 F(t + dt) − F(t) dt = f(t) S(t) = − S 0 (t) S(t) , (14) where S 0 (t) is the derivative of S(t). P r(t ≤ T ≤ t+dt|T > t) denotes the conditional probability that the event of interest will occur in time period [t, t+dt) given that it has not occurred before time t. Accordingly, we can have S(t) = e − R t 0 αv(τ|s(τ))dτ , (15) and for a certain node v, according to (12),we have Fv(t|s(t)) = 1 − e − R t 0 αv(τ|s(τ))dτ . (16)
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 5.2 Problem Solutions 5.2.1 Greedy Algorithm Based on the survival analysis,we propose an additive The proposed Greedy algorithm tries to block the rumor survival model where we assume that the probability of as fast as possible to prevent the rumor from further prop- node v getting activated is the weighted summation of agation.The working mechanism is as following:At time the propagation probabilities mentioned in(5)of all the to when we detect the rumor,we immediately select all K previously activated nodes set fu tuauupav(t), Mathematically,the Greedy algorithm aims to minimize the (17 likelihood of inactive nodes getting activated at ti,i.e.,the u:tu<t next time stamp after the rumor is detected.The likelihood where ao=(auv),u =1,2,...,N is a non-negative pa- of nodes getting activated at time ti is given by rameter vector indicating the existence of the edge between node u and v.ouv=1 if there is an edge between them;and ftls(to》=Π QuvPuv(t1)X auv=0,otherwise. veVw2u:tu≤to We then define a coefficient matrix A:=[]ERNXN Ⅱeaeu房pen(rdrj (21) to denote the structure of the network in terms of the e:te≤to connection between any pair of nodes in the network.Let Ao be the original network coefficient matrix before any Correspondingly,the objective function is nodes are blocked.By substituting a(s(T))in (16)with minf(tls(to)】 (17),then we can have: (22) s.t.auw∈{0,1} F(ts(t))=1-e wPuu(r)dr Then,the greedy algorithm is presented as below: Algorithm 1 Greedy Algorithm =l-Ⅱeaue.mu(r)dr Input:Initial Edge matrix Ao. u:tu<t =l-e2…p()ir Initialization:VB =0. for i=1 to K do (18) u=argm[fts(to):A-i)-fts(to)片Ai-1l】 Accordingly,we have the likelihood function of the activa- Ai:=Ai-1\u, tion of node v,f(tls(t)),as following: VBVB Ufu). end for fu(ts(t))= dF(ts(t)) dt Output:VB. aupuu(t)I eaovfio Pou(dr .(19) u:tu<t e:te<t Given the activation likelihood of a single inactive node 5.2.2 Dynamic Blocking Algorithm v E VN2,now we consider any number of inactive nodes Different from the greedy blocking algorithm,which is a in a cascade,during the entire observation window T, type of static blocking algorithm,we propose a dynamic tsT=(t1,,t,,tNlto≤t≤to+T).We assume that rumor blocking algorithm aiming to incrementally block the every activation is conditionally independent on activations selected nodes instead of blocking them at once.In that case, occurring later given previous activations.Then we can the blocking strategy is split into several rounds and each compute the activation likelihood as: round can be regarded as a greedy algorithm.Thus,how to choose the number of rounds is also very important for the f(t≤T;A)=Π∑QuvPuv(t)× algorithm.In the following part,we will elaborate on the al- i:ti<Tu:tu<ti gorithm design and how we choose the specific parameters. Ⅱeooe i pes(rdrj (20) From the probabilistic perspective,we seek to formulate the likelihood of inactive nodes becoming activated in every p:te<ti round of rumor blocking.Correspondingly,the likelihood Based on the activation likelihood function,we then function is given by design the blocking algorithms.First,we choose to select and block all K nodes at the same time to.As is shown f6ls(,传-》=Π∑auPa传)与× in Eq.(20),the activation likelihood of an inactive node veV(t与)utu<t) v is related to the hazard rate coming from all previously activated nodes.Therefore,the early activated nodes play Ⅱeoeed Pew(rdrj (23) a significant role in the entire process.Hence,we propose e:te<tj the following greedy algorithm to minimize the influence where V(t;)represents the set of nodes which remain of the rumor within one time stamp after it is detected.We inactive at time stamp tj. assume that there are M time steps:t1,...,tM during the Different from the case of greedy algorithm,the objective whole observation window T,with each time step lasting function of the dynamic blocking algorithm is implemented T/M. in several rounds.In each round,it is similar to equation (22).We assume that there are n rounds of blocking in
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 7 5.2 Problem Solutions Based on the survival analysis, we propose an additive survival model where we assume that the probability of node v getting activated is the weighted summation of the propagation probabilities mentioned in (5) of all the previously activated nodes set {u : tu < t}. Thus, in our context, the hazard rate is given by αv(t|s(t)) = α T v s(t) = X u:tu<t αuvpuv(t), (17) where αv = (αuv), u = 1, 2, . . . , N is a non-negative parameter vector indicating the existence of the edge between node u and v. αuv = 1 if there is an edge between them; and αuv = 0, otherwise. We then define a coefficient matrix A := [αv] ∈ R N×N + to denote the structure of the network in terms of the connection between any pair of nodes in the network. Let A0 be the original network coefficient matrix before any nodes are blocked. By substituting αv(τ |s(τ )) in (16) with (17), then we can have: Fv(t|s(t)) =1 − e − R t tu P u:tu<t αuvpuv(τ)dτ =1 − e − P u:tu<t R t tu αuvpuv(τ)dτ =1 − Y u:tu<t e −αuv R t tu puv(τ)dτ =1 − e − P u:tu<t αuv R t tu puv(τ)dτ (18) Accordingly, we have the likelihood function of the activation of node v, fv(t|s(t)), as following: fv(t|s(t)) =dFv(t|s(t)) dt = X u:tu<t αuvpuv(t) Y %:t%<t e −α%v R t t% p%v(τ)dτ . (19) Given the activation likelihood of a single inactive node v ∈ VN2 , now we consider any number of inactive nodes in a cascade, during the entire observation window T, t ≤T = (t1, . . . , ti , . . . , tN |t0 ≤ ti ≤ t0 + T). We assume that every activation is conditionally independent on activations occurring later given previous activations. Then we can compute the activation likelihood as: f(t ≤T ; A) = Y i:ti<T X u:tu<ti αuvpuv(ti)× Y %:t%<ti e −α%v R ti t% p%v(τ)dτ . (20) Based on the activation likelihood function, we then design the blocking algorithms. First, we choose to select and block all K nodes at the same time t0. As is shown in Eq. (20), the activation likelihood of an inactive node v is related to the hazard rate coming from all previously activated nodes. Therefore, the early activated nodes play a significant role in the entire process. Hence, we propose the following greedy algorithm to minimize the influence of the rumor within one time stamp after it is detected. We assume that there are M time steps: t1, . . . , tM during the whole observation window T, with each time step lasting T /M. 5.2.1 Greedy Algorithm The proposed Greedy algorithm tries to block the rumor as fast as possible to prevent the rumor from further propagation. The working mechanism is as following: At time t0 when we detect the rumor, we immediately select all K nodes in our budget and block them (i.e., remove all the links of it so that it can not communicate with its neighbors). Mathematically, the Greedy algorithm aims to minimize the likelihood of inactive nodes getting activated at t1, i.e., the next time stamp after the rumor is detected. The likelihood of nodes getting activated at time t1 is given by f(t1|s(t0)) = Y v∈VN2 X u:tu≤t0 αuvpuv(t1)× Y %:t%≤t0 e −α%v R t1 t% p%v(τ)dτ . (21) Correspondingly, the objective function is min A f(t1|s(t0)) s. t. αuv ∈ {0, 1}. (22) Then, the greedy algorithm is presented as below: Algorithm 1 Greedy Algorithm Input: Initial Edge matrix A0. Initialization: VB = ∅. for i = 1 to K do u = arg max v∈V [f(t1|s(t0); Ai−1) − f(t1|s(t0); Ai−1\v)] Ai := Ai−1\u, VB = VB ∪ {u}. end for Output: VB. 5.2.2 Dynamic Blocking Algorithm Different from the greedy blocking algorithm, which is a type of static blocking algorithm, we propose a dynamic rumor blocking algorithm aiming to incrementally block the selected nodes instead of blocking them at once. In that case, the blocking strategy is split into several rounds and each round can be regarded as a greedy algorithm. Thus, how to choose the number of rounds is also very important for the algorithm. In the following part, we will elaborate on the algorithm design and how we choose the specific parameters. From the probabilistic perspective, we seek to formulate the likelihood of inactive nodes becoming activated in every round of rumor blocking. Correspondingly, the likelihood function is given by f(tj |s(tj−1)) = Y v∈V (tj ) X u:tu<tj αuvpuv(tj )× Y %:t%<tj e −α%v R tj t% p%v(τ)dτ , (23) where V (tj ) represents the set of nodes which remain inactive at time stamp tj . Different from the case of greedy algorithm, the objective function of the dynamic blocking algorithm is implemented in several rounds. In each round, it is similar to equation (22). We assume that there are n rounds of blocking in
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 8 total,and for round i,i=1,2....,n,we have the objective enumerate all users and pick the one with the largest function as degree,which requires traversing all connections once minf(tls(ti-1)》 A (24) for each of K iterations.Here we have K <<E. s.t. auw∈{0,1} In contrast,for our proposed greedy algorithm,most of the time cost lies in the loop of updating the max- It is noticeable that the proposed greedy algorithm(22)is imum likelihood function f.Consequently,in order to a special case of the dynamic blocking algorithm(24)with calculate f,we have to go through every user and n =1.Accordingly,the dynamic blocking algorithm can be connection at most once,which has the time complexity presented as following: of O(V+E)=O(E).Subsequently,in order to pick out the candidate node u in each iteration,we need to Algorithm 2 Dynamic Blocking Algorithm repeat calculating f for O(V)times.Combining these Input:Initial Edge matrix Ao factors,we can conclude that the total time complexity Initialization:VB(t)=0. for our proposed greedy algorithm is O(KEV).Sim- for j=1 ton do ilarly,for the dynamic blocking algorithm,which can for i=1 to kj do be viewed as an online algorithm,we decompose the △f=f(tls(t-1);A-1)-f(ts(t与-1A-1v) selection of candidate nodes into n epochs and in the u=arg ma{△f}, j-th (j=1,2,...n)epoch,we select kj nodes to block, Ai:=Ai-1\u, whereK.In our algorithm,for simplicity, VB(tj)=VB(tj)Ufu). we choose;=2(j)*K.Then the time complexity in end for each epoch can be calculated as O(EV).Finally by end for combining the time complexity of every epoch together, Output:VB(t). we obtain that the entire algorithm runs in O(KEV) time. The dynamic blocking algorithm runs as follows:at Additionally,based on the above analysis,we conclude that the very first stage of blocking,we select a number ki the main difference between the time complexity of our nodes to block based on the Edge matrix and previously proposed algorithm and the baseline greedy algorithm lies infected nodes;in the next round,we move forward with in the value of V.In order to achieve better performance the rumor diffusion,and then use the updated status to at relatively low cost,it is better to implement the algorithm block additional k2 nodes.The blocking process continues in networks with smaller V.In reality,when a rumor is at each following instants until the budget runs out at a moment tn which can be expressed as=K. generated,we expect to locate it in a short time and conduct the blocking algorithms in a local community that contains In real implementation,we decrease kj as time goes by, all the infected nodes.In that case,the value of V can be and a practical example iskj 2(-)*K.Instead of lowered to the largest extent by limiting the target network blocking K candidates at the moment of detection,as pre- to a local community instead of the entire social network. vious static blocking strategies do,this dynamic approach is carried out in a progressive way.The design philosophy is to take advantage of instantaneous information all along 7 EXPERIMENTS the dissemination,since this the activation likelihood of a Dataset:We use three datasets to verify the effectiveness of given moment is a variable which depends on the temporal our proposed algorithms.They are extracted from the real Edge matrix and previous status.Rather than sparing all world large scale social networks such as Facebook,Twitter the efforts at once,we apply consequent force to block and Sina Weibo.Details of the datasets description are listed the diffusion of rumors.In this way,the global efficiency in Table 1. outweighs the previous static decisions. Notations:Let BlkPer denote the percentage of blocked nodes in all the nodes in the social network.In our simula- 6 TIME COMPLEXITY ANALYSIS tion,we set BlkPer to three values as 1%,2%and 10%.To be In this section,based on the above algorithms we propose, noticed,in realistic large-scale social networks with millions we present the time complexity analysis of different algo- or even billions of users,1%may account for up to millions rithms. of users and blocking them is not realistic.However,in Proposition 1.The time complexity of the classic greedy real world rumor blocking,it is impossible to accomplish algorithm,the proposed greedy algorithm and the any blocking mechanism in a global network scale.It is proposed dynamic blocking algorithm is O(KE), more reasonable to first locate the rumour source and divide O(KEV)and O(KEV)respectively,where K<< the entire network into communities according to certain E. attributes of the rumour source (e.g.the geographical in- formation,related friends,etc),and then conduct blocking Proof 1.According to the algorithms we give in the original strategies within a target community with the highest risks. version of our paper,the analysis of their time complex- Then,the scale of the target network would be much smaller ity is given as follows.Assume that we perform our than the entire network and thus our parameter settings blocking algorithms in a social network graph G(V,E) would have more realistic sense. of V users and E connections.The baseline of classic In all figures,the vertical dashed line is plotted to denote greedy algorithm has time complexity of O(KE)as we the time when the rumor is detected and our blocking
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 8 total, and for round i, i = 1, 2, . . . , n, we have the objective function as min A f(ti |s(ti−1)) s. t. αuv ∈ {0, 1}. (24) It is noticeable that the proposed greedy algorithm (22) is a special case of the dynamic blocking algorithm (24) with n = 1. Accordingly, the dynamic blocking algorithm can be presented as following: Algorithm 2 Dynamic Blocking Algorithm Input: Initial Edge matrix A0. Initialization: VB(t) = ∅. for j = 1 to n do for i = 1 to kj do ∆f = f(tj |s(tj−1); Ai−1) − f(tj |s(tj−1); Ai−1\v), u = arg max v∈V {∆f }, Ai := Ai−1\u, VB(tj ) = VB(tj ) ∪ {u}. end for end for Output: VB(t). The dynamic blocking algorithm runs as follows: at the very first stage of blocking, we select a number k1 nodes to block based on the Edge matrix and previously infected nodes; in the next round, we move forward with the rumor diffusion, and then use the updated status to block additional k2 nodes. The blocking process continues at each following instants until the budget runs out at a moment tn, which can be expressed as Pn j=1 kj = K. In real implementation, we decrease kj as time goes by, and a practical example is kj = 2(−j) ∗ K. Instead of blocking K candidates at the moment of detection, as previous static blocking strategies do, this dynamic approach is carried out in a progressive way. The design philosophy is to take advantage of instantaneous information all along the dissemination, since this the activation likelihood of a given moment is a variable which depends on the temporal Edge matrix and previous status. Rather than sparing all the efforts at once, we apply consequent force to block the diffusion of rumors. In this way, the global efficiency outweighs the previous static decisions. 6 TIME COMPLEXITY ANALYSIS In this section, based on the above algorithms we propose, we present the time complexity analysis of different algorithms. Proposition 1. The time complexity of the classic greedy algorithm, the proposed greedy algorithm and the proposed dynamic blocking algorithm is O(K|E|), O(K|E||V |) and O(K|E||V |) respectively, where K << |E|. Proof 1. According to the algorithms we give in the original version of our paper, the analysis of their time complexity is given as follows. Assume that we perform our blocking algorithms in a social network graph G(V, E) of |V | users and |E| connections. The baseline of classic greedy algorithm has time complexity of O(K|E|) as we enumerate all users and pick the one with the largest degree, which requires traversing all connections once for each of K iterations. Here we have K << |E|. In contrast, for our proposed greedy algorithm, most of the time cost lies in the loop of updating the maximum likelihood function f. Consequently, in order to calculate f, we have to go through every user and connection at most once, which has the time complexity of O(|V | + |E|) = O(|E|). Subsequently, in order to pick out the candidate node u in each iteration, we need to repeat calculating f for O(|V |) times. Combining these factors, we can conclude that the total time complexity for our proposed greedy algorithm is O(K|E||V |). Similarly, for the dynamic blocking algorithm, which can be viewed as an online algorithm, we decompose the selection of candidate nodes into n epochs and in the j-th (j = 1, 2, ...n) epoch, we select kj nodes to block, where Pn j=1 kj = K. In our algorithm, for simplicity, we choose kj = 2(−j) ∗ K. Then the time complexity in each epoch can be calculated as O(kj |E||V |). Finally by combining the time complexity of every epoch together, we obtain that the entire algorithm runs in O(K|E||V |) time. Additionally, based on the above analysis, we conclude that the main difference between the time complexity of our proposed algorithm and the baseline greedy algorithm lies in the value of |V |. In order to achieve better performance at relatively low cost, it is better to implement the algorithm in networks with smaller |V |. In reality, when a rumor is generated, we expect to locate it in a short time and conduct the blocking algorithms in a local community that contains all the infected nodes. In that case, the value of |V | can be lowered to the largest extent by limiting the target network to a local community instead of the entire social network. 7 EXPERIMENTS Dataset: We use three datasets to verify the effectiveness of our proposed algorithms. They are extracted from the real world large scale social networks such as Facebook, Twitter and Sina Weibo. Details of the datasets description are listed in Table 1. Notations: Let BlkPer denote the percentage of blocked nodes in all the nodes in the social network. In our simulation, we set BlkPer to three values as 1%, 2% and 10%. To be noticed, in realistic large-scale social networks with millions or even billions of users, 1% may account for up to millions of users and blocking them is not realistic. However, in real world rumor blocking, it is impossible to accomplish any blocking mechanism in a global network scale. It is more reasonable to first locate the rumour source and divide the entire network into communities according to certain attributes of the rumour source (e.g. the geographical information, related friends, etc), and then conduct blocking strategies within a target community with the highest risks. Then, the scale of the target network would be much smaller than the entire network and thus our parameter settings would have more realistic sense. In all figures, the vertical dashed line is plotted to denote the time when the rumor is detected and our blocking
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING TABLE 1 Data Sets Description Data Sets Facebook Twitter Sina weibo Number of nodes 4036 81306 1364548 Number of edges 88234 1768149 31261651 Average degree 21.85 21.74 22.91 Number of connected components 36 278 4631 Average clustering coefficient 0.5731 0.5653 0.5974 strategies start working.All the parameters are selected up to 10%,the infection ratios are lowered by approximately based on empirical results that approximate the realistic 10%,20%and 30%for the classic greedy,proposed greedy scenario.In the experiment,three algorithms are presented and dynamic blocking algorithm respectively. for comparison which are listed as follows: One phenomenon needs to be noticed that at the very Classic Greedy Algorithm:Greedy algorithm based beginning point of blocking strategies,the proposed maxi- on descendant order of nodes degree and is used as mum likelihood greedy algorithm (i.e.the blue curve)per- forms slightly better than the dynamic blocking algorithm the baseline algorithm. (i.e.the red curve).Then after a certain amount of iterations Proposed Greedy Algorithm:the order is deter- mined by the maximum likelihood function.By the red curve surpasses the blue curve and stays better than blocking a node,we can generate a new propagation it ever since.The underlying reason is that in the dynamic blocking algorithm,the dynamic property is revealed from matrix and reach a new maximum survival likeli- hood value. the early stage of rumor blocking.In detail,during the Dynamic Blocking Algorithm:This algorithm ad- initial period of rumor blocking(approximately from 15th justs to each propagation status,and gradually in- to 25th iteration),the blue curve lies under the red one, cludes new targeted nodes as long as the cost is which indicates the slower propagation rate for the static within the scope of tolerable user experience. schema.This can be explained by the fact that the dynamic algorithm does block fewer nodes than the static greedy In Figure 4,we present the simulation results on the algorithm at the initial stage of first several iterations.How- Facebook,Twitter and Sina Weibo datasets respectively. ever,after a certain amount of time (about 25 iterations), Specifically,for each algorithm in each dataset,we repeat the dynamic blocking algorithm dominates the proposed the propagation process for 1000 times and take the average greedy algorithm because the dynamic strategy considers value as the general feature.The black curve stands for clas- the dynamic variation of the topology of the social network sic greedy algorithm,blue one for our proposed maximum in every iteration and constantly introduce new seeds for likelihood greedy algorithm,which blocks all candidates at blocking.In that case,the dynamic blocking algorithm can once and use a metric different from traditional approach, be regarded as an iterative greedy algorithm. and finally red one for the dynamic blocking algorithm.Ob- In the Twitter and Sina Weibo datasets simulation results viously,from all the figures,we can see that for all the social shown in Figure 4,we can see the similar results to those in network datasets,the rumor infection ratio is decreased to the Facebook dataset.The slight difference between them different degrees after the introduction of rumor blockage may be caused by the different topologies of the social strategies.As is shown in the results,according to the final networks.From the dataset description in Table 1,we know infection ratio,the proposed dynamic blocking algorithm that the Sina Weibo network dataset has much more nodes performs the best of all the blocking strategies,since the than the Facebook and Twitter datasets.There are also infection ratio (i.e.,the number of nodes infected by the some differences in average degree,number of connected rumor)is minimized at the end of propagation under this components and average clustering coefficient,which could schema. influence the rumor propagation process and thus the final From all three datasets,we can see that with the blocking infection ratio in the network.For instance,for Sina Weibo percentage of all the nodes increasing from 1%to 10%,the dataset with slightly higher average degree,the network has performances of different blocking strategies vary distinctly. a larger density than the other two.In that case,it is easier In general,for each algorithm,we can see the trend that for rumor to spread through the network and accelerate the the infection ratio tends to be lower with the percentage rumor contagion process.This analysis can be verified by of nodes being blocked becoming higher.However,for the normal rumor propagation curve(i.e.the green curve) different algorithms,the degrees of that trend are different. especially at the very initial stage of rumor propagation. Specifically,the dynamic blocking algorithm has relatively We can see the slope of the green curve in the Sina Weibo the sharpest trend compared to the classic and proposed dataset is slightly higher than the other two,which indicates greedy algorithms.In the Facebook dataset,when the block- the higher average degree and connected components lead ing percentage rises from 1%to 2%,the infection ratio to faster rumor contagion.After the beginning phase,when barely changes for the classic greedy algorithm.In contrast, the infection ratio reaches a certain threshold,the difference the proposed greedy algorithm obtains approximately a between different network topologies becomes minor.On 10%improvement in the infection ratio.For the dynamic the other hand,when it comes to rumor blocking,the higher blocking algorithm,the blocking performance is even better. average degree and connected components enable more In addition,when the blocking percentage becomes higher effective blockage due to its higher density
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 9 TABLE 1 Data Sets Description Data Sets Facebook Twitter Sina Weibo Number of nodes 4036 81306 1364548 Number of edges 88234 1768149 31261651 Average degree 21.85 21.74 22.91 Number of connected components 36 278 4631 Average clustering coefficient 0.5731 0.5653 0.5974 strategies start working. All the parameters are selected based on empirical results that approximate the realistic scenario. In the experiment, three algorithms are presented for comparison which are listed as follows: • Classic Greedy Algorithm: Greedy algorithm based on descendant order of nodes degree and is used as the baseline algorithm. • Proposed Greedy Algorithm: the order is determined by the maximum likelihood function. By blocking a node, we can generate a new propagation matrix and reach a new maximum survival likelihood value. • Dynamic Blocking Algorithm: This algorithm adjusts to each propagation status, and gradually includes new targeted nodes as long as the cost is within the scope of tolerable user experience. In Figure 4, we present the simulation results on the Facebook, Twitter and Sina Weibo datasets respectively. Specifically, for each algorithm in each dataset, we repeat the propagation process for 1000 times and take the average value as the general feature. The black curve stands for classic greedy algorithm, blue one for our proposed maximum likelihood greedy algorithm, which blocks all candidates at once and use a metric different from traditional approach, and finally red one for the dynamic blocking algorithm. Obviously, from all the figures, we can see that for all the social network datasets, the rumor infection ratio is decreased to different degrees after the introduction of rumor blockage strategies. As is shown in the results, according to the final infection ratio, the proposed dynamic blocking algorithm performs the best of all the blocking strategies, since the infection ratio (i.e., the number of nodes infected by the rumor) is minimized at the end of propagation under this schema. From all three datasets, we can see that with the blocking percentage of all the nodes increasing from 1% to 10%, the performances of different blocking strategies vary distinctly. In general, for each algorithm, we can see the trend that the infection ratio tends to be lower with the percentage of nodes being blocked becoming higher. However, for different algorithms, the degrees of that trend are different. Specifically, the dynamic blocking algorithm has relatively the sharpest trend compared to the classic and proposed greedy algorithms. In the Facebook dataset, when the blocking percentage rises from 1% to 2%, the infection ratio barely changes for the classic greedy algorithm. In contrast, the proposed greedy algorithm obtains approximately a 10% improvement in the infection ratio. For the dynamic blocking algorithm, the blocking performance is even better. In addition, when the blocking percentage becomes higher up to 10%, the infection ratios are lowered by approximately 10%, 20% and 30% for the classic greedy, proposed greedy and dynamic blocking algorithm respectively. One phenomenon needs to be noticed that at the very beginning point of blocking strategies, the proposed maximum likelihood greedy algorithm (i.e. the blue curve) performs slightly better than the dynamic blocking algorithm (i.e. the red curve). Then after a certain amount of iterations, the red curve surpasses the blue curve and stays better than it ever since. The underlying reason is that in the dynamic blocking algorithm, the dynamic property is revealed from the early stage of rumor blocking. In detail, during the initial period of rumor blocking (approximately from 15th to 25th iteration), the blue curve lies under the red one, which indicates the slower propagation rate for the static schema. This can be explained by the fact that the dynamic algorithm does block fewer nodes than the static greedy algorithm at the initial stage of first several iterations. However, after a certain amount of time (about 25 iterations), the dynamic blocking algorithm dominates the proposed greedy algorithm because the dynamic strategy considers the dynamic variation of the topology of the social network in every iteration and constantly introduce new seeds for blocking. In that case, the dynamic blocking algorithm can be regarded as an iterative greedy algorithm. In the Twitter and Sina Weibo datasets simulation results shown in Figure 4, we can see the similar results to those in the Facebook dataset. The slight difference between them may be caused by the different topologies of the social networks. From the dataset description in Table 1, we know that the Sina Weibo network dataset has much more nodes than the Facebook and Twitter datasets. There are also some differences in average degree, number of connected components and average clustering coefficient, which could influence the rumor propagation process and thus the final infection ratio in the network. For instance, for Sina Weibo dataset with slightly higher average degree, the network has a larger density than the other two. In that case, it is easier for rumor to spread through the network and accelerate the rumor contagion process. This analysis can be verified by the normal rumor propagation curve (i.e. the green curve) especially at the very initial stage of rumor propagation. We can see the slope of the green curve in the Sina Weibo dataset is slightly higher than the other two, which indicates the higher average degree and connected components lead to faster rumor contagion. After the beginning phase, when the infection ratio reaches a certain threshold, the difference between different network topologies becomes minor. On the other hand, when it comes to rumor blocking, the higher average degree and connected components enable more effective blockage due to its higher density
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING .2 料料 6 .6 Classic Greedy Classie Greedy -D小鱼i长Sehc组a 100 60 120 20 ”” 110 (a)Facebook,BlkPer=1% (b)Facebook,BlkPer=2% (c)Facebook,BlkPer=10% 4168888885 0.3 0.8 0.7 Normal Preps2a —Normal Propagation -Classie Greedy -出ssie Greed 3 21 30 ”me 100 12n 20 100 (d)Twitter,BlkPer=1% (e)Twitter,BlkPer=2% (f)Twitter,BlkPer=10% 03 -25 25 日 0.2 a.1 -D 6 10 20 0 100 120 100 114 (g)Sina,BlkPer=1% (h)Sina,BlkPer=2% (i)Sina,BlkPer=10% Fig.4.The experimental results of the rumor infection ratio with propagation iterations under different blocking algorithms in the Facebook ((a),(b),(c)),Twitter((d),(e),(f))and Sina Weibo ((g),(h),(i))dataset respectively.The blocking percentage of all the nodes in the social network is set to 1%,2%and 10%for each dataset. lassie G 0.2 0.14 012 022 1.013 0.2 .2 .01 0202 (a)Facebook (b)Twitter (c)Sina Fig.5.Stationary rumor infection ratio under different blocking algorithms with different blocking ratios on the Facebook,Twitter and Sina Weibo datasets respectively.The blocking ratio ranges from 1%to2%with an interval of 0.1%,which shows the sensitivity of different blocking algorithms. TABLE 2 False Positive Ratio in Nodes Blocking Data Sets Facebook Twitter Sina Weibo Classic Greedy 42% 40% 70% Proposed Greedy 34% 36% 64% Dynamic Schema 14% 18% 34%
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 10 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 1.2 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (a) Facebook, BlkPer=1% 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 1.2 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (b) Facebook, BlkPer=2% 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 1.2 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (c) Facebook, BlkPer=10% 0 20 40 60 80 100 120 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (d) Twitter, BlkPer=1% 0 20 40 60 80 100 120 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (e) Twitter, BlkPer=2% 0 20 40 60 80 100 120 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (f) Twitter, BlkPer=10% 0 20 40 60 80 100 120 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (g) Sina, BlkPer=1% 0 20 40 60 80 100 120 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (h) Sina, BlkPer=2% 0 20 40 60 80 100 120 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Time (iterations) Infection Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (i) Sina, BlkPer=10% Fig. 4. The experimental results of the rumor infection ratio with propagation iterations under different blocking algorithms in the Facebook ((a),(b),(c)), Twitter ((d),(e),(f)) and Sina Weibo ((g),(h),(i)) dataset respectively. The blocking percentage of all the nodes in the social network is set to 1%, 2% and 10% for each dataset. 0.01 0.012 0.014 0.016 0.018 0.02 0.022 0 0.05 0.1 0.15 0.2 0.25 Blocked Nodes Ratio Rumor Affected Nodes Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (a) Facebook 0.01 0.012 0.014 0.016 0.018 0.02 0.022 0 0.05 0.1 0.15 0.2 Blocked Nodes Ratio Rumor Affected Nodes Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (b) Twitter 0.01 0.012 0.014 0.016 0.018 0.02 0.022 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Blocked Nodes Ratio Rumor Affected Nodes Ratio Normal Propagation Classic Greedy Proposed Greedy Dynamic Schema (c) Sina Fig. 5. Stationary rumor infection ratio under different blocking algorithms with different blocking ratios on the Facebook, Twitter and Sina Weibo datasets respectively. The blocking ratio ranges from 1% to 2% with an interval of 0.1%, which shows the sensitivity of different blocking algorithms. TABLE 2 False Positive Ratio in Nodes Blocking Data Sets Facebook Twitter Sina Weibo Classic Greedy 42% 40% 70% Proposed Greedy 34% 36% 64% Dynamic Schema 14% 18% 34%