Ns9298 ARTICLE N PRESS No of Pages 14, Model 3G 12 November 2011 Information Sciences xxx(2011)xXx Contents lists available at SciVerse Science Direct SCIENCES Information sciences ELSEVIER journalhomepage:www.elsevier.com/locate/ins 2 The dynamic competitive recommendation algorithm in social 3 network services 4 Seok Jong Yu 5 Dept of Computer Science, Sookmyung Women's University, Seoul 140-742, Republic of Korea ARTICLE IN FO A B STRACT 9 Article histor 1 ncreases.it is facing with a challenge to how to help people find right people and informa- 23 15 September 2011 12 ccepted 31 October 2011 on conveniently. For this purpose, current social network services are adopting personal- 24 Available online xxxx ized recommender systems. Existing recommendation algorithms largely depend on one of 25 content-based algorithm, collaborative filtering, or influential ranking analysis. However, 26 14 Keywords e user changes, and it is due to the diversities of personal characteristics 28 social graph size, the number of followers, or sparsity of profile content To 29 mitation and to provide consistent and stable recommendation in social 30 networks, this study proposes the dynamic competitive recommendation algorithm based 31 on the competition of multiple component algorithms. This study shows that it outper 32 forms previous approaches through performance evaluation on actual Twitter dataset. e 2011 Published by Elsevier Inc. 34 37 1 Introduction For recent years, there have been significant progress ial network services(SNS) qualitatively as well as quanti- 39 tatively. As of November 2010, it was reported that more than 175 million people were using Twitter and nearly one billion 40 s were being created a week. In addition, it is said that 240 million photos are being uploaded a day on Facebook and 12,000 photos were browsed a second 42 successfully exploiting SNS-based personalized recommendation services for marketing products [1]. Unlike Facebook and 43 MySpace, Twitter is a directed social network that allows people to follow others without prior reciprocal agreement. This 44 unrestricted following mechanism makes the Twitter network grow fast and enables the large scale information propagation. 45 For this reason, Twitter is currently evaluated as a new type of media other than TV or newspaper[8, 16, 28. However, as the 46 scale of Twitter increases enormously, it is facing with a significant problem with finding right people or information to 47 subscribe. When a new user signs up in Twitter, the first thing to do is to search existing twitter friends by email accounts 48 However, this function has a limitation that can find only the people who are already known to the user. 49 Alternatively, recommender systems have been adopted for the purpose of supplementing the capability of traditional 50 search systems. Personalized recommendation is particularly effective for application domains such as movie, music, book, 51 and app stores that the number of items keeps increasing over time [9, 16, 20. The performance of recommender system is Q1 E-mail address: yusjong@gmail www.amazon.com www.myspace.coml 0020-0255/S-see front matter o 2011 Published by Elsevier Inc. doi:10.1016ins2011.10.020 Please cite this article in press as: s ]. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.do:10.1016ins2011020
1 2 The dynamic competitive recommendation algorithm in social 3 network services 4 Seok Jong Yu ⇑ 5 Dept. of Computer Science, Sookmyung Women’s University, Seoul 140-742, Republic of Korea 6 8 article info 9 Article history: 10 Received 9 March 2011 11 Received in revised form 15 September 2011 12 Accepted 31 October 2011 13 Available online xxxx 14 Keywords: 15 Social network service 16 Recommender system 17 Recommendation algorithm 18 Twitter 19 PageRank 20 2 1 abstract As the number of Twitter users exceeds 175 million and the scale of social network 22 increases, it is facing with a challenge to how to help people find right people and informa- 23 tion conveniently. For this purpose, current social network services are adopting personal- 24 ized recommender systems. Existing recommendation algorithms largely depend on one of 25 content-based algorithm, collaborative filtering, or influential ranking analysis. However, 26 these algorithms tend to suffer the performance fluctuation phenomenon in common 27 whenever an active user changes, and it is due to the diversities of personal characteristics 28 such as the local social graph size, the number of followers, or sparsity of profile content. To 29 overcome this limitation and to provide consistent and stable recommendation in social 30 networks, this study proposes the dynamic competitive recommendation algorithm based 31 on the competition of multiple component algorithms. This study shows that it outper- 32 forms previous approaches through performance evaluation on actual Twitter dataset. 33 2011 Published by Elsevier Inc. 34 35 36 37 1. Introduction 38 For recent years, there have been significant progresses in social network services (SNS) qualitatively as well as quanti- 39 tatively. As of November 2010, it was reported that more than 175 million people were using Twitter and nearly one billion 40 tweets were being created a week. In addition, it is said that 240 million photos are being uploaded a day on Facebook and 41 12,000 photos were being browsed a second on Flickr. Furthermore, major e-commerce sites like Amazon and Netflix1 are 42 successfully exploiting SNS-based personalized recommendation services for marketing products [1]. Unlike Facebook and 43 MySpace,2 Twitter is a directed social network that allows people to follow others without prior reciprocal agreement. This 44 unrestricted following mechanism makes the Twitter network grow fast and enables the large scale information propagation. 45 For this reason, Twitter is currently evaluated as a new type of media other than TV or newspaper [8,16,28]. However, as the 46 scale of Twitter increases enormously, it is facing with a significant problem with finding right people or information to 47 subscribe. When a new user signs up in Twitter, the first thing to do is to search existing Twitter friends by email accounts. 48 However, this function has a limitation that can find only the people who are already known to the user. 49 Alternatively, recommender systems have been adopted for the purpose of supplementing the capability of traditional 50 search systems. Personalized recommendation is particularly effective for application domains such as movie, music, book, 51 and app stores that the number of items keeps increasing over time [9,16,20]. The performance of recommender system is 0020-0255/$ - see front matter 2011 Published by Elsevier Inc. doi:10.1016/j.ins.2011.10.020 Q2 ⇑ Tel.: +82 2 710 9831; fax: +82 2 710 9296. Q1 E-mail address: yusjong@gmail.com 1 www.amazon.com, www.netflix.com. 2 www.myspace.com. Information Sciences xxx (2011) xxx–xxx Contents lists available at SciVerse ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
NS9298 ARTICLE N PRESS No of Pages 14, Model 3G ber 2011 S. Yu/Information Sciences xxx(2011)xxx-XXx 52 influenced by its recommendation algorithm, and snS-supported recommendation algorithms can be categorized into con- 53 tent-based algorithm [13, 21], collaborative filtering [15, 26 and influential ranking m[16,28 Content-based algorithm is based on homophily idea, which means a tendency tha ct between similar people oc- 55 curs at a higher rate than among dissimilar people [21 This algorithm makes use of the demographic similarities between 56 people using user profile data which usually contain personal information such as age, sex, geographical location, and inter- 57 est [13]. For this structural reason, the performance of this algorithm is proportional to the fidelity of user profile informa- 58 tion. However, the profile contents of SNS users are reported to be very sparse. 59 Collaborative filtering 15, 26 is an algorithm that has been actively applied to commercial domains. While content-based 60 algorithm needs sufficient profile data for its calculation, collaborative filtering predicts virtual preferences of inexperienced 61 items using the past rating history created by a group of similar people. Despite its remarkable performance, collaborative 62 filtering is vulnerable to the following problems: cold-start, scalability, and rating matrix sparsity [15, 26 In addition, it is 63 difficult to apply collaborative filtering directly to sNS which does not provide rating data Another available strategy for SNS recommendation is influential raking algorithm. Basically, this algorithm aims to 65 search popular or influential people over a social network, and is activated by various criteria: the number of followers. 66 Friends of a Friend( FOAF)[5 Tunk Rank [29 and Page Rank [23 FOAF is a graph-based algorithm being currently used by 67 Twitter and Facebook, and it investigates the friends of an active user as recommendation candidates. In general, the number 69 algorithm being used by Google search engine, which has a numerical weighting mechanism to each document hyperlinked 70inthewww.TherehavebeenstudiesonhowtoapplyPageranktoTwitternetworkwiththestructuralsimilaritybetween 71 Twitter and the www[16]. Hybrid approaches have been designed in various fields to compensate for the weakness of above single algorithms. Chen 73 et al. [5] proposed Content-plus-link method which integrates content similarity with social link relation in SoNAR, a SNS of 74 IBM, for precise recommendation. In digital library domain, collaborative filtering has been frequently combined with con- 75 tent-based approach [ 2,3, 14, 15, 24 Porcel et al. [24 present a model of a fuzzy linguistic recommender system to help the 76 university digital library users to access for their research resources. Rodriguez et al. 25 presented a hybrid system which 77 uses a knowledge-based model to provide good cold start recommendations. Liu et al [17 proposes a novel hybrid recom- 78 mendation method that integrates the segmentation-based sequential rule method to consider the sequence of customers 79 purchase behavior over time. As a preliminary study, this work performed an experiment on a Twitter dataset from [22]. which compared the recom- 81 mendation performance of existing algorithms( PageRank, common followers, number of tweets, number of followers, and 82 profile matching). From this result, it was confirmed that the performance of each algorithm was highly dependent on the 83 active user, with showing high deviations in recommendation hit ratio whenever the active user changed. In this paper, this 84 phenomenon is defined as performance fluctuation. This phenomenon is analyzed as being caused by the different character 85 istics of the user in evaluation criteria( the number of followers or the sparsity of profile content), and each algorit 86 sitively responds only to its specific feature of them. Therefore it is possible to assume that there is no single 87 recommendation algorithm which is able to consider and satisfy the diversities of all users. as a solution to the lin 88 of single approach this study suggests the dynamic competitive recommendation algorithm which aims to provide ce limp 89 and stable performance through the competition of multiple component algorithms. The rest of the paper is organized as follows. Section 2 will review existing recommendation algorithms, and Section 3 rill describe the design and algorithms of the dynamic competitive recommendation. Section 4 will present the result of 92 performance evaluation, and the paper will be concluded in Section 5 Recommendation algorithms o Recommender system implicitly suggests new preferable items, information or people from the analysis of profile 95 description data, or the past rating history of neighbors who have a similar preference [9]. Personalized recommendation 96 is essential in such environments that the number of items increases continuously as in e-commerce sites, and it helps to 97 save time and effort for searching items [16. In this section, we will review SNS-supported recommendation algorithm 98 which largely rely on demographic information, collaborative filtering, or influential ranking analysis. 99 2.1. Content-based algorithm Originally, content-based algorithm preferable items based on the correlation between item description and 101 demographic contents in the user profi. keywords describing age, sex, location, email, interest, etc. )[6, 19, 26]. In 102 SNS content-based mendation co homophily idea, which is a tendency that similar people contact more fre- 103 quently than dissimilar people [21 Weng et al. [29] have reported that two users who follow reciprocally share topical 104 interests by mining their 50,000 links. a general disadvantage of content-based algorithm is the portfolio effect [ 9] is recommended items which he already knows or which are too similar to those he already knows. In addition, this method 3frIendofafRiendprojecthttp://www.foaf-project.org/. Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.o101016ins2011020
52 influenced by its recommendation algorithm, and SNS-supported recommendation algorithms can be categorized into con- 53 tent-based algorithm [13,21], collaborative filtering [15,26], and influential ranking algorithm [16,28]. 54 Content-based algorithm is based on homophily idea, which means a tendency that a contact between similar people oc- 55 curs at a higher rate than among dissimilar people [21]. This algorithm makes use of the demographic similarities between 56 people using user profile data which usually contain personal information such as age, sex, geographical location, and inter- 57 est [13]. For this structural reason, the performance of this algorithm is proportional to the fidelity of user profile informa- 58 tion. However, the profile contents of SNS users are reported to be very sparse. 59 Collaborative filtering [15,26] is an algorithm that has been actively applied to commercial domains. While content-based 60 algorithm needs sufficient profile data for its calculation, collaborative filtering predicts virtual preferences of inexperienced 61 items using the past rating history created by a group of similar people. Despite its remarkable performance, collaborative 62 filtering is vulnerable to the following problems: cold-start, scalability, and rating matrix sparsity [15,26]. In addition, it is 63 difficult to apply collaborative filtering directly to SNS which does not provide rating data. 64 Another available strategy for SNS recommendation is influential raking algorithm. Basically, this algorithm aims to 65 search popular or influential people over a social network, and is activated by various criteria: the number of followers, 66 Friends of a Friend (FOAF)3 [5], TunkRank [29], and PageRank [23]. FOAF is a graph-based algorithm being currently used by 67 Twitter and Facebook, and it investigates the friends of an active user as recommendation candidates. In general, the number 68 of degree in FOAF graph is limited to two or three for computational reason [8,21]. PageRank [23] is a famous graph analysis 69 algorithm being used by Google search engine, which has a numerical weighting mechanism to each document hyperlinked 70 in the WWW. There have been studies on how to apply PageRank to Twitter network with the structural similarity between 71 Twitter and the WWW [16]. 72 Hybrid approaches have been designed in various fields to compensate for the weakness of above single algorithms. Chen 73 et al. [5] proposed Content-plus-link method which integrates content similarity with social link relation in SONAR, a SNS of 74 IBM, for precise recommendation. In digital library domain, collaborative filtering has been frequently combined with con- 75 tent-based approach [2,3,14,15,24]. Porcel et al. [24] present a model of a fuzzy linguistic recommender system to help the 76 university digital library users to access for their research resources. Rodriguez et al. [25] presented a hybrid system which 77 uses a knowledge-based model to provide good cold start recommendations. Liu et al. [17] proposes a novel hybrid recom- 78 mendation method that integrates the segmentation-based sequential rule method to consider the sequence of customers’ 79 purchase behavior over time. 80 As a preliminary study, this work performed an experiment on a Twitter dataset from [22], which compared the recom- 81 mendation performance of existing algorithms (PageRank, common followers, number of tweets, number of followers, and 82 profile matching). From this result, it was confirmed that the performance of each algorithm was highly dependent on the 83 active user, with showing high deviations in recommendation hit ratio whenever the active user changed. In this paper, this 84 phenomenon is defined as performance fluctuation. This phenomenon is analyzed as being caused by the different character- 85 istics of the user in evaluation criteria (the number of followers or the sparsity of profile content), and each algorithm sen- 86 sitively responds only to its specific feature of them. Therefore, it is possible to assume that there is no single optimal 87 recommendation algorithm which is able to consider and satisfy the diversities of all users. As a solution to the limitation 88 of single approach, this study suggests the dynamic competitive recommendation algorithm, which aims to provide consistent 89 and stable performance through the competition of multiple component algorithms. 90 The rest of the paper is organized as follows. Section 2 will review existing recommendation algorithms, and Section 3 91 will describe the design and algorithms of the dynamic competitive recommendation. Section 4 will present the result of 92 performance evaluation, and the paper will be concluded in Section 5. 93 2. Recommendation algorithms 94 Recommender system implicitly suggests new preferable items, information or people from the analysis of profile 95 description data, or the past rating history of neighbors who have a similar preference [9]. Personalized recommendation 96 is essential in such environments that the number of items increases continuously as in e-commerce sites, and it helps to 97 save time and effort for searching items [16]. In this section, we will review SNS-supported recommendation algorithms, 98 which largely rely on demographic information, collaborative filtering, or influential ranking analysis. 99 2.1. Content-based algorithm 100 Originally, content-based algorithm selects preferable items based on the correlation between item description and 101 demographic contents in the user profile (e.g. keywords describing age, sex, location, email, interest, etc.) [6,19,26]. In 102 SNS, content-based recommendation comes from homophily idea, which is a tendency that similar people contact more fre- 103 quently than dissimilar people [21]. Weng et al. [29] have reported that two users who follow reciprocally share topical 104 interests by mining their 50,000 links. A general disadvantage of content-based algorithm is the portfolio effect [9]: a user 105 is recommended items which he already knows or which are too similar to those he already knows. In addition, this method 3 Friend of a Friend project, http://www.foaf-project.org/. 2 S.J. Yu / Information Sciences xxx (2011) xxx–xxx INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
Ns9298 ARTICLE N PRESS No of Pages 14, Model 3G 12 November 2011 S Yu/Information Sciences xxx(2011)xxx-xo Table 1 m] Use 106 also suffers the sparsity problem of profile contents and computation load due to the large scale of social network [ 30. The 07 former makes it hard to guarantee the recommendation quality and the latter makes it difficult to perform real time 109 2.2. Collaborative filtering 110 Collaborative filtering has been one of successful recommendation algorithms in commercial domains. Unlike other algo- thms, it needs a user-item matrix which records users' past preferences on items for lending new items 9]. In Table 112 1, rui is a rating score by user u on item i n and m are the number of users and items, respectively. Using this table, user-based 13 collaborative filtering performs similarity calculation to search people(neighbors) who showed similar preference, and then it 14 predicts rating scores of new items not experienced by the active user. Collaborative filtering often uses Pearson correlation 15 or cosine-based similarity to get the neighbor set, which can be identified by a threshold or selecting top-N[15. On the other 16 hand, item-based collaborative filtering makes prediction by finding similar items and then takes a weighted combination of 17 their ratings [26]. In contrast to content-based approach, it is able to create cross-genre or serendipitous recommendations 18 because it does not rely on item specific similarities as in content-based algorithm but on user rating-similarities [12] .As 19 drawbacks, collaborative filtering suffers cold-start, scalability, and rating matrix sparsity problems [15, 26]. 120 2.3. Influential ranking algorithm Influential ranking algorithms are also called relation-based, or graph-based algorithms. these algorithms are applicable to 122 graph-based domains as well as SNS. Their primary purpose is to find people who are active and influential on a social net 123 work throug gh the analys ysis of social graph relations [16]. Specifically, these algorithms are classified into the number of fol 124 lowers, TunkRank, Friends of a Friend, and PageRank. Imber of followers is the most intuitive criteria to determine the ranking of candidates, and it counts the number of peo- 126 ple who follow the candidate, or in-degree of a node [4]. In a ranking list, top-k candidates who are followed most are sug- 127 gested as potential friends. The number of followers indicates the degree of influence over the social network because it is 28 proportional to the strength of information propagation. High influential people accelerate information spread through 129 tweet subscription or retweeting function to its follower group. Cha et al. performed a study on how influential people 130 are related to the number of retweeted messages[4,5 TunkRank is another tool for ring the influence of user on Twitter network, which calculates how much attention 32 the followers actually give to their following person [29 The assumptions of this method are as follows: Influence(X) is the pected number of people who will read a tweet that X tweets. If X is a member of Followers(Y), then there is a 1/VertFol- 134 lowing(X)Vert probability that X will read a tweet posted by Y, where Following(X)is the set of people that X follows IfX 135 reads a tweet from Y, there is a constant probability p that x will retweet it. From this model, it is easy to measure someone's 137 influence recursively Influence(X) +px Influence(r) (1) Friends of a Friend(FOAF)indicates people who are followed by one's friend This method are used by Twitter as a function 41 'Suggestions for you', and also implemented by Facebook, Twubble, and Orkut. In semantic web domain, the FOAF is a largest 142 project name and a widely accepted standard vocabulary for representing social networks, and many large social networking 43 websites use it to produce semantic Web profiles for their users. Golbeck and Rothstein [7] presented a study of the intersection 144 of FoaF data found in many online social networks. As an application of FOAF, candidates can be ranked by how many friends of 145 an active user follow the candidate in common, and these people are called Common followers[ 18. The more common followers 46 a candidate has, the higher position it is ranked at Lo and lin [20 proposed a graph-based algorithm based on common 147 followers FriendofaFriendprojecthttp://www.foaf-project.org/. 7www.orkut.com Please cite this article in press as: s ]. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.do:10.1016ins2011020
106 also suffers the sparsity problem of profile contents and computation load due to the large scale of social network [30]. The 107 former makes it hard to guarantee the recommendation quality and the latter makes it difficult to perform real time 108 computation. 109 2.2. Collaborative filtering 110 Collaborative filtering has been one of successful recommendation algorithms in commercial domains. Unlike other algo- 111 rithms, it needs a user-item matrix which records users’ past preferences on items for recommending new items [9]. In Table 112 1, ru,i is a rating score by user u on item i. n and m are the number of users and items, respectively. Using this table, user-based 113 collaborative filtering performs similarity calculation to search people (neighbors) who showed similar preference, and then it 114 predicts rating scores of new items not experienced by the active user. Collaborative filtering often uses Pearson correlation 115 or cosine-based similarity to get the neighbor set, which can be identified by a threshold or selecting top-N [15]. On the other 116 hand, item-based collaborative filtering makes prediction by finding similar items and then takes a weighted combination of 117 their ratings [26]. In contrast to content-based approach, it is able to create cross-genre or serendipitous recommendations 118 because it does not rely on item specific similarities as in content-based algorithm but on user rating-similarities [12]. As 119 drawbacks, collaborative filtering suffers cold-start, scalability, and rating matrix sparsity problems [15,26]. 120 2.3. Influential ranking algorithms 121 Influential ranking algorithms are also called relation-based, or graph-based algorithms. These algorithms are applicable to 122 graph-based domains as well as SNS. Their primary purpose is to find people who are active and influential on a social net- 123 work through the analysis of social graph relations [16]. Specifically, these algorithms are classified into the number of fol- 124 lowers, TunkRank,4 Friends of a Friend,5 and PageRank. 125 Number of followers is the most intuitive criteria to determine the ranking of candidates, and it counts the number of peo- 126 ple who follow the candidate, or in-degree of a node [4]. In a ranking list, top-k candidates who are followed most are sug- 127 gested as potential friends. The number of followers indicates the degree of influence over the social network because it is 128 proportional to the strength of information propagation. High influential people accelerate information spread through 129 tweet subscription or retweeting function to its follower group. Cha et al. performed a study on how influential people 130 are related to the number of retweeted messages [4,5]. 131 TunkRank is another tool for measuring the influence of user on Twitter network, which calculates how much attention 132 the followers actually give to their following person [29]. The assumptions of this method are as follows: Influence(X) is the 133 expected number of people who will read a tweet that X tweets. If X is a member of Followers(Y), then there is a 1/VertFol- 134 lowing (X)Vert probability that X will read a tweet posted by Y, where Following (X) is the set of people that X follows. If X 135 reads a tweet from Y, there is a constant probability p that X will retweet it. From this model, it is easy to measure someone’s 136 influence recursively. 137 InfluenceðXÞ ¼ X Y2FollowersðXÞ 1 þ p InfluenceðYÞ kFollowingðYÞk ð1Þ 139 140 Friends of a Friend (FOAF) indicates people who are followed by one’s friend. This method are used by Twitter as a function 141 ‘Suggestions for you’, and also implemented by Facebook, Twubble,6 and Orkut.7 In semantic web domain, the FOAF is a largest 142 project name and a widely accepted standard vocabulary for representing social networks, and many large social networking 143 websites use it to produce semantic Web profiles for their users. Golbeck and Rothstein [7] presented a study of the intersection 144 of FOAF data found in many online social networks. As an application of FOAF, candidates can be ranked by how many friends of 145 an active user follow the candidate in common, and these people are called Common followers [18]. The more common followers 146 a candidate has, the higher position it is ranked at. Lo and Lin [20] proposed a graph-based algorithm based on common 147 followers. Table 1 User–item rating matrix. Item1 Item2 ... itemm User1 r11 r12 ... r1m User2 r21 r22 ... r2m ... ... ... ... ... Usern rn1 rn2 ... rnm 4 Tunk Rank, http://tunkrank.com. 5 Friend of a Friend project, http://www.foaf-project.org/. 6 www.twubble.com. 7 www.orkut.com. S.J. Yu / Information Sciences xxx (2011) xxx–xxx 3 INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
NS9298 ARTICLE N PRESS No of Pages 14, Model 3G 12 November 2011 S. Yu/Information Sciences xxx(2011) PageRank is a famous link analysis algorithm, used by the google Internet search engine which assigns a numerical ghting to each element of a hyperlinked set of documents, such as the world wide Web 23. The algorithm may be ap- 150 plied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given 151 element E is referred to as the PageRank of e and denoted by PR(E) PR(A) is calculated by the following formula. 1d+4(x+1x+CD+-) 155 where L is number of outbound links, and d is a damping factor which is set as 0.85 Kawk et al. [16 performed a study which compared the recommendation ranking of Twitter users using PageRank, and 157 the number of follower. They argued that Page Rank and the number of follower method generated similar ranking results 158 because they are both link-based algorithms. However, their study did not include the impacts of recommendation perfor 159 mance when PageRank is combined with other criteria. 160 2.4. Hybrid algorithms 161 Content-plus-link [5.8] is a typical hybrid method which enhances profile matching algorithm together with social link 62 information. This method boosts the similarity of a candidate user c and u by 50% if a valid social link from u and c exists. 63 A valid social link is defined as a sequence of three or four users, the first being recipient of the recommendation and the last 164 being the recommended user. The SONAR system [10], an IBM,s SNS, provides recommendation service by aggregating social 165 relationship information from the following data sources within the intranet of IBM: organizational chart, publication data 166 base, patent database, friending system, people tagging system, project wiki, and blogging system. a relationship is estab 167 lished if within that data source two people have interacted with each other, such as Co-authoring a paper or leaving 168 comments on each others' blog. This method is evaluated to be effective for recommending socially known people in SNS [1] 170 2.5. Limitations of current algorithms For a preliminary research, we performed experiments which compared the recommendation hit ratio of current algo- 172 rithms on actual Twitter dataset from [22 Here are the procedures. Firstly, top-ranked one hundred of active people who follow others most have been chosen from the experimental data 174 set. Secondly, an initial candidate pool has been created from a local social network of each chosen person. Thirdly, the fol- lowing algorithms have been sequentially applied to each candidate pool and the hit ratio has been measured. As 176 experimental algorithms, PageRank(PR), common followers(CF), number of tweets(TW), number of followers (FR), profiling 177 matching(PF)were included. Fig. 1 shows part of measurement result on selected users. According to this result, it is easily 178 confirmed that the performance of individual algorithm was being highly fluctuated by the active user. For examples, PF 79 showed the best results for user user3, and user, but not for user, user4, users, user,, users and user. fR was more useful r userz, user4, user,, and users rather than users and user1o Besides, both FR and pf were not good choices for users. In 181 general, most of algorithms showed high deviations from the average during the experiment. These phenomena can be 182 explained as follows: each user usually has different characteristics in evaluation criteria such as the number of followers 183 and tweets, profile contents, and common followers, and each algorithm responds only to its focusing parameters Hit ratio user3 user4 usera user10 Fig. 1. Number of recommendation hits by algorithm. Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.o101016ins2011020
148 PageRank is a famous link analysis algorithm, used by the Google Internet search engine, which assigns a numerical 149 weighting to each element of a hyperlinked set of documents, such as the World Wide Web [23]. The algorithm may be ap- 150 plied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given 151 element E is referred to as the PageRank of E and denoted by PR(E). PR(A) is calculated by the following formula. 152 PRðAÞ ¼ 1 d þ d PRðBÞ LðBÞ þ PRðCÞ LðCÞ þ PRðDÞ LðDÞ þ ð2Þ 154 155 where L is number of outbound links, and d is a damping factor which is set as 0.85. 156 Kawk et al. [16] performed a study which compared the recommendation ranking of Twitter users using PageRank, and 157 the number of follower. They argued that PageRank and the number of follower method generated similar ranking results 158 because they are both link-based algorithms. However, their study did not include the impacts of recommendation perfor- 159 mance when PageRank is combined with other criteria. 160 2.4. Hybrid algorithms 161 Content-plus-link [5,8] is a typical hybrid method which enhances profile matching algorithm together with social link 162 information. This method boosts the similarity of a candidate user c and u by 50% if a valid social link from u and c exists. 163 A valid social link is defined as a sequence of three or four users, the first being recipient of the recommendation and the last 164 being the recommended user. The SONAR system [10], an IBM’s SNS, provides recommendation service by aggregating social 165 relationship information from the following data sources within the intranet of IBM: organizational chart, publication data- 166 base, patent database, friending system, people tagging system, project wiki, and blogging system. A relationship is estab- 167 lished if within that data source two people have interacted with each other, such as co-authoring a paper or leaving 168 comments on each others’ blog. This method is evaluated to be effective for recommending socially known people in SNS 169 [11]. 170 2.5. Limitations of current algorithms 171 For a preliminary research, we performed experiments which compared the recommendation hit ratio of current algo- 172 rithms on actual Twitter dataset from [22]. Here are the procedures. 173 Firstly, top-ranked one hundred of active people who follow others most have been chosen from the experimental data- 174 set. Secondly, an initial candidate pool has been created from a local social network of each chosen person. Thirdly, the fol- 175 lowing algorithms have been sequentially applied to each candidate pool and the hit ratio has been measured. As 176 experimental algorithms, PageRank (PR), common followers (CF), number of tweets (TW), number of followers (FR), profiling 177 matching (PF) were included. Fig. 1 shows part of measurement result on selected users. According to this result, it is easily 178 confirmed that the performance of individual algorithm was being highly fluctuated by the active user. For examples, PF 179 showed the best results for user1, user3, and user10, but not for user2, user4, user5, user7, user8 and user9. FR was more useful 180 for user2, user4, user7, and user8 rather than user5 and user10. Besides, both FR and PF were not good choices for user5. In 181 general, most of algorithms showed high deviations from the average during the experiment. These phenomena can be 182 explained as follows: each user usually has different characteristics in evaluation criteria such as the number of followers 183 and tweets, profile contents, and common followers, and each algorithm responds only to its focusing parameters Fig. 1. Number of recommendation hits by algorithm. 4 S.J. Yu / Information Sciences xxx (2011) xxx–xxx INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
Ns9298 ARTICLE N PRESS No of Pages 14, Model 3G 12 November 2011 S Yu/Information Sciences xxx(2011)xxx-xo 5 184 sensitively. Accordingly, it is concluded that there is no single optimal recommendation algorithm which satisfies the diver 85 sities of users. This can be a limitation of single algorithm approach 186 3. The dynamic competitive recommendation algorithm 87 3. 1. Motivation 188 According to the study [5], while content-based algorithm tends to recommend mostly unknown people, social graph- 189 based methods recommend mostly known people. As shown in the prior study, existing algorithms do not give any consid- ration to difference in users' features. It may be the cause of making it difficult to provide consistent and stable recommen- 91 dation when active user changes. This is why a hybrid method is necessary for SNS recommendation. Therefore, to overcome 192 this limitation, this study suggests the dynamic competitive recommendation(dcr)algorithm which aims to search the best 193 result through a dynamic competition process among multiple algorithms, with reducing the risk of single algorithm 194 approach. 195 3. 2. Design of dCr algorithm Fig 2 describes the procedures of the dCr algorithm; (1)creation of initial candidate pool. (2)application of component 97 algorithms, 3)competition algorithm, (4) integration of candidate list, (5) suggestion of top ranked candidates. Before 198 describing the algorithm, a basic terminology is needed to be defined When a user u follows a user c, c is referred to 'a fol- 199 lowing of u, and u is referred to 'a follower'of c. 200 3. 1. Creation of initial candidate pool 203 First, to create an initial candidate pool, a local social graph of an active user should be traversed along with connected links based on FOaF policy. In Fig 3. assume that I is an active user to be recommended and al and a2 are people followed by I, where a node and an arrow indicate a person and a follow link, respectively In this study the range of initial candidates is restricted to followings of I(within degree 2)and followings of common 205 followers(within degree 3), where candidates should not be followed by I currently. The algorithm limits the graph degree 06 for candidates because it directly affects the computational cost. Therefore, it can be a future study topic to handle the sca 207 lability issue of recommendation algorithm. An initial candidate pool is composed of two subsets, candidate set 1(CP1)and 23 2(CP2)as fo 211 CP=CP1∩CP recommend Active user of active user nitial Candidate Pool Competition Algorithm CLinal Fig. 2. DCR algorithm. Please cite this article in press as: s ]. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.do:10.1016ins2011020
184 sensitively. Accordingly, it is concluded that there is no single optimal recommendation algorithm which satisfies the diver- 185 sities of users. This can be a limitation of single algorithm approach. 186 3. The dynamic competitive recommendation algorithm 187 3.1. Motivation 188 According to the study [5], while content-based algorithm tends to recommend mostly unknown people, social graph- 189 based methods recommend mostly known people. As shown in the prior study, existing algorithms do not give any consid- 190 eration to difference in users’ features. It may be the cause of making it difficult to provide consistent and stable recommen- 191 dation when active user changes. This is why a hybrid method is necessary for SNS recommendation. Therefore, to overcome 192 this limitation, this study suggests the dynamic competitive recommendation (DCR) algorithm which aims to search the best 193 result through a dynamic competition process among multiple algorithms, with reducing the risk of single algorithm 194 approach. 195 3.2. Design of DCR algorithm 196 Fig. 2 describes the procedures of the DCR algorithm; (1) creation of initial candidate pool, (2) application of component 197 algorithms, (3) competition algorithm, (4) integration of candidate list, (5) suggestion of top ranked candidates. Before 198 describing the algorithm, a basic terminology is needed to be defined. When a user u follows a user c, c is referred to ‘a fol- 199 lowing’ of u, and u is referred to ‘a follower’ of c. 200 3.2.1. Creation of initial candidate pool 201 First, to create an initial candidate pool, a local social graph of an active user should be traversed along with connected 202 follow links based on FOAF policy. In Fig. 3, assume that I is an active user to be recommended, and A1 and A2 are people 203 followed by I, where a node and an arrow indicate a person and a follow link, respectively. 204 In this study, the range of initial candidates is restricted to followings of I (within degree 2) and followings of common 205 followers (within degree 3), where candidates should not be followed by I currently. The algorithm limits the graph degree 206 for candidates because it directly affects the computational cost. Therefore, it can be a future study topic to handle the sca- 207 lability issue of recommendation algorithm. An initial candidate pool is composed of two subsets, candidate set 1 (CP1) and 208 set 2 (CP2) as follows. 209 211 CP ¼ CP1 \ CP2 ð3Þ Fig. 2. DCR algorithm. S.J. Yu / Information Sciences xxx (2011) xxx–xxx 5 INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
NS9298 ARTICLE N PRESS No of Pages 14, Model 3G S. Yu/ Information Sciences xxx(2011)xxx-XXx Fig. 3. A local social graph of active user 1. 213 CP1 is a set of people who satisfy the following condition, where F(, a) means that a user u follows a user a CP1=user cf(u, a) and F(a, c) and a!=c, for user u, a, and 216 For example, B1 and B2 are added into CP1. Secondly, D1 and D2 are also important to I because e ollowing people. There- are followed by Cl and 218 fore, D1 and D2 are also added into CP2 set which is defined in Eg19S 217 C2 who are common followers of I to A2. It implies that they may have a similar preference to lin CP2=user cF(u, a)and F(b, a)and F(b, c) and u!= b and al=c, for all user cH 222 To prevent from the increase of computation time, if a common follower(Cl or C2)of I follows too many people, it is desir 223 able to include the following people of the common follower to the candidate pool selectively using secondary criteria such 224 as the number of followings(DI and D2). 226 DCR has a set of component algorithms to apply to the initial candidate pool: PageRank(Pr), common followers(CF) 227 number of tweets(Tw), and number of followers(Fr), profile matching(PM). As in traditional recommender systems, each 228 component algorithm generates a corresponding candidate ranking list according to its unique evaluation criteria. Througl 229 this process, one candidate may be appeared at different ranking positions in multiple lists. Here are some formal expres 230 sions on how a component algorithm is applied to candidates. recommendation score(RS)of a candidate c for an active user 2 u after applying an algorithm RAk is defined as the position(Pos of c in candidate list(cl), where 0< Pos<n-1 RAk(u)=1,C2,…,Cn RSeA(uy(G)-n-Pos(C1, RAK(u)) 235 Formal definitions of component algorithms are as follows FR(u)=Huser cf(c, u) for all user cl 240 where Fc, u indicates a user c follows a user u. TMu returns the number of tweets posted by a user u, which reflects the activities of a user on the social network. T(u, t) indicates th er u produces a tweet t Tw(u)=Tweet tT(u,t)H Common follower 8 The idea of common follower was previously defined in[ 8. and it has been extended in this study as follows. According to 9 the graph of Fig. 4, B2 has one common follower(Al) of Is friends, and bl has two common followers (Al, A2). The assumption is that, the more common followers a candidate is followed by the more important the candidate can be Accordingly, B1 gets a higher priority than B2. The number of common followers( Cf)is defined as Eq (10) where u nd c indicate an active user and a candidate, respectively CF1(u, c)=luser a F(u, a)and f(a, c)and UF(u,c)H 257 Here is another scenario of common follower. In the same graph, I, Cl and C2 are all common followers to A2. Similarly, if the 258 importance of candidates is determined by the number of common followers, CF scores for DI and D2 will be 1 and 2, resp 259 tively. The second CF formula is defined as follows Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.o101016ins2011020
212 CP1 is a set of people who satisfy the following condition, where F(u,a) means that a user u follows a user a. 213 215 CP1 ¼ fuser cjFðu; aÞ and Fða; cÞ and a! ¼ c; for user u; a; and cg ð4Þ 216 For example, B1 and B2 are added into CP1. Secondly, D1 and D2 are also important to I because they are followed by C1 and 217 C2 who are common followers of I to A2. It implies that they may have a similar preference to I in following people. There- 218 fore, D1 and D2 are also added into CP2 set which is defined in Eq. (5). 219 221 CP2 ¼ fuser cjFðu; aÞ and Fðb; aÞ and Fðb; cÞ and u! ¼ b and a! ¼ c; for all user cg ð5Þ 222 To prevent from the increase of computation time, if a common follower (C1 or C2) of I follows too many people, it is desir- 223 able to include the following people of the common follower to the candidate pool selectively using secondary criteria such 224 as the number of followings (D1 and D2). 225 3.2.2. Application of component algorithms 226 DCR has a set of component algorithms to apply to the initial candidate pool: PageRank (PR), common followers (CF), 227 number of tweets (TW), and number of followers (FR), profile matching (PM). As in traditional recommender systems, each 228 component algorithm generates a corresponding candidate ranking list according to its unique evaluation criteria. Through 229 this process, one candidate may be appeared at different ranking positions in multiple lists. Here are some formal expres- 230 sions on how a component algorithm is applied to candidates. recommendation score (RS) of a candidate c for an active user 231 u after applying an algorithm RAk is defined as the position (Pos) of c in candidate list (cl), where 0 6 Pos < n 1. 232 RAkðuÞ¼½c1; c2; ... ; cn ð6Þ RSRAkðuÞðciÞ ¼ n Posðc1; RAkðuÞÞ n ð7Þ 234 235 Formal definitions of component algorithms are as follows 236 Number of followers 237 239 FRðuÞ ¼ jfuser cjFðc; uÞ for all user cgj ð8Þ 240 where F(c,u) indicates a user c follows a user u. 241 Number of weets 242 TW(u) returns the number of tweets posted by a user u, which reflects the activities of a user on the social network. T(u,t) 243 indicates that a user u produces a tweet t. 244 246 TWðuÞ ¼ jftweet tjTðu;tÞgj ð9Þ 247 Common follower 248 The idea of common follower was previously defined in [8], and it has been extended in this study as follows. According to 249 the graph of Fig. 4, B2 has one common follower (A1) of I’s friends, and B1 has two common followers (A1,A2). The 250 assumption is that, the more common followers a candidate is followed by, the more important the candidate can be. 251 Accordingly, B1 gets a higher priority than B2. The number of common followers (CF) is defined as Eq. (10) where u 252 and c indicate an active user and a candidate, respectively. 253 255 CF1ðu; cÞ ¼ jfuser ajFðu; aÞ and Fða; cÞ and UFðu; cÞgj ð10Þ 256 where u! = a! = c. 257 Here is another scenario of common follower. In the same graph, I, C1 and C2 are all common followers to A2. Similarly, if the 258 importance of candidates is determined by the number of common followers, CF scores for D1 and D2 will be 1 and 2, respec- 259 tively. The second CF formula is defined as follows. 260 Fig. 3. A local social graph of active user I. 6 S.J. Yu / Information Sciences xxx (2011) xxx–xxx INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
Ns9298 ARTICLE N PRESS No of Pages 14, Model 3G 12 November 2011 Fig. 4. Common followers. CF2 u, c)=Huser aF(u, v) and f(a, v) and F(a, c)and UF(u,c)l (11) 264 Although CF1 and CFz have been mentioned separately in the work of Golder et al. [8. these can be integrated in one concept 266 as Eq (12). CFu,c=CFI(u,c)+CF2(u, c) 279 For example, according to Eq. (11), CF score of each candidate in the above graph is calculated as Eq. (13). CF(L,B1)=CF1(,B1)+CF2(,B1)=1+0=1 CF(,B2)=CF1(,B2)+CF2(,B2)=2+0=3 CF(,D1)=CF1(D1)+CF2(,D1)=0+1 CF(,D2)=CF1(,D2)+CF2(,D2)=0+2=2 Page Rank Fig 5 is the score result after a social graph of Fig 3 was analyzed by PageRank(Pr)algorithm, which was defined in Eq. (4)in Section 2. According to PR score result, Bl and da become the first and the second most influential candidates in the social network 277· Profile matching Profile matching can be designed by various ways using profile attribute fields. In this study, it was implemented I ime zone information of user profile because it has been conventionally used and effective to calculate geographical proximity. If the same time zone is found in profiles of two users u and c, PM(u, c) function returns 1. otherwise 0 PM(u,c)=1, if Tz is matched 0. otherwise (14) 285 3. 2.3. Competition algorithm In previous stage, one candidate may be differently ranked in multiple candidate lists by different component algorithm. 87 Competition algorithm integrates them into one final list using a series of ranking functions(RF). Initially, multiple candidate 288 lists are merged and sorted And then, If two or more candidates have the same rs, the competition algorithm initially 289 chooses the highest recommendation score(RS) of multiple scores for one candidate using Max ranking functions. If some 90 candidates have still the same score, the next ranking function is applied only to the conflicting ones sequentially. Table 2 91 lists the application order of ranking functions. Mean function chooses a candidate with higher RS on average, and Stdev fun 0072 0072 Fig. 5. Page Rank scores. Please cite this article in press as: s ]. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.do:10.1016ins2011020
262 CF2ðu; cÞ ¼ jfuser ajFðu; vÞ and Fða; vÞ and Fða; cÞ and UFðu; cÞgj ð11Þ 263 where u! = v! = a! = c. 264 Although CF1 and CF2 have been mentioned separately in the work of Golder et al. [8], these can be integrated in one concept 265 as Eq. (12). 266 268 CFðu; cÞ ¼ CF1ðu; cÞ þ CF2ðu; cÞ ð12Þ 269 For example, according to Eq. (11), CF score of each candidate in the above graph is calculated as Eq. (13). 270 CFðI; B1Þ ¼ CF1ðI; B1Þ þ CF2ðI; B1Þ ¼ 1 þ 0 ¼ 1 CFðI; B2Þ ¼ CF1ðI; B2Þ þ CF2ðI; B2Þ ¼ 2 þ 0 ¼ 3 CFðI;D1Þ ¼ CF1ðI;D1Þ þ CF2ðI;D1Þ ¼ 0 þ 1 ¼ 1 CFðI;D2Þ ¼ CF1ðI;D2Þ þ CF2ðI;D2Þ ¼ 0 þ 2 ¼ 2 ð13Þ 272 273 PageRank 274 Fig. 5 is the score result after a social graph of Fig. 3 was analyzed by PageRank (PR) algorithm, which was defined in Eq. 275 (4) in Section 2. According to PR score result, B1 and D2 become the first and the second most influential candidates in the 276 social network. 277 Profile matching 278 Profile matching can be designed by various ways using profile attribute fields. In this study, it was implemented using 279 time zone information of user profile because it has been conventionally used and effective to calculate geographical 280 proximity. If the same time zone is found in profiles of two users u and c, PM(u,c) function returns 1, otherwise 0. 281 PMðu; cÞ ¼ 1; if TZ is matched 0; otherwise ð14Þ 283 284 285 3.2.3. Competition algorithm 286 In previous stage, one candidate may be differently ranked in multiple candidate lists by different component algorithm. 287 Competition algorithm integrates them into one final list using a series of ranking functions (RF). Initially, multiple candidate 288 lists are merged and sorted. And then, If two or more candidates have the same RS, the competition algorithm initially 289 chooses the highest recommendation score (RS) of multiple scores for one candidate using Max ranking functions. If some 290 candidates have still the same score, the next ranking function is applied only to the conflicting ones sequentially. Table 2 291 lists the application order of ranking functions. Mean function chooses a candidate with higher RS on average, and Stdev funcFig. 5. PageRank scores. Fig. 4. Common followers. S.J. Yu / Information Sciences xxx (2011) xxx–xxx 7 INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
NS9298 ARTICLE N PRESS No of Pages 14, Model 3G S. Yu/ Information Sciences xxx(2011)xxx-XXX functions Standard deviation (Stdev) Size of candidate list(cl) tion chooses a candidate with lower standar longer the list is, the higher the ranking rate is. multiple lists, and it is defined formally in from Eq (15)to Eq. (16). DCR(RA1(u), RA2(u) m(u)=c1,C2,…,cn] where RS(cs)> RS()if s<t 98 A final recommendation score(RSn of ct is determined by Eq (16). RS(C)=RFss=RSk(Ci, RAk(u)), for 1<k<mp (16) 302 For example, according to RS generated by PR and FR in Fig. 6(a)and(b), we will have two candidate lists, Clpr(u)and Cf(u) 06 CLpr(u)=[B1, D2, B2, DI], and CLf(u)=[D2, D1, B2, B1 307 because RSp(B1)=(4-(1-1)/4=1.0,RSp(D2)=(4-(2-1)/4=075 Sp(B2)=(4-(3-1)/4=0.5.RSp(D1)=(4-(4-1)/4=025 RSf(B1)=(4-(4-1)/4=025.RSf(D2)=(4-(1-1)/4=1.0 r(B2)=(4-(3-1)/4=05,Rsr(D1)=(4-(2-1)/4=075 and then, if Max function combines these two lists by max- win policy, the final Rs scores will be as follow Rrf(B1)=max{1.0.025}=1.0.RSpr(D2)=max{0.7510}=10 Sprf(B2)=max{0.5.0.5}=0.5,RSrn(D1)=max{0250.75}=0.75 However, because two candidates, B1 and D2, still have the same score, the orders of these conflicting members should be 328 resolved by the next RF(Mean), and conclusively, D2 will be ranked at a higher position than B1. Now, we have the following 329 final list for suggestion to an active md=D2,B1,D1,B2] 333 Table 3 summarizes the procedural results during ranking determination. 裂息。食 0072 (a) PageRank Scores (b) Number of Followers Fig. 6. PageRank scores and number of followers Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.o101016ins2011020
292 tion chooses a candidate with lower standard deviation of RS. Cl Chooses a candidate with longer candidate list because the 293 longer the list is, the higher the ranking rate is. Finally, DCR algorithm generates an integrated candidate list for a user u from 294 multiple lists, and it is defined formally in from Eq. (15) to Eq. (16). 295 DCRðRA1ðuÞ; RA2ðuÞ; ... ; RAmðuÞÞ ¼ ½c1; c2; ... ; cn where RSfðcsÞ P RSfðctÞ if s < t ð15Þ 297 298 A final recommendation score (RSf) of ci is determined by Eq. (16). 299 RSfðciÞ ¼ RFjfsjs ¼ RSkðci; RAkðuÞÞ; for 1 < k 6 mg; where RAk 2 fPR; CF; TW; FR; PFg and RFj 2 fMax; Mean; Stdev; CLg: ð16Þ 301 302 For example, according to RS generated by PR and FR in Fig. 6(a) and (b), we will have two candidate lists, Clpr(u) and Clfr(u) 303 which are 304 306 CLprðuÞ¼½B1;D2; B2;D1; and CLfrðuÞ¼½D2;D1; B2; B1 307 because 308 310 RSprðB1Þ¼ð4 ð1 1ÞÞ=4 ¼ 1:0; RSprðD2Þ¼ð4 ð2 1ÞÞ=4 ¼ 0:75 311 313 RSprðB2Þ¼ð4 ð3 1ÞÞ=4 ¼ 0:5; RSprðD1Þ¼ð4 ð4 1ÞÞ=4 ¼ 0:25 314 316 RSfrðB1Þ¼ð4 ð4 1ÞÞ=4 ¼ 0:25; RSfrðD2Þ¼ð4 ð1 1ÞÞ=4 ¼ 1:0 317 319 RSfrðB2Þ¼ð4 ð3 1ÞÞ=4 ¼ 0:5; RSfrðD1Þ¼ð4 ð2 1ÞÞ=4 ¼ 0:75 320 and then, if Max function combines these two lists by max-win policy, the final RS scores will be as follows. 321 323 RSpr;frðB1Þ ¼ maxf1:0; 0:25g ¼ 1:0; RSpr;frðD2Þ ¼ maxf0:75; 1:0g ¼ 1:0 324 326 RSpr;frðB2Þ ¼ maxf0:5; 0:5g ¼ 0:5; RSpr;frðD1Þ ¼ maxf0:25; 0:75g ¼ 0:75 327 However, because two candidates, B1 and D2, \still have the same score, the orders of these conflicting members should be 328 resolved by the next RF (Mean), and conclusively, D2 will be ranked at a higher position than B1. Now, we have the following 329 final list for suggestion to an active user. 330 332 CLfinal ¼ ½D2; B1;D1; B2 333 Table 3 summarizes the procedural results during ranking determination. Table 2 Application order of ranking functions. Order Ranking functions Priority 1 Maximum (Max) Higher 2 Average (Mean) Higher 3 Standard deviation (Stdev) Lower 4 Size of candidate list (cl) Larger Fig. 6. PageRank scores and number of followers. 8 S.J. Yu / Information Sciences xxx (2011) xxx–xxx INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
Ns9298 ARTICLE N PRESS No of Pages 14, Model 3G 12 November 2011 S Yu/Information Sciences xxx(2011)xxx-xo Table 3 Candidate lists generated by ranking functions B2{0.5,0.5 D1{025 0.75,1.0 Candidate ax 0.75 B1,D2.D1,B2 0875 D2.B1,D1,B2 334 4 Experiment and evaluation 335 4.1. Experimental environments To evaluate the performance of DCR algorithm, we performed experiments for measuring hit ratio and precision of rec- 337 ommendation algorithm using an Open Twitter dataset from[22]. This dataset has a very long list of following information 338 stating aaa bbb' which means aaa is followed by bbb. As listed in Table 4, the Twitter dataset was divided into two train 339 dataset and test dataset at a ratio of 8: 2 which is the best ratio for evaluation[26]. Here are the procedures of experiment. 340 For measurement of hit ratio and precision, 100 Twitter users who most followed others were chosen from test dataset. A 41 test bed system for experiments has been implemented using Python 2.7.1 on Windows 7 environment. The summary of 342 experimental data sets is described in Table 4. An initial candidate pool for an experimental user is created with 44 corresponding candidate lists, and they are integrated using ranking functions of competition algorithm With 345 date list, the system measures how many and how correctly actual followers of an active user are suggestedfina ada ble 5 lists single and composite component algorithms for the experiments. As primary single algorithm, PR,CF TW, PF, FR were included, and they were combined in from two to four in order to confirm the impact of combination of 349 4.2. Recommendation hit ratio Recommendation hit is defined as an event where a suggested person by algorithm was actually followed by active user. 51 This study also defines Recommendation hit ratio(rHr)as the number of recommendation hits(rh)to the number of actual people followed by active user (FP), where an algorithm RAk is applied to a user u. RHR(RA(u)) RH(RAk(u)) 356 Fig. 7 shows RHR measurement results by algorithm and its combination RhR were measured with top 500 and top 1000 358 and tw and prpf showed the second and the third best results. As shown in lower graph, all DCR algorithms(double, triple, 59 and quadruple) recorded higher RHR compared to single algorithms. In general, DCR algorithms improved 7% of RHr on age in comparison to single algorithms. In combination of DCR algorithm, double, triple and quadruple combinations able 4 Specification of experimental datasets. Train dataset ng edges(links) 668432 167,109 Average number of followers per Table 5 Experimental algorithms PageRank (pr) DCR prd.prtw.pf可 Quad prcfrwfr, prpfci Please cite this article in press as: s ]. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.do:10.1016ins2011020
334 4. Experiment and evaluation 335 4.1. Experimental environments 336 To evaluate the performance of DCR algorithm, we performed experiments for measuring hit ratio and precision of rec- 337 ommendation algorithm using an Open Twitter dataset from [22]. This dataset has a very long list of following information 338 stating ‘aaa bbb’ which means aaa is followed by bbb. As listed in Table 4, the Twitter dataset was divided into two train 339 dataset and test dataset at a ratio of 8:2 which is the best ratio for evaluation [26]. Here are the procedures of experiment. 340 For measurement of hit ratio and precision, 100 Twitter users who most followed others were chosen from test dataset. A 341 test bed system for experiments has been implemented using Python 2.7.1 on Windows 7 environment. The summary of 342 experimental data sets is described in Table 4. An initial candidate pool for an experimental user is created with socially 343 linked users who are within two or three degree distance from an active user. And then, component algorithms generate 344 corresponding candidate lists, and they are integrated using ranking functions of competition algorithm. With final candi- 345 date list, the system measures how many and how correctly actual followers of an active user are suggested. 346 Table 5 lists single and composite component algorithms for the experiments. As primary single algorithm, PR, CF, TW, PF, 347 and FR were included, and they were combined in from two to four in order to confirm the impact of combination of 348 algorithms. 349 4.2. Recommendation hit ratio 350 Recommendation hit is defined as an event where a suggested person by algorithm was actually followed by active user. 351 This study also defines Recommendation hit ratio (RHR) as the number of recommendation hits (RH) to the number of actual 352 people followed by active user (FP), where an algorithm RAk is applied to a user u. 353 RHRðRAkðuÞÞ ¼ RHðRAkðuÞÞ FPðuÞ ð17Þ 355 356 Fig. 7 shows RHR measurement results by algorithm and its combination. RHR were measured with top 500 and top 1000 357 ranking lists. As shown in graph, RHR increased in top 1000 ranking list. In algorithms, pfcf recorded the best hit ratio, 358 and tw and prpf showed the second and the third best results. As shown in lower graph, all DCR algorithms (double, triple, 359 and quadruple) recorded higher RHR compared to single algorithms. In general, DCR algorithms improved 7% of RHR on aver- 360 age in comparison to single algorithms. In combination of DCR algorithm, double, triple, and quadruple combinations Table 3 Candidate lists generated by ranking functions. Order RF B1 {1.0, 0.25} B2 {0.5, 0.5} D1 {0.25, 0.75} D2 {0.75, 1.0} Candidate list 1 Max 1.0 0.5 0.75 1.0 [B1, D2, D1, B2] 2 Mean 0.625 – – 0.875 [D2, B1, D1, B2] Table 4 Specification of experimental datasets. Category Train dataset Test dataset Number of users (nodes) 391,856 126,083 Following edges (links) 668,432 167,109 Average number of followers per user 1.70 1.325 Table 5 Experimental algorithms. Category Algorithm Single PageRank (pr) Common follower (cf) Number of tweets (tw) Profile matching (pf) Number of followers (fr) DCR Double prcf, prtw, pfcf, prpf, frcf, frpf Triple prcftw, prcffr, pfcffr Quad prcftwfr, prpfcffr S.J. Yu / Information Sciences xxx (2011) xxx–xxx 9 INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020
ARTICLE N PRESS No of Pages 14, Model 3G er2011 S. Yu/ Information Sciences xxx(2011)xxx-XXx Mean(hit ratio) 50 040 top 500 120 top 1000 Mean(hit ratio) 0.30 020 ad Fig. 7. Recommendation hit ratio by algorithm(upper)and by algorithm comb 361 enhanced rhr by 8%, 7%, and 6%, respectively Of DCR algorithms, double combinations were the best on average, and fol- 362 lowed by triple and quadruple ones. 363 4.3. Recommendation precision Recommendation precision(RP)refers to how accurately an algorithm suggests candidates to an active user Recommen- 65 dation score(RS)is calculated by the position of a hit candidate in the list, and rP is defined as the ratio of average RS of all hit 366 candidates( Chit)to candidate list size RP(U MEAN(RSRA(Chit) 100 370 where L is the size of candidate list generated by RAk MEAN(RSRA, ( Chit ) 74 where h is the number of recommendation hits Fig 8 is the results of RP measured by algorithm. prcftw and prow showed the best and the second best results, respectively 376 They were followed by tw single algorithm In combination of algorithms, triple combination provided the best result and 377 double one was the next. Although average precision of dCr algorithms was slightly better than that of single algorithms Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011.o101016ins2011020
361 enhanced RHR by 8%, 7%, and 6%, respectively. Of DCR algorithms, double combinations were the best on average, and fol- 362 lowed by triple and quadruple ones. 363 4.3. Recommendation precision 364 Recommendation precision (RP) refers to how accurately an algorithm suggests candidates to an active user. Recommen- 365 dation score (RS) is calculated by the position of a hit candidate in the list, and RP is defined as the ratio of average RS of all hit 366 candidates (chit) to candidate list size. 367 RPðuÞ ¼ 1 MEANðRSRAk ðchitÞÞ LðRAkðuÞÞ 100 ð18Þ 369 370 where L is the size of candidate list generated by RAk. 371 MEANðRSRAk ðchitÞÞ ¼ Ph i¼1RSRAk ðciÞ h ð19Þ 373 374 where h is the number of recommendation hits. 375 Fig. 8 is the results of RP measured by algorithm. prcftw and prtw showed the best and the second best results, respectively. 376 They were followed by tw single algorithm. In combination of algorithms, triple combination provided the best result and 377 double one was the next. Although average precision of DCR algorithms was slightly better than that of single algorithms, Fig. 7. Recommendation hit ratio by algorithm (upper) and by algorithm combination (lower). 10 S.J. Yu / Information Sciences xxx (2011) xxx–xxx INS 9298 No. of Pages 14, Model 3G 12 November 2011 Please cite this article in press as: S.J. Yu, The dynamic competitive recommendation algorithm in social network services, Inform. Sci. (2011), doi:10.1016/j.ins.2011.10.020