WCCI 2010 IEEE World Congress on Computational Intelligence uly, 18-23, 2010-CCIB, Barcelona, Spai CEC IEEE A Graph-Based Friend Recommendation System Using genetic algorithm Nitai B Silva, Ing-Ren Tsang, George D.C. Cavalcanti, and Ing-Jyh Tsang Abstract-A social network is composed by communities of used to develop novel applications such asa individuals or organizations that are connected by a common recommendation system. interest Online social networking sites like Twitter, Facebook With the increase of the e-commerce. recommendations and Orkut are among the most visited sites in the Internet. Presently, there is a great interest in trying to understand the systems have been of great interest. This is due to the complexities of this type of network from both theoretical and possibility of increase sell obtained from success applied point of view. The understanding of these social recommendation. Sites that offer different products such as network graphs is important to improve the current social books, clothes and movies, most often also provides network systems, and also to develop new applications. Here, recommendations based on previous brought products. The weproposeafriendrecommendationsystemforsocialnetworkNetflixprize(http://www.netflixprize.com)isanexampleof based on the topology of the network graphs. The topology of continuous interesting in this field[14]. The problem of on. or In more We developed an algorithm that analyses the sub-graph global context information, is growing in both commercial composed by a user and all the others connected people and academic research interest rately by three degree of separation. However, only users Here, we proposed a friend recommendation system that suggested as a friend. The algorithm uses the patterns defined suggests new links between user nodes within the network. separated by two degree of separation are candidates to be by their connections to find those users who have similar The central problem can be viewed as a procedure to propose relevant parameters for nodes relationship using the developed based on the characterization and analyses of the information from the social network topology and statistical network formed by the user's friends and friends-of-friends properties obtained by using classical metrics of complex (FOF) networks. Even though, topology based approaches fo have already been . INTRODUCTION other researchers [11],[12], [13, we proposed a different the last few years, social networks have been increasing clustering indexes and a novel user calibration procedure both size and services. Social networking services using Genetic Algorithm(GA) (SNSs) such as Facebook, MySpace, Twitter, Flickr, The rest of the paper is organized as follows: In Section 2, YouTube and Orkut are growing in popularity and we briefly describe the Oro-Aro social network used to mportance and to some extent they are also contributing to a analyze the proposed recommendation system In section 3, change in human social behavior. Some of these SNSs the recommendation mechanism is explained in details. The already provide a service to recommend friends, even though process is divided in two erin the method used is not disclosed, we believe that an FoF network measurements are also defined and three important approach is mostly used. The topology of this type of indexes are introduced. In Section 4, we describe the network has been measured and analyzed by different experiments using the proposed system in the Oro-Aro researches[1 ],[2],[3]. Some interesting structural properties comparisons of the obtained samples. Finally, the such as power-law, small-world and scale-free network concluding remarks are presented in Section 6. characteristics have been reported [1]. Also the topological patterns of activities and the structure and evolution of IL. THE ORO-ARO SOCIAL NETWORK online social networks have been studied 3.7 A social network is a structured community of individuals Knowledge of the structure and topology of these complex or organization composed of nodes that are connected networks combined with quantitative properties such as size, through one or more particular kind of interdependence,like density, average path length or cluster coefficient can b ralues. ideas p conflict, and trading [4],[5]. Analysis and measurements of by FAcEPe- Fundacao de amparo a ciencia e tecnologia do estado de social networks examines the social relations in terms of Nitai B Silva, Ing-Ren Tsang and George C. D Cavalcanti are with the individual users of the system, and connections correspo Federal University of Pernambuco (UFPE), Center of Informatics(CIn), to the relations between the users of the snss Av. Prof. Luis Freire. s/n. Cidade Universitaria. CEP In our experiments, we used the data obtained from the Ing-jyhTsangiswithAlcatel-lucent,Bell-iabs,Copernicuslan50.Oro-arosocialnetwork(http://www.oro-aro.com)thatwas 2018Antwerp,Belgium(e-mail:ing-jyhtsang@alcatel-lucent.com). 978-1-4244-8126-2/10/$26.00⊙2010IEEE
Abstract—A social network is composed by communities of individuals or organizations that are connected by a common interest. Online social networking sites like Twitter, Facebook and Orkut are among the most visited sites in the Internet. Presently, there is a great interest in trying to understand the complexities of this type of network from both theoretical and applied point of view. The understanding of these social network graphs is important to improve the current social network systems, and also to develop new applications. Here, we propose a friend recommendation system for social network based on the topology of the network graphs. The topology of network that connects a user to his friends is examined and a local social network called Oro-Aro is used in the experiments. We developed an algorithm that analyses the sub-graph composed by a user and all the others connected people separately by three degree of separation. However, only users separated by two degree of separation are candidates to be suggested as a friend. The algorithm uses the patterns defined by their connections to find those users who have similar behavior as the root user. The recommendation mechanism was developed based on the characterization and analyses of the network formed by the user’s friends and friends-of-friends (FOF). I. INTRODUCTION N the last few years, social networks have been increasing in both size and services. Social networking services (SNSs) such as Facebook, MySpace, Twitter, Flickr, YouTube and Orkut are growing in popularity and importance and to some extent they are also contributing to a change in human social behavior. Some of these SNSs already provide a service to recommend friends, even though the method used is not disclosed, we believe that an FOF approach is mostly used. The topology of this type of network has been measured and analyzed by different researches [1], [2], [3]. Some interesting structural properties such as power-law, small-world and scale-free network characteristics have been reported [1]. Also the topological patterns of activities and the structure and evolution of online social networks have been studied [3], [7]. Knowledge of the structure and topology of these complex networks combined with quantitative properties such as size, density, average path length or cluster coefficient can be Manuscript received February 8, 2010. This work was supported in part by FACEPE – Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco. Nitai B. Silva, Ing-Ren Tsang and George C. D. Cavalcanti are with the Federal University of Pernambuco (UFPE), Center of Informatics (CIn), Av. Prof. Luis Freire, s/n, Cidade Universitária, CEP 50740-540 (phone: +55 81 2126 8430; e-mail: {nbs,tir,gdcc}@ cin.ufpe.br). Ing-Jyh Tsang is with Alcatel-Lucent, Bell-Labs, Copernicuslaan 50, 2018 Antwerp, Belgium (e-mail: ing-jyh.tsang@alcatel-lucent.com). used to develop novel applications such as a recommendation system. With the increase of the e-commerce, recommendations systems have been of great interest. This is due to the possibility of increase sell obtained from success recommendation. Sites that offer different products such as books, clothes and movies, most often also provides recommendations based on previous brought products. The Netflix prize (http://www.netflixprize.com) is an example of continuous interesting in this field [14]. The problem of product, service, and friend recommendation, or in more global context information, is growing in both commercial and academic research interest. Here, we proposed a friend recommendation system that suggests new links between user nodes within the network. The central problem can be viewed as a procedure to propose relevant parameters for nodes relationship using the information from the social network topology and statistical properties obtained by using classical metrics of complex networks. Even though, topology based approaches for recommendation systems have already been suggested by other researchers [11], [12], [13], we proposed a different clustering indexes and a novel user calibration procedure using Genetic Algorithm (GA). The rest of the paper is organized as follows: In Section 2, we briefly describe the Oro-Aro social network used to analyze the proposed recommendation system. In section 3, the recommendation mechanism is explained in details. The process is divided in two phase filtering and ordering, some network measurements are also defined and three important indexes are introduced. In Section 4, we describe the experiments using the proposed system in the Oro-Aro social network. Also, we describe some estimates and comparisons of the obtained samples. Finally, the concluding remarks are presented in Section 6. II. THE ORO-ARO SOCIAL NETWORK A social network is a structured community of individuals or organization composed of nodes that are connected through one or more particular kind of interdependence, like values, ideas, interests, business, friendships, kinship, conflict, and trading [4], [5]. Analysis and measurements of social networks examines the social relations in terms of nodes and connections. Nodes in such network represent individual users of the system, and connections correspond to the relations between the users of the SNSs. In our experiments, we used the data obtained from the Oro-Aro social network (http://www.oro-aro.com) that was A Graph-Based Friend Recommendation System Using Genetic Algorithm Nitai B. Silva, Ing-Ren Tsang, George D.C. Cavalcanti, and Ing-Jyh Tsang I WCCI 2010 IEEE World Congress on Computational Intelligence July, 18-23, 2010 - CCIB, Barcelona, Spain CEC IEEE 978-1-4244-8126-2/10/$26.00 c 2010 IEEE 233
developed by the Recife Center for ced Studies and information on the topological characterization of the Oro- Systems(c.e.s.a.r)(http://Www.ceSa This social Aro SNS network is located in Brazil and it w ed with the The recommendation mechanism procedure utilizes the intention to facilitate the exchange of experiences of the graph topology of the SNs to filter and order a set of node student of the Center of Informatics and the software that have some important properties in relation to a given engineers at CESAR. The network is composed of a total of node V. The nodes of the resulting set are recommendations 634 nodes and 5076 edges of new edges that should be connected to the node Vi. The Some preprocessing was applied on the data, so that the creation of these new edges is used to improve the properties proposed algorithm could be implemented. The Oro-Aro of the node V, besides providing benefits to the entire allows the creation of a one-way relationship, i.e. a user can network in terms of friendship connection add another user to his list of contact without the need for The process of recommendation is divided in two steps: a ne other user to approve the link. This kind of network is filtering procedure followed by an ordering. Filtering is an similar to twitter, a microblog network. Therefore a filter important step, because it separates the nodes with higher was used to remove all one-way relationship, i.e. who did possibilities to be a recommendation, consequently reducing not have an inverse relationship. The procedure reduced by the total number of nodes to be processed in the network. 29% the number of edges and 8% in the number of nodes The ordering step considers some properties to put the most (isolated sub-networks were removed). This procedure was relevant nodes in the top of the resulting list. As expected, necessary to obtain a network with symmetric connections: the result depends on the user the recommendation is also most of the social networks just allow two-ways generated for. Therefore, we applied an adaptive solution relationshi using Genetic Algorithm Figure 1 shows a graphical representation of the Oro-Aro social network. Each node represents a user and the edges a two-way relationship between users Fig. 2. The graph shows the degree of the node versus the frequency ccurrence. This figure demonstrates the behavior of connected users in the Oro. Aro social network A. Filter The algorithm used to perform the filtering procedure employs the concept of the clustering coefficient, which is haracteristic in small world networks [1. Using the natural idea that "It's more probable that you know a friend of your presents a use havin theo ion of the oro Aro relationships betwee networks, eacn node friend than any other random person"as stated by Mitchell in [4], the filtering step is restricted to select nodes adjacent II. RECOMMENDATION MECHANISM to each node that is adjacent to the central node of the recommendation process. All nodes that can be reached with he proposed friend recommendation system is based on two hops are considered the structural properties of social networks. The topological Figure 3 shows an example of this network for a single haracteristics, the information an le metrics are derived user. The node ni is the central element of the global from the complex network theory. It is observed that these recommendation process. The nodes within circle"A"are types of networks are defined as being either small-world or directly connected to the node n The nodes within circle scale free [1],[2], [3]. This characteristic can be used to the "B"and outside circle"A"are selected by the filtering development of a reliable recommendation mechanism rocedure Figure 2 shows a plot of the degree of the node versus the e. This plot provides some
developed by the Recife Center for Advanced Studies and Systems (C.E.S.A.R) (http://www.cesar.org.br/ network is located in Brazil and it was developed with the intention to facilitate the exchange of experiences student of the Center of Informatics and the software engineers at CESAR. The network is composed 634 nodes and 5076 edges. Some preprocessing was applied on the proposed algorithm could be implemented. The Oro allows the creation of a one-way relationship, i.e. a user can add another user to his list of contact without the the other user to approve the link. This kind of network is similar to twitter, a microblog network. Therefore a filter was used to remove all one-way relationship, i.e. who did not have an inverse relationship. The procedure reduced by 29% the number of edges and 8% in the number of nodes (isolated sub-networks were removed). This procedure was necessary to obtain a network with symmetric also most of the social networks just relationship. Figure 1 shows a graphical representation of the Oro social network. Each node represents a user and the edges a two-way relationship between users. Fig. 1. Visual representation of the Oro-Aro social networks, each node represents a user having the relationships between users III. RECOMMENDATION MECHANISM The proposed friend recommendation system is based on the structural properties of social networks. The topological characteristics, the information and the metrics from the complex network theory. It is observed types of networks are defined as being either scale free [1], [2], [3]. This characteristic can be development of a reliable recommendation Figure 2 shows a plot of the degree of the node vers frequency of occurrence. This plot provides some developed by the Recife Center for Advanced Studies and http://www.cesar.org.br/). This social located in Brazil and it was developed with the facilitate the exchange of experiences of the nformatics and the software composed of a total of the data, so that the proposed algorithm could be implemented. The Oro-Aro way relationship, i.e. a user can without the need for to approve the link. This kind of network is . Therefore a filter way relationship, i.e. who did not have an inverse relationship. The procedure reduced by the number of nodes . This procedure was a network with symmetric connections; also most of the social networks just allow two-ways graphical representation of the Oro-Aro social network. Each node represents a user and the edges a Aro social networks, each node users show a link. MECHANISM The proposed friend recommendation system is based on s. The topological metrics are derived observed that these either small-world or can be used to the recommendation mechanism. Figure 2 shows a plot of the degree of the node versus the frequency of occurrence. This plot provides some information on the topological characterization of the Oro Aro SNS. The recommendation mechanism procedure utilizes the graph topology of the SNS to filter and order a set of that have some important properties in relation to a given node v . The nodes of the resulting set are of new edges that should be connec creation of these new edges is used to improve of the node v , besides providing network in terms of friendship connection. The process of recommendation is filtering procedure followed by an ordering. Filtering is an important step, because it separates the nodes with higher possibilities to be a recommendation, the total number of nodes to be processed The ordering step considers some properties to put the most relevant nodes in the top of the resulting list. the result depends on the user the generated for. Therefore, we applied using Genetic Algorithm. 0 20 40 60 80 100 120 140 1 3 5 7 9 11 13 15 17 19 21 23 frequency degree Fig. 2. The graph shows the degree of the node versus the frequency of occurrence. This figure demonstrates the behavior of connected users in the Oro-Aro social network. A. Filtering The algorithm used to perform the filtering procedure employs the concept of the clustering coefficient, which is characteristic in small world networks idea that “It’s more probable that you know a friend of your friend than any other random person” as stated by Mitchell in [4], the filtering step is restricted to to each node that is adjacent to the central node of the recommendation process. All nodes that can be reached with two hops are considered. Figure 3 shows an example of this network for a single user. The node ni is the central element of the recommendation process. The nodes directly connected to the node ni . The “B” and outside circle “A” are selected by the filtering procedure. information on the topological characterization of the OroThe recommendation mechanism procedure utilizes the graph topology of the SNS to filter and order a set of node rtant properties in relation to a given of the resulting set are recommendations connected to the node v . The creation of these new edges is used to improve the properties , besides providing benefits to the entire network in terms of friendship connection. e process of recommendation is divided in two steps: a filtering procedure followed by an ordering. Filtering is an important step, because it separates the nodes with higher to be a recommendation, consequently reducing nodes to be processed in the network. The ordering step considers some properties to put the most in the top of the resulting list. As expected, the user the recommendation is applied an adaptive solution 23 26 29 31 33 36 38 61 123 the degree of the node versus the frequency of the behavior of connected users in the The algorithm used to perform the filtering procedure the concept of the clustering coefficient, which is characteristic in small world networks [1]. Using the natural hat “It’s more probable that you know a friend of your friend than any other random person” as stated by Mitchell , the filtering step is restricted to select nodes adjacent node that is adjacent to the central node of the All nodes that can be reached with 3 shows an example of this network for a single is the central element of the global nodes within circle “A” are . The nodes within circle “B” and outside circle “A” are selected by the filtering 234
B. Ordering clustering coefficient of node V. The density is calculated as The ordering procedure consists on the measurement of the the real size of edges where the origin and destiny nodes are coefficients and their normalization by using an adjustment in C divided by the quantity of edges of the clique( complete mechanism. It is not different from a regular ordering graph)formed by all the nodes contained in C mechanism, since it also use only one numeric value related to each node to be ordered. The indexing value belonging to ∑ Ec ojec(M1)) ach node is a result of a process that measures the (C|*(C|-1)/2 nteraction strength between that node and the central node of the entire recommendation process The measurement of this interaction is chosen as the result where Mi is the element ij of the adjacency matrix of a weighted average among three independent indexes. 1) First Index These indexes measure specific properties of a sub-graph he first index is defined as the number of adjacent nodes omposed by the nodes that are analyzed. Three indicators that are linked at the same time to node i and node j, where i are analyzed in this present article. However other index can is the center node of the analysis and j is the node that is still be used to improve the precision of the weighting values. These metrics were used because of its simplicity being ordered for the recommendation system. So, we have and the intuitive ideas of how friend should be connected In the literature of complex networks several metrics to evaluate the strength of interaction between two nodes were h1i=C:n presented[4, [8], [9]. We choose the friend-of-friends In a social networks this index measures the quantity of one of the three measurements. Inspired by the concept of common mends between person i and person J lustering coefficient, we also applied two other metrics. 2) Second index Both of them measure the clustering of a set of nodes that The second index refers to the density of the result links the two nodes in question. The referred nodes are the measured by the first index recommended node. Those measures are chosen b the simplicity and linkage connection between the its possible recommended friend This index measures the cohesion level inside the " small group formed by the common friends of person i and person B If this index has a small value then the people inside this group are not well-related. Figure 4 shows a sub-graph representing common friends between tw his connected friends and friend-of-friends i ocial network. The can be spit in two different regions. The re er and his friends. Region B represents friend Fig. 4. Sub-graph of the relationship from white and black vertex. The Some variable and concept are defined to derive the regio %diacent vertexes important for both users are shown in the circled friends of the main user dexes used for the friend recommendation system. First we define Ci as the set of the nodes adjacent to node vi and Dc 3 )Third Index are defined as the adjacency density among the node The third index is a variation of the second. This index contained in the C set. This measure is inspired in the measures the density of the group formed by the adjacent clustering coefficient definition, whereas Dc is the vertices of node n and node n Instead of an intersection
B. Ordering The ordering procedure consists on the measurement of the coefficients and their normalization by using an adjustment mechanism. It is not different from a regular ordering mechanism, since it also use only one numeric value related to each node to be ordered. The indexing value belonging to each node is a result of a process that measures the interaction strength between that node and the central node of the entire recommendation process. The measurement of this interaction is chosen as the result of a weighted average among three independent indexes. These indexes measure specific properties of a sub-graph composed by the nodes that are analyzed. Three indicators are analyzed in this present article. However other index can still be used to improve the precision of the weighting values. These metrics were used because of its simplicity and the intuitive ideas of how friend should be connected. In the literature of complex networks several metrics to evaluate the strength of interaction between two nodes were presented [4], [8], [9]. We choose the friend-of-friends (FOF) concept, which is a simple and widely used idea as one of the three measurements. Inspired by the concept of clustering coefficient, we also applied two other metrics. Both of them measure the clustering of a set of nodes that links the two nodes in question. The referred nodes are the node that will receive the recommendation and the possible recommended node. Those measures are chosen based on the simplicity and linkage connection between the user and its possible recommended friend. Fig. 3. A visual example of a sub-network showing the links between single users in relation to his connected friends and friend-of-friends in a social network. The network can be spit in two different regions. The region A represents the central user and his friends. Region B represents friends of friends of the main user. Some variable and concept are defined to derive the indexes used for the friend recommendation system. First we define C as the set of the nodes adjacent to node v and D are defined as the adjacency density among the node contained in the C set. This measure is inspired in the clustering coefficient definition, whereas is the clustering coefficient of node v . The density is calculated as the real size of edges where the origin and destiny nodes are in C divided by the quantity of edges of the clique (complete graph) formed by all the nodes contained in C. = ∑∈(∑∈( )) (|| ∗ (|| − 1))⁄2 where is the element ij of the adjacency matrix. 1) First Index The first index is defined as the number of adjacent nodes, that are linked at the same time to node i and node j, where i is the center node of the analysis and j is the node that is being ordered for the recommendation system. So, we have: = ∩ In a social networks this index measures the quantity of common friends between person i and person j. 2) Second Index The second index refers to the density of the result measured by the first index: = ∩ This index measures the cohesion level inside the “small” group formed by the common friends of person i and person j. If this index has a small value then the people inside this group are not well-related. Figure 4 shows a sub-graph representing common friends between two users, ni and nj . Fig. 4. Sub-graph of the relationship from white and black vertex. The common adjacent vertexes important for both users are shown in the circled region. 3) Third Index The third index is a variation of the second. This index measures the density of the group formed by the adjacent vertices of node ni and node nj . Instead of an intersection, 235
there is a union of the two adjacencies set as shown in a measure of rightness. Since the ordering procedure Figure 5 and represented by positions to each node, the mean positions of that are already related to the central node, is the function value. The smaller this value is the better is the B3u= de weighting set being considered This index measures the cohesion between the " bisset weigth formed by the friends of i and friends of j. A good index 2 weigth weigth does not necessarily means a good index 3. In other words, a rson that belongs to my work environment, for example Parent110011100101000100111 does not necessarily have friends that is relate in any way friends In gener ral. So the second index betwe een me Parent210001100101110101010101 and a work friend can appear strong, but as their friend have no relationship with my friends, the third index will 101011000010101001001101 Children100001110010101001001101 C. Index Weight Calibration Using Genetic Algorithm The calibration step is the procedure that combines the 101011001011100001001101 multiple indexes, three in this case, into a single value. This value is used to obtain the final set of ordered results for the ecommendation mechanism. A simple procedure to obtain this value is to use a weighted average among the indexes. Fig. 6. Generation of several children strings obtained from two parents However, an Genetic Algorithm is used to optimize The crossover of the gene from the parents is randomly obtained dexes value so to obtain the best ordered result A self-adjustment mechanism should be more practic he purpose of the recommendation system. This mechanism should choose different indexes for different users. Since each user has his particular relationships pattern, preserving these patterns is the best choice for a recommendation mechanism for it to be more effective. A self-adjustment mechanism needs to be able to calibrate the weights in relation to the optimization function, which is given by n A(n, w)=I(ne,n).W,+I2 (nc,n).w2+I3(nc, n).w The calibration step can be defined as an optimization problem. In this case, we use a genetic algorithm to solve this optimization. The fitness function is defined in the above equation, where Ii, represents the index and wi the weights that we wish to optimize order to find the solution. Each individual of the population is composed of 3 Fig.5.The rounded vertex, minus the black and white vertex, composes the weight of each index. The evolutionary process is done with analyzed set on third index. [10]. Each new generation replaces the worst individuals by The weight calibration of each single index must be children of the best individuals. The crossover is done in a adjusted to obtain an optimized result. In this case, single random cutoff point as shown in Figure 6. Each pair optimization means classifying the most important users in of parents can produce many children as long as there are no the beginning of the list. The importance of a user on the duplicates in generation. After a generation of children every social network depends in the context. This context can bit of the new string is passed through the process of change depending on the user who is searching for the mutation. A high mutation rate was applied, 4%o for each bit, recommendation. Therefore, the optimization function must enough to prevent the loss of diversity in the population, consider the existing relationship structure of the user which could lead to premature convergence. The initial To create the optimization function for the weights, a population contained 200 individuals randomly generated modification has been proposed in the filtering step The The algorithm until no improvement occurs in the filtering step needs also to include the nodes directly related fitness of the best individual for 4 consecutive generation to the central node of the global recommendation process The fitness function used the classification of these nodes as
there is a union of the two adjacencies set as shown in Figure 5 and represented by: = ∪ This index measures the cohesion between the “big” set formed by the friends of i and friends of j. A good index 2 does not necessarily means a good index 3. In other words, a person that belongs to my work environment, for example, does not necessarily have friends that is relate in any way with my friends in general. So the second index between me and a work friend can appear strong, but as their friends have no relationship with my friends, the third index will appear weak. C. Index Weight Calibration Using Genetic Algorithm The calibration step is the procedure that combines the multiple indexes, three in this case, into a single value. This value is used to obtain the final set of ordered results for the recommendation mechanism. A simple procedure to obtain this value is to use a weighted average among the indexes. However, an Genetic Algorithm is used to optimize the indexes value so to obtain the best ordered results. Fig. 5. The rounded vertex, minus the black and white vertex, composes the analyzed set on third index. The weight calibration of each single index must be adjusted to obtain an optimized result. In this case, optimization means classifying the most important users in the beginning of the list. The importance of a user on the social network depends in the context. This context can change depending on the user who is searching for the recommendation. Therefore, the optimization function must consider the existing relationship structure of the user. To create the optimization function for the weights, a modification has been proposed in the filtering step. The filtering step needs also to include the nodes directly related to the central node of the global recommendation process. The fitness function used the classification of these nodes as a measure of rightness. Since the ordering procedure defines positions to each node, the mean positions of these nodes that are already related to the central node, is the fitness function value. The smaller this value is the better is the weighting set being considered. Fig. 6. Generation of several children strings obtained from two parents. The crossover of the gene from the parents is randomly obtained. A self-adjustment mechanism should be more practical for the purpose of the recommendation system. This mechanism should choose different indexes for different users. Since each user has his particular relationships pattern, preserving these patterns is the best choice for a recommendation mechanism for it to be more effective. A self-adjustment mechanism needs to be able to calibrate the weights in relation to the optimization function, which is given by: (!, #) = (!$ , !). # + (!$ , !). # + (!$ , !). # The calibration step can be defined as an optimization problem. In this case, we use a genetic algorithm to solve this optimization. The fitness function is defined in the above equation, where , represents the index and # the weights that we wish to optimize in order to find the best solution. Each individual of the population is composed of 3 bytes. Each byte, ranging from 0 to 255 represents the weight of each index. The evolutionary process is done with a binary genetic algorithm with uniform state replacement [10]. Each new generation replaces the worst individuals by children of the best individuals. The crossover is done in a single random cutoff point as shown in Figure 6. Each pair of parents can produce many children as long as there are no duplicates in generation. After a generation of children every bit of the new string is passed through the process of mutation. A high mutation rate was applied, 4% for each bit, enough to prevent the loss of diversity in the population, which could lead to premature convergence. The initial population contained 200 individuals randomly generated. The algorithm runs until no improvement occurs in the fitness of the best individual for 4 consecutive generations. 236
TABLE I In the first step a set of candidate is chosen. This set is EVOLUTION OF 3 GENERATIONS IN GENETIC ALGORITHM omposed by users with higher chances to be recommended Iteration Optimization in relation to the central user. In the second step, the index measures between each candidate and the central user is 58 27, 341 evaluated once, then the genetic algorithm is used to select the best weight combination to balance the indexes for each 7207 叫2700 candidate. Below the summary of the main code used for th 12 147 18,706 genetic algorithm procedure is presented. 8400 The main procedure that return the best weight combination for recommend friends for a specific user: 142 11,129 Generate an initial population with random weight Until the fitness function value of the best Example of the weights obtained with algorithm. The goal is to minimize the value of the optimization function. individual of the population do not improve for four generations, d e In Table I three iterations of the evolutionary process is o Evaluate the fitness function for each shown. The interaction occurs until the convergence criteria individual in the popular are satisfied. Only the top 3 combinations of weights in each o Exclude the worst individuals in the generation are shown. In the last column the number population according the fitmess function represent the value of the function optimization for each weight combination. Figure 6 shows a graphical o Generate new individuals applying representation of the optimization function of the best crossover in remaining individuals weights combination in each generation. The gray color o Apply mutation operations on children represents the classification of the nodes already connected o Merge children and parents eliminating to the central node. The black color represents the average duplicates. classification of all nodes in gray, in other words the fitness function value. And the white color represents the nodes that are not connected to the central node, i.e. nodes that will be The function to generate population sons from two parents Posit I"iteration 2 Iteration iTeration n"Iteration Crossover is applied over each pair from the Cartesian product of individual of the elite Crossover is applied on same chromosomes type of two parents. Two chromosomes gen nerate two nel chromosomes, and the combination of three chromosomes from each parent six different new performed in a single random cutoff point The fitness function, considering the candidate set of users as the real candidates plus all friends already connected to the central user. Also all three indexes between each candidate and the central user are evaluated, this procedure Is summarized for the best weights in each iteration of the example in the table. The weight combination is received as parameter and normalized This step was developed using the TSQL and SQL Server For each candidate, the weighted average is 2008 database. As the network is considered small. it takes evaluated using indexes and a set of weigh no more than 20 seconds however the time can increase The result set is sorted in descending order over th depending on the network size. Below, we describe part of the algorithm focusing on genetic algorithm procedure. weighted average measure Below it is described the recommendation process for A new set is created selecting the position of all the only one user. The experiment is executed for a set of user, andidates that are friends of the central user i.e. so the recommendation process is performed for each one of that are already connected to it. The fitness function measure is defined as th average of all entries in the previous set
TABLE I EVOLUTION OF 3 GENERATIONS IN GENETIC ALGORITHM Iteration 1 st Weight 2 nd Weight 3 rd Weight Optimization function 1 st 71 69 134 27,341 247 58 104 35,089 104 7 32 43,764 2 nd 75 12 147 18,706 87 10 104 21,142 104 7 200 25,997 3 rd 70 10 155 10,975 70 11 142 11,129 71 11 141 11,439 n th … … … … Example of the weights obtained with three iterations using genetic algorithm. The goal is to minimize the value of the optimization function. In Table 1 three iterations of the evolutionary process is shown. The interaction occurs until the convergence criteria are satisfied. Only the top 3 combinations of weights in each generation are shown. In the last column the numbers represent the value of the function optimization for each weight combination. Figure 6 shows a graphical representation of the optimization function of the best weights combination in each generation. The gray color represents the classification of the nodes already connected to the central node. The black color represents the average classification of all nodes in gray, in other words the fitness function value. And the white color represents the nodes that are not connected to the central node, i.e. nodes that will be recommended. Fig. 7. Graphical representation of the values of the optimization functions for the best weights in each iteration of the example in the table. This step was developed using the TSQL and SQL Server 2008 database. As the network is considered small, it takes no more than 20 seconds, however the time can increase depending on the network size. Below, we describe part of the algorithm focusing on genetic algorithm procedure. Below it is described the recommendation process for only one user. The experiment is executed for a set of user, so the recommendation process is performed for each one of this group. In the first step a set of candidate is chosen. This set is composed by users with higher chances to be recommended in relation to the central user. In the second step, the index measures between each candidate and the central user is evaluated once, then the genetic algorithm is used to select the best weight combination to balance the indexes for each candidate. Below the summary of the main code used for the genetic algorithm procedure is presented. The main procedure that return the best weight combination for recommend friends for a specific user: • Generate an initial population with random weight values. • Until the fitness function value of the best individual of the population do not improve for four generations, do: o Evaluate the fitness function for each individual in the population. o Exclude the worst individuals in the population according the fitness function value. o Generate new individuals applying crossover in remaining individuals. o Apply mutation operations on children. o Merge children and parents eliminating duplicates. • Return the weight combination of the best individual The function to generate population sons from two parents is summarized as: • Crossover is applied over each pair from the Cartesian product of individual of the elite population. • Crossover is applied on same chromosomes type of two parents. Two chromosomes generate two new chromosomes, and the combination of three chromosomes from each parent six different new individual can be generated. Crossover is performed in a single random cutoff point The fitness function, considering the candidate set of users as the real candidates plus all friends already connected to the central user. Also all three indexes between each candidate and the central user are evaluated, this procedure is summarized: • The weight combination is received as parameter and normalized. • For each candidate, the weighted average is evaluated using indexes and a set of weight. • The result set is sorted in descending order over the weighted average measure. • A new set is created selecting the position of all the candidates that are friends of the central user i.e. that are already connected to it. • The fitness function measure is defined as the average of all entries in the previous set 237
The weight combination solution is obtained by appl second condition focuses on the users most likely to respond 70 selected user. it w the genetic algorithm. Using this combination the initial set to the experiment. For each of of candidate is ordered to produce the recommendation list. generated a set of 10 recommendations, among which 20% The filtering step can reduce the set of candidate based on using the FOF algorithm. However, before sending the the first index, avoiding measuring the second and third recommended list of friends to the recipients, we still have to index for less relevant nodes. The indexes evaluated from replace those that were already part of their friends list nodes ny to n] are the same as n2 to ni, so reducing by half relationships. This step conveys the first result. We call it the the time needed for the solution for the entire network This does not mean that if n2 is recommended for nI, nI will be correct set of one-way relationships. Our proposed solution presents a rate of 20.83% while the FOF algorithm achieved recommended for n2, since each user has his own context 13.33%. Note that the low value for this rate does not imply with singular weight combination. In a applied situation, the bad results since a person can link to another person recommendation can be previously generated, and the without having a common interest. These are common cases weight combination could be preserved for the user for a of celebrity user such as movie actors. In the twitter site the period of time and them recalculated, depending on increase best connect person does not necessary is connected to all of the friend network list of the user the users that are connected to him. After this procedure, the IV. EXPERIMENTS recommendations are sent in the form of links to the profile of users recommended and instructions to accept or reject Usually experiments to validate solutions in machine the recommendation. After a period of 10 days, 31 of 70 learning uses well established data sets. This data are users used the recommendations. It was counted only the normally divided in training, validation and test set which acceptance of recommendation without the need of the re prepared specifically for the purpose of evaluating the recommended user to have done the reverse action. Using performance of the algorithm used to solve the problem. The our algorithm 77.69% of the recommendations were procedures enable the comp of different solutions accepted while the FOF algorithm achieved 72.22% success led from different method applied in the same data ship recommendation in TABLE I social networks does not have a straightforward method to EXPERIMENTAL RESULTS validate the results since the objective is to generate a list of GBFRGA FOF recommendation that is dependable on the user Complex networks have been shown to be a promising 20.83% 13,33% area for research in recent years. Several problems can be Acceptance of the modeled in network structures. Despite all the progress, commendation 77,69% 72,22% recommendation in complex networks has been a problem poorly explored. A validation technique to assess the quality of the proposed algorithm would be necessary to compare V. CONCLUSION AND FUTURE WORKS the recommended list of relationships, to a certain user, wit the desired objective. For privacy reasons social networks We have presented a friend recommendation system based normlly do not provide or give access to information on the topology of a social network. The knowledge of the necessary to perform this experiments. Most of the times. structure and topology of these SNSs combined with each SNSs develops their own methods and algorithms for quantitative properties of the graph are used to develop the recommendation but the limited access to this method make recommendation system. The social network used, the Oro- it difficult a cross-validation. A direct consequence of this Aro, is smaller than the most of the popular social networks, fact is that there is no public data set prepared for such as Facebook, Orkut or Myspace. Besides being smaller experiments on the recommendation of relationships in size. it is also less accessed. hence the rate of 44% of user social networks response to the experiment. As expected, our algorithm was To solve the problem of lack of a common public data, w better than the fof another solution that is also based obtained through C.E. S.A. R the data of their social network, etwork topology. Despite the small difference between the Oro-Aro to test the proposed method. The experiment performances, we believe that the size and dynamics of the consisted of analyzing the algorithm being used by network plays role. The Oro-Aro network has only selected group of users. For each user we present a list of 634 users, and the average of the friend per users is only 8.1 recommendations of new relationships. Then, we evaluated Thus, we can assume that the FOF algorithm performs well in small networks; however the resulting list is probably used the algorithm FOF (friend-of-friend) for a subset of the small. In larger networks, like Facebook or Orkut in which users. Of the 655 users of the Oro-Aro 70 were selected. the average friend size exceeds 200, the FOF could not They satisfy the condition of having at least 13 relationships distinguish the best recommendations when there are small and have accessed the network in the last 45 days. The first differences in FOF criteria. The reason why our solution can condition is necessary because the algorithm uses the perform better in larger networks is because of its hybrid topology, making sure that the subnet of each user with a ature of taking in consideration three different indexes. himum size is relevant and that it display a pattern. The Since the FOF is used as part of our solution which implic 238
The weight combination solution is obtained by applying the genetic algorithm. Using this combination the initial set of candidate is ordered to produce the recommendation list. The filtering step can reduce the set of candidate based on the first index, avoiding measuring the second and third index for less relevant nodes. The indexes evaluated from nodes n1 to n2 are the same as n2 to n1, so reducing by half the time needed for the solution for the entire network. This does not mean that if n2 is recommended for n1, n1 will be recommended for n2, since each user has his own context with singular weight combination. In a applied situation, the recommendation can be previously generated, and the weight combination could be preserved for the user for a period of time and them recalculated, depending on increase of the friend network list of the user. IV. EXPERIMENTS Usually experiments to validate solutions in machine learning uses well established data sets. This data are normally divided in training, validation and test set which are prepared specifically for the purpose of evaluating the performance of the algorithm used to solve the problem. The procedures enable the comparison of different solutions obtained from different method applied in the same data set. However, the problem of relationship recommendation in social networks does not have a straightforward method to validate the results since the objective is to generate a list of recommendation that is dependable on the user. Complex networks have been shown to be a promising area for research in recent years. Several problems can be modeled in network structures. Despite all the progress, recommendation in complex networks has been a problem poorly explored. A validation technique to assess the quality of the proposed algorithm would be necessary to compare the recommended list of relationships, to a certain user, with the desired objective. For privacy reasons social networks normally do not provide or give access to information necessary to perform this experiments. Most of the times, each SNSs develops their own methods and algorithms for recommendation but the limited access to this method make it difficult a cross-validation. A direct consequence of this fact is that there is no public data set prepared for experiments on the recommendation of relationships in social networks. To solve the problem of lack of a common public data, we obtained through C.E.S.A.R the data of their social network, Oro-Aro to test the proposed method. The experiment consisted of analyzing the algorithm being used by a selected group of users. For each user we present a list of recommendations of new relationships. Then, we evaluated the acceptance of the recommendations. For comparison, we used the algorithm FOF (friend-of-friend) for a subset of the users. Of the 655 users of the Oro-Aro 70 were selected. They satisfy the condition of having at least 13 relationships and have accessed the network in the last 45 days. The first condition is necessary because the algorithm uses the topology, making sure that the subnet of each user with a minimum size is relevant and that it display a pattern. The second condition focuses on the users most likely to respond to the experiment. For each of the 70 selected user, it was generated a set of 10 recommendations, among which 20% using the FOF algorithm. However, before sending the recommended list of friends to the recipients, we still have to replace those that were already part of their friends list before the pre-processing network, i.e., the one-way relationships. This step conveys the first result. We call it the correct set of one-way relationships. Our proposed solution presents a rate of 20.83% while the FOF algorithm achieved 13.33%. Note that the low value for this rate does not imply a bad results since a person can link to another person without having a common interest. These are common cases of celebrity user such as movie actors. In the twitter site the best connect person does not necessary is connected to all the users that are connected to him. After this procedure, the recommendations are sent in the form of links to the profile of users recommended and instructions to accept or reject the recommendation. After a period of 10 days, 31 of 70 users used the recommendations. It was counted only the acceptance of recommendation without the need of the recommended user to have done the reverse action. Using our algorithm 77.69% of the recommendations were accepted while the FOF algorithm achieved 72.22% success. TABLE II EXPERIMENTAL RESULTS GBFRGA FOF correct set of one-way relationships 20,83% 13,33% Acceptance of the recommendation 77,69% 72,22% V. CONCLUSION AND FUTURE WORKS We have presented a friend recommendation system based on the topology of a social network. The knowledge of the structure and topology of these SNSs combined with quantitative properties of the graph are used to develop the recommendation system. The social network used, the OroAro, is smaller than the most of the popular social networks, such as Facebook, Orkut or Myspace. Besides being smaller in size, it is also less accessed, hence the rate of 44% of user response to the experiment. As expected, our algorithm was better than the FOF, another solution that is also based on network topology. Despite the small difference between the performances, we believe that the size and dynamics of the network plays a major role. The Oro-Aro network has only 634 users, and the average of the friend per users is only 8.1. Thus, we can assume that the FOF algorithm performs well in small networks; however the resulting list is probably small. In larger networks, like Facebook or Orkut in which the average friend size exceeds 200, the FOF could not distinguish the best recommendations when there are small differences in FOF criteria. The reason why our solution can perform better in larger networks is because of its hybrid nature of taking in consideration three different indexes. Since the FOF is used as part of our solution which implies 238
that in the worst case the algorithm have the same [9] R. Albert and AL Barabasi. 200 Topology of networks. performance, and that the combined clustering indexes with ts and universality. Phys. Rev. Let. 85. 5234-5237, 2000 the adaptive feature of the Genetic Algorithm is responsible [101 Goldberg. David E. Genetic Algorithms in Search Optimization and for the improved performance. The experiments results achine Learning. Addison Wesley (1989). showed to be very promising: however it is still necessary to lm coye to invite into your social network. Proc. IUI pp. 77-86. 200 9 Ronen L. and wilcox E. Do you know? Recommend apply this algorithm in networks much larger in size and activity. Several papers [1], [2], [3], [4] and [7) have [12 Chen, J, Geyer, W, Dugan, C. and Guy, I Make new friends,but investigated the topological structure of SNSs and others H,pp.201-210.2009 complex networks. However, we have showed that these analyses can be used to develop some practical applications [13] Liben-Nowell, D, and Kleinberg, J. The link pre social networks. Proceedings of the twelfth intemational in the field of recommendation system. on Information and knowledge management, pp 556-559. For future work, it is important to test the proposed [14] R Bell, Y Koren, C Volinsky(2008)."The BellKor solution to the mechanism more intensively in a larger network using Netflix Prize 2008 several test groups. In addition, this approach based on http://www.netflixprize.com/assets/progressprize2008_beLlkor.pdf network topology can be also used for other type of ecommendation networks system apart from SNSs. An example is the innumerous e-commerce systems that constitute bipartite networks composed by users and products and could be mapped as single entity graphs e would like to acknowledge the recife center for Advanced Studies and Systems(C.E.S. A R) for providing the Oro-Aro database In particular we would like to thanks Ricardo Araujo Costa for assistance in managing the database information. Also, FACEPE Fundacao de Amparo a Ciencia e Tecnologia do estado de Pernambuco, for financial support. [1] Mislove, A, Marcon, M, Gummadi, K. Networks. Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (San Diego, California, USA. October 24-26, 2007). IMC07. ACM Press, New York, NY, 29-42. 2] Ahn Y.Y. Han, S, Kwak, H, Moon, S. and Topological Characteristics of Huge Online Services. Proceeding of the 16th Intemationa Wide Web Conference, (Banff, Alberta, Canada, May 8-12, 2007).Www'07 ACM Press, New York, NY, 835-844. 2007. [3] Wilson, M, and Nicholas, C. Topological Analysis of an Online ocial Network for older Adults. Proceeding of the 2008 ACm October 30, 2008). SSM08. ACM Press. New York, NY, 51-58 M1008甲1B12时1时2 tz, S. D. An Introduction to Structural Analy ork Approach to Social Research. Toronto: Butterwort. 1982 6 Chau, D. H, Pandit, s. Wang, S, and Faloutsos. C, Parallel Crawling for Online Social Networks. Proceedings of the 16th international conference on World wide Web(Banff, Alberta Canada, May 8-12, 2007).Www'07. ACM Press, New York, NY, 1283-1284. 2007 [7 Kumar, R, Novak, J, and Tomkins, A. Structure and Evolution of 8] Karrer. B, Levina. E. and Newman. M. EJ. 2008. Robustness of ommunity structure in networks. Phys Rev. E77046119, 2008
that in the worst case the algorithm have the same performance, and that the combined clustering indexes with the adaptive feature of the Genetic Algorithm is responsible for the improved performance. The experiments results showed to be very promising; however it is still necessary to apply this algorithm in networks much larger in size and activity. Several papers [1], [2], [3], [4] and [7] have investigated the topological structure of SNSs and others complex networks. However, we have showed that these analyses can be used to develop some practical applications in the field of recommendation system. For future work, it is important to test the proposed mechanism more intensively in a larger network using several test groups. In addition, this approach based on network topology can be also used for other type of recommendation networks system apart from SNSs. An example is the innumerous e-commerce systems that constitute bipartite networks composed by users and products and could be mapped as single entity graphs. ACKNOWLEDGMENT We would like to acknowledge the Recife Center for Advanced Studies and Systems (C.E.S.A.R) for providing the Oro-Aro database. In particular we would like to thanks Ricardo Araújo Costa for assistance in managing the database information. Also, FACEPE – Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco, for financial support. REFERENCES [1] Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., and Bhattacherjee, B. Measurement and Analysis of Online Social Networks. Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. (San Diego, California, USA. October 24-26, 2007). IMC’07. ACM Press, New York, NY, 29 - 42. [2] Ahn,Y.Y., Han,S., Kwak, H., Moon, S., and Jeong, H. Analysis of Topological Characteristics of Huge Online Social Networking Services. Proceeding of the 16th International World Wide Web Conference, (Banff, Alberta, Canada, May 8-12, 2007).WWW '07. ACM Press, New York, NY, 835-844, 2007. [3] Wilson, M., and Nicholas, C. Topological Analysis of an Online Social Network for Older Adults. Proceeding of the 2008 ACM workshop on Search in Social Media. Napa Valley, California, USA, October 30, 2008). SSM’08. ACM Press, New York, NY, 51-58, 2008. [4] Mitchell, M. Complex Systems: Network Thinking. Artificial Intelligence, 170(18), pp. 1194-1212, 2006. [5] Berkowitz, S. D. An Introduction to Structural Analysis: The Network Approach to Social Research. Toronto: Butterwort, 1982 [6] Chau, D. H., Pandit, S., Wang, S., and Faloutsos, C., Parallel Crawling for Online Social Networks. Proceedings of the 16th international conference on World Wide Web. (Banff, Alberta, Canada, May 8-12, 2007).WWW '07. ACM Press, New York, NY, 1283 – 1284, 2007. [7] Kumar, R., Novak, J., and Tomkins, A. Structure and Evolution of Online Social Networks. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. (Philadelphia, PA, USA, August 20-23, 2006). KDD’06. ACM Press, New York, NY, 611 – 617. 2006. [8] Karrer, B., Levina, E., and Newman, M.E.J. 2008. Robustness of community structure in networks. Phys Rev. E 77 046119, 2008. [9] R. Albert and A.-L. Barabási. 200. Topology of complex networks: Local events and universality. Phys. Rev. Let. 85, 5234-5237, 2000. [10] Goldberg, David E. Genetic Algorithms in Search Optimization and Machine Learning. Addison Wesley (1989). [11] Guy, I., Ronen I., and Wilcox E. Do you know? Recommending people to invite into your social network. Proc. IUI pp. 77-86. 2009. [12] Chen, J., Geyer, W., Dugan, C., and Guy, I. Make new friends, but keep the old: recommending people on social networking sites. Proc. CHI, pp. 201-210. 2009. [13] Liben-Nowell, D., and Kleinberg, J. The link prediction problem for social networks. Proceedings of the twelfth international conference on Information and knowledge management, pp 556-559. 2003. [14] R. Bell, Y. Koren, C. Volinsky (2008). "The BellKor solution to the Netflix Prize 2008". http://www.netflixprize.com/assets/ProgressPrize2008_BellKor.pdf 239