User Profiling System Using Social Networks for Recommendation Hajime Hotta and Masafumi Hagiwara Faculty of Science and Technology Keio un 3-14-1, Kohoku, Yokohama, Kanagawa Japan hotta(@soft ics keio ac jp, hagiwara @soft ics keio ac jp AbstrackIn this paper, we propose a user profiling system ing social networks for recommendation. Recent development of information technologies produces a lot of community services. SNS(Social Network Service) is one of the community services on the world wide webs. On the other hand. the recommendation algorithms are one of the best known topics for their use on various kinds of web sites these days. The purpose of the proposed ystem is the web contents recommendation to users in social etwork services. The main idea of the proposed system is the terpolation of profile data by the data of other users who are link as a linked with the focused user. The proposed system is one of the contents delivery systems, which employs information filtering ystem. The proposed system consists of the following two stages )Learning profile data;(2) Filtering contents by profile data. In the first stage, the profile data are learned with the behavior of An example of a social network model the user and the profile data of the other users. when the learned profile data are used in the second stage, proper information are information, (2)the recommendation results must be returned in real-time with high-quality recommendations INTRODUCTION In this paper, we propose a new user profiling system using social networks for recommendation on the web. The main Recent Development of information technologies produces idea of the proposed system is the interpolation of profile data vice) is one of the community services on the world wide by the data of other users who are linked with the focused webs[2],[3]. In the SNs, a user can register other users as riends and enjoy communication through a virtual message The proposed system is one of the contents delivery sys- and a diary such as blog. The examples of the popular tems, which employs information filtering system. When a SNS websites are"My Space 1"Friendster 2 and"Orkut "3 user accesses the system, it responses contents which the user The number of My Space users is over 75 million and now he/she clicks the hyperlink of the contents to view details still increases. Therefore, the SNS becomes one of the most of the contents. Then the click bertis saved as the users important communication infrastructures from a viewpoint of the Communication Network behavior logs in the system. The advertisement system is one On the other hand, the recommendation algorithms of these kinds of contents delivery syste This paper is organized as follows. Section Il describes the Web sites in recent years(4). They use input about a user, proposed system, experiments and the results are shown in s interests to generate a list of recommended contents. Many section Ill and section IV concludes the paper recommendation systems use the information that customers IL. THE PROPOSED SYSTEM get interest(for example clicks, views or purchases on e- Commerce sites)[5]. They can also use other attributes, such A. Overview as demographic data. There are some technical challenges A social network is a social structure made of nodes(which for recommendation systems to operate on the web sites: are generally individuals) that are tied by one such as friends (1)Many users, especially new users have extremely limited Fig. 1 shows an example of the social network model http://www.friendster.c Social network analysis views social relationships in terms of nodes and links
User Profiling System Using Social Networks for Recommendation Hajime Hotta and Masafumi Hagiwara Faculty of Science and Technology Keio University, 3-14-1, Kohoku, Yokohama, Kanagawa Japan hotta@soft.ics.keio.ac.jp, hagiwara@soft.ics.keio.ac.jp Abstract—In this paper, we propose a user profiling system using social networks for recommendation. Recent development of information technologies produces a lot of community services. SNS (Social Network Service) is one of the community services on the world wide webs. On the other hand, the recommendation algorithms are one of the best known topics for their use on various kinds of web sites these days. The purpose of the proposed system is the web contents recommendation to users in social network services. The main idea of the proposed system is the interpolation of profile data by the data of other users who are linked with the focused user. The proposed system is one of the contents delivery systems, which employs information filtering system. The proposed system consists of the following two stages; (1) Learning profile data; (2) Filtering contents by profile data. In the first stage, the profile data are learned with the behavior of the user and the profile data of the other users. When the learned profile data are used in the second stage, proper information are filtered. I. INTRODUCTION Recent Development of information technologies produces a lot of community services[1]. SNS (Social Network Service) is one of the community services on the world wide webs[2], [3]. In the SNS, a user can register other users as friends and enjoy communication through a virtual message and a diary such as blog. The examples of the popular SNS websites are “MySpace,”1 “Friendster”2 and “Orkut.”3 . The number of MySpace users is over 75 million and now still increases. Therefore, the SNS becomes one of the most important communication infrastructures from a viewpoint of the Communication Network. On the other hand, the recommendation algorithms are one of the best known topics for their use on various kinds of Web sites in recent years[4]. They use input about a user ’ s interests to generate a list of recommended contents. Many recommendation systems use the information that customers get interest (for example clicks, views or purchases on eCommerce sites)[5]. They can also use other attributes, such as demographic data. There are some technical challenges for recommendation systems to operate on the web sites: (1) Many users, especially new users have extremely limited 1http://www.myspace.com 2http://www.friendster.com 3http://www.orkut.com link as a friend Fig. 1. An example of a social network model. information, (2) the recommendation results must be returned in real-time with high-quality recommendations. In this paper, we propose a new user profiling system using social networks for recommendation on the web. The main idea of the proposed system is the interpolation of profile data by the data of other users who are linked with the focused user. The proposed system is one of the contents delivery systems, which employs information filtering system. When a user accesses the system, it responses contents which the user may prefer. If the user really gets interest in one content, he/she clicks the hyperlink of the contents to view details of the contents. Then the click-logs are saved as the users’ behavior logs in the system. The advertisement system is one of these kinds of contents delivery systems. This paper is organized as follows. Section II describes the proposed system, experiments and the results are shown in section III and section IV concludes the paper. II. THE PROPOSED SYSTEM A. Overview A social network is a social structure made of nodes (which are generally individuals) that are tied by one such as friends, web links[2]. Fig. 1 shows an example of the sociral network model. Social network analysis views social relationships in terms of nodes and links
Interfaces Logics Databases Access 1. Get User Profile data Profile data user request 2. Get Web Response Contents Contents If the click the contents User links Click Data 3. Learning Profile data中 Fig. 2. An example of Social Network Services The proposed system is one of the contents delivery sys- user. In this study the detail of the demographics are sex tems, which employs information filtering system. When a age and the living place user accesses the system, it sends contents data which the (2)Preference Table consists of the sets of tags and ser may prefer. If the user really gets interest in one content, weights. Tags mean keywords which express the category he/she clicks the hyperlink of the content to view details of the of the users'preference content. Then the click-logs are saved as the users' behavior logs in the system. The advertisement system is one of these Contents data kinds of contents delivery systems Fig 4 shows an example of the contents data. All contents Fig. 2 shows the structure of the proposed system. The linked with relative tags. The relationship between system consists of three parts: interface; logic; database. The contents and tags are many-to-many connections interfaces of the system are the following three A User Link If a user sends a request to some web service, the system The user links are always defined in the social network gets the contents request with a user ID such as e-mail services. For example, Facebook+, which are one of the or subscriber id number most famous services in the U.S., has been released open API(application programming interface) through which Get Web contents we can get friend links The system gets web contents from a database and send the contents data to the user. sing the databases and interfaces above, the system learns user profile data and delivers web contents to the user with Click Data When a user gets the web contents through the proposed. 1. Learning profile data ystem, the user can click the hyperlinks for getting more In the first stage, the profile data are learned with the detail. If the user clicks the hyperlink, the system learns behavior of the user and the profile data of the other users the profile data As shown in logics-3 of Fig. 2, a users behaviors(click) are learned in this stage The following are databases employed in the proposed 2. Filtering contents by profile data When the learned profile data are used in the second · User profile data The structure of the user profile data are shown in Fig 3 stage, proper information is filtered. The system shows contents using information filtering using the profile data. profile data (l)Demographic Data When the access from a user reaches to the system he system gets the users' profile data and the profile (2)Preference Table (i)DemogrAphicdataarethebasicinformationaboutthe4http://www.facebook.com
Access Response 1. Get User Profile Data 2. Get Web Contents Profile Data Contents Click Data 3. Learning Profile Data If the user click the contents user’ request User Links Interfaces Logics Databases Fig. 2. An example of Social Network Services. The proposed system is one of the contents delivery systems, which employs information filtering system. When a user accesses the system, it sends contents data which the user may prefer. If the user really gets interest in one content, he/she clicks the hyperlink of the content to view details of the content. Then the click-logs are saved as the users’ behavior logs in the system. The advertisement system is one of these kinds of contents delivery systems. Fig. 2 shows the structure of the proposed system. The system consists of three parts: interface; logic; database. The interfaces of the system are the following three. • Access If a user sends a request to some web service, the system gets the contents request with a user ID such as e-mail or subscriber ID number. • Get Web Contents The system gets web contents from a database and send the contents data to the user. • Click Data When a user gets the web contents through the proposed system, the user can click the hyperlinks for getting more detail. If the user clicks the hyperlink, the system learns the profile data. The following are databases employed in the proposed system. • User profile data The structure of the user profile data are shown in Fig. 3. The user profile data consists of the following two data. (1)Demographic Data (2)Preference Table (1)Demographic data are the basic information about the user. In this study the detail of the demographics are sex, age and the living place. (2)Preference Table consists of the sets of tags and weights. Tags mean keywords which express the category of the users’ preference. • Contents data Fig. 4 shows an example of the contents data. All contents are linked with relative tags. The relationship between contents and tags are many-to-many connections. • User Links The user links are always defined in the social network services. For example, Facebook 4 , which are one of the most famous services in the U.S., has been released open API (application programming interface) through which we can get friend links. Using the databases and interfaces above, the system learns user profile data and delivers web contents to the user with the following two stages. • 1. Learning profile data In the first stage, the profile data are learned with the behavior of the user and the profile data of the other users. As shown in logics-3 of Fig. 2, a users’ behaviors(click) are learned in this stage. • 2. Filtering contents by profile data. When the learned profile data are used in the second stage, proper information is filtered. The system shows contents using information filtering using the profile data. When the access from a user reaches to the system, the system gets the users’ profile data and the profile 4http://www.facebook.com
User Profiling data Getting Preference TableGetting Preference Table of the user of the linked user (1)Demographic Data Country: Japan Generating assembled (2)Preference Table Roughly filtering contents rockmusic 03 by demographic data cting contents by th assembled Preference table Fig. 3. An example of user profile data. Contents T Contents rock N Fig. 5. A flow of filtering contents. Red Hot cilli Peppers official Web. Destiny s Child Li where a is a constant. The weight w is a powered value of CTR(Click Through Rate), which are one of the general parameter In anal lyzing the log data[6 EMINEM new dVd video release! In order to extend the Nview and Nclick to express time dacay,, Nview and Nclick are calculated in the following equations Fig 4. An example of contents data which are links with tags tnow - tlo data of users who have friendship links with the focused user(logics-I in Fig. 2). After the process, the system gets Nak=∑crpt web contents by the information filtering(logics-2 in Fig allclicklogs where T is a constant, tnow is a time stamp of the calculation and togged is a time stamp of the logging. When using these In the following sub section, the details of two stages above LanIo k can be be updated gradually without re described referring past log B. Learning profile data The weight and the confidence coefficient are calculated for In the first stage, the system requires the user's behavior tags every appointed time logs. The output of the stage is the preference tables. Tags C. Filtering contents by the profile data which are attached to the contents, preference weights and the Fig. 5 illustrates the flow of filtering contents by the profile confidence coeficient are saved in the preference tables. data. This stage consists of the following five modules Tags are the labels of the contents such as finance, cosmetics. Getting Preference Table of the user and games. Preference weights(a) means the predicted value When a user accesses to the preference table of the user of the strength of preference to the category and confidence coefficient(c)means the confidence of the prediction of the preference weights Getting Preference Table of the linked users Suppose that Nview is a number of view logs of the contents Preference tables of the users who have friendship links which are linked with the focused tags and Nclick is a number are referred of click logs. The weights w and the confidence coefficient are calculated in the following equations Generating assembled Preference Table he preference table of the user and those of the linked users are assembled to the new preference table which isused in the contents filtering
User Profiling Data (1) Demographic Data Sex: male Age: 24 Country: Japan (2) Preference Table jazz music 0.9 rock music 0.3 tags Weight Fig. 3. An example of user profile data. Tags Contents rock Female hip hop New album of Avril Lavigne Red Hot Cilli Peppers Official Web. Destiny’s Child Live Linkin Park & Jay Z Collaboration EMINEM new DVD video release! Fig. 4. An example of contents data which are links with tags. data of users who have friendship links with the focused user(logics-1 in Fig. 2). After the process, the system gets web contents by the information filtering (logics-2 in Fig. 2). In the following sub section, the details of two stages above are described. B. Learning profile data In the first stage, the system requires the user’s behavior logs. The output of the stage is the preference tables. Tags which are attached to the contents, preference weights and the confidence coefficient are saved in the preference tables. Tags are the labels of the contents such as finance, cosmetics and games. Preference weights(w) means the predicted value of the strength of preference to the category and confidence coefficient(c) means the confidence of the prediction of the preference weights. Suppose that Nview is a number of view logs of the contents which are linked with the focused tags and Nclick is a number of click logs. The weights w and the confidence coefficientc are calculated in the following equations. w = ( Nclick Nview )α (1) c = N β view (2) Roughly filtering contents by demographic data Getting Preference Table of the user Getting Preference Table of the linked user Generating assembled Preference Table Selecting contents by the assembled Preference Table Contents Fig. 5. A flow of filtering contents. where α is a constant. The weight w is a powered value of CTR(Click Through Rate), which are one of the general parameter in analyzing the log data[6]. In order to extend the Nview and Nclick to express ”time dacay”, Nview and Nclick are calculated in the following equations. Nview = ∑ allviewlogs exp ( − tnow − tlogged T ) (3) Nclick = ∑ allclicklogs exp ( − tnow − tlogged T ) (4) where T is a constant, tnow is a time stamp of the calculation and tlogged is a time stamp of the logging. When using these equations Nview and Nclick can be updated gradually without referring past log. The weight and the confidence coefficient are calculated for all tags every appointed time. C. Filtering contents by the profile data Fig. 5 illustrates the flow of filtering contents by the profile data. This stage consists of the following five modules. • Getting Preference Table of the user When a user accesses to the preference table of the user are referred. • Getting Preference Table of the linked users Preference tables of the users who have friendship links are referred. • Generating assembled Preference Table The preference table of the user and those of the linked users are assembled to the new preference table which isused in the contents filtering
Tag Tag ●鲁 12 Contents/00 10 1 Contents∥10001 Contents //0 1 00 0 Contents/V00111 100 o 2 4 Number af friends Fig. 6. An example of the contents tags matrix. Fig. 7. Distribution of the user. example,cosmetic advertisements are desired not to be confidence coefficient and y is the constant vale a value be Roughly filtering contents by demographic data where wr is the assembled weight for tag r, w(r, Y) is Filtering contents by demographic data such as sex. For weight value of the tag r, user r and c(ar, Y) is delivered to male users E. Contents selection Selecting contents by the assembled Preference Table In selecting contents, stochastic selection using roulette rule Selecting contents by the tags information and the assem- are employed. The possibility weights for the roulette selection tabl are calculated by the following equation D. Assembled Preference Table To assemble preference table of the focused user and M U2 (7) preference tables of the users linked with the focused user, the system need to have weight data of the relationship among wher where px is a possibility weights of the contents X for the of evaluation methods of the similarity of nodes: cohesion o e selection and M is a contents-tags matrix shown in In the field of network analysis[7l. there are two kinds and structural equivalence. In general, structural equivalence is more effective than cohesion, so in this method, relationship II EXPERImENt weight between user A and B is calculated by the following A. Implementation equation rel(A, B N(A∩B) To evaluate the proposed system we implemented this (5) system as an internet advertisement system. The contents for recommendation are advertisements. Recently several adver- where N(X) is the number of the users who are linked with tisement programs are released. We use a Japanese affiliate the focused user X and N(An B)is the number of the users program called Pocket Affiliate as contents data who have A and B as linked users. These relationship weights re prepared when using the system B. Evaluation The preference table are assembled by the following equa- The system has run 10 days as an internet advertisement system in the sample SNs. The total number of users is 949 and distribution of the friends are shown in Fig. 7. The number of contents are 150 and the number of tags is 32 () (1, self)C(1, self) U(1,A)C(1, A) w(2, self)C(2, self) U(2, A)(2, A) Fig. 8 shows the Ctr(Click Through Rate)of the sys- U(3, self)c(3, self) U(3, A)C(3, A) tem. As shown in this figure, the proposed system worked efficiently especially to the users who have many friends As the system to the user who don't have friends can be ght as a recommendation syste the recommendation algorithms may works efficiently in the rel(self,B)(6)system
0 0 1 0 1 …. 1 0 0 0 1 …. 0 1 0 0 0 …. 0 0 1 1 1 …. Contents I Contents II Contents III Contents IV Tag 1 Tag 2 Fig. 6. An example of the contents-tags matrix. • Roughly filtering contents by demographic data Filtering contents by demographic data such as sex. For example, cosmetic advertisements are desired not to be delivered to male users. • Selecting contents by the assembled Preference Table Selecting contents by the tags information and the assembled preference table. D. Assembled Preference Table To assemble preference table of the focused user and preference tables of the users linked with the focused user, the system need to have weight data of the relationship among users. In the field of network analysis[7], there are two kinds of evaluation methods of the similarity of nodes: cohesion and structural equivalence. In general, structural equivalence is more effective than cohesion, so in this method, relationship weight between user A and B is calculated by the following equation. rel(A, B) = N(A ∩ B) min(N(A), N(B)) (5) where N(X) is the number of the users who are linked with the focused user X and N(A ∩ B) is the number of the users who have A and B as linked users. These relationship weights are prepared when using the system. The preference table are assembled by the following equation. w1 w2 . . . = w(1,self)c(1,self) w(1,A)c(1,A) . . . w(2,self)c(2,self) w(2,A)c(2,A) . . . w(3,self)c(3,self) w(3,A)c(3,A) . . . . . . . . . . . . γ rel(self, A) rel(self, B) . . . (6) 0 50 100 150 200 250 300 350 400 450 0 2 4 6 8 10 12 14 16 Users Number of friends Fig. 7. Distribution of the user. where wx is the assembled weight for tag x, w(x,Y ) is the weight value of the tag x, user Y and c(x,Y ) is a value of confidence coefficient and γ is the constant value. E. Contents Selection In selecting contents, stochastic selection using roulette rule are employed. The possibility weights for the roulette selection are calculated by the following equation. pI pII . . . = M w1 w2 . . . (7) where pX is a possibility weights of the contents X for the roulette selection and M is a contents-tags matrix shown in Fig. 6. III. EXPERIMENT A. Implementation To evaluate the proposed system we implemented this system as an internet advertisement system. The contents for recommendation are advertisements. Recently several advertisement programs are released. We use a Japanese affiliate program called Pocket Affiliate5 as contents data. B. Evaluation The system has run 10 days as an internet advertisement system in the sample SNS. The total number of users is 949 and distribution of the friends are shown in Fig. 7. The number of contents are 150 and the number of tags is 32. Fig. 8 shows the CTR (Click Through Rate) of the system. As shown in this figure, the proposed system worked efficiently especially to the users who have many friends. As the system to the user who don’t have friends can be thought as a recommendation system without Social Network, the recommendation algorithms may works efficiently in the system. 5http://smaf.jp
O nUmber of friends 8. An ex In this study we propose a user profiling system using social networks for recommendation. The proposed system is one of the contents delivery systems, which employs information filtering system. The proposed system consists of the following two stages:(1) Learning profile data; (2)Filtering contents by profile data. The experimental results show that the proposed stem works better than the algorithms without social net REFERENCES [ H. Rollett, M. Lux, M. Strohmaier, G. Dosinger, and K. Tochtermann. The web 2.0 way ning with technologies. International Journal of earning Technology, 3(1): 87-107, November 2007 2] Wellman, B. For a social network analysis of computer networks: A sociological perspective on collaborative work and virtual community Proceedings of The ACM SIGCPR/SIGMIS conference, Denver, USA, 1 Guare, J : " Six degrees of separation: A play, vintage, New York. undation,"IEEE Internet Computing, 5(6), 2001. 40-47. 5]R. F. Wilson. Preparing a Customer Profile for Your Internet Marketing Plan. Issue 76. 2000 [6] Rex Briggs, Nigel Hollis: "Advertising on the Web: Is There Response [7 Albert, R, and Barabasi, A.-L, "Statistical mechanics of complex net- works", Reviews of Modern Physics 74, pp. 47-97(2002)
20- 15-19 10-14 5-9 2-4 0-1 CTR Average[%] Number of friends 1 0 Fig. 8. An example of the contents-tags matrix. IV. CONCLUSION In this study we propose a user profiling system using social networks for recommendation. The proposed system is one of the contents delivery systems, which employs information filtering system. The proposed system consists of the following two stages; (1) Learning profile data; (2)Filtering contents by profile data. The experimental results show that the proposed system works better than the algorithms without social network. REFERENCES [1] H. Rollett, M. Lux, M. Strohmaier, G. Dosinger, and K. Tochtermann. The web 2.0 way of learning with technologies. International Journal of Learning Technology, 3(1):87 – 107, November 2007. [2] Wellman, B.: “For a social network analysis of computer networks: A sociological perspective on collaborative work and virtual community,” Proceedings of The ACM SIGCPR/SIGMIS conference, Denver, USA, 1996. 11-13. [3] Guare, J.: ”Six degrees of separation: A play, vintage, New York.” [4] Herlocker, J., and Konstan, J.: “Content-independent task-focused recommendation,” IEEE Internet Computing, 5(6), 2001. 40-47. [5] R. F. Wilson. Preparing a Customer Profile for Your Internet Marketing Plan. Issue 76, 2000. [6] Rex Briggs, Nigel Hollis: “Advertising on the Web: Is There Response before Click-Through”, Journal of Advertising Research, Vol. 37, 1997. [7] Albert, R., and Barabasi, A.-L., ”Statistical mechanics of complex networks”, Reviews of Modern Physics 74, pp. 47-97 (2002)