正在加载图片...
3.3 Recommendation algorith The basic intuition of subject-based recommendation is that a user's interest in an item can be expressed through subjects. The probability of a user visiting an item is given by p(ilu)= 2sp(slu) p(ils),where p(slu) indicates the probability of user u's interest in subject s and p(ils) represents the probability of saving item i when users are interested in subject s. This formula can be rewritten as UlpM= Us SI According to the above formula, estimating USpM and Slpm becomes another crucial step of our approach besides CONMF Since we already have the coordinate values of users in the subject space, USpM can be computed by UspM= furs (UScM), i.e., the probability that a user is interested in a subject is proportional to her coordinate value on this subject. Likewise, ISPy can be calculated based on IScM. However, simply transposing ISpm will not generate SIpy. Since ISpM and Slpm are actually marginal probability matrices of joint distribution F(l,S)on items and subjects respectively, we compute the joint distribution matrix of F(, S) from ISpm and then estimate SIpM based on F(, S). The details of computing Slpm is illustrated in Item IScM Compute ispy by normalizing IScy to unit Subject Multiply each row of ispm by the frequency of matrix of items' frequencies diag() by Ispy Transpose IS so as to get sI Subject SlpM ow Sum ]subject SI Obtain SIpy by norma sum Figure 3. Estimation of transition probability matrix SIpM SIM=fars((diag(①·fs(IScM))(2) In short, SlpM is calculated using equation(2). Finally, we can compute UlpM based on USpM and SlpM and then recommend those items with the highest probabilities to each user (if the items have not been bookmarked before 3. 4 Tag Generalization As discussed in Section 1, some users tend to use tags which are meaningless to other users. These tags are noises disturbing matrix UT and IT. Since UT records tag frequencies of individual users, there will be considerable noises in it. However, IT is the aggregation of all users' tagging behavior on items, thus it is much more resistant to noisy tags. As a result of this different noise ratio in UT and IT, combining them directly for factorization may lead to poor results. In order to generate a more reliable UT, one possible approach(not without the loss of user-specific information) is as follows. For a user and an item bookmarked by this user, we reset her tag vector as the corresponding tag vectors from IT, reflecting crowd wisdom. In the matrix form, UT UI furl (IT). We call this idea tag generalization and have ested its effectiveness in Section 4 4. An Empirical Study The datasets used in our study were crawled from Delicious, the largest social bookmarking site, from June 2008 to December 2008. Two raw datasets consisting of bookmarking data from 5000 users each were used for our reported experiments. To reduce the size of the raw data, we removed users that had posted less than 15 URLs and items that had been saved by less than 15 users. To create an experimental testbed that can highlight differences between recommendation algorithms, we removed the URLs that had been bookmarked by more than 75 users, which accounted for about 1-2% of all URLs. This removal 19th Workshop on Information Technologies and Systems3.3 Recommendation Algorithm The basic intuition of subject-based recommendation is that a user’s interest in an item can be expressed through subjects. The probability of a user visiting an item is given by 𝑝𝑝(𝑖𝑖|𝑢𝑢) = ∑ 𝑝𝑝(𝑠𝑠|𝑢𝑢) ∙ 𝑝𝑝(𝑖𝑖|𝑠𝑠) 𝑠𝑠 , where 𝑝𝑝(𝑠𝑠|𝑢𝑢) indicates the probability of user 𝑢𝑢’s interest in subject 𝑠𝑠 and 𝑝𝑝(𝑖𝑖|𝑠𝑠) represents the probability of saving item 𝑖𝑖 when users are interested in subject 𝑠𝑠. This formula can be rewritten as 𝐔𝐔𝐔𝐔𝐏𝐏𝐏𝐏 = 𝐔𝐔𝐔𝐔𝐏𝐏𝐏𝐏 ∙ 𝐒𝐒𝐒𝐒𝐏𝐏𝐏𝐏. According to the above formula, estimating USPM and SIPM becomes another crucial step of our approach besides CONMF. Since we already have the coordinate values of users in the subject space, USPM can be computed by 𝐔𝐔𝐔𝐔𝐏𝐏𝐏𝐏 = 𝑓𝑓𝑢𝑢𝑢𝑢𝑢𝑢 (𝐔𝐔𝐔𝐔𝐂𝐂𝐂𝐂), i.e., the probability that a user is interested in a subject is proportional to her coordinate value on this subject. Likewise, ISPM can be calculated based on ISCM. However, simply transposing ISPM will not generate SIPM. Since ISPM and SIPM are actually marginal probability matrices of joint distribution F(I,S) on items and subjects respectively, we compute the joint distribution matrix of F(I,S) from ISPM and then estimate SIPM based on F(I,S). The details of computing SIPM is illustrated in Figure 3. Item ISCM Subject Item ISPM Subject Item IS Subject Subject SI Item Subject SIPM Item Unit Row Sum Multiply Item Freq Transpose Unit Row Sum Figure 3. Estimation of transition probability matrix SIPM 𝐒𝐒𝐒𝐒𝐏𝐏𝐏𝐏 = 𝑓𝑓𝑢𝑢𝑢𝑢𝑢𝑢 ((𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(I) ∙ 𝑓𝑓𝑢𝑢𝑢𝑢𝑢𝑢 (𝐈𝐈𝐈𝐈𝐂𝐂𝐂𝐂))T) (2) In short, SIPM is calculated using equation (2). Finally, we can compute UIPM based on USPM and SIPM and then recommend those items with the highest probabilities to each user (if the items have not been bookmarked before). 3.4 Tag Generalization As discussed in Section 1, some users tend to use tags which are meaningless to other users. These tags are noises disturbing matrix UT and IT. Since UT records tag frequencies of individual users, there will be considerable noises in it. However, IT is the aggregation of all users’ tagging behavior on items, thus it is much more resistant to noisy tags. As a result of this different noise ratio in UT and IT, combining them directly for factorization may lead to poor results. In order to generate a more reliable UT, one possible approach (not without the loss of user-specific information) is as follows. For a user and an item bookmarked by this user, we reset her tag vector as the corresponding tag vectors from IT, reflecting crowd wisdom. In the matrix form, 𝐔𝐔𝐔𝐔 = 𝐔𝐔𝐔𝐔 ∙ 𝑓𝑓𝑢𝑢𝑢𝑢𝑢𝑢 (𝐈𝐈𝐈𝐈). We call this idea tag generalization and have tested its effectiveness in Section 4. 4. An Empirical Study The datasets used in our study were crawled from Delicious, the largest social bookmarking site, from June 2008 to December 2008. Two raw datasets consisting of bookmarking data from 5000 users each were used for our reported experiments. To reduce the size of the raw data, we removed users that had posted less than 15 URLs and items that had been saved by less than 15 users. To create an experimental testbed that can highlight differences between recommendation algorithms, we removed the URLs that had been bookmarked by more than 75 users, which accounted for about 1~2% of all URLs. This removal Computing SIPM  Compute ISPM by normalizing ISCM to unit row sum  Multiply each row of ISPM by the frequency of this item being saved in the training set, which is equivalent to multiplying the diagonal matrix of items’ frequencies diag(I) by ISPM.  Transpose IS so as to get SI  Obtain SIPM by normalizing SI to unit row sum 76 19th Workshop on Information Technologies and Systems
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有