正在加载图片...
matrices whose product approximates the input matrix. CONMf takes two matrices as input and produce two correlated factorizations to approximate the two input matrices. The imposed correlation guarantees consistency, needed for reinterpretation of these two resulting factorizations in a meaningful manner in Based on CONMF results, we are able to build a computational model for tagging data as shown in Figure 1(b), where UScM, IScm and SIBy denote the user-subject, item-subject and tag-subject inter- relationships, respectively. We called this model a"subject-centered"model. This model can be viewed as a generalization of the tripartite graph. If no tag clustering is performed the subjects are an identity mapping to tags and the subject-centered model degenerates into the tripartite graph. In our subject centered model, subject becomes the driving force motivating a user to save an item and assign tags. In other words, a user bookmarks an item because this item falls into her subject interests and a user assigns a tag because this tag is able to describe her subject interest with the item being saving. The solid edges shown in Figure 1(b) indicate the presumed structure of tagging data, whereas the dashed edges indicate the real-world observation of users' tagging behavior. Since the clustered subjects are represented by weighed tag vectors, the subject-centered model can help solve some of the key problems the tripartite graph-based method is facing. More specifically, (i) the range of subjects is meaningfully reduced; (ii)tag semantic ambiguity is lessened; (iii) synonymic tags are grouped together to describe the same subject; and (iv) the noises from meaningless tags are discounted 3. Subject-based recommendation In this section, we present how to make recommendations based on the subject-centered model. We first extract the hidden subjects from users' tagging activities via CONMF, and then estimate the probabilities of items being saved by users based on the constructed subject-centered model 3.I Notation We view users, items, tags, and subjects as four random variables and denote them by U, I, t,s, respectively. The joint distribution of any two random variables X and Y is denoted as F(X,Y). The co. occurrence matrix of two entities is represented by the combination of their corresponding letters. For example, UT stands for the user-tag co-occurrence matrix. We use different subscripts to distinguish different types of matrices. Subscript"PM indicates a transition probability matrix(e. g, USpM stands for the probabilities of users getting interested in different subjects); subscript"BM"indicates basis matrix (e.g, STBM Stands for the basis matrix of subject space); subscript"CM represents coordinate matrix, (e.g, UScM holds the coordinate values of users in the subject space). In addition, the symbol fM) furs( M)means normalizing matrix M to unit row length/sum 3. 2 Subject extraction CONMF is employed to discover the subjects hidden in tagging data In Figure 2(b), the weights for matrix UT and IT are assumed to be equal although different weights can be used as well. In our approach, UT and It are normalized to unit row length so that all users are equally weighed with the weight for UT and all items are equally weighed with the weight for It. We choose these weights such that the sum of weights for UT and IT is 2. ConMf can then be formally written as c·furl(UT) (2-c)·frl(T) IS STBM 1) where c(0 <c<2)reflects the tradeoff between contributions from UT and It to the extracted subjects In fact, combining UT and IT for factorization not only guarantees a consistent clustering result on tags, but also enables the clustered subjects to aggregate information from both matrices. In our experiment, CONMF is implemented with the Matlab library function nnmf. The desired number of subjects, k, is selected manually before-hand and the optimum value of this parameter depends on the dataset. However, according to our experiments, quite stable results can be obtained in a large range of k(e. g, 50-250) 19th Workshop on Information Technologies and Systemsmatrices whose product approximates the input matrix. CONMF takes two matrices as input and produce two correlated factorizations to approximate the two input matrices. The imposed correlation guarantees consistency, needed for reinterpretation of these two resulting factorizations in a meaningful manner in the tagging domain. Based on CONMF results, we are able to build a computational model for tagging data as shown in Figure 1 (b), where USCM, ISCM and SIBM denote the user-subject, item-subject and tag-subject inter￾relationships, respectively. We called this model a “subject-centered” model. This model can be viewed as a generalization of the tripartite graph. If no tag clustering is performed, the subjects are an identity mapping to tags and the subject-centered model degenerates into the tripartite graph. In our subject￾centered model, subject becomes the driving force motivating a user to save an item and assign tags. In other words, a user bookmarks an item because this item falls into her subject interests and a user assigns a tag because this tag is able to describe her subject interest with the item being saving. The solid edges shown in Figure 1 (b) indicate the presumed structure of tagging data, whereas the dashed edges indicate the real-world observation of users’ tagging behavior. Since the clustered subjects are represented by weighed tag vectors, the subject-centered model can help solve some of the key problems the tripartite graph-based method is facing. More specifically, (i) the range of subjects is meaningfully reduced; (ii) tag semantic ambiguity is lessened; (iii) synonymic tags are grouped together to describe the same subject; and (iv) the noises from meaningless tags are discounted. 3. Subject-based Recommendation In this section, we present how to make recommendations based on the subject-centered model. We first extract the hidden subjects from users’ tagging activities via CONMF, and then estimate the probabilities of items being saved by users based on the constructed subject-centered model. 3.1 Notation We view users, items, tags, and subjects as four random variables and denote them by U, I, T, S, respectively. The joint distribution of any two random variables X and Y is denoted as F(X,Y). The co￾occurrence matrix of two entities is represented by the combination of their corresponding letters. For example, UT stands for the user-tag co-occurrence matrix. We use different subscripts to distinguish different types of matrices. Subscript “PM” indicates a transition probability matrix (e.g., USPM stands for the probabilities of users getting interested in different subjects); subscript “BM” indicates basis matrix (e.g., STBM stands for the basis matrix of subject space); subscript “CM” represents coordinate matrix, (e.g., USCM holds the coordinate values of users in the subject space). In addition, the symbol furl(M)/ furs(M) means normalizing matrix M to unit row length/sum. 3.2 Subject Extraction CONMF is employed to discover the subjects hidden in tagging data. In Figure 2 (b), the weights for matrix UT and IT are assumed to be equal although different weights can be used as well. In our approach, UT and IT are normalized to unit row length so that all users are equally weighed with the weight for UT and all items are equally weighed with the weight for IT. We choose these weights such that the sum of weights for UT and IT is 2. CONMF can then be formally written as: � 𝑐𝑐 ∙ 𝑓𝑓𝑢𝑢𝑢𝑢𝑢𝑢 (𝐔𝐔𝐔𝐔) (2 − 𝑐𝑐) ∙ 𝑓𝑓𝑢𝑢𝑢𝑢𝑢𝑢 (𝐈𝐈𝐈𝐈) � ≈ � 𝐔𝐔𝐔𝐔𝐂𝐂𝐂𝐂 𝐈𝐈𝐈𝐈𝐂𝐂𝐂𝐂 � ∙ 𝐒𝐒𝐒𝐒𝐁𝐁𝐁𝐁 (1) where 𝑐𝑐 (0 < 𝑐𝑐 < 2) reflects the tradeoff between contributions from UT and IT to the extracted subjects. In fact, combining UT and IT for factorization not only guarantees a consistent clustering result on tags, but also enables the clustered subjects to aggregate information from both matrices. In our experiment, CONMF is implemented with the Matlab library function nnmf. The desired number of subjects, k, is selected manually before-hand and the optimum value of this parameter depends on the dataset. However, according to our experiments, quite stable results can be obtained in a large range of k (e.g., 50~250). 75 19th Workshop on Information Technologies and Systems
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有