正在加载图片...
2. Consistent Nonnegative Matrix Factorization for Tag Clustering USERS WWEBSITES Figure 1. (a) Tripartite graph structure(b) Subject-centered model In order to overcome the above problems of associating each tag with a distinct topic under the tripartite structure, we propose to cluster tags to gain a more compact and informative representation of the underlying subjects that users actually intend with these tags. The common feature-based hard clustering algorithms(such as k-means)are not suitable for clustering tags for two reasons. First, the tags do not have a well-defined feature space associated with them. Second, many tags can be used to represent different subjects. In our work, we employ Nonnegative Matrix Factorization(NMF) to perform soft clustering on tags. NMf has been proved to be an effective technique to extract hidden structures in data that can be represented as matrices(.g, in image processing(Lee et al. 1999)and text mining lee et al 1999, Xu et al. 2003). Given a nonnegative matrix V, an NMF can be defined as v a WH, where W and H are constrained to be nonnegative. The columns of w are often restricted to be unit length so as to make the factorization result unique. From the aspect of clustering, factor matrix W contains cluster centroids and H contains cluster membership indicators. For notational convenience, a variant of the standard NMF is adopted in this paper, that is, VT HwT, where each row of w indicates a cluster centroid and each row of H holds the cluster membership of this row's corresponding object Subject User UT= User UScM x SubjectSTeM User UScM x Subject STaM IT x Subject STeM Figure 2.(a) NMFs on user-tag and item-tag matrices(b) Consistent NMF on user-item-tag matrix As shown in Figure 2(a), two types of NMF can be applied to find the cluster centroids of tags with each centroid representing a subject. UT and IT represent the user-tag and item-tag co-occurrence matrix, respectively, with each entry being the frequency of the corresponding <user, tag> or <item, tag> pair in the training set. Each row of sT BM indicates a clustered subject and the entire matrix constitutes the bases of the subject space. UScM/IScM holds the coordinate values of users/items in the subject space Nevertheless, this NMF approach is only able to capture the cluster membership of one entity (either user or item)on the clustered subjects. We argue, however, from both modeling and computational standpoints, it is critical to consider these two types of cluster membership together. To this end, we propose a new method called Consistent NMF (CONMF), as shown in Figure 2 (b), to capture both memberships as well as gain consistent clustering result on tags. The computational definition of CoNMf will be presented in Section 3. 2. Here we provide the basic intuition. NMF takes one matrix as input and produces two 19th Workshop on Information Technologies and Systems2. Consistent Nonnegative Matrix Factorization for Tag Clustering Subject UT IT User UI Item Tag USCM ISCM STBM (a) (b) Figure 1. (a) Tripartite graph structure (b) Subject-centered model In order to overcome the above problems of associating each tag with a distinct topic under the tripartite structure, we propose to cluster tags to gain a more compact and informative representation of the underlying subjects that users actually intend with these tags. The common feature-based hard clustering algorithms (such as k-means) are not suitable for clustering tags for two reasons. First, the tags do not have a well-defined feature space associated with them. Second, many tags can be used to represent different subjects. In our work, we employ Nonnegative Matrix Factorization (NMF) to perform soft clustering on tags. NMF has been proved to be an effective technique to extract hidden structures in data that can be represented as matrices (e.g., in image processing (Lee et al. 1999) and text mining (Lee et al. 1999; Xu et al. 2003)). Given a nonnegative matrix 𝐕𝐕, an NMF can be defined as 𝐕𝐕 ≈ 𝐖𝐖𝐖𝐖, where 𝐖𝐖 and 𝐇𝐇 are constrained to be nonnegative. The columns of 𝐖𝐖 are often restricted to be unit length so as to make the factorization result unique. From the aspect of clustering, factor matrix 𝐖𝐖 contains cluster centroids and 𝐇𝐇 contains cluster membership indicators. For notational convenience, a variant of the standard NMF is adopted in this paper, that is, 𝐕𝐕𝐓𝐓 ≈ 𝐇𝐇𝐓𝐓𝐖𝐖𝐓𝐓, where each row of 𝐖𝐖𝐓𝐓 indicates a cluster centroid and each row of 𝐇𝐇𝐓𝐓 holds the cluster membership of this row’s corresponding object. UT IT User Item Tag = USCM ISCM User Item Subject STBM ˣ Subject Tag = ˣ Subject STBM Tag Subject Tag UT IT User Item Tag = USCM ISCM User Item Subject STBM ˣ Subject Tag = (a) (b) Figure 2. (a) NMFs on user-tag and item-tag matrices (b) Consistent NMF on user-item-tag matrix As shown in Figure 2 (a), two types of NMF can be applied to find the cluster centroids of tags with each centroid representing a subject. UT and IT represent the user-tag and item-tag co-occurrence matrix, respectively, with each entry being the frequency of the corresponding <user,tag> or <item,tag> pair in the training set. Each row of STBM indicates a clustered subject and the entire matrix constitutes the bases of the subject space. USCM/ISCM holds the coordinate values of users/items in the subject space. Nevertheless, this NMF approach is only able to capture the cluster membership of one entity (either user or item) on the clustered subjects. We argue, however, from both modeling and computational standpoints, it is critical to consider these two types of cluster membership together. To this end, we propose a new method called Consistent NMF (CONMF), as shown in Figure 2 (b), to capture both memberships as well as gain consistent clustering result on tags. The computational definition of CONMF will be presented in Section 3.2. Here we provide the basic intuition. NMF takes one matrix as input and produces two 74 19th Workshop on Information Technologies and Systems
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有