Learning V The gradient vector and He_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Generalized latent factor models for social network analysis

正在加载图片...

Learning V 2009b]for community detection,which provides a good The gradient vector and Hessian matrix of Vi.can be platform for our empirical comparison. computed as follows: For MLFM and GLFM,we adopt k-means to perform OL clustering based on the normalized latent representation U.Here normalization means that the latent represen- avt -g+r[A-os】 tation of each node is divided by its length.Because the 82L 1 magnitude of Ui reflects the activity of i,we select the ∂V∂V* =-1-4} (1-Sk)Uf.UK. most active user as the first seed of the k-means,and then choose a point as the seed of the next community if summation of the distances between this point and all LetG:=-I-品∑kk{ZUg,Uk* we can the existing seeds is the largest one.Hence,the initial- 2L prove thatv品.之Gi. ization of k-means is fixed.We set Z =A.For fair Similar to the update rule for Uis,we can obtain the comparison,the hyper-parameters in GLFM and all the baselines to be compared,such as the T in(7),are cho- update rule for Vi as follows: sen from a wide range and the best results are reported. 「L More specifically,for GLFM,the r is fixed to 106,B and V.t+I)=V因-W7间 ×G:(t)-1 Y are set to 2,and D=20. where Vi(t)denotes the value of the former iteration 5.1 Data Sets and Evaluation Metric and Gi(t)is computed with the updated parameters ex- As in [Yang et al.,2009al,we use two paper citation cept for Vis. networks,Cora and Citeseer data sets4,for evaluation. Both data sets contain content information in addition Learningμ to the directed links. Using similar learning techniques as those for U and V, The Cora data set contains 2708 research papers from we can get the update rule for u: the 7 subfields of machine learning:case-based reason- 4[∑k≠(Ak-ZkSk)-Tμ(t] ing,genetic algorithms,neural networks,probabilistic (t+1)=μ(t)+ 4r+∑k≠：Zk methods,reinforcement learning,rule learning,and the- ory.Each paper is described by a 0/1-valued word vec- 4.3 Convergence and Computational tor indicating the absence/presence of the corresponding Complexity word from a dictionary of 1433 unique words.There are overall 5429 citations(links)between the papers. With the MM algorithm,the learning procedure of The Citeseer data set contains 3312 papers which GLFM is guaranteed to converge to a local maximum. can be classified into 6 categories.Each paper is de- The time complexity to compute the gradient and Hes- scribed by a 0/1-valued word vector indicating the ab sian for node i is linear to the total number of ones in sence/presence of the corresponding word from a dic- both Z.i and Zi.In general,this number is O(1)be- tionary of 3703 unique words.There are overall 4732 cause the observations in real networks are always very citations (links)between the papers.After deleting the sparse.Furthermore,since Hi and Gi are DxD,the self-links.we obtain 4715 links for our evaluation. computational complexity to invert the Hessian matrices As in [Yang et al.,2009a,we use Normalized Mutual is O(D3).Typically,D is a very small number.Hence, Information (NMI,Pairwise F-Measure (PWF)and to update the whole U and V,only O(N)time is needed. Modularity (Modu)as metrics to measure the clustering accuracy of our model.For all the algorithms,we set the 5 Experiment number of communities to the ground-truth number of There exist many different SNA tasks such as social po- class labels in the data. sition and role estimation Wasserman and Faust,1994. 5.2 Baselines ink prediction Hoff,2009.node classification Li et al.. 2009a;Li and Yeung,2009;Li et al.,2009b],community We compare GLFM with the closely related method detection [Yang et al.,2009al,and so on.In this paper, MLFM Hoff,2009].The U and V in both MLFM and we adopt the same evaluation strategy as that in [Yang GLFM are initialized by principal component analysis et al..2009a:2009b for social community detection.The (PCA)on the content information.In addition,we also main reason for choosing this task is that from our model adopt the methods introduced in [Yang et al.,2009al formulation we can clearly see the difference between and [Yang et al.,2009b]for comparison.Those methods GLFM and other latent factor models.However,many can be divided into three groups:link-based methods, other models from different research communities have content-based methods,link+content based methods. also been proposed for SNA.It is difficult to figure out The link-based methods include:PHITS [Cohn and the connection and difference between those models and Chang,2000],LDA-Link [Erosheva et al.,2004]-an ex- GLFM from the formulation perspective.Hence,we use tension of latent Dirichlet allocation (LDA)for link empirical evaluation to compare them.Most mainstream The two data sets can be downloaded from http://www. models have been compared in [Yang et al.,2009a; cs.umd.edu/projects/linqs/projects/1bc/index.html.Learning V The gradient vector and Hessian matrix of Vi∗ can be computed as follows: ∂L ∂VT i∗ = − 1 γ VT i∗ + 1 2 UT h A∗i − (Z∗i ◦ S∗i) i ∂ 2L ∂VT i∗ ∂Vi∗ = − 1 γ I − 1 4 X k,k6=i n ZkiSki(1 − Ski)UT k∗Uk∗ o . Let Gi = − 1 γ I − 1 16 P k,k6=i n ZkiUT k∗Uk∗ o , we can prove that ∂ 2L ∂VT i∗ ∂Vi∗ Gi . Similar to the update rule for Ui∗, we can obtain the update rule for Vi∗ as follows: Vi∗(t + 1) = Vi∗(t) − h ∂L ∂VT i∗ (t) iT × Gi(t) −1 , where Vi∗(t) denotes the value of the former iteration and Gi(t) is computed with the updated parameters except for Vi∗. Learning µ Using similar learning techniques as those for U and V, we can get the update rule for µ: µ(t + 1) = µ(t) + 4[P k6=i (Aik − ZikSik) − τµ(t)] 4τ + P k6=i Zik . 4.3 Convergence and Computational Complexity With the MM algorithm, the learning procedure of GLFM is guaranteed to converge to a local maximum. The time complexity to compute the gradient and Hessian for node i is linear to the total number of ones in both Z∗i and Zi∗. In general, this number is O(1) because the observations in real networks are always very sparse. Furthermore, since Hi and Gi are D×D, the computational complexity to invert the Hessian matrices is O(D3 ). Typically, D is a very small number. Hence, to update the whole U and V, only O(N) time is needed. 5 Experiment There exist many different SNA tasks such as social position and role estimation [Wasserman and Faust, 1994], link prediction [Hoff, 2009], node classification [Li et al., 2009a; Li and Yeung, 2009; Li et al., 2009b], community detection [Yang et al., 2009a], and so on. In this paper, we adopt the same evaluation strategy as that in [Yang et al., 2009a; 2009b] for social community detection. The main reason for choosing this task is that from our model formulation we can clearly see the difference between GLFM and other latent factor models. However, many other models from different research communities have also been proposed for SNA. It is difficult to figure out the connection and difference between those models and GLFM from the formulation perspective. Hence, we use empirical evaluation to compare them. Most mainstream models have been compared in [Yang et al., 2009a; 2009b] for community detection, which provides a good platform for our empirical comparison. For MLFM and GLFM, we adopt k-means to perform clustering based on the normalized latent representation U. Here normalization means that the latent representation of each node is divided by its length. Because the magnitude of Ui∗ reflects the activity of i, we select the most active user as the first seed of the k-means, and then choose a point as the seed of the next community if summation of the distances between this point and all the existing seeds is the largest one. Hence, the initialization of k-means is fixed. We set Z = A. For fair comparison, the hyper-parameters in GLFM and all the baselines to be compared, such as the τ in (7), are chosen from a wide range and the best results are reported. More specifically, for GLFM, the τ is fixed to 106 , β and γ are set to 2, and D = 20. 5.1 Data Sets and Evaluation Metric As in [Yang et al., 2009a], we use two paper citation networks, Cora and Citeseer data sets 4 , for evaluation. Both data sets contain content information in addition to the directed links. The Cora data set contains 2708 research papers from the 7 subfields of machine learning: case-based reasoning, genetic algorithms, neural networks, probabilistic methods, reinforcement learning, rule learning, and theory. Each paper is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from a dictionary of 1433 unique words. There are overall 5429 citations (links) between the papers. The Citeseer data set contains 3312 papers which can be classified into 6 categories. Each paper is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from a dictionary of 3703 unique words. There are overall 4732 citations (links) between the papers. After deleting the self-links, we obtain 4715 links for our evaluation. As in [Yang et al., 2009a], we use Normalized Mutual Information (NMI), Pairwise F-Measure (PWF) and Modularity (Modu) as metrics to measure the clustering accuracy of our model. For all the algorithms, we set the number of communities to the ground-truth number of class labels in the data. 5.2 Baselines We compare GLFM with the closely related method MLFM [Hoff, 2009]. The U and V in both MLFM and GLFM are initialized by principal component analysis (PCA) on the content information. In addition, we also adopt the methods introduced in [Yang et al., 2009a] and [Yang et al., 2009b] for comparison. Those methods can be divided into three groups: link-based methods, content-based methods, link+content based methods. The link-based methods include: PHITS [Cohn and Chang, 2000], LDA-Link [Erosheva et al., 2004]–an extension of latent Dirichlet allocation (LDA) for link 4The two data sets can be downloaded from http://www. cs.umd.edu/projects/linqs/projects/lbc/index.html

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Generalized latent factor models for social network analysis