Generalized Latent Factor Models for Social Network Analysis Wu-Jun Li',Dit-Yan Yeung,Zhihua Zhang t Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering,Shanghai Jiao Tong University,China t Department of Computer Science and Engineering Hong Kong University of Science and Technology,Hong Kong,China College of Computer Science and Technology,Zhejiang University,China liwujun@cs.sjtu.edu.cn,dyyeung@cse.ust.hk,zhzhang@cs.zju.edu.cn Abstract tween those nodes having different characteristics,we say the graph exhibits homophily.For example,two Homophily and stochastic equivalence are two individuals are more likely to be friends if they share primary features of interest in social networks common interests.Hence,a friendship graph has the Recently,the multiplicative latent factor model feature of homophily.On the other hand.if the nodes (MLFM)is proposed to model social net- of a graph can be divided into groups where members works with directed links.Although MLFM within a group have similar patterns of links,we say this can capture stochastic equivalence,it cannot graph exhibits stochastic equivalence.The web graph model well homophily in networks.However, has such a feature because some nodes can be described many real-world networks exhibit homophily as hubs which are connected to many other nodes called or both homophily and stochastic equivalence. authorities but the hubs or authorities are seldom con- and hence the network structure of these net- nected among themselves.For stochastic equivalence, works cannot be modeled well by MLFM.In the property that members within a group have similar this paper,we propose a novel model,called patterns of links also implies that if two nodes link to generalized latent factor model(GLFM),for so- or are linked by one common node,the two nodes most cial network analysis by enhancing homophily likely belong to the same group. modeling in MLFM.We devise a minorization- marimization(MM)algorithm with linear-time Examples of homophily and stochastic equivalence in directed graphs are illustrated in Figure 1,where the complexity and convergence guarantee to learn the model parameters.Extensive experiments locations in the 2-dimensional space denote the charac- teristics of the points (nodes).From Figure 1(a),we can on some real-world networks show that GLFM see that a link is more likely to exist between two points can effectively model homophily to dramati- cally outperform state-of-the-art methods. close to each other,which is the property of homophily. In Figure 1(b),the points form three groups associated with different colors,and the nodes in each group share 1 Introduction similar patterns of links to nodes in other groups,but the nodes in the same group are not necessarily con- A social network1 [Wasserman and Faust,1994]is often nected to each other.This is the property of stochastic represented as a graph in which the nodes represent the equivalence.Note that in a graph exhibiting stochastic objects and the edges (or called links)represent the bi- equivalence.two points close to each other are not nec- nary relations between objects.The edges in a graph can essarily connected to each other and connected points be directed or undirected.If the edges are directed,we are not necessarily close to each other,which is different call the graph a directed graph.Otherwise,the graph is from the property of homophily. an undirected graph.Unless otherwise stated,we focus on directed graphs in this paper because an undirected edge can be represented by two directed edges with oppo- site directions.Some typical networks include friendship networks among people,web graphs,and paper citation networks. As pointed out by Hoff,2008,homophily and stochas- tic equivalence are two primary features of interest in social networks.If an edge is more likely to exist be- tween two nodes with similar characteristics than be- (a)homophily (b)stochastic equivalence IIn this paper,we use the terms network','social network' Figure 1:Homophily and stochastic equivalence in net- and graph'interchangeably. works
Generalized Latent Factor Models for Social Network Analysis Wu-Jun Li† , Dit-Yan Yeung‡ , Zhihua Zhang] † Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong University, China ‡ Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China ] College of Computer Science and Technology, Zhejiang University, China liwujun@cs.sjtu.edu.cn, dyyeung@cse.ust.hk, zhzhang@cs.zju.edu.cn Abstract Homophily and stochastic equivalence are two primary features of interest in social networks. Recently, the multiplicative latent factor model (MLFM) is proposed to model social networks with directed links. Although MLFM can capture stochastic equivalence, it cannot model well homophily in networks. However, many real-world networks exhibit homophily or both homophily and stochastic equivalence, and hence the network structure of these networks cannot be modeled well by MLFM. In this paper, we propose a novel model, called generalized latent factor model (GLFM), for social network analysis by enhancing homophily modeling in MLFM. We devise a minorizationmaximization (MM) algorithm with linear-time complexity and convergence guarantee to learn the model parameters. Extensive experiments on some real-world networks show that GLFM can effectively model homophily to dramatically outperform state-of-the-art methods. 1 Introduction A social network 1 [Wasserman and Faust, 1994] is often represented as a graph in which the nodes represent the objects and the edges (or called links) represent the binary relations between objects. The edges in a graph can be directed or undirected. If the edges are directed, we call the graph a directed graph. Otherwise, the graph is an undirected graph. Unless otherwise stated, we focus on directed graphs in this paper because an undirected edge can be represented by two directed edges with opposite directions. Some typical networks include friendship networks among people, web graphs, and paper citation networks. As pointed out by [Hoff, 2008], homophily and stochastic equivalence are two primary features of interest in social networks. If an edge is more likely to exist between two nodes with similar characteristics than be- 1 In this paper, we use the terms ‘network’, ‘social network’ and ‘graph’ interchangeably. tween those nodes having different characteristics, we say the graph exhibits homophily. For example, two individuals are more likely to be friends if they share common interests. Hence, a friendship graph has the feature of homophily. On the other hand, if the nodes of a graph can be divided into groups where members within a group have similar patterns of links, we say this graph exhibits stochastic equivalence. The web graph has such a feature because some nodes can be described as hubs which are connected to many other nodes called authorities but the hubs or authorities are seldom connected among themselves. For stochastic equivalence, the property that members within a group have similar patterns of links also implies that if two nodes link to or are linked by one common node, the two nodes most likely belong to the same group. Examples of homophily and stochastic equivalence in directed graphs are illustrated in Figure 1, where the locations in the 2-dimensional space denote the characteristics of the points (nodes). From Figure 1(a), we can see that a link is more likely to exist between two points close to each other, which is the property of homophily. In Figure 1(b), the points form three groups associated with different colors, and the nodes in each group share similar patterns of links to nodes in other groups, but the nodes in the same group are not necessarily connected to each other. This is the property of stochastic equivalence. Note that in a graph exhibiting stochastic equivalence, two points close to each other are not necessarily connected to each other and connected points are not necessarily close to each other, which is different from the property of homophily. (a) homophily (b) stochastic equivalence Figure 1: Homophily and stochastic equivalence in networks
As social network analysis (SNA)is becoming more Zi=0 means that Aij is missing and more important in a wide range of applications, many SNA models have been proposed [Goldenberg et 3 Multiplicative Latent Factor Models al.,2009].In this paper,we focus on latent variable models Bartholomew and Knott,1999]which have been The latent eigenmodel is formulated as follows 2: successfully applied to model social networks Hoff.2008: Θk=log odds(Ak=1|Xi*,Xk*,)=μ+XiAX. Nowicki and Snijders,2001;Hoff et al.,2002;Kemp et al.,2006;Airoldi et al.,2008;Hoff,2009.These where X is an Nx D matrix with Xi.denoting the latent models include:the latent class model Nowicki and representation of node i and u is a parameter refecting Snijders,2001 and its extensions Kemp et al..2006: the overall density of the links in the network.A is a Dx Airoldi et al.,2008,the latent distance model Hoff et D diagonal matrix with the diagonal entries being either al.,2002],the latent eigenmodel Hoff,2008,and the positive or negative.The latent eigenmodel generalizes multiplicative latent factor model (MLFM)Hoff,2009]. both latent class models and latent distance models.It Among all these models,the recently proposed latent can model both homophily and stochastic equivalence in eigenmodel,which includes both the latent class model undirected graphs Hoff,2008]. and the latent distance model as special cases,can cap- To adapt the latent eigenmodel for directed graphs, ture both homophily and stochastic equivalence in net- MLFM defines works.However,it can only model undirected graphs. Θk=μ+Xi*AWK, (1) MLFM Hoff,2009]adapts the latent eigenmodel for di- rected graphs.However,as to be shown in our experi- where X and W have orthonormal columns.Note that ments,in fact it cannot model well homophily. the key difference between the latent eigenmodel and In this paper,we propose a novel model,called gen- MLFM lies in the fact that MLFM adopts a different eralized latent factor model(GLFM),for social network receiver factor matrix W which enables MLFM to model analysis by enhancing homophily modeling in MLFM. directed (asymmetric)graphs.As we will show in our The learning algorithm of GLFM is guaranteed to con- experiments,this modification in MLFM makes it fail to verge to a local optimum and has linear-time complexity. model homophily in networks. Hence,GLFM can be used to model large-scale graphs. Letting =we can rewrite MLFM as fol- Extensive experiments on community detection in some lows: real-world networks show that GLFM dramatically out- Θ=E+XAWT (2) performs existing methods. where E is an NxN matrix with all entries being 1. We can find that MLFM is a special case of the fol- Notation and Definition lowing model: ⊙=uE+UVT (3) We use boldface uppercase letters,such as K,to denote For example,we can get MLFM by setting U=X and matrices,and boldface lowercase letters,such as x,to V=WA.Furthermore,it is easy to compute the X, denote vectors.The ith row and the jth column of a matrix K are denoted as Ki and K.j,respectively.Kij W and A in (2)based on the learned U and V in (3). Hence,in the sequel,MLFM refers to the model in (3). denotes the element at the ith row and ith column in K. zi denotes the ith element in x.We use tr(K)to denote 4 Generalized Latent factor models its trace,KT for its transpose and K-1 for its inverse. Il·‖is used to denote the length of a vector.|·|de- As discussed above,MLFM can capture stochastic equiv- notes the cardinality of a set.I denotes the identity ma- alence but cannot model well homophily in directed trix whose dimensionality depends on the context.For a graphs.Here,we propose our GLFM to enhance ho- matrix K,K -0 means that K is positive semi-definite mophily modeling in MLFM. (psd)and K0 means that K is positive definite(pd). K M means K-M 0.A()denotes the normal 4.1 Model distribution,either for scalars or vectors.o denotes the In GLFM,ik is defined as follows: Hadamard product (element-wise product). Let N denote the number of nodes in a graph.A is O=+U.+U..VI. (4) the adjacency (link)matrix for the N nodes.Aij=1 if there exists a link from node i to node j.Otherwise, Comparing(4)to (3),we can find that GLFM gener- Aij =0.D denotes the number of latent factors.In alizes MLFM by adding an extra term Ui.Uf.3 It is real-world networks,if Aij=1,we can say that there is a relation from i to j.However,Aij =0 does not 2Note that in this paper,we assume for simplicity that there is no attribute information for the links.It is straight- necessarily mean that there is no relation from i to j. forward to integrate attribute information into the existing In most cases,Aij=0 means that the relationship from latent variable models as well as our proposed model. i to j is missing.Hence,we use an indicator matrix Z SNote that the coefficient in(4)makes no essential dif- to indicate whether or not an element is missing.More ference between (4)and (3).It is only for convenience of specifically,Zij =1 means that Aij is observed while computation
As social network analysis (SNA) is becoming more and more important in a wide range of applications, many SNA models have been proposed [Goldenberg et al., 2009]. In this paper, we focus on latent variable models [Bartholomew and Knott, 1999] which have been successfully applied to model social networks [Hoff, 2008; Nowicki and Snijders, 2001; Hoff et al., 2002; Kemp et al., 2006; Airoldi et al., 2008; Hoff, 2009]. These models include: the latent class model [Nowicki and Snijders, 2001] and its extensions [Kemp et al., 2006; Airoldi et al., 2008], the latent distance model [Hoff et al., 2002], the latent eigenmodel [Hoff, 2008], and the multiplicative latent factor model (MLFM) [Hoff, 2009]. Among all these models, the recently proposed latent eigenmodel, which includes both the latent class model and the latent distance model as special cases, can capture both homophily and stochastic equivalence in networks. However, it can only model undirected graphs. MLFM [Hoff, 2009] adapts the latent eigenmodel for directed graphs. However, as to be shown in our experiments, in fact it cannot model well homophily. In this paper, we propose a novel model, called generalized latent factor model (GLFM), for social network analysis by enhancing homophily modeling in MLFM. The learning algorithm of GLFM is guaranteed to converge to a local optimum and has linear-time complexity. Hence, GLFM can be used to model large-scale graphs. Extensive experiments on community detection in some real-world networks show that GLFM dramatically outperforms existing methods. 2 Notation and Definition We use boldface uppercase letters, such as K, to denote matrices, and boldface lowercase letters, such as x, to denote vectors. The ith row and the jth column of a matrix K are denoted as Ki∗ and K∗j , respectively. Kij denotes the element at the ith row and jth column in K. xi denotes the ith element in x. We use tr(K) to denote its trace, KT for its transpose and K−1 for its inverse. k · k is used to denote the length of a vector. | · | denotes the cardinality of a set. I denotes the identity matrix whose dimensionality depends on the context. For a matrix K, K 0 means that K is positive semi-definite (psd) and K 0 means that K is positive definite (pd). K M means K − M 0. N (·) denotes the normal distribution, either for scalars or vectors. ◦ denotes the Hadamard product (element-wise product). Let N denote the number of nodes in a graph. A is the adjacency (link) matrix for the N nodes. Aij = 1 if there exists a link from node i to node j. Otherwise, Aij = 0. D denotes the number of latent factors. In real-world networks, if Aij = 1, we can say that there is a relation from i to j. However, Aij = 0 does not necessarily mean that there is no relation from i to j. In most cases, Aij = 0 means that the relationship from i to j is missing. Hence, we use an indicator matrix Z to indicate whether or not an element is missing. More specifically, Zij = 1 means that Aij is observed while Zij = 0 means that Aij is missing. 3 Multiplicative Latent Factor Models The latent eigenmodel is formulated as follows 2 : Θik = log odds(Aik = 1 | Xi∗, Xk∗, µ) = µ + Xi∗ΛXT k∗ , where X is an N ×D matrix with Xi∗ denoting the latent representation of node i and µ is a parameter reflecting the overall density of the links in the network, Λ is a D× D diagonal matrix with the diagonal entries being either positive or negative. The latent eigenmodel generalizes both latent class models and latent distance models. It can model both homophily and stochastic equivalence in undirected graphs [Hoff, 2008]. To adapt the latent eigenmodel for directed graphs, MLFM defines Θik = µ + Xi∗ΛWT k∗ , (1) where X and W have orthonormal columns. Note that the key difference between the latent eigenmodel and MLFM lies in the fact that MLFM adopts a different receiver factor matrix W which enables MLFM to model directed (asymmetric) graphs. As we will show in our experiments, this modification in MLFM makes it fail to model homophily in networks. Letting Θ = [Θik] N i,k=1, we can rewrite MLFM as follows: Θ = µE + XΛWT , (2) where E is an N×N matrix with all entries being 1. We can find that MLFM is a special case of the following model: Θ = µE + UVT . (3) For example, we can get MLFM by setting U = X and V = WΛ. Furthermore, it is easy to compute the X, W and Λ in (2) based on the learned U and V in (3). Hence, in the sequel, MLFM refers to the model in (3). 4 Generalized Latent Factor Models As discussed above, MLFM can capture stochastic equivalence but cannot model well homophily in directed graphs. Here, we propose our GLFM to enhance homophily modeling in MLFM. 4.1 Model In GLFM, Θik is defined as follows: Θik = µ + 1 2 Ui∗UT k∗ + 1 2 Ui∗VT k∗ . (4) Comparing (4) to (3), we can find that GLFM generalizes MLFM by adding an extra term Ui∗UT k∗ . 3 It is 2Note that in this paper, we assume for simplicity that there is no attribute information for the links. It is straightforward to integrate attribute information into the existing latent variable models as well as our proposed model. 3Note that the coefficient 1 2 in (4) makes no essential difference between (4) and (3). It is only for convenience of computation
this extra term that enables GLFM to model homophily Learning U in networks,which will be detailed in Section 4.2 when To learn U,we optimize each row of it with all other we analyze the objective function in(7).This will also rows fixed.The gradient vector and Hessian matrix can be demonstrated empirically later in our experiments. be computed as follows: Based on(4),the likelihood of the observations can be defined as follows: p(A|U,V,川=IISt(1-5k)-4]2, (5) VT [AI-(Z. i≠k where exp(Θk) +U[A+A.-(Zi.)TZ Sik=1+exp(Θk)】 (6) Note that as in the conventional SNA model,we ignore 元8U.=--是∑{aau1-5UgU} 02L the diagonal elements of A.That is,in this paper,we set Aii=Zii=0 by default. Furthermore,we put normal priors on the param- eters4,U and V:p(u)=W(μ|0,T-1),p(U)= (ZaSud-5)+V+V. Π品1N(Udl0,I),pV)=Π品1N(Vdl0,. 4.2 Learning Because both the gradient vector and Hessian matrix depend on Si which is a function of Ui,we have to Although the Markov chain Monte Carlo (MCMC)al- resort to iterative methods to find the optimal values. gorithms designed for other latent variable models can Here,we devise a minorization-marimization (MM)al- easily be adapted for GLFM,we do not adopt MCMC gorithm Lang et al.,2000]to learn it.MM is a so-called here for GLFM because MCMC methods typically incur expectation-maximization (EM)algorithm Dempster et very high computational cost.In this paper,we adopt al.,1977 without missing data,alternating between con- the marimum a posteriori(MAP)estimation strategy to structing a concave lower bound of the objective function learn the parameters.The log posterior probability can and maximizing that bound. be computed as follows: Because 0<Sik<we can get Sik(1-Sik)<. L=∑{货4U.U.+4U.Vv.+4h Let us define: i≠k H=--{zaw+vru+v -Zik log[1+exp(Θk]} k.k≠ -嘉uun-vn-2+6 (7) k.k1 where c is a constant independent of the parameters. 82L Note that in (7)we assume that all existing links should It is easy to prove that auH. be observed.That is to say,if Aik=1,then Zik=1. Let The term AikUiUf,in (7)results from the extra term UiU.in (4).In (7),to maximize the objective fU)=LU.)+U.-U.创×U因 aL function L,we have to make UiUr as large as pos- sible if there exists a link between nodes i and k(i.e., Aik 1).This conforms to the property of homophily, +j[V.-V.(JH(@[U.-V.F, i.e.,a link is more likely to exist between two nodes with similar characteristics than between those nodes having where Ui.(t)denotes the value of the former iteration different characteristics.Note that here the latent fac- and Hi(t)is computed with the updated U except for tor Ui.reflects the characteristics of node i.Therefore, Uis the extra term UiU.in (4)enables GLFM to model Then we have the following theorem: homophily in networks. Theorem1L(Ui*)≥f(Ui,),which means that If we directly optimize all the parameters U,V and f(Uis)is a lower bound of L(Uis). u jointly,the computational cost will be very high.For example,if we want to use the second-order information, The proof of Theorem 1 is simple and we omit it here. generally we need to invert the Hessian matrix where the We can see that f(Ui)has a quadratic form of Ui. time complexity is cubic in the number of parameters. By setting the gradient of f(Ui)with respect to Ui.to Here,we adopt an alternating projection strategy to 0,we have the update rule for Ui.: maximize L.More specifically,each time we optimize one parameter,such as U,with the other parameters fixed. at+)=U-[0×H
this extra term that enables GLFM to model homophily in networks, which will be detailed in Section 4.2 when we analyze the objective function in (7). This will also be demonstrated empirically later in our experiments. Based on (4), the likelihood of the observations can be defined as follows: p(A | U, V, µ) = Y i6=k [S Aik ik (1 − Sik) 1−Aik ] Zik , (5) where Sik = exp (Θik) 1 + exp(Θik) . (6) Note that as in the conventional SNA model, we ignore the diagonal elements of A. That is, in this paper, we set Aii = Zii = 0 by default. Furthermore, we put normal priors on the parameters µ, U and V: p(µ) = N (µ | 0, τ −1 ), p(U) = QD d=1 N (U∗d | 0, βI), p(V) = QD d=1 N (V∗d | 0, γI). 4.2 Learning Although the Markov chain Monte Carlo (MCMC) algorithms designed for other latent variable models can easily be adapted for GLFM, we do not adopt MCMC here for GLFM because MCMC methods typically incur very high computational cost. In this paper, we adopt the maximum a posteriori (MAP) estimation strategy to learn the parameters. The log posterior probability can be computed as follows: L = X i6=k n1 2 AikUi∗UT k∗ + 1 2 AikUi∗VT k∗ + Aikµ − Zik log 1 + exp(Θik) o − 1 2β tr(UUT ) − 1 2γ tr(VVT ) − τ 2 µ 2 + c, (7) where c is a constant independent of the parameters. Note that in (7) we assume that all existing links should be observed. That is to say, if Aik = 1, then Zik = 1. The term AikUi∗UT k∗ in (7) results from the extra term Ui∗UT k∗ in (4). In (7), to maximize the objective function L, we have to make Ui∗UT k∗ as large as possible if there exists a link between nodes i and k (i.e., Aik = 1). This conforms to the property of homophily, i.e., a link is more likely to exist between two nodes with similar characteristics than between those nodes having different characteristics. Note that here the latent factor Ui∗ reflects the characteristics of node i. Therefore, the extra term Ui∗UT k∗ in (4) enables GLFM to model homophily in networks. If we directly optimize all the parameters U, V and µ jointly, the computational cost will be very high. For example, if we want to use the second-order information, generally we need to invert the Hessian matrix where the time complexity is cubic in the number of parameters. Here, we adopt an alternating projection strategy to maximize L. More specifically, each time we optimize one parameter, such as U, with the other parameters fixed. Learning U To learn U, we optimize each row of it with all other rows fixed. The gradient vector and Hessian matrix can be computed as follows: ∂L ∂UT i∗ = − 1 β UT i∗ + 1 2 VT h AT i∗ − (Zi∗ ◦ Si∗) T i + 1 2 UT h AT i∗ + A∗i − (Zi∗ ◦ Si∗) T − Z∗i ◦ S∗i i , ∂ 2L ∂UT i∗ ∂Ui∗ = − 1 β I − 1 4 X k,k6=i n ZkiSki(1 − Ski)UT k∗Uk∗ o − 1 4 X k,k6=i n ZikSik(1 − Sik)[Uk∗ + Vk∗] T [Uk∗ + Vk∗] o . Because both the gradient vector and Hessian matrix depend on Si∗ which is a function of Ui∗, we have to resort to iterative methods to find the optimal values. Here, we devise a minorization-maximization (MM) algorithm [Lang et al., 2000] to learn it. MM is a so-called expectation-maximization (EM) algorithm [Dempster et al., 1977] without missing data, alternating between constructing a concave lower bound of the objective function and maximizing that bound. Because 0 < Sik < 1 2 , we can get Sik(1 − Sik) < 1 4 . Let us define: Hi = − 1 β I − 1 16 X k,k6=i n Zik[Uk∗ + Vk∗] T [Uk∗ + Vk∗] o − 1 16 X k,k6=i n ZkiUT k∗Uk∗ o . It is easy to prove that ∂ 2L ∂UT i∗ ∂Ui∗ Hi . Let f(Ui∗) =L(Ui∗(t)) + [Ui∗ − Ui∗(t)] × ∂L ∂UT i∗ (t) + 1 2 [Ui∗ − Ui∗(t)]Hi(t)[Ui∗ − Ui∗(t)]T , where Ui∗(t) denotes the value of the former iteration and Hi(t) is computed with the updated U except for Ui∗. Then we have the following theorem: Theorem 1 L(Ui∗) ≥ f(Ui∗), which means that f(Ui∗) is a lower bound of L(Ui∗). The proof of Theorem 1 is simple and we omit it here. We can see that f(Ui∗) has a quadratic form of Ui∗. By setting the gradient of f(Ui∗) with respect to Ui∗ to 0, we have the update rule for Ui∗: Ui∗(t + 1) = Ui∗(t) − h ∂L ∂UT i∗ (t) iT × Hi(t) −1 .
Learning V 2009b]for community detection,which provides a good The gradient vector and Hessian matrix of Vi.can be platform for our empirical comparison. computed as follows: For MLFM and GLFM,we adopt k-means to perform OL clustering based on the normalized latent representation U.Here normalization means that the latent represen- avt -g+r[A-os】 tation of each node is divided by its length.Because the 82L 1 magnitude of Ui reflects the activity of i,we select the ∂V∂V* =-1-4} (1-Sk)Uf.UK. most active user as the first seed of the k-means,and then choose a point as the seed of the next community if summation of the distances between this point and all LetG:=-I-品∑kk{ZUg,Uk* we can the existing seeds is the largest one.Hence,the initial- 2L prove thatv品.之Gi. ization of k-means is fixed.We set Z =A.For fair Similar to the update rule for Uis,we can obtain the comparison,the hyper-parameters in GLFM and all the baselines to be compared,such as the T in(7),are cho- update rule for Vi as follows: sen from a wide range and the best results are reported. 「L More specifically,for GLFM,the r is fixed to 106,B and V.t+I)=V因-W7间 ×G:(t)-1 Y are set to 2,and D=20. where Vi(t)denotes the value of the former iteration 5.1 Data Sets and Evaluation Metric and Gi(t)is computed with the updated parameters ex- As in [Yang et al.,2009al,we use two paper citation cept for Vis. networks,Cora and Citeseer data sets4,for evaluation. Both data sets contain content information in addition Learningμ to the directed links. Using similar learning techniques as those for U and V, The Cora data set contains 2708 research papers from we can get the update rule for u: the 7 subfields of machine learning:case-based reason- 4[∑k≠(Ak-ZkSk)-Tμ(t] ing,genetic algorithms,neural networks,probabilistic (t+1)=μ(t)+ 4r+∑k≠:Zk methods,reinforcement learning,rule learning,and the- ory.Each paper is described by a 0/1-valued word vec- 4.3 Convergence and Computational tor indicating the absence/presence of the corresponding Complexity word from a dictionary of 1433 unique words.There are overall 5429 citations(links)between the papers. With the MM algorithm,the learning procedure of The Citeseer data set contains 3312 papers which GLFM is guaranteed to converge to a local maximum. can be classified into 6 categories.Each paper is de- The time complexity to compute the gradient and Hes- scribed by a 0/1-valued word vector indicating the ab sian for node i is linear to the total number of ones in sence/presence of the corresponding word from a dic- both Z.i and Zi.In general,this number is O(1)be- tionary of 3703 unique words.There are overall 4732 cause the observations in real networks are always very citations (links)between the papers.After deleting the sparse.Furthermore,since Hi and Gi are DxD,the self-links.we obtain 4715 links for our evaluation. computational complexity to invert the Hessian matrices As in [Yang et al.,2009a,we use Normalized Mutual is O(D3).Typically,D is a very small number.Hence, Information (NMI,Pairwise F-Measure (PWF)and to update the whole U and V,only O(N)time is needed. Modularity (Modu)as metrics to measure the clustering accuracy of our model.For all the algorithms,we set the 5 Experiment number of communities to the ground-truth number of There exist many different SNA tasks such as social po- class labels in the data. sition and role estimation Wasserman and Faust,1994. 5.2 Baselines ink prediction Hoff,2009.node classification Li et al.. 2009a;Li and Yeung,2009;Li et al.,2009b],community We compare GLFM with the closely related method detection [Yang et al.,2009al,and so on.In this paper, MLFM Hoff,2009].The U and V in both MLFM and we adopt the same evaluation strategy as that in [Yang GLFM are initialized by principal component analysis et al..2009a:2009b for social community detection.The (PCA)on the content information.In addition,we also main reason for choosing this task is that from our model adopt the methods introduced in [Yang et al.,2009al formulation we can clearly see the difference between and [Yang et al.,2009b]for comparison.Those methods GLFM and other latent factor models.However,many can be divided into three groups:link-based methods, other models from different research communities have content-based methods,link+content based methods. also been proposed for SNA.It is difficult to figure out The link-based methods include:PHITS [Cohn and the connection and difference between those models and Chang,2000],LDA-Link [Erosheva et al.,2004]-an ex- GLFM from the formulation perspective.Hence,we use tension of latent Dirichlet allocation (LDA)for link empirical evaluation to compare them.Most mainstream The two data sets can be downloaded from http://www. models have been compared in [Yang et al.,2009a; cs.umd.edu/projects/linqs/projects/1bc/index.html
Learning V The gradient vector and Hessian matrix of Vi∗ can be computed as follows: ∂L ∂VT i∗ = − 1 γ VT i∗ + 1 2 UT h A∗i − (Z∗i ◦ S∗i) i ∂ 2L ∂VT i∗ ∂Vi∗ = − 1 γ I − 1 4 X k,k6=i n ZkiSki(1 − Ski)UT k∗Uk∗ o . Let Gi = − 1 γ I − 1 16 P k,k6=i n ZkiUT k∗Uk∗ o , we can prove that ∂ 2L ∂VT i∗ ∂Vi∗ Gi . Similar to the update rule for Ui∗, we can obtain the update rule for Vi∗ as follows: Vi∗(t + 1) = Vi∗(t) − h ∂L ∂VT i∗ (t) iT × Gi(t) −1 , where Vi∗(t) denotes the value of the former iteration and Gi(t) is computed with the updated parameters except for Vi∗. Learning µ Using similar learning techniques as those for U and V, we can get the update rule for µ: µ(t + 1) = µ(t) + 4[P k6=i (Aik − ZikSik) − τµ(t)] 4τ + P k6=i Zik . 4.3 Convergence and Computational Complexity With the MM algorithm, the learning procedure of GLFM is guaranteed to converge to a local maximum. The time complexity to compute the gradient and Hessian for node i is linear to the total number of ones in both Z∗i and Zi∗. In general, this number is O(1) because the observations in real networks are always very sparse. Furthermore, since Hi and Gi are D×D, the computational complexity to invert the Hessian matrices is O(D3 ). Typically, D is a very small number. Hence, to update the whole U and V, only O(N) time is needed. 5 Experiment There exist many different SNA tasks such as social position and role estimation [Wasserman and Faust, 1994], link prediction [Hoff, 2009], node classification [Li et al., 2009a; Li and Yeung, 2009; Li et al., 2009b], community detection [Yang et al., 2009a], and so on. In this paper, we adopt the same evaluation strategy as that in [Yang et al., 2009a; 2009b] for social community detection. The main reason for choosing this task is that from our model formulation we can clearly see the difference between GLFM and other latent factor models. However, many other models from different research communities have also been proposed for SNA. It is difficult to figure out the connection and difference between those models and GLFM from the formulation perspective. Hence, we use empirical evaluation to compare them. Most mainstream models have been compared in [Yang et al., 2009a; 2009b] for community detection, which provides a good platform for our empirical comparison. For MLFM and GLFM, we adopt k-means to perform clustering based on the normalized latent representation U. Here normalization means that the latent representation of each node is divided by its length. Because the magnitude of Ui∗ reflects the activity of i, we select the most active user as the first seed of the k-means, and then choose a point as the seed of the next community if summation of the distances between this point and all the existing seeds is the largest one. Hence, the initialization of k-means is fixed. We set Z = A. For fair comparison, the hyper-parameters in GLFM and all the baselines to be compared, such as the τ in (7), are chosen from a wide range and the best results are reported. More specifically, for GLFM, the τ is fixed to 106 , β and γ are set to 2, and D = 20. 5.1 Data Sets and Evaluation Metric As in [Yang et al., 2009a], we use two paper citation networks, Cora and Citeseer data sets 4 , for evaluation. Both data sets contain content information in addition to the directed links. The Cora data set contains 2708 research papers from the 7 subfields of machine learning: case-based reasoning, genetic algorithms, neural networks, probabilistic methods, reinforcement learning, rule learning, and theory. Each paper is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from a dictionary of 1433 unique words. There are overall 5429 citations (links) between the papers. The Citeseer data set contains 3312 papers which can be classified into 6 categories. Each paper is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from a dictionary of 3703 unique words. There are overall 4732 citations (links) between the papers. After deleting the self-links, we obtain 4715 links for our evaluation. As in [Yang et al., 2009a], we use Normalized Mutual Information (NMI), Pairwise F-Measure (PWF) and Modularity (Modu) as metrics to measure the clustering accuracy of our model. For all the algorithms, we set the number of communities to the ground-truth number of class labels in the data. 5.2 Baselines We compare GLFM with the closely related method MLFM [Hoff, 2009]. The U and V in both MLFM and GLFM are initialized by principal component analysis (PCA) on the content information. In addition, we also adopt the methods introduced in [Yang et al., 2009a] and [Yang et al., 2009b] for comparison. Those methods can be divided into three groups: link-based methods, content-based methods, link+content based methods. The link-based methods include: PHITS [Cohn and Chang, 2000], LDA-Link [Erosheva et al., 2004]–an extension of latent Dirichlet allocation (LDA) for link 4The two data sets can be downloaded from http://www. cs.umd.edu/projects/linqs/projects/lbc/index.html
analysis,the popularity-based conditional link model (PCL)[Yang et al.,2009b],and the normalized cut (NCUT)for spectral clustering [Shi and Malik,2000]. The content-based methods include:the probabilistic latent semantic analysis (PLSA)Hofmann,1999,LDA- Word,and NCUT respectively with the Gaussian RBF kernel and the probabilistic product (PP)kernel Jebara etal.,2004. The link+content based methods include:PHITS- PLSA [Cohn and Hofmann,2000],LDA-Link-Word [Ero- (a)MLFM (b)MLFM (normalized) sheva et al.,2004. Link-Content-Factorization (LCF)Zhu et al..2007.NCUT.PCL-PLSA.PHITS- DC,PCL-DC and C-PLDC [Yang et al.,2009al.Here PCL-PLSA represents the combination of PCL and PLSA,PHITS-DC represents the PHITS model com- bined with the discriminative content (DC)model in Yang et al.,2009a,PCL-DC represents the PCL model combined with DC.and C-PLDC refers to the combined popularity-driven link model and DC model [Yang et al.,2009al.Moreover,the setting for t in C-PLDC follows that in [Yang et al.,2009a.More specifically, (c)GLFM (d)GLFM (normalized) C-PLDC(t 1)denotes a special case of C-PLDC without popularity modeling [Yang et al.,2009a]. Figure 2:Illustration of the homophily and stochastic equivalence modeling in networks. 5.3 Illustration We sample a subset from Cora for illustration.The sam- the MM method for GLFM converges very fast.We set pled data set contains two classes.The learned latent the maximum number of iterations T as T=5 in all our representations U for the data instances are illustrated following experiments. in Figure 2,where the blue circle and red cross are used to denote the data instances from two different classes respectively,and the (directed)black edges are the ci- tation relationships between the data points.In Figure 2,(a)and(c)show the original learned latent factors of MLFM and GLFM,respectively,(b)and (d)show the corresponding normalized latent factors of MLFM and GLFM,respectively.Here normalization means that we divide the latent factor of each node by its length.Hence, it is clear to see that all the points in (b)and (d)have (a)Cora (b)Citeseer unit length.Note that for fair comparison all the differ- Figure 3:Convergence speed of GLFM. ent subfigures from (a)to (d)are generated automati- cally by our program with the same parameter settings and initial values. 5.5 Accuracy In (a)and (b)of Figure 2,two instances are more likely to be close if they are connected by or connect to We compare GLFM with all the baselines introduced in the same instance,which is just the feature of stochastic Section 5.2 in terms of NMI,PWF and Modu [Yang et equivalence.However,there exist many links across the al.,2009a;2009bl.The results are reported in Table 1, inner part of the circle in (b),which means that two from which we can see that GLFM achieves the best per- instances linked with each other are not necessarily close formance on all the data sets for the three criteria.Espe- in the latent space.This just violates the feature of cially for the Citeseer data set,GLFM dramatically out- homophily.Hence,we can conclude that MLFM cannot performs the second best model.According to the prior effectively model homophily in networks. knowledge,the paper citation networks are more likely In (c)and(d)of Figure 2,homophily is obvious since to exhibit homophily because the citations often exist two nodes are close to each other in general if there exists among papers from the same community.This can ex- a link between them. plain why GLFM can achieve such good performance on these data sets.Hence,GLFM provides a way to model 5.4 Convergence Speed networks which cannot be modeled well by MLFM. When D =20.the objective function values of GLFM Figure 4 shows the performance of GLFM when D against the iteration number T are plotted in Figure 3, takes different values.We see that GLFM is not sensitive from which we can see that our learning procedure with to D as long as D is not too small
analysis, the popularity-based conditional link model (PCL) [Yang et al., 2009b], and the normalized cut (NCUT) for spectral clustering [Shi and Malik, 2000]. The content-based methods include: the probabilistic latent semantic analysis (PLSA) [Hofmann, 1999], LDAWord, and NCUT respectively with the Gaussian RBF kernel and the probabilistic product (PP) kernel [Jebara et al., 2004]. The link+content based methods include: PHITSPLSA [Cohn and Hofmann, 2000], LDA-Link-Word [Erosheva et al., 2004], Link-Content-Factorization (LCF) [Zhu et al., 2007], NCUT, PCL-PLSA, PHITSDC, PCL-DC and C-PLDC [Yang et al., 2009a]. Here PCL-PLSA represents the combination of PCL and PLSA, PHITS-DC represents the PHITS model combined with the discriminative content (DC) model in [Yang et al., 2009a] , PCL-DC represents the PCL model combined with DC, and C-PLDC refers to the combined popularity-driven link model and DC model [Yang et al., 2009a]. Moreover, the setting for t in C-PLDC follows that in [Yang et al., 2009a]. More specifically, C-PLDC(t = 1) denotes a special case of C-PLDC without popularity modeling [Yang et al., 2009a]. 5.3 Illustration We sample a subset from Cora for illustration. The sampled data set contains two classes. The learned latent representations U for the data instances are illustrated in Figure 2, where the blue circle and red cross are used to denote the data instances from two different classes respectively, and the (directed) black edges are the citation relationships between the data points. In Figure 2, (a) and (c) show the original learned latent factors of MLFM and GLFM, respectively, (b) and (d) show the corresponding normalized latent factors of MLFM and GLFM, respectively. Here normalization means that we divide the latent factor of each node by its length. Hence, it is clear to see that all the points in (b) and (d) have unit length. Note that for fair comparison all the different subfigures from (a) to (d) are generated automatically by our program with the same parameter settings and initial values. In (a) and (b) of Figure 2, two instances are more likely to be close if they are connected by or connect to the same instance, which is just the feature of stochastic equivalence. However, there exist many links across the inner part of the circle in (b), which means that two instances linked with each other are not necessarily close in the latent space. This just violates the feature of homophily. Hence, we can conclude that MLFM cannot effectively model homophily in networks. In (c) and (d) of Figure 2, homophily is obvious since two nodes are close to each other in general if there exists a link between them. 5.4 Convergence Speed When D = 20, the objective function values of GLFM against the iteration number T are plotted in Figure 3, from which we can see that our learning procedure with −1.5 −1 −0.5 0 0.5 1 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 (a) MLFM −1 −0.5 0 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 (b) MLFM (normalized) −1.5 −1 −0.5 0 0.5 1 1.5 2 −1 −0.5 0 0.5 1 1.5 (c) GLFM −1 −0.5 0 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 (d) GLFM (normalized) Figure 2: Illustration of the homophily and stochastic equivalence modeling in networks. the MM method for GLFM converges very fast. We set the maximum number of iterations T as T = 5 in all our following experiments. 0 5 10 15 20 −7000 −6000 −5000 −4000 −3000 −2000 T Objective Function Value 0 5 10 15 20 −8000 −7000 −6000 −5000 −4000 −3000 −2000 T Objective Function Value (a) Cora (b) Citeseer Figure 3: Convergence speed of GLFM. 5.5 Accuracy We compare GLFM with all the baselines introduced in Section 5.2 in terms of NMI, PW F and Modu [Yang et al., 2009a; 2009b]. The results are reported in Table 1, from which we can see that GLFM achieves the best performance on all the data sets for the three criteria. Especially for the Citeseer data set, GLFM dramatically outperforms the second best model. According to the prior knowledge, the paper citation networks are more likely to exhibit homophily because the citations often exist among papers from the same community. This can explain why GLFM can achieve such good performance on these data sets. Hence, GLFM provides a way to model networks which cannot be modeled well by MLFM. Figure 4 shows the performance of GLFM when D takes different values. We see that GLFM is not sensitive to D as long as D is not too small
Table 1:Community detection performance on Cora and Citeseer data sets (the best performance is shown in bold face). Cora Citeseer Algorithm NMI PWF Modu NMI PWF Modu PHITS 0.0570 0.1894 0.3929 0.0101 0.1773 5 Link LDA-Link 0.0762 0.2278 0.2189 0.0356 0.2363 0.2211 P 0.0884 02055 0.5903 0.0315 01927 0.6436 NCUT 0.1715 0.2864 0.2701 0.1833 0.3252 0.6577 PLSA 0.2107 0.2864 0.2682 0.0965 0.2298 0.2885 Content LDA-Word 0.2g310 0.2774 0.2970 0.1342 02880 0.3022 NCUT(RBF kernel) 0.1317 0.2457 0.1839 0.0976 0.2386 0.2133 NCUT(pp kernel】 0.1804 0.2912 0.2487 0.1986 0D.3282 0.4802 PHITS-PLSA 0.3140 0.3526 0.3956 0.1188 0.2596 0.33863 LDA-Link-Word 0.3587 0.3969 0.457 0.1920 0.3045 0.5058 I¥F 0.1227 0.2456 0.1664 0.09334 0.236 0.2011 Link NCUT(RBF kernel) 0.2444 0.3062 0.3703 0.1592 0.2957 0.4280 NCUT(pp kernel) 0.3866 0.4214 0.5158 0.1986 0.3282 0.4802 Content PCL-PLSA 0.3900 0.4232 0.5503 0.2207 0.3334 0.5505 PHITS-DC 0.4359 0.4526 0.638 0.2062 0.3295 0.6117 PCL-DC 0.5123 05450 山.576 0.2921 03876 0.6857 C-PLDC(t=1) 0.4294 0.4264 0.5877 0.2303 0.3340 0.5530 ●-PL 0.4887 0.4638 0.6160 0.2756 0.3611 0.5582 MLFM 0.3640 03874 0.2325 0.2558 0.3356 0.0089 GLFM 0.5229 0.5545 0.7234 0.3951 0.5053 0.7563 [Den et al. 1977]A Der N Laird,and D Rubin.Maximum like incomplete data ia the EM algorithm Journal of the Royal Statistical Society,Series B.39(1):1-38,1977. [Erosheva et al.,2004]Elena Erosheva,Stephen Fienberg,and John Laf. e-相M ferty.Mixed membership models of scientific publications.In Proceedings of the National Academy of Sciences,volume 101,pages 5220-5227,2004. ardo nna Goldenberg,Alice X.Zheng, 。 Fienb erg,and ·A survey of statist Foundations and Trends in Machine Learning,2:129-233,2009. (a)Cora (b)Citeseer [Hoff et al.,2002]Peter D.Hoff,Adrian E.Raftery,and Mark S.Handcock. Figure 4:Sensitivity to the parameter D of GLFM. Latent space approaches to social network analysis.Journal of the Amer. ican Statistical Association,97(460):1090-1098,2002. Ho,200 8]Peter r D.Hoff.Modelin equivalence in symmetric relational dats.0s and stochastic 6 Conclusion [Hoff,2009]Peter D.Hoff.Multiplicative latent factor models for descrip- In this paper,a generalized latent factor model is pro- tion and prediction of social networks.Computational and Mathematical Organization Theory,15:261-272,2009. posed to model homophily in social networks.A linear- [Hofman a,1999]Thomas Hofmann.Probabilistic latent semantic ndexing. time learning algorithm with convergence guarantee is In SIGIR.1999. proposed to learn the parameters.Experimental results [Jebara et al.,2004]Tony Jebara,Risi Imre Kondor,and Andrew Howard. Probability product kernels.Journal of Machine Learning Research,5:819- on community detection in real-world networks show 844.2004. that our model can effectively model homophily to out- [Kemp et al..2006]Charles Ket n Joshua B.Tenenbaun as L.Grif perform state-of-the-art methods. ri Ueda Lea rning systems of concepts with an infinite relational model.In AAA/,2006 [Lang et al,2000]Kenneth Lang,David R.Hunter,and Ilsoon Yang.Op- Acknowledgments timization transfer using surrogate objective functions.Journal of Com- putational and Graphical Statistics,9,2000. Li is supported by a grant from the "project of Arts Sci- [Li and Yeu ng,2009 Wu-Jun Li and Dit-Yan Yeung.Relation regularized ence"of Shanghai Jiao Tong University (No.10JCY09).Ye- matrix factorization.In IJCAI,2009 ung is supported by General Research Fund 622209 from the [Li et al.,2009a]Wu-Jun Li,Dit-Yan Yeung,and Zhihua Zhang.Probabilis. Research Grants Council of Hong Kong.Zhang is supported tic relational PCA.In NPS.2009. by the NSFC (No.61070239),the 973 Program of China(No [Li et al..2009b]Wu-Jun Li,Zhihua Zhang.and Dit-Yan Yeung Latent Wishart processes for relational kernel learning.In AISTATS,2009. 2010CB327903),the Doctoral Program of Specialized Re- search Fund of Chinese Universities (No.20090101120066), INowicki and Snijders,2001 Krzyaztof Nowicki and Tom A.B.SnijdersEs mation and pred and the Fundamental Research Funds for the Central Uni- American Statistical Association,96:1077-1087,2001 versities. [Shi and Malik,2000]Jianbo Shi and Jitendra Malik.Normalized cuts and image segmentation.IEEE Trans.Pattern Anal.Mach.Intell.,22(8):888- 905.2000. References [Wasserman and Faust,1994]Stanley Wass erman and Katherine Faust.So- [Airoldi et al.,2008]Edoardo M.Airoldi,David M.Blei,Stephen E.Fien- cial Network Analysis:Methods and Applications.Cambridge Univeraity Prss.1994. and Eric P.Xing.Mixed membershit stochastic bloch [Yang et al.,2009a]Tianbao Yang,Rong Jin,Yun Chi,and Shenghuo Zhu. A Bayesian framework for community detedtion integrating content and (Bartholomew and Knott,1999]David J.Bartholomew and Martin Knott. link.In UA/.2009. Latent Variable Models and Factor Analysis.Kendall's Library of Statis- tics,7,second edition,1999. [Yang et al.,2009bl Tianbao Yang,Rong Jin,Yun Chi,and Shenghuo Zhu 2000]David Cohn and Huan Cha Combining link rning to proba bstically identify authoritative documents.In IC 2000. approach:In KDD:2009ent for community detection:a discriminative [Zhu et al.,2007]Shenghuo Zhu,Kai Yu,Yun Chi,and Yihong Gong.Com- (Cohn and Hofmann,2000]David A.Cohn and Thomas Hofmann.The bining content and link for classification using matrix factorization.In missing link-a probabilistic model of document content and hypertext S1G1R,2007. connectivity.In NIPS,2000
Table 1: Community detection performance on Cora and Citeseer data sets (the best performance is shown in bold face). Cora Citeseer Algorithm NM I PW F Modu NM I PW F Modu PHITS 0.0570 0.1894 0.3929 0.0101 0.1773 0.4588 Link LDA-Link 0.0762 0.2278 0.2189 0.0356 0.2363 0.2211 PCL 0.0884 0.2055 0.5903 0.0315 0.1927 0.6436 NCUT 0.1715 0.2864 0.2701 0.1833 0.3252 0.6577 PLSA 0.2107 0.2864 0.2682 0.0965 0.2298 0.2885 Content LDA-Word 0.2310 0.2774 0.2970 0.1342 0.2880 0.3022 NCUT(RBF kernel) 0.1317 0.2457 0.1839 0.0976 0.2386 0.2133 NCUT(pp kernel) 0.1804 0.2912 0.2487 0.1986 0.3282 0.4802 PHITS-PLSA 0.3140 0.3526 0.3956 0.1188 0.2596 0.3863 LDA-Link-Word 0.3587 0.3969 0.4576 0.1920 0.3045 0.5058 LCF 0.1227 0.2456 0.1664 0.0934 0.2361 0.2011 Link NCUT(RBF kernel) 0.2444 0.3062 0.3703 0.1592 0.2957 0.4280 + NCUT(pp kernel) 0.3866 0.4214 0.5158 0.1986 0.3282 0.4802 Content PCL-PLSA 0.3900 0.4233 0.5503 0.2207 0.3334 0.5505 PHITS-DC 0.4359 0.4526 0.6384 0.2062 0.3295 0.6117 PCL-DC 0.5123 0.5450 0.6976 0.2921 0.3876 0.6857 C-PLDC(t=1) 0.4294 0.4264 0.5877 0.2303 0.3340 0.5530 C-PLDC 0.4887 0.4638 0.6160 0.2756 0.3611 0.5582 MLFM 0.3640 0.3874 0.2325 0.2558 0.3356 0.0089 GLFM 0.5229 0.5545 0.7234 0.3951 0.5053 0.7563 10 20 30 40 50 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 D NMI PWF Modu 10 20 30 40 50 0.2 0.3 0.4 0.5 0.6 0.7 0.8 D NMI PWF Modu (a) Cora (b) Citeseer Figure 4: Sensitivity to the parameter D of GLFM. 6 Conclusion In this paper, a generalized latent factor model is proposed to model homophily in social networks. A lineartime learning algorithm with convergence guarantee is proposed to learn the parameters. Experimental results on community detection in real-world networks show that our model can effectively model homophily to outperform state-of-the-art methods. Acknowledgments Li is supported by a grant from the “project of Arts & Science” of Shanghai Jiao Tong University (No. 10JCY09). Yeung is supported by General Research Fund 622209 from the Research Grants Council of Hong Kong. Zhang is supported by the NSFC (No. 61070239), the 973 Program of China (No. 2010CB327903), the Doctoral Program of Specialized Research Fund of Chinese Universities (No. 20090101120066), and the Fundamental Research Funds for the Central Universities. References [Airoldi et al., 2008] Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9:1981–2014, 2008. [Bartholomew and Knott, 1999] David J. Bartholomew and Martin Knott. Latent Variable Models and Factor Analysis. Kendall’s Library of Statistics,7, second edition, 1999. [Cohn and Chang, 2000] David Cohn and Huan Chang. Learning to probabilistically identify authoritative documents. In ICML, 2000. [Cohn and Hofmann, 2000] David A. Cohn and Thomas Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS, 2000. [Dempster et al., 1977] A Dempster, N Laird, and D Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977. [Erosheva et al., 2004] Elena Erosheva, Stephen Fienberg, and John Lafferty. Mixed membership models of scientific publications. In Proceedings of the National Academy of Sciences, volume 101, pages 5220–5227, 2004. [Goldenberg et al., 2009] Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg, and Edoardo M. Airoldi. A survey of statistical network models. Foundations and Trends in Machine Learning, 2:129–233, 2009. [Hoff et al., 2002] Peter D. Hoff, Adrian E. Raftery, and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002. [Hoff, 2008] Peter D. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. In NIPS, 2008. [Hoff, 2009] Peter D. Hoff. Multiplicative latent factor models for description and prediction of social networks. Computational and Mathematical Organization Theory, 15:261–272, 2009. [Hofmann, 1999] Thomas Hofmann. Probabilistic latent semantic indexing. In SIGIR, 1999. [Jebara et al., 2004] Tony Jebara, Risi Imre Kondor, and Andrew Howard. Probability product kernels. Journal of Machine Learning Research, 5:819– 844, 2004. [Kemp et al., 2006] Charles Kemp, Joshua B. Tenenbaum, Thomas L. Grif- fiths, Takeshi Yamada, and Naonori Ueda. Learning systems of concepts with an infinite relational model. In AAAI, 2006. [Lang et al., 2000] Kenneth Lang, David R. Hunter, and Ilsoon Yang. Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9, 2000. [Li and Yeung, 2009] Wu-Jun Li and Dit-Yan Yeung. Relation regularized matrix factorization. In IJCAI, 2009. [Li et al., 2009a] Wu-Jun Li, Dit-Yan Yeung, and Zhihua Zhang. Probabilistic relational PCA. In NIPS, 2009. [Li et al., 2009b] Wu-Jun Li, Zhihua Zhang, and Dit-Yan Yeung. Latent Wishart processes for relational kernel learning. In AISTATS, 2009. [Nowicki and Snijders, 2001] Krzysztof Nowicki and Tom A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96:1077–1087, 2001. [Shi and Malik, 2000] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888– 905, 2000. [Wasserman and Faust, 1994] Stanley Wasserman and Katherine Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994. [Yang et al., 2009a] Tianbao Yang, Rong Jin, Yun Chi, and Shenghuo Zhu. A Bayesian framework for community detedtion integrating content and link. In UAI, 2009. [Yang et al., 2009b] Tianbao Yang, Rong Jin, Yun Chi, and Shenghuo Zhu. Combining link and content for community detection: a discriminative approach. In KDD, 2009. [Zhu et al., 2007] Shenghuo Zhu, Kai Yu, Yun Chi, and Yihong Gong. Combining content and link for classification using matrix factorization. In SIGIR, 2007