正在加载图片...
this extra term that enables GLFM to model homophily Learning U in networks,which will be detailed in Section 4.2 when To learn U,we optimize each row of it with all other we analyze the objective function in(7).This will also rows fixed.The gradient vector and Hessian matrix can be demonstrated empirically later in our experiments. be computed as follows: Based on(4),the likelihood of the observations can be defined as follows: p(A|U,V,川=IISt(1-5k)-4]2, (5) VT [AI-(Z. i≠k where exp(Θk) +U[A+A.-(Zi.)TZ Sik=1+exp(Θk)】 (6) Note that as in the conventional SNA model,we ignore 元8U.=--是∑{aau1-5UgU} 02L the diagonal elements of A.That is,in this paper,we set Aii=Zii=0 by default. Furthermore,we put normal priors on the param- eters4,U and V:p(u)=W(μ|0,T-1),p(U)= (ZaSud-5)+V+V. Π品1N(Udl0,I),pV)=Π品1N(Vdl0,. 4.2 Learning Because both the gradient vector and Hessian matrix depend on Si which is a function of Ui,we have to Although the Markov chain Monte Carlo (MCMC)al- resort to iterative methods to find the optimal values. gorithms designed for other latent variable models can Here,we devise a minorization-marimization (MM)al- easily be adapted for GLFM,we do not adopt MCMC gorithm Lang et al.,2000]to learn it.MM is a so-called here for GLFM because MCMC methods typically incur expectation-maximization (EM)algorithm Dempster et very high computational cost.In this paper,we adopt al.,1977 without missing data,alternating between con- the marimum a posteriori(MAP)estimation strategy to structing a concave lower bound of the objective function learn the parameters.The log posterior probability can and maximizing that bound. be computed as follows: Because 0<Sik<we can get Sik(1-Sik)<. L=∑{货4U.U.+4U.Vv.+4h Let us define: i≠k H=--{zaw+vru+v -Zik log[1+exp(Θk]} k.k≠ -嘉uun-vn-2+6 (7) k.k1 where c is a constant independent of the parameters. 82L Note that in (7)we assume that all existing links should It is easy to prove that auH. be observed.That is to say,if Aik=1,then Zik=1. Let The term AikUiUf,in (7)results from the extra term UiU.in (4).In (7),to maximize the objective fU)=LU.)+U.-U.创×U因 aL function L,we have to make UiUr as large as pos- sible if there exists a link between nodes i and k(i.e., Aik 1).This conforms to the property of homophily, +j[V.-V.(JH(@[U.-V.F, i.e.,a link is more likely to exist between two nodes with similar characteristics than between those nodes having where Ui.(t)denotes the value of the former iteration different characteristics.Note that here the latent fac- and Hi(t)is computed with the updated U except for tor Ui.reflects the characteristics of node i.Therefore, Uis the extra term UiU.in (4)enables GLFM to model Then we have the following theorem: homophily in networks. Theorem1L(Ui*)≥f(Ui,),which means that If we directly optimize all the parameters U,V and f(Uis)is a lower bound of L(Uis). u jointly,the computational cost will be very high.For example,if we want to use the second-order information, The proof of Theorem 1 is simple and we omit it here. generally we need to invert the Hessian matrix where the We can see that f(Ui)has a quadratic form of Ui. time complexity is cubic in the number of parameters. By setting the gradient of f(Ui)with respect to Ui.to Here,we adopt an alternating projection strategy to 0,we have the update rule for Ui.: maximize L.More specifically,each time we optimize one parameter,such as U,with the other parameters fixed. at+)=U-[0×Hthis extra term that enables GLFM to model homophily in networks, which will be detailed in Section 4.2 when we analyze the objective function in (7). This will also be demonstrated empirically later in our experiments. Based on (4), the likelihood of the observations can be defined as follows: p(A | U, V, µ) = Y i6=k [S Aik ik (1 − Sik) 1−Aik ] Zik , (5) where Sik = exp (Θik) 1 + exp(Θik) . (6) Note that as in the conventional SNA model, we ignore the diagonal elements of A. That is, in this paper, we set Aii = Zii = 0 by default. Furthermore, we put normal priors on the param￾eters µ, U and V: p(µ) = N (µ | 0, τ −1 ), p(U) = QD d=1 N (U∗d | 0, βI), p(V) = QD d=1 N (V∗d | 0, γI). 4.2 Learning Although the Markov chain Monte Carlo (MCMC) al￾gorithms designed for other latent variable models can easily be adapted for GLFM, we do not adopt MCMC here for GLFM because MCMC methods typically incur very high computational cost. In this paper, we adopt the maximum a posteriori (MAP) estimation strategy to learn the parameters. The log posterior probability can be computed as follows: L = X i6=k n1 2 AikUi∗UT k∗ + 1 2 AikUi∗VT k∗ + Aikµ − Zik log 1 + exp(Θik) o − 1 2β tr(UUT ) − 1 2γ tr(VVT ) − τ 2 µ 2 + c, (7) where c is a constant independent of the parameters. Note that in (7) we assume that all existing links should be observed. That is to say, if Aik = 1, then Zik = 1. The term AikUi∗UT k∗ in (7) results from the extra term Ui∗UT k∗ in (4). In (7), to maximize the objective function L, we have to make Ui∗UT k∗ as large as pos￾sible if there exists a link between nodes i and k (i.e., Aik = 1). This conforms to the property of homophily, i.e., a link is more likely to exist between two nodes with similar characteristics than between those nodes having different characteristics. Note that here the latent fac￾tor Ui∗ reflects the characteristics of node i. Therefore, the extra term Ui∗UT k∗ in (4) enables GLFM to model homophily in networks. If we directly optimize all the parameters U, V and µ jointly, the computational cost will be very high. For example, if we want to use the second-order information, generally we need to invert the Hessian matrix where the time complexity is cubic in the number of parameters. Here, we adopt an alternating projection strategy to maximize L. More specifically, each time we optimize one parameter, such as U, with the other parameters fixed. Learning U To learn U, we optimize each row of it with all other rows fixed. The gradient vector and Hessian matrix can be computed as follows: ∂L ∂UT i∗ = − 1 β UT i∗ + 1 2 VT h AT i∗ − (Zi∗ ◦ Si∗) T i + 1 2 UT h AT i∗ + A∗i − (Zi∗ ◦ Si∗) T − Z∗i ◦ S∗i i , ∂ 2L ∂UT i∗ ∂Ui∗ = − 1 β I − 1 4 X k,k6=i n ZkiSki(1 − Ski)UT k∗Uk∗ o − 1 4 X k,k6=i n ZikSik(1 − Sik)[Uk∗ + Vk∗] T [Uk∗ + Vk∗] o . Because both the gradient vector and Hessian matrix depend on Si∗ which is a function of Ui∗, we have to resort to iterative methods to find the optimal values. Here, we devise a minorization-maximization (MM) al￾gorithm [Lang et al., 2000] to learn it. MM is a so-called expectation-maximization (EM) algorithm [Dempster et al., 1977] without missing data, alternating between con￾structing a concave lower bound of the objective function and maximizing that bound. Because 0 < Sik < 1 2 , we can get Sik(1 − Sik) < 1 4 . Let us define: Hi = − 1 β I − 1 16 X k,k6=i n Zik[Uk∗ + Vk∗] T [Uk∗ + Vk∗] o − 1 16 X k,k6=i n ZkiUT k∗Uk∗ o . It is easy to prove that ∂ 2L ∂UT i∗ ∂Ui∗  Hi . Let f(Ui∗) =L(Ui∗(t)) + [Ui∗ − Ui∗(t)] × ∂L ∂UT i∗ (t) + 1 2 [Ui∗ − Ui∗(t)]Hi(t)[Ui∗ − Ui∗(t)]T , where Ui∗(t) denotes the value of the former iteration and Hi(t) is computed with the updated U except for Ui∗. Then we have the following theorem: Theorem 1 L(Ui∗) ≥ f(Ui∗), which means that f(Ui∗) is a lower bound of L(Ui∗). The proof of Theorem 1 is simple and we omit it here. We can see that f(Ui∗) has a quadratic form of Ui∗. By setting the gradient of f(Ui∗) with respect to Ui∗ to 0, we have the update rule for Ui∗: Ui∗(t + 1) = Ui∗(t) − h ∂L ∂UT i∗ (t) iT × Hi(t) −1 .
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有