If we use Υ to denote the Gaussian no_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Probabilistic relational PCA

正在加载图片...

If we use T to denote the Gaussian noise process and assume that T and the latent variable matrix X follow these distributions: Y~Na,v(0,a2Ia⑧Iw),XNg,N(0,Ig⑧IN), (1) we can express a generative model as follows:T=WX+ueT+T. Based on some properties of matrix variate normal distributions in [11],we get the following results: TIX~Na.N(WX+ueT,o2Ia@IN), T~Na.N (ue,(WwT+o2Ia)IN).(2) Let C=WWT+o2Ia.The corresponding log-likelihood of the observation matrix T is then C=Inp(T)=-din(2x)+InCl+tr(C-IS), (3) where S=(T-LeT)(T-ueT)=T.-)(T.-) -We can see that S is just the sample covariance matrix of the content observations.It is easy to see that this log-likelihood form is the same as that in [21].Using matrix notations,the graphical model of PPCA based on matrix variate normal distributions is shown in Figure 1(a). (a)Model of PPCA (b)Model of PRPCA Figure 1:Graphical models of PPCA and PRPCA,in which T is the observation matrix,X is the latent variable matrix,u,W and o are the parameters to learn,and the other quantities are kept constant. 4 Probabilistic relational pca PPCA assumes that all the observations are independent and identically distributed.Although this i.i.d.assumption can make the modeling process much simpler and has achieved great success in many traditional applications,this assumption is however very unreasonable for relational data [10]. In relational data,the attributes of connected (linked)instances are often correlated. In this section,a probabilistic relational PCA model,called PRPCA,is proposed to integrate both the relational information and the content information seamlessly into a unified framework by elim- inating the i.i.d.assumption.Based on our reformulation of PPCA using matrix variate notations as presented in the previous section,we can obtain PRPCA just by introducing some relatively simple (but very effective)modifications.A promising property is that the computation needed for PRPCA is as simple as that for PPCA even though we have eliminated the restrictive i.i.d.assumption. 4.1 Model Formulation Assume that the latent variable matrix X has the following distribution: XWg.N(0,Ig⑧Φ) (4) According to Corollary2.3.3.1in[ll],we can get cov(X*)=Φ(i∈{1，.，q}),which means that actually reflects the covariance between the instances.From(1),we can see that cov(Xi)= IN for PPCA,which also coincides with the i.i.d.assumption of PPCA. Hence,to eliminate the i.i.d.assumption for relational data,one direct way is to use a non-identity covariance matrix for the distribution of X in (4).This should reflect the physical meaning (semantics)of the relations between instances,which will be discussed in detail later.Similarly,we can also change the INin(1)to for T to eliminate the i.i.d.assumption for the noise process. 4.1.1 Relational Covariance Construction Because the covariance matrix in PRPCA is constructed from the relational information in the data,we refer to it as relational covariance here. The goal of PCA and PPCA is to find those principal axes onto which the retained variance under projection is maximal [13,21].For one specific X,the retained variance is tr[XXT].If we rewrite p(X)in (1)as p()we have the following observation: (2T)9N72 (2x)9N72If we use Υ to denote the Gaussian noise process and assume that Υ and the latent variable matrix X follow these distributions: Υ ∼ Nd,N (0, σ2 Id ⊗ IN ), X ∼ Nq,N (0, Iq ⊗ IN ), (1) we can express a generative model as follows: T = WX + µe T + Υ. Based on some properties of matrix variate normal distributions in [11], we get the following results: T | X ∼ Nd,N (WX + µe T , σ2 Id ⊗ IN ), T ∼ Nd,N µe T ,(WWT + σ 2 Id) ⊗ IN . (2) Let C = WWT + σ 2 Id. The corresponding log-likelihood of the observation matrix T is then L = ln p(T) = − N 2 h d ln(2π) + ln |C| + tr(C−1S) i , (3) where S = (T−µe T )(T−µe T ) T N = PN n=1(T∗n−µ)(T∗n−µ) T N . We can see that S is just the sample covariance matrix of the content observations. It is easy to see that this log-likelihood form is the same as that in [21]. Using matrix notations, the graphical model of PPCA based on matrix variate normal distributions is shown in Figure 1(a). T X Iq W σ 2 IN µ T X Iq W σ 2 ∆−1 µ (a) Model of PPCA (b) Model of PRPCA Figure 1: Graphical models of PPCA and PRPCA, in which T is the observation matrix, X is the latent variable matrix, µ, W and σ 2 are the parameters to learn, and the other quantities are kept constant. 4 Probabilistic Relational PCA PPCA assumes that all the observations are independent and identically distributed. Although this i.i.d. assumption can make the modeling process much simpler and has achieved great success in many traditional applications, this assumption is however very unreasonable for relational data [10]. In relational data, the attributes of connected (linked) instances are often correlated. In this section, a probabilistic relational PCA model, called PRPCA, is proposed to integrate both the relational information and the content information seamlessly into a unified framework by eliminating the i.i.d. assumption. Based on our reformulation of PPCA using matrix variate notations as presented in the previous section, we can obtain PRPCA just by introducing some relatively simple (but very effective) modifications. A promising property is that the computation needed for PRPCA is as simple as that for PPCA even though we have eliminated the restrictive i.i.d. assumption. 4.1 Model Formulation Assume that the latent variable matrix X has the following distribution: X ∼ Nq,N (0, Iq ⊗ Φ). (4) According to Corollary 2.3.3.1 in [11], we can get cov(Xi∗) = Φ (i ∈ {1, . . . , q}), which means that Φ actually reflects the covariance between the instances. From (1), we can see that cov(Xi∗) = IN for PPCA, which also coincides with the i.i.d. assumption of PPCA. Hence, to eliminate the i.i.d. assumption for relational data, one direct way is to use a non-identity covariance matrix Φ for the distribution of X in (4). This Φ should reflect the physical meaning (semantics) of the relations between instances, which will be discussed in detail later. Similarly, we can also change the IN in (1) to Φ for Υ to eliminate the i.i.d. assumption for the noise process. 4.1.1 Relational Covariance Construction Because the covariance matrix Φ in PRPCA is constructed from the relational information in the data, we refer to it as relational covariance here. The goal of PCA and PPCA is to find those principal axes onto which the retained variance under projection is maximal [13, 21]. For one specific X, the retained variance is tr[XXT ]. If we rewrite p(X) in (1) as p(X) = exp{tr[− 1 2 XXT ]} (2π) qN/2 = exp{− 1 2 tr[XXT ]} (2π) qN/2 , we have the following observation: 3

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Probabilistic relational PCA