Observation 1 For PPCA, the larger th_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Probabilistic relational PCA

正在加载图片...

Observation 1 For PPCA,the larger the retained variance of X,i.e.,the more X approaches the destination point,the lower is the probability density at x given by the prior. Here,the destination point refers to the point where the goal of PPCA is achieved,i.e.,the retained variance is maximal.Moreover,we use the retained variance as a measure to define the gap between two different points.The smaller is the gap between the retained variance of two points,the more they approach each other. Because the design principle of PRPCA is similar to that of PPCA,our working hypothesis here is that Observation 1 can also guide us to design the relational covariance of PRPCA.Its effectiveness will be empirically verified in Section 5. In PRPCA,we assume that the attributes of two linked instances are positively correlated.2 Under this assumption,the ideal goal of PRPCA should be to make the latent representations of two in- stances as close as possible if there exists a relation (link)between them.Hence,the measure to define the gap between two points refers to the closeness of the linked instances,i.e.,the summation of the Euclidean distances between the linked instances.Based on Observation 1,the more X ap- proaches the destination point,the lower should be the probability density at X given by the prior. Hence,under the latent space representation X,the closer the linked instances are,the lower should be the probability density at X given by the prior.We will prove that if we setA-1where AIN+(IN +A)T(IN +A)with y being typically a very small positive number to make A >0,we can get an appropriate prior for PRPCA.Note that Aj=1 if there exists a relation between instances i and j,and otherwise Aij=0.Because AT=A,we can also express A as △=IN+(IN+A)(Iw+A) Let D denote a diagonal matrix whose diagonal elements Di=>;Aij.It is easy to prove that (AA)=Di.Let B=AA-D,which means that Bi =(AA)ii ifi j and Bii =0.We can get△=(1+Iw+2A+AA=(1+)IN+D+(2A+B).Because B=∑1Ak4 fori≠ j,we can see that Bij is the number of paths,each with path length 2,from instance i to instance j in the original adjacency graph A.Because the attributes of two linked instances are positively correlated,Bij actually reflects the degree of correlation between instance i and instance j.Let us take the paper citation graph as an example to illustrate this.The existence of a citation relation between two papers often implies that they are about the same topic.If paper i cites paper k and paper k cites paperj,it is highly likely that paper i and paper j are about the same topic.If there exists another paper ak linking both paper i and paper j as well,the confidence that paper i and paper j are about the same topic will increase.Hence,the larger B:;is,the stronger is the correlation between instance i and instance j.Because BAkAATA.j,Bj can also be seen as the similarity between the link vectors of instance i and instance j.Therefore,B can be seen as a weight matrix(corresponding to a weight graph)derived from the original adjacency matrix A,and B is also consistent with the physical meaning underlying A. Letting G =2A+B,3 we can find that G actually combines the original graph reflected by A and the derived graph reflected by B to get a new graph,and puts a weight 2Ai;+Bii on the edge between instance i and instance j in the new graph.The new weight graph reflected by G is also consistent with the physical meaning underlying A.Letting LD-G,where D is a diagonal matrix whose diagonal elements Dii=>,Gij and L is called the Laplacian matrix [6]of G,we can get A=(1+)IN+D+D-L.If we define another diagonal matrix D(1+)IN+D+D. we can get A =D-L.Then we have trK△xT]=∑DalX2- ll-x.j2 (5) 2Links with other physical meanings,such as the directed links in web graphs [25],can be transformed into links satisfying the assumption in PRPCA via some preprocessing strategies.One such strategy to preprocess the WebKB data set [8]will be given as an example in Section 5. 3This means that we put a 2:1 ratio between A and B.Other ratios can be obtained by setting A= yIN +(aIN +A)(aIN +A)=IN aIN +20A+B.Preliminary results show that PRPCA is not sensitive to o as long as a is not too large,but we omit the detailed results here because they are out of the scope of this paper.Observation 1 For PPCA, the larger the retained variance of X, i.e., the more X approaches the destination point, the lower is the probability density at X given by the prior. Here, the destination point refers to the point where the goal of PPCA is achieved, i.e., the retained variance is maximal. Moreover, we use the retained variance as a measure to define the gap between two different points. The smaller is the gap between the retained variance of two points, the more they approach each other. Because the design principle of PRPCA is similar to that of PPCA, our working hypothesis here is that Observation 1 can also guide us to design the relational covariance of PRPCA. Its effectiveness will be empirically verified in Section 5. In PRPCA, we assume that the attributes of two linked instances are positively correlated.2 Under this assumption, the ideal goal of PRPCA should be to make the latent representations of two instances as close as possible if there exists a relation (link) between them. Hence, the measure to define the gap between two points refers to the closeness of the linked instances, i.e., the summation of the Euclidean distances between the linked instances. Based on Observation 1, the more X approaches the destination point, the lower should be the probability density at X given by the prior. Hence, under the latent space representation X, the closer the linked instances are, the lower should be the probability density at X given by the prior. We will prove that if we set Φ = ∆−1 where ∆ , γIN + (IN + A) T (IN + A) with γ being typically a very small positive number to make ∆ 0, we can get an appropriate prior for PRPCA. Note that Aij = 1 if there exists a relation between instances i and j, and otherwise Aij = 0. Because AT = A, we can also express ∆ as ∆ = γIN + (IN + A)(IN + A). Let D˜ denote a diagonal matrix whose diagonal elements D˜ ii = P j Aij . It is easy to prove that (AA)ii = D˜ ii. Let B = AA − D˜ , which means that Bij = (AA)ij if i 6= j and Bii = 0. We can get ∆ = (1+γ)IN +2A+AA = (1+γ)IN +D˜ +(2A+B). Because Bij = PN k=1 AikAkj for i 6= j, we can see that Bij is the number of paths, each with path length 2, from instance i to instance j in the original adjacency graph A. Because the attributes of two linked instances are positively correlated, Bij actually reflects the degree of correlation between instance i and instance j. Let us take the paper citation graph as an example to illustrate this. The existence of a citation relation between two papers often implies that they are about the same topic. If paper i cites paper k and paper k cites paper j, it is highly likely that paper i and paper j are about the same topic. If there exists another paper a 6= k linking both paper i and paper j as well, the confidence that paper i and paper j are about the same topic will increase. Hence, the larger Bij is, the stronger is the correlation between instance i and instance j. Because Bij = PN k=1 AikAkj = AT ∗iA∗j , Bij can also be seen as the similarity between the link vectors of instance i and instance j. Therefore, B can be seen as a weight matrix (corresponding to a weight graph) derived from the original adjacency matrix A, and B is also consistent with the physical meaning underlying A. Letting G = 2A + B, 3 we can find that G actually combines the original graph reflected by A and the derived graph reflected by B to get a new graph, and puts a weight 2Aij + Bij on the edge between instance i and instance j in the new graph. The new weight graph reflected by G is also consistent with the physical meaning underlying A. Letting L , D − G, where D is a diagonal matrix whose diagonal elements Dii = P j Gij and L is called the Laplacian matrix [6] of G, we can get ∆ = (1+γ)IN +D˜ +D−L. If we define another diagonal matrix Dˆ , (1+γ)IN +D˜ +D, we can get ∆ = Dˆ − L. Then we have tr[X∆XT ] = X N i=1 Dˆ iikX∗ik 2 − 1 2 X N i=1 X N j=1 GijkX∗i − X∗jk 2 . (5) 2Links with other physical meanings, such as the directed links in web graphs [25], can be transformed into links satisfying the assumption in PRPCA via some preprocessing strategies. One such strategy to preprocess the WebKB data set [8] will be given as an example in Section 5. 3This means that we put a 2:1 ratio between A and B. Other ratios can be obtained by setting ∆ = γIN + (αIN + A)(αIN + A) = γIN + α 2 IN + 2αA + B. Preliminary results show that PRPCA is not sensitive to α as long as α is not too large, but we omit the detailed results here because they are out of the scope of this paper. 4

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Probabilistic relational PCA