4.2.1 Closed-Form Solution Theorem 1 _中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Probabilistic relational PCA

正在加载图片...

4.2.1 Closed-Form Solution Theorem 1 The log-likelihood in (7)is maximized when d WML Ug(Ag-OMLIg)12R OML - 9+1 d-q where入1≥入2≥..≥入are the eigenvalues of H,Ag is aq×q diagonal matrix containing the first q largest eigenvalues.U is a d x q matrix in which the q column vectors are the principal eigenvectors of H corresponding to Ag,and R is an arbitrary q x q orthogonal rotation matrix. The proof of Theorem I makes use of techniques similar to those in Appendix A of [21]and is omitted here. 4.2.2 EM Algorithm During the EM learning process,we treat {W,o2}as parameters,X as missing data and {T,X as complete data.The EM algorithm operates by alternating between the E-step and M-step.Here we only briefly describe the updating rules and their derivation can be found in a longer version which can be downloaded from http://www.cse.ust.hk/~liwujun. In the E-step,the expectation of the complete-data log-likelihood with respect to the distribution of the missing data X is computed.To compute the expectation of the complete-data log-likelihood, we only need to compute the following sufficient statistics: (X)=M-1WT(T-ueT), (X△X)=Na2M-1+(X)△(X)T, (8) where M=WTW+o21.Note that all these statistics are computed based on the parameter values obtained from the previous iteration. In the M-step,to maximize the expectation of the complete-data log-likelihood,the parameters {W,o2)are updated as follows: W=HW(o2I+M-WTHW)-1, a2tr(H-HWM-WT) d (9) Note that we use W here to denote the old value and W for the updated new value. 4.3 Complexity Analysis Suppose there are 6 nonzero elements in A.We can see that the computation cost for H is O(dN+do).In many applications 6 is typically a constant multiple of N.Hence,we can say that the time complexity for computing H is O(dN).For the closed-form solution,we have to in- vert a d x d matrix.Hence,the computation cost is O(dN+d3).For EM,because d is typically larger than g,we can see that the computation cost is O(dN+d2gT),where T is the number of EM iterations.If the data are of very high dimensionality,EM will be more efficient than the closed-form solution. 5 Experiments Although PPCA possesses additional advantages when compared with the original non-probabilistic formulation of PCA,they will get similar DR results when there exist no missing values in the data. If the task is to classify instances in the low-dimensional embedding,the classifiers based on the embedding results of PCA and PPCA are expected to achieve comparable results.Hence,in this paper,we only adopt PCA as the baseline to study the performance of PRPCA.For the EM algorithm of PRPCA,we use PCA to initialize W,o2 is initialized to 10-6,and y=10-6.Because the EM algorithm and the closed-form solution achieve similar results,we only report the results of the EM algorithm of PRPCA in the following experiments. 5.1 Data Sets and Evaluation Scheme Here,we only briefly describe the data sets and evaluation scheme for space saving.More detailed information about them can be found in the longer version. 64.2.1 Closed-Form Solution Theorem 1 The log-likelihood in (7) is maximized when WML = Uq(Λq − σ 2 MLIq) 1/2R, σ2 ML = Pd i=q+1 λi d − q , where λ1 ≥ λ2 ≥ · · · ≥ λd are the eigenvalues of H, Λq is a q × q diagonal matrix containing the first q largest eigenvalues, Uq is a d × q matrix in which the q column vectors are the principal eigenvectors of H corresponding to Λq, and R is an arbitrary q × q orthogonal rotation matrix. The proof of Theorem 1 makes use of techniques similar to those in Appendix A of [21] and is omitted here. 4.2.2 EM Algorithm During the EM learning process, we treat {W, σ2} as parameters, X as missing data and {T, X} as complete data. The EM algorithm operates by alternating between the E-step and M-step. Here we only briefly describe the updating rules and their derivation can be found in a longer version which can be downloaded from http://www.cse.ust.hk/∼liwujun. In the E-step, the expectation of the complete-data log-likelihood with respect to the distribution of the missing data X is computed. To compute the expectation of the complete-data log-likelihood, we only need to compute the following sufficient statistics: hXi = M−1WT (T − µe T ), hX∆XT i = Nσ2M−1 + hXi∆hXi T , (8) where M = WTW + σ 2 Iq. Note that all these statistics are computed based on the parameter values obtained from the previous iteration. In the M-step, to maximize the expectation of the complete-data log-likelihood, the parameters {W, σ2} are updated as follows: Wf = HW(σ 2 Iq + M−1WT HW) −1 , σe 2 = tr(H − HWM−1WfT ) d . (9) Note that we use W here to denote the old value and Wf for the updated new value. 4.3 Complexity Analysis Suppose there are δ nonzero elements in ∆. We can see that the computation cost for H is O(dN + dδ). In many applications δ is typically a constant multiple of N. Hence, we can say that the time complexity for computing H is O(dN). For the closed-form solution, we have to invert a d × d matrix. Hence, the computation cost is O(dN + d 3 ). For EM, because d is typically larger than q, we can see that the computation cost is O(dN + d 2 qT), where T is the number of EM iterations. If the data are of very high dimensionality, EM will be more efficient than the closed-form solution. 5 Experiments Although PPCA possesses additional advantages when compared with the original non-probabilistic formulation of PCA, they will get similar DR results when there exist no missing values in the data. If the task is to classify instances in the low-dimensional embedding, the classifiers based on the embedding results of PCA and PPCA are expected to achieve comparable results. Hence, in this paper, we only adopt PCA as the baseline to study the performance of PRPCA. For the EM algorithm of PRPCA, we use PCA to initialize W, σ 2 is initialized to 10−6 , and γ = 10−6 . Because the EM algorithm and the closed-form solution achieve similar results, we only report the results of the EM algorithm of PRPCA in the following experiments. 5.1 Data Sets and Evaluation Scheme Here, we only briefly describe the data sets and evaluation scheme for space saving. More detailed information about them can be found in the longer version. 6

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Probabilistic relational PCA