正在加载图片...
Sparse Probabilistic Relational Projection Wu-Jun Lif and Dit-Yan Yeung t Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering,Shanghai Jiao Tong University,China Department of Computer Science and Engineering Hong Kong University of Science and Technology,Hong Kong,China liwujunecs.situ.edu.cn,dyyeungecse.ust.hk Abstract PPCA are proposed by putting some sparsity-inducing pri- ors such as the Jeffreys prior on the projection matrix. Probabilistic relational PCA (PRPCA)can learn a pro- jection matrix to perform dimensionality reduction for All the variants of PCA and sparse PCA mentioned above relational data.However.the results learned by PRPCA assume that the instances are independent and identically lack interpretability because each principal component distributed (i.i.d.).Hence.they are not suitable for model- is a linear combination of all the original variables.In ing relational data(Getoor and Taskar 2007;Li and Yeung this paper,we propose a novel model,called sparse 2009;Li,Zhang,and Yeung 2009;Li 2010:Li and Yeung probabilistic relational projection (SPRP).to learn a 2011;Li,Yeung,and Zhang 2011)in which the instances are sparse projection matrix for relational dimensionality not i.i.d.In relational data.besides the content information, reduction.The sparsity in SPRP is achieved by impos- there also exist links (i.e.,relations)between the instances ing on the projection matrix a sparsity-inducing prior in the data.The attributes of the linked instances are often such as the Laplace prior or Jeffreys prior.We propose correlated rather than i.i.d.(Li,Yeung,and Zhang 2009). an expectation-maximization(EM)algorithm to learn the parameters of SPRP.Compared with PRPCA,the One typical example of relational data is a collection of re- sparsity in SPRP not only makes the results more inter- search papers which contain both paper content and citations pretable but also makes the projection operation much between the papers.The existence of a citation relation be- more efficient without compromising its accuracy.All tween two papers often implies that they are about the same these are verified by experiments conducted on several topic.In (Li,Yeung,and Zhang 2009),probabilistic rela- real applications tional PCA(PRPCA),which extends PPCA by eliminating the ii.d.assumption,is proposed to perform dimensional- ity reduction for relational data.By explicitly modeling the Introduction covariance between instances,PRPCA dramatically outper- Principal component analysis (PCA)(Jolliffe 2002)and forms PCA and PPCA.However,as in PCA and PPCA.the probabilistic PCA (PPCA)(Tipping and Bishop 1999)are results learned by PRPCA also lack interpretability. very popular dimensionality reduction methods which have In this paper,we propose a novel model,called sparse been widely used to explore the structure of a high- probabilistic relational projection(SPRP),to learn a sparse dimensional data set by mapping the data set into a low- projection matrix for relational dimensionality reduction. dimensional space via a projection (or called transforma- Compared with PRPCA,the sparsity in SPRP not only tion)matrix.However,it is difficult to interpret the re- makes the results more interpretable but also makes the pro- sults of PCA and PPCA because each principal compo- jection operation much more efficient without compromis- nent is a linear combination of all the original variables. ing its accuracy. To achieve interpretability,some sparse versions of PCA or PPCA have been proposed by enforcing many entries of Notation the projection matrix to go to zero.Sparse PCA(SPCA) (Zou.Hastie,and Tibshirani 2006)first reformulates PCA For the convenience of presentation and comparison,we as a regression-type optimization problem and then applies adopt the same notation as that in (Li,Yeung,and Zhang the elastic net (Zou and Hastie 2005)constraint on the re- 2009).More specifically,we use boldface lowercase letters, gression coefficients to get a sparse projection matrix.In such as v.to denote vectors and v;to denote the ith ele- (Sigg and Buhmann 2008),sparsity is achieved by putting ment of v.Boldface uppercase letters,such as F,are used a 1-norm (constraint on the projection matrix during the to denote matrices,with the ith row and the jth column of expectation-maximization(EM)(Dempster,Laird,and Ru- F denoted by Fi and F.j,respectively.Fj is the element bin 1977)learning procedure of PPCA.In (Archambeau and at the ith row and jth column of F.We use F to denote Bach 2008)and (Guan and Dy 2009),sparse versions of the determinant of a matrix F,tr(F)to denote its trace,FT Copyright C)2012,Association for the Advancement of Artificial As in (Li,Yeung,and Zhang 2009),we use the term 'content Intelligence (www.aaai.org).All rights reserved. information'to refer to the feature vectors describing the instances.Sparse Probabilistic Relational Projection Wu-Jun Li† and Dit-Yan Yeung‡ † Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong University, China ‡ Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China liwujun@cs.sjtu.edu.cn, dyyeung@cse.ust.hk Abstract Probabilistic relational PCA (PRPCA) can learn a pro￾jection matrix to perform dimensionality reduction for relational data. However, the results learned by PRPCA lack interpretability because each principal component is a linear combination of all the original variables. In this paper, we propose a novel model, called sparse probabilistic relational projection (SPRP), to learn a sparse projection matrix for relational dimensionality reduction. The sparsity in SPRP is achieved by impos￾ing on the projection matrix a sparsity-inducing prior such as the Laplace prior or Jeffreys prior. We propose an expectation-maximization (EM) algorithm to learn the parameters of SPRP. Compared with PRPCA, the sparsity in SPRP not only makes the results more inter￾pretable but also makes the projection operation much more efficient without compromising its accuracy. All these are verified by experiments conducted on several real applications. Introduction Principal component analysis (PCA) (Jolliffe 2002) and probabilistic PCA (PPCA) (Tipping and Bishop 1999) are very popular dimensionality reduction methods which have been widely used to explore the structure of a high￾dimensional data set by mapping the data set into a low￾dimensional space via a projection (or called transforma￾tion) matrix. However, it is difficult to interpret the re￾sults of PCA and PPCA because each principal compo￾nent is a linear combination of all the original variables. To achieve interpretability, some sparse versions of PCA or PPCA have been proposed by enforcing many entries of the projection matrix to go to zero. Sparse PCA (SPCA) (Zou, Hastie, and Tibshirani 2006) first reformulates PCA as a regression-type optimization problem and then applies the elastic net (Zou and Hastie 2005) constraint on the re￾gression coefficients to get a sparse projection matrix. In (Sigg and Buhmann 2008), sparsity is achieved by putting a 1-norm (L1) constraint on the projection matrix during the expectation-maximization (EM) (Dempster, Laird, and Ru￾bin 1977) learning procedure of PPCA. In (Archambeau and Bach 2008) and (Guan and Dy 2009), sparse versions of Copyright c 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. PPCA are proposed by putting some sparsity-inducing pri￾ors such as the Jeffreys prior on the projection matrix. All the variants of PCA and sparse PCA mentioned above assume that the instances are independent and identically distributed (i.i.d.). Hence, they are not suitable for model￾ing relational data (Getoor and Taskar 2007; Li and Yeung 2009; Li, Zhang, and Yeung 2009; Li 2010; Li and Yeung 2011; Li, Yeung, and Zhang 2011) in which the instances are not i.i.d. In relational data, besides the content information,1 there also exist links (i.e., relations) between the instances in the data. The attributes of the linked instances are often correlated rather than i.i.d. (Li, Yeung, and Zhang 2009). One typical example of relational data is a collection of re￾search papers which contain both paper content and citations between the papers. The existence of a citation relation be￾tween two papers often implies that they are about the same topic. In (Li, Yeung, and Zhang 2009), probabilistic rela￾tional PCA (PRPCA), which extends PPCA by eliminating the i.i.d. assumption, is proposed to perform dimensional￾ity reduction for relational data. By explicitly modeling the covariance between instances, PRPCA dramatically outper￾forms PCA and PPCA. However, as in PCA and PPCA, the results learned by PRPCA also lack interpretability. In this paper, we propose a novel model, called sparse probabilistic relational projection (SPRP), to learn a sparse projection matrix for relational dimensionality reduction. Compared with PRPCA, the sparsity in SPRP not only makes the results more interpretable but also makes the pro￾jection operation much more efficient without compromis￾ing its accuracy. Notation For the convenience of presentation and comparison, we adopt the same notation as that in (Li, Yeung, and Zhang 2009). More specifically, we use boldface lowercase letters, such as v, to denote vectors and vi to denote the ith ele￾ment of v. Boldface uppercase letters, such as F, are used to denote matrices, with the ith row and the jth column of F denoted by Fi∗ and F∗j , respectively. Fij is the element at the ith row and jth column of F. We use |F| to denote the determinant of a matrix F, tr(F) to denote its trace, F T 1As in (Li, Yeung, and Zhang 2009), we use the term ‘content information’ to refer to the feature vectors describing the instances
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有