正在加载图片...
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence(AAAI-10) Gaussian Process Latent Random Field Guoqiang Zhong',Wu-Jun Li,Dit-Yan Yeung,Xinwen Houf,Cheng-Lin Liut t National Laboratory of Pattern Recognition(NLPR), Institute of Automation.Chinese Academy of Sciences.Beijing 100190.China Department of Computer Science and Engineering.The Hong Kong University of Science and Technology, Clear Water Bay,Kowloon,Hong Kong,China gqzhong@nlpr.ia.ac.cn,[liwujun,dyyeung)@cse.ust.hk,[xwhou,liucl)@nlpr.ia.ac.cn Abstract Saul 2000)have been proposed.They can discover the low- The Gaussian process latent variable model(GPLVM) dimensional manifold structure of the data embedded in a is an unsupervised probabilistic model for nonlinear di- high-dimensional space.However,under situations with mensionality reduction.A supervised extension,called relatively sparse or noisy data sets,it was found that these discriminative GPLVM (DGPLVM),incorporates su- methods do not perform well(Geiger.Urtasun,and Darrell pervisory information into GPLVM to enhance the clas- 2009). sification performance.However,its limitation of the The Gaussian process latent variable model (GPLVM) latent space dimensionality to at most C-1 (C is (Lawrence 2005)is a fully probabilistic,nonlinear latent the number of classes)leads to unsatisfactorily perfor- variable model based on Gaussian processes (Rasmussen mance when the intrinsic dimensionality of the applica- tion is higher than C-1.In this paper,we propose a and Williams 2006).GPLVM can learn a nonlinear map- novel supervised extension of GPLVM,called Gaussian ping from the latent space to the observation space.It has process latent random field (GPLRF),by enforcing the achieved very promising performance in many real-world latent variables to be a Gaussian Markov random field applications,especially for situations with only a small num- with respect to a graph constructed from the supervisory ber of training examples,i.e.,sparse data sets.As pointed information.In GPLRF,the dimensionality of the latent out by Lawrence and Quinonero-Candela (2006).GPLVM space is no longer restricted to at most C-1.This can preserve the dissimilarity between points,which means makes GPLRF much more flexible than DGPLVM in that the points will be far apart in the latent space if they are applications.Experiments conducted on both synthetic far apart in the observation space.However,GPLVM can- and real-world data sets demonstrate that GPLRF per- forms comparably with DGPLVM and other state-of- not preserve the similarity between points,i.e.,the points that are close in the observation space are not necessarily the-art methods on data sets with intrinsic dimensional- ity at most C-1,and dramatically outperforms DG- close in the latent space.Typically,the points that are close PLVM on data sets when the intrinsic dimensionality in the observation space are expected to belong to the same exceeds C-1. class.Consequently,in GPLVM,there is no guarantee that data points from the same class are close in the latent space, Introduction which makes the learned latent representation not necessar- ily good for discriminative applications. In many artificial intelligence applications,one often has In many applications,we often have some supervisory in- to deal with high-dimensional data.Such data require di- formation such as class labels though the information may mensionality reduction to reveal the low-dimensional latent be rather limited.However,GPLVM is unsupervised in na- structure of the data so that the underlying tasks,such as ture.If we can explicitly incorporate the supervisory infor- visualization,classification and clustering,can benefit from mation into the learning procedure of GPLVM to make the it.Many dimensionality reduction methods have been pro- points from the same class close in the latent space,we can posed over the past few decades.Nevertheless,classical lin- obtain a more discriminative latent representation.To the ear dimensionality reduction methods such as principal com best of our knowledge,only one work,called discrimina- ponent analysis(PCA)(Joliffe 1986)and multidimensional tive GPLVM(DGPLVM)(Urtasun and Darrell 2007).has scaling (MDS)(Cox and Cox 2001)remain to be popular choices due to their simplicity and efficiency.However,they integrated supervisory information into the GPLVM frame- work.However,since DGPLVM is based on the linear dis- fail to discover nonlinear latent structure of the data in more criminant analysis (LDA)(Fukunnaga 1991)or generalized complex data sets.Starting from about a decade ago,a num- discriminant analysis (GDA)(Baudat and Anouar 2000)cri- ber of nonlinear manifold learning methods such as isomet- ric feature mapping (Isomap)(Tenenbaum,Silva,and Lang- terion,the dimensionality of the learned latent space in DGPLVM is restricted to at most C-1,where Cis the num- ford 2000)and locally linear embedding (LLE)(Roweis and ber of classes.For applications with intrinsic dimensionality Copyright c)2010,Association for the Advancement of Artificial equal to or higher than C,DGPLVM might not be able to de- Intelligence (www.aaai.org).All rights reserved. liver satisfactory performance.This will be verified by the 679Gaussian Process Latent Random Field Guoqiang Zhong† , Wu-Jun Li‡ , Dit-Yan Yeung‡ , Xinwen Hou† , Cheng-Lin Liu† † National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China ‡ Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China gqzhong@nlpr.ia.ac.cn, {liwujun, dyyeung}@cse.ust.hk, {xwhou, liucl}@nlpr.ia.ac.cn Abstract The Gaussian process latent variable model (GPLVM) is an unsupervised probabilistic model for nonlinear di￾mensionality reduction. A supervised extension, called discriminative GPLVM (DGPLVM), incorporates su￾pervisory information into GPLVM to enhance the clas￾sification performance. However, its limitation of the latent space dimensionality to at most C − 1 (C is the number of classes) leads to unsatisfactorily perfor￾mance when the intrinsic dimensionality of the applica￾tion is higher than C − 1. In this paper, we propose a novel supervised extension of GPLVM, called Gaussian process latent random field (GPLRF), by enforcing the latent variables to be a Gaussian Markov random field with respect to a graph constructed from the supervisory information. In GPLRF, the dimensionality of the latent space is no longer restricted to at most C − 1. This makes GPLRF much more flexible than DGPLVM in applications. Experiments conducted on both synthetic and real-world data sets demonstrate that GPLRF per￾forms comparably with DGPLVM and other state-of￾the-art methods on data sets with intrinsic dimensional￾ity at most C − 1, and dramatically outperforms DG￾PLVM on data sets when the intrinsic dimensionality exceeds C − 1. Introduction In many artificial intelligence applications, one often has to deal with high-dimensional data. Such data require di￾mensionality reduction to reveal the low-dimensional latent structure of the data so that the underlying tasks, such as visualization, classification and clustering, can benefit from it. Many dimensionality reduction methods have been pro￾posed over the past few decades. Nevertheless, classical lin￾ear dimensionality reduction methods such as principal com￾ponent analysis (PCA) (Joliffe 1986) and multidimensional scaling (MDS) (Cox and Cox 2001) remain to be popular choices due to their simplicity and efficiency. However, they fail to discover nonlinear latent structure of the data in more complex data sets. Starting from about a decade ago, a num￾ber of nonlinear manifold learning methods such as isomet￾ric feature mapping (Isomap) (Tenenbaum, Silva, and Lang￾ford 2000) and locally linear embedding (LLE) (Roweis and Copyright c 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Saul 2000) have been proposed. They can discover the low￾dimensional manifold structure of the data embedded in a high-dimensional space. However, under situations with relatively sparse or noisy data sets, it was found that these methods do not perform well (Geiger, Urtasun, and Darrell 2009). The Gaussian process latent variable model (GPLVM) (Lawrence 2005) is a fully probabilistic, nonlinear latent variable model based on Gaussian processes (Rasmussen and Williams 2006). GPLVM can learn a nonlinear map￾ping from the latent space to the observation space. It has achieved very promising performance in many real-world applications, especially for situations with only a small num￾ber of training examples, i.e., sparse data sets. As pointed out by Lawrence and Quinonero-Candela (2006), GPLVM ˜ can preserve the dissimilarity between points, which means that the points will be far apart in the latent space if they are far apart in the observation space. However, GPLVM can￾not preserve the similarity between points, i.e., the points that are close in the observation space are not necessarily close in the latent space. Typically, the points that are close in the observation space are expected to belong to the same class. Consequently, in GPLVM, there is no guarantee that data points from the same class are close in the latent space, which makes the learned latent representation not necessar￾ily good for discriminative applications. In many applications, we often have some supervisory in￾formation such as class labels though the information may be rather limited. However, GPLVM is unsupervised in na￾ture. If we can explicitly incorporate the supervisory infor￾mation into the learning procedure of GPLVM to make the points from the same class close in the latent space, we can obtain a more discriminative latent representation. To the best of our knowledge, only one work, called discrimina￾tive GPLVM (DGPLVM) (Urtasun and Darrell 2007), has integrated supervisory information into the GPLVM frame￾work. However, since DGPLVM is based on the linear dis￾criminant analysis (LDA) (Fukunnaga 1991) or generalized discriminant analysis (GDA) (Baudat and Anouar 2000) cri￾terion, the dimensionality of the learned latent space in DGPLVM is restricted to at most C−1, where C is the num￾ber of classes. For applications with intrinsic dimensionality equal to or higher than C, DGPLVM might not be able to de￾liver satisfactory performance. This will be verified by the 679 Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10)
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有