正在加载图片...
cosine b aseline score has been coarsely optimized by modeling. In Proceedings of ICASSP98, v hand, MED, CRAN: A=1/ 2, CACM, CISI: A= 2/3 677-80.1998 of PLSI over LSI. Substantial performance gains have [2]N Coccaro and D Jurafsky. Towards better inte- The experiments consistently validate the advantages gration of semantic predictors in statistical lai been achieved for all 4 data sets. Notice that the rel a e In Proceedings of ICSLP-98 tive precision gain comp ared to the b aseline method o in the most interesting interme- 1998. to appear ca diate regime of recall! In particul ar, PLSI works well Deerwester s.T Dumais G.w furnas. lan- even in cases where LSI fails completely(these prob- dauer. T.K., and R Harshman. Indexing by la- ems of LSI are in accordance with the original results tent semantic analysis. Journal of the american reported in 3). The benefits of model combination ociety for Information Science, 41, 1990 are also very substantial cases combined model performed bet ter than the best single 4]AP. Dempster, N.M. Laird, and D B. Rubin model. As a sight-effect model averaging also deliber Maximum likelihood from incomplete data via the ated from selecting the correct mo del dimensionality MM algorithm. J. Royal Stat ist. Soc. B, 39: 1-38 These experiments demonstrate that the advant ages of PLSA over standard LSA are not restricted to appl 5 P.W. Foltz and S. T. Dumais. An analysis of in- ns with performance criteria directly depending formation filtering methods. Communications on the perplexity. Statistical ob ective functions like the acm,35(12)51-60,1992 eral yardstick for anal ysis methods in text learning and [6]T. Hofmann. Prob abilistic latent semantic index- the perplexity(log-likelihood )may thus provide a gen ng. In Proceedings of SIGIR 99, 199 experiment on the MED data, where both, perplexity [7 T Hofmann, J Puzicha, and M. I Jordan. Unsu- and average precision, have been moni tored simult a- pervised learning from dy adic dat a In Advances neously as a function of B. The resulting curves which in Neural Information Processing Systems, vol show a striking correl ation are plotted in Figure 7 ume 11. MIr Press. 1999 Conclusion [8T. K. Landauer and S.T. Dumais. A solution to Plato s problem: The latent semantic anal We have proposed a novel method for unsupervised ysis theory of acquisition, induction, and rep- called Proba bilistic late nt Semantic a sis. which is based on a statistical latent cl ass model 104(2):211-240,1997 We have argued that this approach is more principled than standard Latent Semantic Anal ysis, since it pos- [9 R.M. Neal and G.E. Hinton. A view of the EM al sesses a sound statistical foundation. Tempered Expec- gorithm that justifies incremental and ot her var tat ion Maximization has been presented as a powerful ants. In M.I. Jordan, editor, Learning in Grap h fitting pro cedure. We have experimentally verified the I Models, pages 355-368. Kluwer Academic d advantages achieving subs al per formance Publishers. 1998 gains. Prob abilistic Latent Semantic Analysis has thus [101 F.C. N. Pereira, N 2. Tishby, and L. Lee. Distribu el tional clustering of english words. In Proceedings earning met hod with a wide range of applications in of the ACL, pages 183-190, 1993 [11 K. Rose, E. Gurewitz, and acknowledgment s tic annealing approach to clustering. Pattern The aut hor would like to thank j an puzicha. andrew Recognition Letters, 11(11): 589-594, 1990 Mike jordan joack Uhm Tishby, Nelson Morgan, Jerry Feldman, Dan Gildea, [12] G. Sal ton and M J McGill. Introduc Mod- Andrew Ng, Seb asti an Thrun, and Tom Mitchell for stimulating discussions and helpful hints. This work [13] L. Saul and F. Pereira. Aggregate and mixed- has been supported by a Daad fellow ship order Mar kow mo dels for statistical language Re ferences Conference on Empirical Meth guage Processing, 1997 1 J.R. Bellegarda. Exploiting both local and global onstraints for multi-span statistical languagecosine baseline score has been coarsely optimized by hand, MED, CRAN:  = 1=2, CACM, CISI: = 2=3. The experiments consistently validate the advantages of PLSI over LSI. Substantial performance gains have been achieved for all 4 data sets. Notice that the rela￾tive precision gain compared to the baseline method is typically around 100% in the most interesting interme￾diate regime of recall! In particular, PLSI works well even in cases where LSI fails completely (these prob￾lems of LSI are in accordance with the original results reported in [3]). The bene ts of model combination are also very substantial. In all cases the (uniformly) combined model performed better than the best single model. As a sight-e ect model averaging also deliber￾ated from selecting the correct model dimensionality. These experiments demonstrate that the advantages of PLSA over standard LSA are not restricted to appli￾cations with performance criteria directly depending on the perplexity. Statistical objective functions like the perplexity (log-likelihood) may thus provide a gen￾eral yardstick for analysis methods in text learning and information retrieval. To stress this point we ran an experiment on the MED data, where both, perplexity and average precision, have been monitored simulta￾neously as a function of . The resulting curves which show a striking correlation are plotted in Figure 7. 5 Conclusion We have proposed a novel method for unsupervised learning, called Probabilistic Latent Semantic Analy￾sis, which is based on a statistical latent class model. We have argued that this approach is more principled than standard Latent Semantic Analysis, since it pos￾sesses a sound statistical foundation. Tempered Expec￾tation Maximization has been presented as a powerful tting procedure. We have experimentally veri ed the claimed advantages achieving substantial performance gains. Probabilistic Latent Semantic Analysis has thus to be considered as a promising novel unsupervised learning method with a wide range of applications in text learning and information retrieval. Acknowledgments The author would like to thank Jan Puzicha, Andrew McCallum, Mike Jordan, Joachim Buhmann, Tali Tishby, Nelson Morgan, Jerry Feldman, Dan Gildea, Andrew Ng, Sebastian Thrun, and Tom Mitchell for stimulating discussions and helpful hints. This work has been supported byaDAAD fellowship. References [1] J.R. Bellegarda. Exploiting both local and global constraints for multi-span statistical language modeling. In Proceedings of ICASSP'98, vol￾ume 2, pages 677{80, 1998. [2] N. Coccaro and D. Jurafsky. Towards better inte￾gration of semantic predictors in statistical lan￾guage modeling. In Proceedings of ICSLP-98, 1998. to appear. [3] S. Deerwester, S. T. Dumais, G. W. Furnas, Lan￾dauer. T. K., and R. Harshman. Indexing by la￾tent semantic analysis. Journal of the American Society for Information Science, 41, 1990. [4] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B, 39:1{38, 1977. [5] P.W. Foltz and S. T. Dumais. An analysis of in￾formation ltering methods. Communications of the ACM, 35(12):51{60, 1992. [6] T. Hofmann. Probabilistic latent semantic index￾ing. In Proceedings of SIGIR'99, 1999. [7] T. Hofmann, J. Puzicha, and M. I. Jordan. Unsu￾pervised learning from dyadic data. In Advances in Neural Information Processing Systems, vol￾ume 11. MIT Press, 1999. [8] T.K. Landauer and S.T. Dumais. A solution to Plato's problem: The latent semantic anal￾ysis theory of acquisition, induction, and rep￾resentation of knowledge. Psychological Review, 104(2):211{240, 1997. [9] R.M. Neal and G.E. Hinton. A view of the EM al￾gorithm that justi es incremental and other vari￾ants. In M.I. Jordan, editor, Learning in Graph￾ical Models, pages 355{368. Kluwer Academic Publishers, 1998. [10] F.C.N. Pereira, N.Z. Tishby, and L. Lee. Distribu￾tional clustering of english words. In Proceedings of the ACL, pages 183{190, 1993. [11] K. Rose, E. Gurewitz, and G. Fox. A determin￾istic annealing approach to clustering. Pattern Recognition Letters, 11(11):589{594, 1990. [12] G. Salton and M. J. McGill. Introduction to Mod￾ern Information Retrieval. McGraw{Hill, 1983. [13] L. Saul and F. Pereira. Aggregate and mixed{ order Markov models for statistical language pro￾cessing. In Proceedings of the 2nd International Conference on Empirical Methods in Natural Lan￾guage Processing, 1997
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有