Saul and F. Pereira. Aggregate and mixed- has been supported by a Daad fellow ship order Mar kow mo dels for statistical language Re ferences Conference on Empirical Meth guage Processing, 1997 1 J.R. Bellegarda. Exploiting both local and global onstraints for multi-span statistical languagecosine baseline score has been coarsely optimized by hand, MED, CRAN:  = 1=2, CACM, CISI: = 2=3. The experiments consistently validate the advantages of PLSI over LSI. Substantial performance gains have been achieved for all 4 data sets. Notice that the rela￾tive precision gain compared to the baseline method is typically around 100% in the most interesting interme￾diate regime of recall! In particular, PLSI works well even in cases where LSI fails completely (these prob￾lems of LSI are in accordance with the original results reported in [3]). The bene ts of model combination are also very substantial. In all cases the (uniformly) combined model performed better than the best single model. As a sight-e ect model averaging also deliber￾ated from selecting the correct model dimensionality. These experiments demonstrate that the advantages of PLSA over standard LSA are not restricted to appli￾cations with performance criteria directly depending on the perplexity. Statistical objective functions like the perplexity (log-likelihood) may thus provide a gen￾eral yardstick for analysis methods in text learning and information retrieval. To stress this point we ran an experiment on the MED data, where both, perplexity and average precision, have been monitored simulta￾neously as a function of . The resulting curves which show a striking correlation are plotted in Figure 7. 5 Conclusion We have proposed a novel method for unsupervised learning, called Probabilistic Latent Semantic Analy￾sis, which is based on a statistical latent class model. We have argued that this approach is more principled than standard Latent Semantic Analysis, since it pos￾sesses a sound statistical foundation. Tempered Expec￾tation Maximization has been presented as a powerful tting procedure. We have experimentally veri ed the claimed advantages achieving substantial performance gains. Probabilistic Latent Semantic Analysis has thus to be considered as a promising novel unsupervised learning method with a wide range of applications in text learning and information retrieval. Acknowledgments The author would like to thank Jan Puzicha, Andrew McCallum, Mike Jordan, Joachim Buhmann, Tali Tishby, Nelson Morgan, Jerry Feldman, Dan Gildea, Andrew Ng, Sebastian Thrun, and Tom Mitchell for stimulating discussions and helpful hints. This work has been supported byaDAAD fellowship. References [1] J.R. Bellegarda. Exploiting both local and global constraints for multi-span statistical language modeling. 