第2期杨潇，等：主题模型LDA的多文档自动文摘 ·175

正在加载图片...

第2期杨潇，等：主题模型LDA的多文档自动文摘 ·175· Development in Information Retrieval.Sheffield,UK. 5结束语 2008:299-306. 本文基于DA模型中的主题概率分布和句子 [9]HARABAGIU S,HICKL A,LACATUSU F.Satisfying in- 概率分布提出了2种句子权重的计算方法：1)使用 formation needs with multidocument summaries[J].Infor- 生成性的主题模型计算句子中单词概率的方法；2) mation Processing and Management,2007,43(6):1619- 1642. 将文档集中的句子看作文档集生成模型的方法.在 [10]BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet allo- 通用型文摘数据集DUC2002上，使用ROUGE评测 cation[J].Journal of Machine Learing Research,2003 工具得到的实验结果表明，这2种句子权重计算方 (3):993-1022. 法都取得了明显优于传统方法的效果，比其他基于 [11 HAGHIGHI A,VANDERWENDE L.Exploring content LDA的文摘系统也有优势.LDA模型中做了词袋假 models for multi-document summarization C]//Human 设，它没有考虑单词和句子的位置，也没有考虑句 Language Technologies:the Annual Conference of the 子、文档和文档集合之间的结构关系，以后的工作将 North American Chapter of the ACL Boulder.Colorado, 在主题模型中纳入句法结构信息，并为句子、文档和 2009:362-370. 文档集合建立统一的概率生成框架.另外，目前在句 [12]ARORA R,RAVINDRAN B.Latent Dirichlet allocation based multi-document summarization C]//Proc of the 子生成模型中作了句子先验概率相同的假设，以后 Second Workshop on Analytics for Noisy Unstructured Text 的工作也将考虑纳入句子先验概率的影响， data.Singapore,2008:91-97. 参考文献： 13]ARORA R,RAVINDRAN B.Latent Dirichlet allocation and singular value decomposition based multi-document [1]RADEV D R,HOVY E,MCKEOWN K.Introduction to the summarization[C]//Proc of Eighth IEEE International special issue on text summarization[J].Computational Lin- Conference on Data Mining.Pisa,Italy,2008:713-718 guistics,2002,28(4):399408. [14]CHEN Y T,CHEN B,WANG H M.A probabilistic gen- [2]LEE J H,SUN P,AHN C M,et al.Automatic generic erative framework for extractive broadcast news speech document summarization based on non-negative matrix fac- summarization[J].IEEE Trans on Audio,Speech,and torization[J].Information Processing and Management, Language Processing,2009,17(1):95-106. 2009,45(1):20-34. [15]SHAFIEI MM,MILIOS EE.Latent Dirichlet co-cluste- 「3]徐永东，徐志明，王晓龙.基于信息融合的多文档自动文 ring[C]//Proceedings of the Sixth International Confer- 摘技术[J].计算机学报，2007,30(11)：2048-2054. ence on Data Mining (ICDM).Hong Kong,China,2006: XU Yongdong,XU Zhiming,WANG Xiaolong.Multi-docu- 542-551. ment automatic summarization technique based on informa- [16]CHANG Y L,CHIEN J T.Latent Dirichlet learning for tion fusion J].Chinese Journal of Computers,2007,30 document summarization[C]//IEEE International Confer- (11):2048-2054. ence on Acoustics,Speech,and Signal Processing.Tai- [4]HIRAO T,ISOZAKI H,MAEDA E,et al.Extracting im- pei,China,2009:1689-1692. portant sentences with support vector machines[C]//Proc of [17]LIN C Y.ROUGE:a package for automatic evaluation of the 19th International Conference on Computational Linguis- summaries[C]//Workshop on Text Summarization Bran- tics.Taipei,China,2002:1-7. ches Out.[S.1.],Spain,2004:74-81. [5]NENKOVA A,VANDERWENDE L.The impact of fre- [18]STEYVERS M,GRIFFITHS T.Probabilistic topic models quency on summarization:MSR-TR-2005-101 [R].Red- [C]//Handbook of Latent Semantic Analysis.Laurence mond,USA:Microsoft Research,2005. Erlbaum,2007:1-15. [6]LINC Y,HOVY E.The automated acquisition of topic sig- [19]TEH Y W,JORDAN M I,BEAL M J,et al.Hierarchical natures FOR text summarization C//Proc of the 18th In- Dirichlet processes[J].Journal of the American Statistical ternational Conference on Computational Linguistics.Sar- Association,2006,101(476):1566-1581. brflcken,Germany,2000:271-278. [20]秦兵，刘挺，李生，基于局部主题判定与抽取的 [7]ANTIQUEIRA L,Jr OLIVEIRA O N.A complex network 多文档文摘技术[J].自动化学报，2004,30(6)：905 approach to text summarization[].Information Science, 910 2009(179):584-599. QIN Bing,LIU Ting,LI Sheng.Multi-document summari- [8]WAN X J,YANG J W.Multi-document summarization u- zation based on local topics identification and extraction sing cluster-based link analysis[C]//Proc of the 27th An- [J].Acta Automatica Sinica,2004,30(6):905-910. nual International ACM SIGIR Conference on Research and [21]石晶，胡明，石鑫，等.基于LDA模型的文本分

<<向上翻页向下翻页>>

点击下载：【自然语言处理与理解】主题模型LDA的多文档自动文摘