Learning Algorithms for Keyphrase Ext_中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：Learning Agorithms for Keyphrase Extraction

正在加载图片...

Learning Algorithms for Keyphrase Extraction Table 1: Samples of the behaviour of three different stemming algorithms Word Porter Stem Lovins Stem Iterated Lovins stem believes believ belief belief belief belie belief believable believ belief belief ss halo Je jealousy jealous p polICI assembli assembl assembly assembli assemb semb prob probability probabl prob prob probabilities robabl pro babil robabil each word or phrase in the document into one of two categories: either it is a key phrase or it is not a keyphrase. We evaluate automatic keyphrase extraction by the degree to which its classifications correspond to human-generated classifications. Our performance measure is precision(the number of matches divided by the number of machine-generated keyphrases), using a variety of cut-offs for the number of machine-generated keyphrases 3. Related work Although there are several papers that discuss automatically extracting important phrases, as far as we know, we are the first to treat this problem as supervised learning from examples Krulwich and burkey(1996) use heuristics to extract keyphrases from a document. The heu- ristics are based on syntactic clues, such as the use of italics, the presence of phrases in sec tion headers, and the use of acronyms. Their motivation is to produce phrases for use as features when automatically classifying documents. Their algorithm tends to produce a rela tively large list of phrases, with low precision. Munoz(1996)uses an unsupervised learning algorithm to discover two-word keyphrases. The algorithm is based on Adaptive resonance Theory(ART) neural networks. Muroz's algorithm tends to produce a large list of phrases with low precision. Also, the algorithm is not applicable to one-word or more-than-two-word keyphrases Steier and Belew(1993)use the mutual information statistic to discover two-Learning Algorithms for Keyphrase Extraction 7 each word or phrase in the document into one of two categories: either it is a keyphrase or it is not a keyphrase. We evaluate automatic keyphrase extraction by the degree to which its classifications correspond to human-generated classifications. Our performance measure is precision (the number of matches divided by the number of machine-generated keyphrases), using a variety of cut-offs for the number of machine-generated keyphrases. 3. Related Work Although there are several papers that discuss automatically extracting important phrases, as far as we know, we are the first to treat this problem as supervised learning from examples. Krulwich and Burkey (1996) use heuristics to extract keyphrases from a document. The heuristics are based on syntactic clues, such as the use of italics, the presence of phrases in section headers, and the use of acronyms. Their motivation is to produce phrases for use as features when automatically classifying documents. Their algorithm tends to produce a relatively large list of phrases, with low precision. Muñoz (1996) uses an unsupervised learning algorithm to discover two-word keyphrases. The algorithm is based on Adaptive Resonance Theory (ART) neural networks. Muñoz’s algorithm tends to produce a large list of phrases, with low precision. Also, the algorithm is not applicable to one-word or more-than-two-word keyphrases. Steier and Belew (1993) use the mutual information statistic to discover twoTable 1: Samples of the behaviour of three different stemming algorithms. Word Porter Stem Lovins Stem Iterated Lovins Stem believes believ belief belief belief belief belief belief believable believ belief belief jealousness jealous jeal jeal jealousy jealousi jealous jeal police polic polic pol policy polici polic pol assemblies assembli assembl assembl assembly assembli assemb assemb probable probabl prob prob probability probabl prob prob probabilities probabl probabil probabil

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：Learning Agorithms for Keyphrase Extraction