正在加载图片...
Learning Algorithms for Keyphrase Extraction Table 1: Samples of the behaviour of three different stemming algorithms Word Porter Stem Lovins Stem Iterated Lovins stem believes believ belief belief belief belie belief believable believ belief belief ss halo Je jealousy jealous p polICI assembli assembl assembly assembli assemb semb prob probability probabl prob prob probabilities robabl pro babil robabil each word or phrase in the document into one of two categories: either it is a key phrase or it is not a keyphrase. We evaluate automatic keyphrase extraction by the degree to which its classifications correspond to human-generated classifications. Our performance measure is precision(the number of matches divided by the number of machine-generated keyphrases), using a variety of cut-offs for the number of machine-generated keyphrases 3. Related work Although there are several papers that discuss automatically extracting important phrases, as far as we know, we are the first to treat this problem as supervised learning from examples Krulwich and burkey(1996) use heuristics to extract keyphrases from a document. The heu- ristics are based on syntactic clues, such as the use of italics, the presence of phrases in sec tion headers, and the use of acronyms. Their motivation is to produce phrases for use as features when automatically classifying documents. Their algorithm tends to produce a rela tively large list of phrases, with low precision. Munoz(1996)uses an unsupervised learning algorithm to discover two-word keyphrases. The algorithm is based on Adaptive resonance Theory(ART) neural networks. Muroz's algorithm tends to produce a large list of phrases with low precision. Also, the algorithm is not applicable to one-word or more-than-two-word keyphrases Steier and Belew(1993)use the mutual information statistic to discover two-Learning Algorithms for Keyphrase Extraction 7 each word or phrase in the document into one of two categories: either it is a keyphrase or it is not a keyphrase. We evaluate automatic keyphrase extraction by the degree to which its classifications correspond to human-generated classifications. Our performance measure is precision (the number of matches divided by the number of machine-generated keyphrases), using a variety of cut-offs for the number of machine-generated keyphrases. 3. Related Work Although there are several papers that discuss automatically extracting important phrases, as far as we know, we are the first to treat this problem as supervised learning from examples. Krulwich and Burkey (1996) use heuristics to extract keyphrases from a document. The heu￾ristics are based on syntactic clues, such as the use of italics, the presence of phrases in sec￾tion headers, and the use of acronyms. Their motivation is to produce phrases for use as features when automatically classifying documents. Their algorithm tends to produce a rela￾tively large list of phrases, with low precision. Muñoz (1996) uses an unsupervised learning algorithm to discover two-word keyphrases. The algorithm is based on Adaptive Resonance Theory (ART) neural networks. Muñoz’s algorithm tends to produce a large list of phrases, with low precision. Also, the algorithm is not applicable to one-word or more-than-two-word keyphrases. Steier and Belew (1993) use the mutual information statistic to discover two￾Table 1: Samples of the behaviour of three different stemming algorithms. Word Porter Stem Lovins Stem Iterated Lovins Stem believes believ belief belief belief belief belief belief believable believ belief belief jealousness jealous jeal jeal jealousy jealousi jealous jeal police polic polic pol policy polici polic pol assemblies assembli assembl assembl assembly assembli assemb assemb probable probabl prob prob probability probabl prob prob probabilities probabl probabil probabil
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有