正在加载图片...
Learning Algorithms for Keyphrase Extraction ( Breiman, 1996a, 1996b; Quinlan, 1996). Bagging works by generating many different deci sion trees and allowing them to vote on the classification of each example. We experimented with different numbers of trees and different techniques for sampling the training data. The experiments support the hypothesis that bagging improves the performance of C4.5 when applied to automatic keyphrase extraction During our experiments with C4. 5, we came to believe that a specialized algorith developed specifically for learning to extract keyphrases, might achieve better results than a general-purpose learning algorithm, such as C4. 5. Section 7 introduces the GenEx algorithm GenEx is a hybrid of the Genitor steady-state genetic algorithm (Whitley, 1989)and the Extractor parameterized keyphrase extraction algorithm (Turney, 1997, 1999). Extractor works by assigning a numerical score to the phrases in the input document. The final output of Extractor is essentially a list of the highest scoring phrases. The behaviour of the scoring function is determined by a dozen numerical parameters. Genitor tunes the setting of these parameters, to maximize the performance of Extractor on a given set of training examples The second set of experiments(Section 8) supports the hypothesis that a lized algorithm(Gen Ex) can generate better keyphrases than a general-purpose algorithm(C4. 5) Both algorithms incorporate significant amounts of domain knowledge, but we avoided embedding specialized procedural knowledge in our application of C4.5. It appears that some degree of specialized procedural knowledge is necessary for automatic keyphrase extraction The third experiment (Section 9)looks at subjective human evaluation of the quality of the keyphrases produced by GenEx. On average, about 80% of the automatically generated keyphrases are judged to be acceptable and about 60%are judged to be good Section 10 discusses the experimental results and Section 1 l presents our plans for future work. We conclude (in Section 12)that GenEx is performing at a level that is suitable for 4. Extractor is an Official Mark of the National Research Council of Canada. Patent applications have been sub-Learning Algorithms for Keyphrase Extraction 5 (Breiman, 1996a, 1996b; Quinlan, 1996). Bagging works by generating many different deci￾sion trees and allowing them to vote on the classification of each example. We experimented with different numbers of trees and different techniques for sampling the training data. The experiments support the hypothesis that bagging improves the performance of C4.5 when applied to automatic keyphrase extraction. During our experiments with C4.5, we came to believe that a specialized algorithm, developed specifically for learning to extract keyphrases, might achieve better results than a general-purpose learning algorithm, such as C4.5. Section 7 introduces the GenEx algorithm. GenEx is a hybrid of the Genitor steady-state genetic algorithm (Whitley, 1989) and the Extractor parameterized keyphrase extraction algorithm (Turney, 1997, 1999).4 Extractor works by assigning a numerical score to the phrases in the input document. The final output of Extractor is essentially a list of the highest scoring phrases. The behaviour of the scoring function is determined by a dozen numerical parameters. Genitor tunes the setting of these parameters, to maximize the performance of Extractor on a given set of training examples. The second set of experiments (Section 8) supports the hypothesis that a specialized algorithm (GenEx) can generate better keyphrases than a general-purpose algorithm (C4.5). Both algorithms incorporate significant amounts of domain knowledge, but we avoided embedding specialized procedural knowledge in our application of C4.5. It appears that some degree of specialized procedural knowledge is necessary for automatic keyphrase extraction. The third experiment (Section 9) looks at subjective human evaluation of the quality of the keyphrases produced by GenEx. On average, about 80% of the automatically generated keyphrases are judged to be acceptable and about 60% are judged to be good. Section 10 discusses the experimental results and Section 11 presents our plans for future work. We conclude (in Section 12) that GenEx is performing at a level that is suitable for 4. Extractor is an Official Mark of the National Research Council of Canada. Patent applications have been sub￾mitted for Extractor
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有