正在加载图片...
Turney Learning Algorithms for Keyphrase extraction 1. Introduction Many journals ask their authors to provide a list of keywords for their articles. We call these keyphrases, rather than keywords, because they are often phrases of two or more words, rather than single words. We define a keyphrase list as a short list of phrases(typically five to fifteen noun phrases)that capture the main topics discussed in a given document. This paper is concerned with the automatic extraction of keyphrases from text Keyphrases are meant to serve multiple goals. For example, (1)when they are printed on the first page of a journal article, the goal is summarization. They enable the reader to quickly determine whether the given article is in the readers fields of interest. (2)When they are printed in the cumulative index for a journal, the goal is indexing. They enable the reader to quickly find a relevant article when the reader has a specific need. (3)When a search engine form has a field labelled keywords, the goal is to enable the reader to make the search more precise. A search for documents that match a given query term in the keyword field will yield a smaller, higher quality list of hits than a search for the same term in the full text of the documents. Keyphrases can serve these diverse goals and others, because the goals share the requirement for a short list of phrases that captures the main topics of the documents We define automatic keyphrase extraction as the automatic selection of important, topi cal phrases from within the body of a document. Automatic keyphrase extraction is a special case of the more general task of automatic keyphrase generation, in which the generated phrases do not necessarily appear in the body of the given document. Section 2 discusses cri teria for measuring the performance of automatic keyphrase extraction algorithms. In the experiments in this paper, we measure the performance by comparing machine-generated keyphrases with human-generated key phrases In our document collections, an average of about 75% of the authors keyphrases appear somewhere in the body of the corresponding document. Thus, an ideal keyphrase extraction algorithm could (in principle) generateTurney 2 Learning Algorithms for Keyphrase Extraction 1. Introduction Many journals ask their authors to provide a list of keywords for their articles. We call these keyphrases, rather than keywords, because they are often phrases of two or more words, rather than single words. We define a keyphrase list as a short list of phrases (typically five to fifteen noun phrases) that capture the main topics discussed in a given document. This paper is concerned with the automatic extraction of keyphrases from text. Keyphrases are meant to serve multiple goals. For example, (1) when they are printed on the first page of a journal article, the goal is summarization. They enable the reader to quickly determine whether the given article is in the reader’s fields of interest. (2) When they are printed in the cumulative index for a journal, the goal is indexing. They enable the reader to quickly find a relevant article when the reader has a specific need. (3) When a search engine form has a field labelled keywords, the goal is to enable the reader to make the search more precise. A search for documents that match a given query term in the keyword field will yield a smaller, higher quality list of hits than a search for the same term in the full text of the documents. Keyphrases can serve these diverse goals and others, because the goals share the requirement for a short list of phrases that captures the main topics of the documents. We define automatic keyphrase extraction as the automatic selection of important, topi￾cal phrases from within the body of a document. Automatic keyphrase extraction is a special case of the more general task of automatic keyphrase generation, in which the generated phrases do not necessarily appear in the body of the given document. Section 2 discusses cri￾teria for measuring the performance of automatic keyphrase extraction algorithms. In the experiments in this paper, we measure the performance by comparing machine-generated keyphrases with human-generated keyphrases. In our document collections, an average of about 75% of the author’s keyphrases appear somewhere in the body of the corresponding document. Thus, an ideal keyphrase extraction algorithm could (in principle) generate
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有