正在加载图片...
A Discriminative Approach to Topic-based Citation Recommendation 3.2 Ranking and recommendation The objective of citation recommendation is to rank the recommended papers for a given citation context. Specifically, we apply the same modeling procedure to the cita- tion context. Hence, we can obtain a topic representation he) of the citation context c. Based on the topic representation and the modeling results, we can calculate the prob- ability of each paper being the reference paper for the citation context according to Equation(6). Finally, the papers are ranked in terms of the probabilities and the top K ranked papers are returned as the recommended papers. It is hard to specify an accurate value of K for each citation context. A simple way is to set it as the average number of citations in a paper (i.e, ll in our data set). 3.3 Matching Recommended Papers with Citation Sentences The purpose of matching the recommended papers with citation sentences is to align the recommended papers with sentences in the citation context. This can be done by using each recommended paper as a keyword query to retrieve relevant citation sentences. In general, we may use any retrieval method. In this paper, we used KL-divergence to measure the relevance between the recommended paper and the citation sentence KL(d,sa)=∑p(hkd)lg where d is a recommended paper and sci is the ith sentence in the citation context c; the probabilities p(hkd)and p(hk sci), which can be obtained by (4) 4 Experiments 4.1 Experimental Setting Data Set We conducted experiments on two data sets, NPS and Citeseer The NiPS data set consists of 12 volumes of NIPS papers(1, 605 papers and 10, 472 citation relationships). Each paper contains full text and its citations. We removed some citations with incomplete information, e. g, consisting of only authors and publication venue, but no title. We also removed citations that do not appear in the data set. The Citeseer data set consists of 3, 335 papers(with 32, 558 citation relationships)downloaded from the Citeseer web site. As well, we removed citations that do not appear in the data set. t,. Each paper was preprocessed by(a) removing stopwords and numbers; (b)remov- ng words appearing less than three times in the corpus; and (c) downcasing the obtained words. Finally, we obtained V= 26, 723 unique words and a total of 350, 361 words in NIPS and V=44, 548 unique words and 634, 875 words in Citeseer http://citeseer psu. edu/oai. htmlA Discriminative Approach to Topic-based Citation Recommendation 5 3.2 Ranking and recommendation The objective of citation recommendation is to rank the recommended papers for a given citation context. Specifically, we apply the same modeling procedure to the cita￾tion context. Hence, we can obtain a topic representation {hc} of the citation context c. Based on the topic representation and the modeling results, we can calculate the prob￾ability of each paper being the reference paper for the citation context according to Equation (6). Finally, the papers are ranked in terms of the probabilities and the top K ranked papers are returned as the recommended papers. It is hard to specify an accurate value of K for each citation context. A simple way is to set it as the average number of citations in a paper (i.e., 11 in our data set). 3.3 Matching Recommended Papers with Citation Sentences The purpose of matching the recommended papers with citation sentences is to align the recommended papers with sentences in the citation context. This can be done by using each recommended paper as a keyword query to retrieve relevant citation sentences. In general, we may use any retrieval method. In this paper, we used KL-divergence to measure the relevance between the recommended paper and the citation sentence: KL(d, sci) = XT k=1 p(hk|d)log p(hk|d) p(hk|sci) (7) where d is a recommended paper and sci is the ith sentence in the citation context c; the probabilities p(hk|d) and p(hk|sci), which can be obtained by (4). 4 Experiments 4.1 Experimental Setting Data Set We conducted experiments on two data sets, NIPS 1 and Citeseer 2 . The NIPS data set consists of 12 volumes of NIPS papers (1,605 papers and 10,472 citation relationships). Each paper contains full text and its citations. We removed some citations with incomplete information, e.g., consisting of only authors and publication venue, but no title. We also removed citations that do not appear in the data set. The Citeseer data set consists of 3, 335 papers (with 32,558 citation relationships) downloaded from the Citeseer web site. As well, we removed citations that do not appear in the data set. Each paper was preprocessed by (a) removing stopwords and numbers; (b) remov￾ing words appearing less than three times in the corpus; and (c) downcasing the obtained words. Finally, we obtained V = 26, 723 unique words and a total of 350, 361 words in NIPS and V = 44, 548 unique words and 634, 875 words in Citeseer. 1 http://www.cs.toronto.edu/˜roweis/data.html 2 http://citeseer.ist.psu.edu/oai.html
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有