A Discriminative approach to Topic-based Citation Recommendation Jie Tang and Jing zhang Tsinghua University, Beijing, 100084. China jietangetsinghua. edu. cn, zhangjingekeg Cs tsinghua. edu.cn Abstract. In this paper, we present a study of a novel problem, i.e. topic-based citation recommendation, which involves recommending papers to be referred to. Traditionally, this problem is usually treated as an engineering issue and dealt with using heuristics. This paper gives a formalization of topic-based citation recommendation and proposes a discriminative approach to this problem. Specif ally, it proposes a two-layer Restricted Boltzmann Machine model, called RBM- CS, which can discover topic distributions of paper content and citation relation- ship simultaneously. Experimental results demonstrate that RBM-CS can signifi cantly outperform baseline methods for citation recommendation 1 Introduction Citation recommendation is concerned with recommending papers that should be re- ferred to. When starting a work in a new research topic or brainstorming for novel ideas, a researcher usually wants to have a quick understanding of the exiting literatures in this field, including which papers are the most relevant papers and what sub-topics are presented in these papers. Two common ways to find reference papers are: (1)search documents on search engines such as Google and (2)trace the cited references by start- ing with a small number of initial papers(seed-papers). Unfortunately, for(1)it would be difficult to find a comprehensive keyword list to cover all papers, especially for be ginning researchers. It is very possible to miss important developments in areas outside a researcher's specialty. For(2), an average paper may cite more than twenty papers. It would be quite time consuming to analyze each of the cited reference to see whether it is useful or not, especially with the increase of the tracing depth. Additionally, even a well organized paper may miss some important"related work", due to space limitation or other reasons Previously, papers recommendation has been studied, for example, by exploring col- laborative filtering [7]. Our problem is relevant, but different, from this kind of work. Firstly, in citation recommendation, the user is interested in not only a list of recom- mended papers, but also the sub-topics presented in these papers. Secondly, conven- tional methods can only recommend papers; but cannot suggest the citation position (i.e. which sentences should refer to the citation). The work is supported by the National Natural Science Foundation of China(60703059, Chi nese National Key Foundation Research and Development Plan(2007CB3 10803), and Chinese Young Faculty Research Funding(20070003093)
A Discriminative Approach to Topic-based Citation Recommendation ? Jie Tang and Jing Zhang Department of Computer Science and Technology, Tsinghua University, Beijing, 100084. China jietang@tsinghua.edu.cn,zhangjing@keg.cs.tsinghua.edu.cn Abstract. In this paper, we present a study of a novel problem, i.e. topic-based citation recommendation, which involves recommending papers to be referred to. Traditionally, this problem is usually treated as an engineering issue and dealt with using heuristics. This paper gives a formalization of topic-based citation recommendation and proposes a discriminative approach to this problem. Specifi- cally, it proposes a two-layer Restricted Boltzmann Machine model, called RBMCS, which can discover topic distributions of paper content and citation relationship simultaneously. Experimental results demonstrate that RBM-CS can signifi- cantly outperform baseline methods for citation recommendation. 1 Introduction Citation recommendation is concerned with recommending papers that should be referred to. When starting a work in a new research topic or brainstorming for novel ideas, a researcher usually wants to have a quick understanding of the exiting literatures in this field, including which papers are the most relevant papers and what sub-topics are presented in these papers. Two common ways to find reference papers are: (1) search documents on search engines such as Google and (2) trace the cited references by starting with a small number of initial papers (seed-papers). Unfortunately, for (1) it would be difficult to find a comprehensive keyword list to cover all papers, especially for beginning researchers. It is very possible to miss important developments in areas outside a researcher’s specialty. For (2), an average paper may cite more than twenty papers. It would be quite time consuming to analyze each of the cited reference to see whether it is useful or not, especially with the increase of the tracing depth. Additionally, even a well organized paper may miss some important “related work”, due to space limitation or other reasons. Previously, papers recommendation has been studied, for example, by exploring collaborative filtering [7]. Our problem is relevant, but different, from this kind of work. Firstly, in citation recommendation, the user is interested in not only a list of recommended papers, but also the sub-topics presented in these papers. Secondly, conventional methods can only recommend papers; but cannot suggest the citation position (i.e., which sentences should refer to the citation). ? The work is supported by the National Natural Science Foundation of China (60703059), Chinese National Key Foundation Research and Development Plan (2007CB310803), and Chinese Young Faculty Research Funding (20070003093)
Jie Tang and Jing Zhang →P a retrieval Fig 1: Example of citation recommendation. o. In this paper, we formalize citation recommendation as that of topic discovery, pic-based recommendation, and matching citation sentences with the recommend papers. We propose a unified and discriminative approach to citation recommendation This approach can automatically discover topical aspects of each paper and recommend papers based on the discovered topic distribution. Experimental results show that the proposed approach significantly outperforms the baseline methods 2 Problem formulation We define notations used throughout this paper. Assuming that a paper d contains a vector wd of Nd words, in which each word wdi is chosen from a vocabulary of size V; and a list ld of Ld references. Then a collection of D papers can be represented as D=I(w1, 11), . ,(wD, D). We only consider references that appear in the paper ollection D. Thus the size L of the vocabulary of references is D. Further, we consider that each paper is associated with a distribution of T' topics, so is the citation. Definition 1.(Citation Context and Citation Sentence)Citation context is defined by the context words occurring in, for instance, the user written proposal. For an example, the words". We use Cosine computation (x)evaluate the similarity. " would be citation context. One reference paper is expected to be cited at the position"Ix".We use c to denote a citation context. each sentence in the citation context is called citation sentence. The position"Ix) "to cite the reference paper is called citation position Figure 1 shows an example of citation recommendation. The left part of Figure 1 includes a citation context provided by the user and a paper collection. The right part shows the recommended result that we expect a citation recommendation algorithm outputs. For instance, two topics, i.e., "text summarization"and"information retrieval have been extracted from the citation context. For the first topic"text summarization two papers have been recommended and for the second topic"information retrieval three papers have been recommended. Further, the recommended papers are matched with the citation sentences and the corresponding citation positions have been identified
2 Jie Tang and Jing Zhang We are considering the extraction-based text summarization [2] [3]. … As for the models, we can adopt many existing probabilistic retrieval models such as the classic probabilistic retrieval models [4] and the Kullback-Leibler (KL) divergence retrieval model [1] [5]. Suggested references Citation context Paper collection We are considering the extraction-based text summarization. … As for the models, we can adopt many existing probabilistic retrieval models such as the classic probabilistic retrieval models and the Kullback-Leibler (KL) divergence retrieval model. Citation recommendation results Document language models, .. Topic1: text summarization Topic 2: information retrieval Lafferty, J. and Zhai, C. Document language models, query models, and risk minimization for information retrieval. In SIGIR'01. 111-119. [1] Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development 2, 2, 159-165. [2] McKeown, K. and Radev, D. R. 1995. Generating summaries of multiple news articles. In SIGIR’95, 74-82. [3] Robertson, S. E. 1977. The probability ranking principle in IR. Journal of Documentation 33, 4, 294-304. [4] Zhai, C. and Lafferty, J. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR’01, 334-342. [5] Discovered topics Matching references with citation sentences Fig. 1: Example of citation recommendation. In this paper, we formalize citation recommendation as that of topic discovery, topic-based recommendation, and matching citation sentences with the recommended papers. We propose a unified and discriminative approach to citation recommendation. This approach can automatically discover topical aspects of each paper and recommend papers based on the discovered topic distribution. Experimental results show that the proposed approach significantly outperforms the baseline methods. 2 Problem Formulation We define notations used throughout this paper. Assuming that a paper d contains a vector wd of Nd words, in which each word wdi is chosen from a vocabulary of size V ; and a list ld of Ld references. Then a collection of D papers can be represented as D = {(w1, l1), · · · ,(wD, lD)}. We only consider references that appear in the paper collection D. Thus the size L of the vocabulary of references is D. Further, we consider that each paper is associated with a distribution of T topics, so is the citation. Definition 1. (Citation Context and Citation Sentence) Citation context is defined by the context words occurring in, for instance, the user written proposal. For an example, the words “... We use Cosine computation [x] evaluate the similarity ...” would be a citation context. One reference paper is expected to be cited at the position “[x]”. We use c to denote a citation context. Each sentence in the citation context is called citation sentence. The position “[x]” to cite the reference paper is called citation position. Figure 1 shows an example of citation recommendation. The left part of Figure 1 includes a citation context provided by the user and a paper collection. The right part shows the recommended result that we expect a citation recommendation algorithm outputs. For instance, two topics, i.e., “text summarization” and “information retrieval”, have been extracted from the citation context. For the first topic “text summarization”, two papers have been recommended and for the second topic “information retrieval”, three papers have been recommended. Further, the recommended papers are matched with the citation sentences and the corresponding citation positions have been identified
A Discriminative Approach to Topic-based Citation Recommendation We see that the recommended papers are topic dependent. By nature, the problem of citation recommendation can be formalized as topic discovery, reference papers rec ommendation, and matching of the recommended papers with the citation sentences. 3 Our Approach At a high level, our approach primarily consists of three steps 1. We propose a two-layer Restricted Boltzmann Machine(RBM) model, referred to as RBM-CS. Given a collection of papers with citation relationship, the model learns a mixture of topic distribution over paper contents and citation relationships 2. We present a method to rank papers for a given citation context, based on the learned topic model. We take the top ranked papers as the recommended papers 3. We describe a method to find the correspondence between the recommended papers and the citation sentences 3.1 The rBM-CS model Restricted Boltzmann Machines(RBMs)[8] are undirected graphical models that use a layer of hidden variables to model a( topic)distribution over visible variables. In th work, we propose a two-layer RBM model, called RBM-CS, to jointly model papers and itations Graphical representation of the RBM-CS model is shown in Figure 2. We see that in RBM-CS, the hidden layer h is associated with two visible layers: words w and citation relationships l, respectively coupling with an interaction matrix M and U. The basic idea in RBM-CS is to capture the topic distribution of papers with a hidden topic layer, which is conditioned on both words and citation relationships. Words and citation elationship are considered to be generated from the hidden topics independently Fig. 2: Graphical representation of the RBM-CS model To train a graphical model, we can consider maximization of the generative log likelihood log p(w, I). However, we are dealing with a predictive problem, our interests ltimately only lie in correct prediction p( w), not necessarily to have a good p(w) Therefore, we define a discriminative objective function by a conditional log-likelihood L=∑gpuw)=∑(nw)
A Discriminative Approach to Topic-based Citation Recommendation 3 We see that the recommended papers are topic dependent. By nature, the problem of citation recommendation can be formalized as topic discovery, reference papers recommendation, and matching of the recommended papers with the citation sentences. 3 Our Approach At a high level, our approach primarily consists of three steps: 1. We propose a two-layer Restricted Boltzmann Machine (RBM) model, referred to as RBM-CS. Given a collection of papers with citation relationship, the model learns a mixture of topic distribution over paper contents and citation relationships. 2. We present a method to rank papers for a given citation context, based on the learned topic model. We take the top ranked papers as the recommended papers. 3. We describe a method to find the correspondence between the recommended papers and the citation sentences. 3.1 The RBM-CS Model Restricted Boltzmann Machines (RBMs) [8] are undirected graphical models that use a layer of hidden variables to model a (topic) distribution over visible variables. In this work, we propose a two-layer RBM model, called RBM-CS, to jointly model papers and citations. Graphical representation of the RBM-CS model is shown in Figure 2. We see that in RBM-CS, the hidden layer h is associated with two visible layers: words w and citation relationships l, respectively coupling with an interaction matrix M and U. The basic idea in RBM-CS is to capture the topic distribution of papers with a hidden topic layer, which is conditioned on both words and citation relationships. Words and citation relationship are considered to be generated from the hidden topics independently. l ... l l Fig. 2: Graphical representation of the RBM-CS model. To train a graphical model, we can consider maximization of the generative loglikelihood log p(w, l). However, we are dealing with a predictive problem, our interests ultimately only lie in correct prediction p(l|w), not necessarily to have a good p(w). Therefore, we define a discriminative objective function by a conditional log-likelihood: L = XD d log p(ld|wd) = XD d log ÃYL j=1 p(lj |wd) ! (1)
Jie Tang and Jing Zhang The probability p(l,lwd)can be defined as p)=1)+可))=0(2Mm)+()+ where o( )is a sigmoid function, defined as o(r)=1/(1+erp(-x)); e are bias terms for citation relationships: f(hk)is the feature function for hidden variable hk; f(i)and f(wi)are feature functions for citation relationship l, and word w;, respectively; a are bias terms for hidden variables. For simplicity, we define f(w'i) as the count of word in document d. We define binary value for the feature function of citation relationship L. For example, for document d, f(li)=l denotes that the document d has a citation relationship with another paper d Now, the task is to learn the model parameters 0=(m, U, a, b, e)given a training set D. Maximum-likelihood (ML) learning of the parameters can be done by gradient ascent with respect to the model parameters(b are bias terms for words). The exact gradient, for any parameter b Ee can be written as follows: alog p(Iw)=En咧-EP啊 where Epol denotes an expectation with respect to the data distribution and EPx is an expectation with respect to the distribution defined by the model. Computation of the expectation EPx is intractable. In practice, we use a stochastic approximation of this gradient, called the contrastive divergence gradient [4]. The algorithm cycles througl the training data and updates the model parameters according to Algorithm l, where the probabilities p(hk w, 1), p(w h)and p(l, h)are defined as p(hkw,D=a(∑Mk∫(m)+∑Uf()+ak) p(ml)=a(∑M1k∫(hk)+b) p()=a∑Uk∫(hk)+e) where b are bias terms for words; f(Li)is the feature function for citation relationship repeat a) for each document by 2. until all model parameters e converge
4 Jie Tang and Jing Zhang The probability p(lj |wd) can be defined as: p(lj |w) = σ ÃXT k=1 Ujkf(hk) + ej ! , f(hk) = σ XV i=1 Mij f(wi) +X j Ukj f(lj ) + ak (2) where σ(.) is a sigmoid function, defined as σ(x) = 1/(1 + exp(−x)); e are bias terms for citation relationships; f(hk) is the feature function for hidden variable hk; f(lj ) and f(wi) are feature functions for citation relationship lj and word wi , respectively; a are bias terms for hidden variables. For simplicity, we define f(wi) as the count of word wi in document d. We define binary value for the feature function of citation relationship l. For example, for document d, f(lj ) = 1 denotes that the document d has a citation relationship with another paper dj . Now, the task is to learn the model parameters Θ = (M, U, a, b, e) given a training set D. Maximum-likelihood (ML) learning of the parameters can be done by gradient ascent with respect to the model parameters (b are bias terms for words). The exact gradient, for any parameter θ ∈ Θ can be written as follows: ∂log p(l|w) ∂θ = EP0 [l|w] − EPM [l|w] (3) where EP0 [.] denotes an expectation with respect to the data distribution and EPM is an expectation with respect to the distribution defined by the model. Computation of the expectation EPM is intractable. In practice, we use a stochastic approximation of this gradient, called the contrastive divergence gradient [4]. The algorithm cycles through the training data and updates the model parameters according to Algorithm 1, where the probabilities p(hk|w, l), p(wi |h) and p(lj |h) are defined as: p(hk|w, l) = σ( XV i=1 Mikf(wi) +XL j=1 Ujkf(lj ) + ak) (4) p(wi|h) = σ( XT k=1 Mikf(hk) + bi) (5) p(lj |h) = σ( XT k=1 Ujkf(hk) + ej ) (6) where b are bias terms for words; f(lj ) is the feature function for citation relationship. Algorithm 1. Parameter learning via contrastive divergence Input: training data D = {(wd, ld)}, topic number T, and learning rate λ 1. repeat (a) for each document d: i. sampling each topic hk according to (4); ii. sampling each word wi according to (5); iii. sampling each citation relationship lj according to (6); (b) end for (c) update each model parameter θ ∈ Θ by θ = θ + λ( ∂logp(l|w) ∂θ ) 2. until all model parameters Θ converge
A Discriminative Approach to Topic-based Citation Recommendation 3.2 Ranking and recommendation The objective of citation recommendation is to rank the recommended papers for a given citation context. Specifically, we apply the same modeling procedure to the cita- tion context. Hence, we can obtain a topic representation he) of the citation context c. Based on the topic representation and the modeling results, we can calculate the prob- ability of each paper being the reference paper for the citation context according to Equation(6). Finally, the papers are ranked in terms of the probabilities and the top K ranked papers are returned as the recommended papers. It is hard to specify an accurate value of K for each citation context. A simple way is to set it as the average number of citations in a paper (i.e, ll in our data set). 3.3 Matching Recommended Papers with Citation Sentences The purpose of matching the recommended papers with citation sentences is to align the recommended papers with sentences in the citation context. This can be done by using each recommended paper as a keyword query to retrieve relevant citation sentences. In general, we may use any retrieval method. In this paper, we used KL-divergence to measure the relevance between the recommended paper and the citation sentence KL(d,sa)=∑p(hkd)lg where d is a recommended paper and sci is the ith sentence in the citation context c; the probabilities p(hkd)and p(hk sci), which can be obtained by (4) 4 Experiments 4.1 Experimental Setting Data Set We conducted experiments on two data sets, NPS and Citeseer The NiPS data set consists of 12 volumes of NIPS papers(1, 605 papers and 10, 472 citation relationships). Each paper contains full text and its citations. We removed some citations with incomplete information, e. g, consisting of only authors and publication venue, but no title. We also removed citations that do not appear in the data set. The Citeseer data set consists of 3, 335 papers(with 32, 558 citation relationships)downloaded from the Citeseer web site. As well, we removed citations that do not appear in the data set. t,. Each paper was preprocessed by(a) removing stopwords and numbers; (b)remov- ng words appearing less than three times in the corpus; and (c) downcasing the obtained words. Finally, we obtained V= 26, 723 unique words and a total of 350, 361 words in NIPS and V=44, 548 unique words and 634, 875 words in Citeseer http://citeseer psu. edu/oai. html
A Discriminative Approach to Topic-based Citation Recommendation 5 3.2 Ranking and recommendation The objective of citation recommendation is to rank the recommended papers for a given citation context. Specifically, we apply the same modeling procedure to the citation context. Hence, we can obtain a topic representation {hc} of the citation context c. Based on the topic representation and the modeling results, we can calculate the probability of each paper being the reference paper for the citation context according to Equation (6). Finally, the papers are ranked in terms of the probabilities and the top K ranked papers are returned as the recommended papers. It is hard to specify an accurate value of K for each citation context. A simple way is to set it as the average number of citations in a paper (i.e., 11 in our data set). 3.3 Matching Recommended Papers with Citation Sentences The purpose of matching the recommended papers with citation sentences is to align the recommended papers with sentences in the citation context. This can be done by using each recommended paper as a keyword query to retrieve relevant citation sentences. In general, we may use any retrieval method. In this paper, we used KL-divergence to measure the relevance between the recommended paper and the citation sentence: KL(d, sci) = XT k=1 p(hk|d)log p(hk|d) p(hk|sci) (7) where d is a recommended paper and sci is the ith sentence in the citation context c; the probabilities p(hk|d) and p(hk|sci), which can be obtained by (4). 4 Experiments 4.1 Experimental Setting Data Set We conducted experiments on two data sets, NIPS 1 and Citeseer 2 . The NIPS data set consists of 12 volumes of NIPS papers (1,605 papers and 10,472 citation relationships). Each paper contains full text and its citations. We removed some citations with incomplete information, e.g., consisting of only authors and publication venue, but no title. We also removed citations that do not appear in the data set. The Citeseer data set consists of 3, 335 papers (with 32,558 citation relationships) downloaded from the Citeseer web site. As well, we removed citations that do not appear in the data set. Each paper was preprocessed by (a) removing stopwords and numbers; (b) removing words appearing less than three times in the corpus; and (c) downcasing the obtained words. Finally, we obtained V = 26, 723 unique words and a total of 350, 361 words in NIPS and V = 44, 548 unique words and 634, 875 words in Citeseer. 1 http://www.cs.toronto.edu/˜roweis/data.html 2 http://citeseer.ist.psu.edu/oai.html
Table 1: Two topics discovered by RBM-CS from the NIPS data. Topic 12: Markov Words probability 0.057 global optimization of a neural network hidden Markov model hybrid 00169 eld 0.018 neural network classifiers estimate kermel 0.0S3 the nature of statistical learning training algorithm for optimal margin classifiers a tutorial on support vector machines for pattern recognition statistical learning theory machine 0.069 01511 Evaluation Measure and Baseline Methods We used P@l, P@3, P@5, P@10, Rpec, MAP, Bpref, and MRR as the evaluation measures. For the details of the measures, please refer to [1][2]. We conducted the evaluation on both paper-level (without con- sidering the citation position)and sentence-level(considering the citation position We defined two baseline methods. One is based on language model (LM). Given a according to this score and recommended the top K ranked网个 citation context c, we computed the score of each paper d by p(cd)=llnsp(uld) where p(ald) is the maximum likelihood of word w in document d The other baseline is based on RBM, which learns a generative model for papers and the citation context. Then we use KL-divergence to calculate a score for each paper (by a similar equation to Equation(7). For both RBM and RBM-CS, we set the number of topic as T'=200 and the number of recommended references as the average number of the data set, i.e. K= 7 for NIPS and K= ll for Citeseer. The weights were update using a learning rate of 0.01/batch-size, momentum of 0.9, and a weight decay of 0.001 Estimated Topics Table I shows two example topics discovered by RBM-CS from the NIPS data. We can see that our model can capture the topic distribution very well. Performance of Citation recommendation Table 2 shows the result of citation rec- ommendation. We see that our proposed model clearly outperforms the two baseline models. The language model suffers from the fact that it is based on only keyword matching. The RBM uses a hidden topic layer to alleviate the problem. However, it is aimed at optimize p(w), which might be inappropriate for citation recommendation. In addition, RBM cannot capture the dependencies between paper contents and citation relationships. Our proposed RBM-CS can be advantageous to optimize p(ll) directly and to model the dependencies between paper contents and citation relationships We can also see from Table 2 that the recommendation performance is much better on the Citeseer data than that on the NiPs data. This means that on the sparse data, the recommendation tasks would be more difficult. How to improve the recommendation performance on the sparse data is also one of our ongoing work
6 Jie Tang and Jing Zhang Table 1: Two topics discovered by RBM-CS from the NIPS data. “Topic 12: Markov Model” Words Citation hmm 0.091 state 0.063 markov 0.058 probability 0.057 field 0.018 links between Markov models and multilayer perceptrons 0.0347 a tutorial on hidden Markov models and selected applications in speech recognition 0.0221 connectionist speech recognition a hybrid approach 0.0169 global optimization of a neural network hidden Markov model hybrid 0.0169 neural network classifiers estimate Bayesian a posteriori probabilities 0.0169 “Topic 97: Support Vector Machines” Words Citation kernel 0.083 margin 0.079 support 0.075 svm 0.075 machine 0.069 the nature of statistical learning 0.036363 a training algorithm for optimal margin classifiers 0.026984 a tutorial on support vector machines for pattern recognition 0.026763 statistical learning theory 0.020220 support vector networks 0.015117 Evaluation Measure and Baseline Methods We used P@1, P@3, P@5, P@10, Rpec, MAP, Bpref, and MRR as the evaluation measures. For the details of the measures, please refer to [1] [2]. We conducted the evaluation on both paper-level (without considering the citation position) and sentence-level (considering the citation position). We defined two baseline methods. One is based on language model (LM). Given a citation context c, we computed the score of each paper d by p(c|d) = Q w∈c p(w|d), where p(w|d) is the maximum likelihood of word w in document d. We ranked papers according to this score and recommended the top K ranked papers. The other baseline is based on RBM, which learns a generative model for papers and the citation context. Then we use KL-divergence to calculate a score for each paper (by a similar equation to Equation (7)). For both RBM and RBM-CS, we set the number of topic as T = 200 and the number of recommended references as the average number of the data set, i.e. K = 7 for NIPS and K = 11 for Citeseer. The weights were updated using a learning rate of 0.01/batch-size, momentum of 0.9, and a weight decay of 0.001. 4.2 Experimental Results Estimated Topics Table 1 shows two example topics discovered by RBM-CS from the NIPS data. We can see that our model can capture the topic distribution very well. Performance of Citation recommendation Table 2 shows the result of citation recommendation. We see that our proposed model clearly outperforms the two baseline models. The language model suffers from the fact that it is based on only keyword matching. The RBM uses a hidden topic layer to alleviate the problem. However, it is aimed at optimize p(w), which might be inappropriate for citation recommendation. In addition, RBM cannot capture the dependencies between paper contents and citation relationships. Our proposed RBM-CS can be advantageous to optimize p(l|w) directly and to model the dependencies between paper contents and citation relationships. We can also see from Table 2 that the recommendation performance is much better on the Citeseer data than that on the NIPS data. This means that on the sparse data, the recommendation tasks would be more difficult. How to improve the recommendation performance on the sparse data is also one of our ongoing work
A Discriminative Approach to Topic-based Citation Recommendation ole 2: Performance of citation recommendation on the two data sets Data Method P@1 P@2 P@3 P@5 P@10 Rprec MAP BprefMRR LM001950016400132001250014800161004450010800132 NIPS RBM0O28900313002630024006400245006520017600162 MCS0.24020262802349017920117001676034990162601082 LM0049600492004540043900274002590.11030031100243 iteseerRBM6840188407800.1519007601510028040189039 RBM-CS03330379103501028000.176802375042370250101564 Table 3: Performance of sentence-level citation recommendation on the nips data set odel P@1 P@2 P@3 P@5 P@10 Rpec MAP Bpref MRR LM0078300642005820062900050300607011780048300502 RBM10810106101061010000727009140.2089076100851 BMCS02005021360201001788015610178202854015650.1657 Table 3 shows the performance of citation recommendation by RBM and RBM-CS in terms of sentence-level evaluation. (As the Citeseer data contains a lot OCR errors and it is difficult to accurately extract the citation position, we conducted sentence- level evaluation on the NIPs data only. We can again see that our proposed model significantly outperforms the method of using LM and that of using RBM. 5 Related work e review scientific literatures about citation analysis and related topic models. Citation analysis usually employs a graphical model to represent papers and their relationships, for example Science Citation Index [3]. This index links authors and their corresponding papers. Bibliographical Coupling(BC)[6]and co-citation analysis are proposed for citation analysis, for example to measure the quality of an academic paper [ 3] Recommending citations for scientific papers is a task which has not been studied exhaustively before Strohman et al. [9] investigated this task using a graphical frame- work. Each paper is represented by a node and the citation relationship is represented as the link between nodes. A new paper is a node without in and out links. Citation recommendation is then cast as link prediction. McNee et al. [7] employed collabora- tive filtering in citation network to recommend citations to papers. Both of them use the graphical framework. We look at citation recommendation from a different perspective. We take advantages of the dependencies between paper contents and citation relation- ships by using a hidden topic layer to joint model them. Restricted Boltzmann Machines(RBMs)[8] are generative models based on latent (usually binary) variables to model an input distribution, and have been applied in a large variety of problems in the past few years. Many extensions of the RBM model ple dual wing RBM[12], modeling various types of input distribution [5][11]. In this paper, we propose a two-layer Restricted Boltzmann Ma-
A Discriminative Approach to Topic-based Citation Recommendation 7 Table 2: Performance of citation recommendation on the two data sets. Data Method P@1 P@2 P@3 P@5 P@10 Rprec MAP Bpref MRR NIPS LM 0.0195 0.0164 0.0132 0.0125 0.0148 0.0161 0.0445 0.0108 0.0132 RBM 0.0289 0.0313 0.0263 0.0224 0.0164 0.0245 0.0652 0.0176 0.0162 RBM-CS 0.2402 0.2628 0.2349 0.1792 0.1170 0.1676 0.3499 0.1626 0.1082 Citeseer LM 0.0496 0.0492 0.0454 0.0439 0.0274 0.0259 0.1103 0.0311 0.0243 RBM 0.1684 0.1884 0.1780 0.1519 0.0776 0.1510 0.2804 0.1189 0.0639 RBM-CS 0.3337 0.3791 0.3501 0.2800 0.1768 0.2375 0.4237 0.2501 0.1564 Table 3: Performance of sentence-level citation recommendation on the NIPS data set. Model P@1 P@2 P@3 P@5 P@10 Rpec MAP Bpref MRR LM 0.0783 0.0642 0.0582 0.0629 0.00503 0.0607 0.1178 0.0483 0.0502 RBM 0.1081 0.1061 0.1061 0.1000 0.0727 0.0914 0.2089 0.0761 0.0851 RBM-CS 0.2005 0.2136 0.2010 0.1788 0.1561 0.1782 0.2854 0.1565 0.1657 Table 3 shows the performance of citation recommendation by RBM and RBM-CS in terms of sentence-level evaluation. (As the Citeseer data contains a lot OCR errors and it is difficult to accurately extract the citation position, we conducted sentencelevel evaluation on the NIPS data only.) We can again see that our proposed model significantly outperforms the method of using LM and that of using RBM. 5 Related Work We review scientific literatures about citation analysis and related topic models. Citation analysis usually employs a graphical model to represent papers and their relationships, for example Science Citation Index [3]. This index links authors and their corresponding papers. Bibliographical Coupling (BC) [6] and co-citation analysis are proposed for citation analysis, for example to measure the quality of an academic paper [3]. Recommending citations for scientific papers is a task which has not been studied exhaustively before. Strohman et al. [9] investigated this task using a graphical framework. Each paper is represented by a node and the citation relationship is represented as the link between nodes. A new paper is a node without in and out links. Citation recommendation is then cast as link prediction. McNee et al. [7] employed collaborative filtering in citation network to recommend citations to papers. Both of them use the graphical framework. We look at citation recommendation from a different perspective. We take advantages of the dependencies between paper contents and citation relationships by using a hidden topic layer to joint model them. Restricted Boltzmann Machines (RBMs) [8] are generative models based on latent (usually binary) variables to model an input distribution, and have been applied in a large variety of problems in the past few years. Many extensions of the RBM model have been proposed, for example dual wing RBM [12], modeling various types of input distribution [5] [11]. In this paper, we propose a two-layer Restricted Boltzmann Ma-
chine model, called RBM-Cs, which can jointly model topic distribution of papers and 6 Conclusion In this paper, we formally define the problems of topic-based citation recommendation and propose a discriminative approach to this problem. Specifically, we proposes a two- layer Restricted Boltzmann Machine model, called RBM-CS, to model paper contents RBM-CS can significantly improve the recommendation performance There are many potential future directions of this work. It would be interesting to include other information for citation recommendation such as conference and author information. We are going to integrate the citation recommendation as a new feature intoouracademicsearchsystemArnetminer[10](http:/arnetminer.org) References 1. C. Buckley and E M. Voorhees. Retrieval evaluation with incomplete information. In Pro- ceedings of the 27th Annual International ACM SIGIR Conference on Research and Devel opment in Information Retrieval(SIGIR'04. pages 25-32, 2004 2. N Craswell, A P de Vries, and I Soboroff. Overview of the trec-2005 enterprise track. In TREC 2005 Conference Notebook, pages 199-205, 2005 3. E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178(4060): 471-479 1972. 4. G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neurai Computation,l4:1771-1800,2002. 5. G E Hinton. A fast learning algorithm for deep belief nets. Neural Computation, 18: 152 1554,2006 6. M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10-25.1963 7. S M. McNee, I. Albert, D Cosley, P Gopalkrishnan, S.K. Lam, A M. Rashid, J. A. Konstan, and J Riedl. On the recommending of citations for research papers. In CSCw02, page 116-125.2002. 8. P. Smolensky. Information processing in dynamical systems: foundations of harmony theory pages 194281,1986. In Proceedings of the 30th Annual Intemational ACM SIGIR Conference on Rese- papers 9. T Strohman, W. B. Croft, and D. Jensen. Recommending citations for academic papers Development in Information Retrieval (SIGIRO7, pages 705-706, 2007 10. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990-998. 2008. 11. M. Welling. M. Rosen-zvi, and G. E Hinton. Exponential family harmoniums with an ap- plication to infomration retrieval. In Proceedings of the 17th Neural information Processing Systems(NIPS05), 2005 12. E. P. Xing, R. Yan, and A. G. Hauptmann. Mining associated text and images with dual-wing harmoniums. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence UA'O5, pages633-641,2005
8 Jie Tang and Jing Zhang chine model, called RBM-CS, which can jointly model topic distribution of papers and citation relationships. 6 Conclusion In this paper, we formally define the problems of topic-based citation recommendation and propose a discriminative approach to this problem. Specifically, we proposes a twolayer Restricted Boltzmann Machine model, called RBM-CS, to model paper contents and citation relationships simultaneously. Experimental results show that the proposed RBM-CS can significantly improve the recommendation performance. There are many potential future directions of this work. It would be interesting to include other information for citation recommendation, such as conference and author information. We are going to integrate the citation recommendation as a new feature into our academic search system ArnetMiner [10] (http://arnetminer.org). References 1. C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04), pages 25–32, 2004. 2. N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec-2005 enterprise track. In TREC 2005 Conference Notebook, pages 199–205, 2005. 3. E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178(4060):471–479, 1972. 4. G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002. 5. G. E. Hinton. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527– 1554, 2006. 6. M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10–25, 1963. 7. S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl. On the recommending of citations for research papers. In CSCW’02, pages 116–125, 2002. 8. P. Smolensky. Information processing in dynamical systems: foundations of harmony theory. pages 194–281, 1986. 9. T. Strohman, W. B. Croft, and D. Jensen. Recommending citations for academic papers. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07), pages 705–706, 2007. 10. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD’08, pages 990–998, 2008. 11. M. Welling, M. Rosen-Zvi, and G. E. Hinton. Exponential family harmoniums with an application to infomration retrieval. In Proceedings of the 17th Neural Information Processing Systems (NIPS’05), 2005. 12. E. P. Xing, R. Yan, and A. G. Hauptmann. Mining associated text and images with dual-wing harmoniums. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI’05), pages 633–641, 2005