2009 International Conference on Research Challenges in Computer Science A Personalized Paper Recommendation Approach Based on Web Paper Mining and reviewer's Interest modeling Yueheng Sun, Weijie Ni, Rui Men School of Computer Science and Technology Tianjin University Tianjin, China e-mail:yhs(@tju.edu.cn,axiali1314@yahoo.com AbstractIn this article a personalized paper recommendation approach based on the reviewer's interest applications [1, 2 ]. The growth of the number of users and model is presented in order to increase the number of tems will lead to the exponential computational reviews for online papers. To achieve this purpose, we first complexity of such systems. In case that there is little model the reviewers interest based on some useful data information on the users interest, the system may be extracted from the papers in a journal database, such as unable to make any item recommendations for a particular titles, abstracts, keywords and the Chinese Library user, which is the so-called"cold start "problem 3].The Classification Codes(CLCCs). According to the reviewer's key of the content-based filtering system is how to interest model, we then propose a recommendation construct the interesting model based on the history approach, which can send a paper published online to the information collected automatically by the system. In [4 reviewers that are experts in the scoop of the paper. the user's interest models are built through the ontology Experimental results show that our recommendation created by domain experts and readjusted by the user's approach is effective and achieves 80-90% accuracy in terms feedback. The difficulty of this method is to construct the of recommending different kinds of papers to the right domain ontology because it needs training texts for concept cl As known, the task of personalized recommendation is interest model; online paper; Chinese Library Classification similar to that of text classification, so some classification Codes algorithms in machine learning can be used. For example a Bayesian hierarchical model has been developed for the INTRODUCTION content-based recommendation in [5]. Such approaches usually need to train the classifiers. In our system, this As a new media of knowledge propagation, online means that each reviewer needs a classifier, which paper publishing platform provides a convenient way for becomes impossible to apply these approaches the publishing and reviewing of academic papers In this article, we collect four kinds of data, including However, like the traditional aggregation-based publishing the paper titles, keywords, abstracts and the Chinese platforms, the existing ones inherit the same deficiencies. Library Classification Codes, to build the reviewer's nese platforms wait for paper review interest model. The similar method is used to build passively. Moreover, they are lack of effective model for a paper newly released online. Finally, we use personalized service to enhance the dynamic scheduling of the similarity obtained from cosine formula as a measure paper resources. As a result, each paper receives only a to recommend experts capable of reviewing the online few review comments, which are obviously not enough to paper. The rest of this paper is organized as follows: the improve the quality of the paper. In order to increase the modeling approach for reviewer's interest is presented in number of reviews for each online paper, we present in this section 2, and the recommendation approach based on article a personalized recommendation system based on interests models is provided in section 3. Our experiments reviewer's interest model, which will effectively explore for evaluation are presented in Section 4. Section 5 the features of the newly released papers and select the concludes the paper with a discussion of our future work most suitable group of reviewers for them Personalized recommendation technologies mainly fall IL. REVIEWER S INTEREST MODELING into three categories, including the rule-based filtering, the content-based filtering and the collaborative filtering. The A. Interest Modeling rule-based filtering requires the users to provide th Using web data mining techniques, download interest information, build and maintain their interest the snapshots of papers from a paper and then models by themselves. Although this approach can reflect extract the title, abstract, keyword authors the user's interest accurately, the availability and information from each paper. During the process some scalability of such systems may be poor because the users authors not having the qualification of reviewer, i.e take the responsibility for modeling. The collaborative graduate students or doctoral students, will be filtered our systems, usually based on nearest neighbor hen we segment the texts of titles, abstracts and ns, have been very successful in the past, keywords respectively for the papers of an author, and r, some real challenges, such as data sparse and filter out the function words to construct feature vectors salability, have been revealed in their widespread upon them, as follows 978-0-7695-3927-009526.002009IEEE DOI10.l109/ CRCCS.2009.76
A Personalized Paper Recommendation Approach Based on Web Paper Mining and Reviewer’s Interest Modeling Yueheng Sun, Weijie Ni, Rui Men School of Computer Science and Technology Tianjin University Tianjin, China e-mail: yhs@tju.edu.cn, axiali1314@yahoo.com Abstract—In this article a personalized paper recommendation approach based on the reviewer’s interest model is presented in order to increase the number of reviews for online papers. To achieve this purpose, we first model the reviewer's interest based on some useful data extracted from the papers in a journal database, such as titles, abstracts, keywords and the Chinese Library Classification Codes (CLCCs). According to the reviewer's interest model, we then propose a recommendation approach, which can send a paper published online to the reviewers that are experts in the scoop of the paper. Experimental results show that our recommendation approach is effective and achieves 80-90% accuracy in terms of recommending different kinds of papers to the right reviewers. Keywords-personalized paper recommendation; reviewer's interest model; online paper; Chinese Library Classification Codes I. INTRODUCTION As a new media of knowledge propagation, online paper publishing platform provides a convenient way for the publishing and reviewing of academic papers. However, like the traditional aggregation-based publishing platforms, the existing ones inherit the same deficiencies. For examples, these platforms wait for paper review passively. Moreover, they are lack of effective personalized service to enhance the dynamic scheduling of paper resources. As a result, each paper receives only a few review comments, which are obviously not enough to improve the quality of the paper. In order to increase the number of reviews for each online paper, we present in this article a personalized recommendation system based on reviewer’s interest model, which will effectively explore the features of the newly released papers and select the most suitable group of reviewers for them. Personalized recommendation technologies mainly fall into three categories, including the rule-based filtering, the content-based filtering and the collaborative filtering. The rule-based filtering requires the users to provide their interest information, build and maintain their interest models by themselves. Although this approach can reflect the user’s interest accurately, the availability and scalability of such systems may be poor because the users take the responsibility for modeling. The collaborative filtering systems, usually based on nearest neighbor algorithms, have been very successful in the past. However, some real challenges, such as data sparse and scalability, have been revealed in their widespread applications [1, 2]. The growth of the number of users and items will lead to the exponential computational complexity of such systems. In case that there is little information on the user’s interest, the system may be unable to make any item recommendations for a particular user, which is the so-called “cold start” problem [3]. The key of the content-based filtering system is how to construct the interesting model based on the history information collected automatically by the system. In [4], the user’s interest models are built through the ontology created by domain experts and readjusted by the user’s feedback. The difficulty of this method is to construct the domain ontology because it needs a great amount of training texts for concept clustering. As known, the task of personalized recommendation is similar to that of text classification, so some classification algorithms in machine learning can be used. For example, a Bayesian hierarchical model has been developed for the content-based recommendation in [5]. Such approaches usually need to train the classifiers. In our system, this means that each reviewer needs a classifier, which becomes impossible to apply these approaches. In this article, we collect four kinds of data, including the paper titles, keywords, abstracts and the Chinese Library Classification Codes, to build the reviewer’s interest model. The similar method is used to build a model for a paper newly released online. Finally, we use the similarity obtained from cosine formula as a measure to recommend experts capable of reviewing the online paper. The rest of this paper is organized as follows: the modeling approach for reviewer’s interest is presented in section 2, and the recommendation approach based on interest’s models is provided in section 3. Our experiments for evaluation are presented in Section 4. Section 5 concludes the paper with a discussion of our future work. II. REVIEWER’S INTEREST MODELING A. Interest Modeling Using web data mining techniques, we first download the snapshots of papers from a paper library, and then extract the title, abstract, keywords and authors information from each paper. During the process some authors not having the qualification of reviewer, i.e., graduate students or doctoral students, will be filtered out. Then we segment the texts of titles, abstracts and keywords respectively for the papers of an author, and filter out the function words to construct feature vectors upon them, as follows: 2009 International Conference on Research Challenges in Computer Science 978-0-7695-3927-0/09 $26.00 © 2009 IEEE DOI 10.1109/ICRCCS.2009.76 49
h= (1) IlL. RECOMMENDATION APPROACH BASED ON INTEREST MODELS A. The Similarity Calculation between Model absa=<W12,22…,w Each of reviewers can be considered as a class. so the essence of recommendation is to classify an online paper into a group of reviewers suitable for evaluating this paper. It is similar to the problem of"soft classification,i.e,an yon=W3,123,…,Yn23 (3) object to be classified may belong to multiple classes.One of important issues is to calculate the similarities between Where w, is the weight of the "feature word in the j h a newly released paper and its candidate reviewers. In this vectors, which can be calculated as Equation (4) paper we use Equation (6) to calculate the similarities between then +0.5 log(-+0.5) Sim(M1, M2)=k,* Sim(V mle, maule)+k2* Sim( absract, Vasara)(6) ∑ Where y is the number of the i feature word Where M, and M2 are the models of a paper and a appearing in the titles, abstracts or keywords of an author's reviewer respectively, and v nate, V abtract and Vaowond are the papers, is the number of papers published by the author, representative vectors of the paper's title, abstract and We note that the features in the title and keywords keywords respectively. kI, k] and ks are a group of weight should be the most accurate summary for a paper, so if factors which indicate the importance of each model feature word in the abstract vector also appears in the two component. We empirically set k=0. 4, k=0. 2 and k=0.4 parts, then its weight will be adjusted as Equation(5) The typical cosine formula is used to calculate the similarity between two corresponding model components for example, Sim(v aule, uke ) can be obtained from Equation w2=×log(+0.5)×(exp()+exp(") ∑ 1;v here w, and w, are the average weight values of tor and the keyword vector B. Model Update The reviewers. i. e. the authors The computation of the other two items in the right papers, may publish new papers future so it is hand of Equation(6 )is similar to Equation(7) necessary to update their interest models, as follows Another factor, Chinese Library (I)add these new papers into the original paper set of Classification Code( CLCC) is also introduced into our recommendation approach. A CLCC iS usually composed (2)Readjust the weight of each feature word according TP391 1 or R122.4. Assume that one author publishes an (i) a new feature word appears in the original vector article on operating system with the CLCC TP315 and then its corresponding weight will be multiplied the CLCC TP316, respectively. Although the two codes by a compensation factor cf between I and 2. The are not the same, but both of them belong to the Computer reason is that the new added feature words Software (TP31)class, therefore, the model similarity reflect the recent research direction in which the between them should be higher than that between each of author is interested them and the other class such as Robotic Technology (ii)an original feature word doesn the (TP24 ).So, we define Equation( 8)as a comparison wit new paper, then its corresponding multiplied by a penalty factor pf because it may mean that the author is not Sim(Mn, M2)=k,x Sim( ate, Vae)+k, x Sim(tumut, v a interest in the research associated with that word +k,x Sim(barb, heyward)+k,Sim(CLCC m, CLCCsmoner) () Rank all the feature words by the non-ascending Where k, k, k, and k, are empirically set to 0.3, 0.2, order,and filter out those words with lower weights 0.3 and 0.2, respectiv eviewe that all the different and beyond the vector dimensions (4)Normalize the vectors of title, abstract and keywords CLCCs for the papers of a reviewer are represented as a vector as follow CLCS =<C
1 11 21 1 title , ,..., V ww w = n JG (1) 2 12 22 2 abstract , ,..., V ww w = n JG (2) 3 13 23 3 keywords , ,..., V ww w = n JG (3) Where wij is the weight of the i th feature word in the j th vectors, which can be calculated as Equation (4): 1 1 0.5 log( 0.5) ij ij n i ij i tf N w N tf = + = ×+ ∑ (4) Where tfij is the number of the i th feature word appearing in the titles, abstracts or keywords of an author’s papers, N is the number of papers published by the author, and Ni is the number of papers containing the feature word. We note that the features in the title and keywords should be the most accurate summary for a paper, so if a feature word in the abstract vector also appears in the two parts, then its weight will be adjusted as Equation (5): 1 1 3 2 1 3 1 0.5 1 log( 0.5) (exp( ) exp( )) 2 ij i i i n i ij i tf N w w w N ww tf = + = × +× + ∑ (5) Where w1 and w3 are the average weight values of the title vector and the keyword vector. B. Model Update The reviewers, i.e., the authors mined from the web papers, may publish new papers in the future, so it is necessary to update their interest models, as follows: (1) add these new papers into the original paper set of this author; (2) Readjust the weight of each feature word according to Equation (4) or (5), and if (i) a new feature word appears in the original vector, then its corresponding weight will be multiplied by a compensation factor cf between 1 and 2. The reason is that the new added feature words reflect the recent research direction in which the author is interested; (ii) an original feature word doesn’t appear in the new paper, then its corresponding weight will be multiplied by a penalty factor pf between 0 and 1 because it may mean that the author is not interest in the research associated with that word recently. (3) Rank all the feature words by the non-ascending order, and filter out those words with lower weights and beyond the vector dimensions. (4) Normalize the vectors of title, abstract and keywords. III. RECOMMENDATION APPROACH BASED ON INTEREST MODELS A. The Similarity Calculation between Models Each of reviewers can be considered as a class, so the essence of recommendation is to classify an online paper into a group of reviewers suitable for evaluating this paper. It is similar to the problem of “soft classification”, i.e., an object to be classified may belong to multiple classes. One of important issues is to calculate the similarities between a newly released paper and its candidate reviewers. In this paper we use Equation (6) to calculate the similarities between them: ' ' 12 1 2 ' 3 (, ) ( , ) ( , ) ( , ) title title abstract abstract keywords keywords Sim M M k Sim V V k Sim V V k Sim V V =∗ +∗ + ∗ JG JG JG JG JG JG (6) Where M1 and M2 are the models of a paper and a reviewer respectively, and ' V title JG , ' V abstract JG and ' V keywords JG are the representative vectors of the paper’s title, abstract and keywords respectively. k1, k2 and k3 are a group of weight factors which indicate the importance of each model component. We empirically set k1=0.4, k2=0.2 and k3=0.4. The typical cosine formula is used to calculate the similarity between two corresponding model components, for example, ' Sim V V (,) title title JG JG can be obtained from Equation (7): 1 1 1 ' 1 1 ' 1 2 '2 1 1 1 1 ( , ) cos ( )( ) n k k k title title n n k k k k w w Sim V V w w θ = = = ∗ = = ∑ ∑ ∑ JG JG (7) The computation of the other two items in the right hand of Equation (6) is similar to Equation (7). Another important factor, Chinese Library Classification Code (CLCC) is also introduced into our recommendation approach. A CLCC is usually composed of one or two letters and several digits, for example, TP391.1 or R122.4. Assume that one author publishes an article on operating system with the CLCC TP315 and another author publishes one on management system with the CLCC TP316, respectively. Although the two codes are not the same, but both of them belong to the Computer Software (TP31) class, therefore, the model similarity between them should be higher than that between each of them and the other class such as Robotic Technology (TP24). So, we define Equation (8) as a comparison with Equation (6): ' ' 12 1 2 ' 3 4 (, ) ( , ) ( , ) ( , ) ( , ) title title abstract abstract keywords keywords paper reviewer Sim M M k Sim V V k Sim V V k Sim V V k Sim CLCC CLCCs =× +× +× +× JG JG JG JG JG JG (8) Where k1, k2, k3 and k4 are empirically set to 0.3, 0.2, 0.3 and 0.2, respectively. Assume that all the different CLCCs for the papers of a reviewer are represented as a vector as follows: 1 12 2 n CLCCs : , : ,..., : V cwcw c w = c c cn JG 50
Where Wa(i=1, 2, ,n)is the weight of the /"CLcC, for us to filter the reviewers by the ClCCs as described which is the ratio of the number of the h clcc to the above in the ster in the 3.2 section number of papers. Assume that when comparing from left Table I gives the number of reviewers most suitable to right, the length of common string between CICCpaper obtained by manual check to the authors extracted from for evaluating each of the other four papers, which is and Ci is pl, then the Sim(ClCCpaper, CLCCSrevienser)can defined as Equation (9) journal papers. These reviewers for each paper can be considered as a "standard answer set. The evaluation results are presented in Table 2 and 3 in terms of the Sim(CLCC, CLCCs 一Xw conventional precision and recall. They are obtained based column shows the different preset thresholds in the two B. Paper Recommendation tables recommendation process of our system TABLE I THE NUM EVIEWERS SUITABLE FOR involves the following steps: (1)connect the online paper EVALUATING EACH PAPER publish platform, and if there are newly released papers, then extract the snapshots of these papers and build the I er 3 Pans Pazer. models for them; (2) calculate the similarities between a paper model and a reviewer's interest model. In order to TABLE IL. THE EVALUATION RESULTS FOR THE OTHER FOUR ncrease the operational speed, we only use the interest PAPERS BASED ON EQUATION (8) models with the same or similar CLCCs as the paper Paper 3 Paper4 Paper 5 Paper 6 model. For instance. the interest model with a Clcc p"|p%「%p%m%|p%|r TP315 may compare with a paper model with a CLCC t=0.23923914235689.784368944640.069206 TP316 instead of TP242. Actually, we must ensure that two kinds of models should have the same letter( first digit in the CLCCs; (3)if the similarity between two 1=0.883.7863.0582.0664.8589.6281.52862980.651 models is greater than a preset threshold, then the author will be added into a reviewer list for the paper;(4)After TABLE IIL. THE EVALUATION RESULTS FOR THE OTHER FOUR getting all the reviewers capable of evaluating the paper, PAPERS BASED ON EQUATION (6) the system will automatically send emails to these Paper 6 reviewers and invite them to review the paper P(%) r(%) p(%) r(%) P(%) r() P(%)r(%) IV EXPERIMENTAL RESULTS AND ANALYSIS t=0.67083 A. Data set +=08姒662为“妹90始 1 14161 papers of 13 journals on computer domain are extracted from VIP Chinese Journal Database. and 13236 Paper 3 and 4 involve multi-disciplinary. For example reviewer's interest models are built based on these papers. in the paper named"Visualization of Medical Images To effectively reflect the true quality of the Based on IDL", both medical and computer science are recommendation system, we select two papers from each related. The model similarities of such a paper to the medical image processing and computer software. The However, the paper may be Sciplinary may not be high choice of these papers is highly selective. For example, the in one of the disciplinary since its similarity is higher than two papers selected from agriculture engineering have no the threshold. Though it seems oK to recommend the connection with the disciplinary of our review experts paper to the reviewers in either disciplinary, the Though the papers selected from medical image processing recommendation to the reviewers with background in both may have some relations, but apparently the research disciplinary is the most appropriate choice. On the interests of our reviewers are not in the scoop of the contrary, paper 5 and 6 are specialized in a single field medical. To review one of the papers selected from which leads to the higher precision an computer software needs strong special knowledge. For threshold compared with paper yory d recall at the same example, only experts that are interests in graphics theory Table 2 and 3 also show the influence of CLCCs to the are appropriate to review the papers on face detection results when they are taken into consideration. For the technologies. The other papers on XML data query same paper and at the same threshold, the precisions based algorithms can be recommended to the experts whose on Equation(8)are generally improved by 3-5% compared research interests on database, XML and data indexing with Equation( 6), while the recalls are only reduced by 1 algorithm 2%, which means that although the ClCCs may filter out a reviewers are also ruled out Our system doesn 't pick out any reviewers for the two apers selected from the field of agriculture engineering V. CONCLUSION We can find that the similarity values between each of the The reviews for a paper are helpful to improve the paper models and the reviewers interest models are far quality of that paper. in this article we present below the minimal threshold, which proves that it is useful recommendation approach based on reviewer's interest model in order to increase the number of reviews for
Where wci (i=1,2,…,n) is the weight of the i th CLCC, which is the ratio of the number of the i th CLCC to the number of papers. Assume that when comparing from left to right, the length of common string between CLCCpaper and Ci is lpi, then the Sim(CLCCpaper,CLCCsreviewer) can be defined as Equation (9): 1 (, ) max( ( ), ( )) n pi paper reviewer ci i paper i l Sim CLCC CLCCs w = length CLCC length c = × ∑ (9) B. Paper Recommendation The paper recommendation process of our system involves the following steps: (1) connect the online paper publish platform, and if there are newly released papers, then extract the snapshots of these papers and build the models for them; (2) calculate the similarities between a paper model and a reviewer’s interest model. In order to increase the operational speed, we only use the interest models with the same or similar CLCCs as the paper model. For instance, the interest model with a CLCC TP315 may compare with a paper model with a CLCC TP316 instead of TP242. Actually, we must ensure that the two kinds of models should have the same letter(s) and first digit in the CLCCs; (3) if the similarity between two models is greater than a preset threshold, then the author will be added into a reviewer list for the paper; (4) After getting all the reviewers capable of evaluating the paper, the system will automatically send emails to these reviewers and invite them to review the paper. IV. EXPERIMENTAL RESULTS AND ANALYSIS A. Data Set 114161 papers of 13 journals on computer domain are extracted from VIP Chinese Journal Database, and 13236 reviewer’s interest models are built based on these papers. To effectively reflect the true quality of the recommendation system, we select two papers from each of the three fields, including agriculture engineering, medical image processing and computer software. The choice of these papers is highly selective. For example, the two papers selected from agriculture engineering have no connection with the disciplinary of our review experts. Though the papers selected from medical image processing may have some relations, but apparently the research interests of our reviewers are not in the scoop of the medical. To review one of the papers selected from computer software needs strong special knowledge. For example, only experts that are interests in graphics theory are appropriate to review the papers on face detection technologies. The other papers on XML data query algorithms can be recommended to the experts whose research interests on database, XML and data indexing algorithm. B. Result Analysis Our system doesn’t pick out any reviewers for the two papers selected from the field of agriculture engineering. We can find that the similarity values between each of the paper models and the reviewer’s interest models are far below the minimal threshold, which proves that it is useful for us to filter the reviewers by the CLCCs as described above in the step (2) in the 3.2 section. Table 1 gives the number of reviewers most suitable for evaluating each of the other four papers, which is obtained by manual check to the authors extracted from journal papers. These reviewers for each paper can be considered as a ‘‘standard answer set’’. The evaluation results are presented in Table 2 and 3 in terms of the conventional precision and recall. They are obtained based on Equation (8) and Equation (6), respectively. The ‘‘t’’ column shows the different preset thresholds in the two tables. TABLE I. THE NUMBER OF REVIEWERS SUITABLE FOR EVALUATING EACH PAPER Paper 3 Paper 4 Paper 5 Paper 6 true 86 75 326 278 TABLE II. THE EVALUATION RESULTS FOR THE OTHER FOUR PAPERS BASED ON EQUATION (8) Paper 3 Paper 4 Paper 5 Paper 6 p(%) r(%) p(%) r(%) p(%) r(%) p(%) r(%) t=0.2 39.23 91.42 35.69 89.78 43.68 94.46 40.06 92.06 t=0.4 57.68 85.24 52.43 83.06 59.42 90.12 56.23 87.43 t=0.6 75.65 77.32 64.56 70.03 75.83 85.53 72.45 83.79 t=0.8 83.78 63.05 82.06 64.85 89.62 81.52 86.29 80.65 TABLE III. THE EVALUATION RESULTS FOR THE OTHER FOUR PAPERS BASED ON EQUATION (6) Paper 3 Paper 4 Paper 5 Paper 6 p(%) r(%) p(%) r(%) p(%) r(%) p(%) r(%) t=0.2 37.04 93.02 32.64 90.57 40.78 93.05 36.54 93.08 t=0.4 53.52 86.37 48.75 84.25 55.34 88.02 52.03 88.67 t=0.6 70.83 79.07 60.42 71.38 70.68 83.26 68.75 84.95 t=0.8 80.64 63.25 79.56 66.45 84.98 80.03 82.36 82.79 Paper 3 and 4 involve multi-disciplinary. For example, in the paper named “Visualization of Medical Images Based on IDL”, both medical and computer science are related. The model similarities of such a paper to the reviewer models in both disciplinary may not be high. However, the paper may be recommended to the reviewers in one of the disciplinary since its similarity is higher than the threshold. Though it seems OK to recommend the paper to the reviewers in either disciplinary, the recommendation to the reviewers with background in both disciplinary is the most appropriate choice. On the contrary, paper 5 and 6 are specialized in a single field, which leads to the higher precision and recall at the same threshold compared with paper 3 or 4. Table 2 and 3 also show the influence of CLCCs to the results when they are taken into consideration. For the same paper and at the same threshold, the precisions based on Equation (8) are generally improved by 3-5% compared with Equation (6), while the recalls are only reduced by 1- 2%, which means that although the CLCCs may filter out a small amount of qualified reviewers, but more unqualified reviewers are also ruled out. V. CONCLUSION The reviews for a paper are helpful to improve the quality of that paper. In this article we present a recommendation approach based on reviewer’s interest model in order to increase the number of reviews for 51
papers published online. From the web papers of a journal library, our approach can automatically extract the useful data for modeling a reviewer Based on the reviewer's nterest models. we can recommend more reviewers for a paper compared with the existing publishing platform. Our approach is simple and effective, for it is based on the full utilization of paper data as titles, abstracts, keywords and CLCCs, instead of the complicated similarity calculation ethods or training methods that are usually involved i the previous work Our future work will focus on the following directions (1)introduce more useful paper data, for instance, the luthor's research direction, into the modeling of reviewers 2)build domain dictionaries to help determine whether a paper to be recommend belongs to multiple disciplinary, (3)Based on the mining to interest models, provide the trends in academic research, which may be an interesting ACKNOWlEdgmENT This work is sponsored by Research Project of Humanities and Social Science Foundation of the state Education Ministry(08JC870008)and National Natural Science Foundation of China(60603027) REFERENCES [1] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl "Item-Based collaborative filtering recommendation algorithm Intermational World wide Web Conference. ACM New York, NY USA, Hong Kong. 2001, Pp 285-295 [2 Michael J. Pazzani, "A Framework for Collaborative, Content 4810D0145213,P39 [3] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T ""Evaluating Collaborative Filtering Recommend System", ACM Transactions on Information Systems (TOis Madrid, Spain, January 2004, Vol. 22, No. 1, pp. 5-53, doi 10.1007978-3-540-720799 Recommendation Techno ngineering, July 2007, Vol7, Pp. 3394-339 [5] Yi Zhang, and Jonathan Koren,"Efficient Bayesian Hierarchical ACM Transactions on Information Systems, ACM New York, NY, USA Amsterdam, the Netherlands, 2007, Pp. 47-54
papers published online. From the web papers of a journal library, our approach can automatically extract the useful data for modeling a reviewer. Based on the reviewer’s interest models, we can recommend more reviewers for a paper compared with the existing publishing platform. Our approach is simple and effective, for it is based on the fully utilization of paper data as titles, abstracts, keywords and CLCCs, instead of the complicated similarity calculation methods or training methods that are usually involved in the previous work. Our future work will focus on the following directions: (1) introduce more useful paper data, for instance, the author’s research direction, into the modeling of reviewers; (2) build domain dictionaries to help determine whether a paper to be recommend belongs to multiple disciplinary; (3) Based on the mining to interest models, provide the trends in academic research, which may be an interesting work. ACKNOWLEDGMENT This work is sponsored by Research Project of Humanities and Social Science Foundation of the State Education Ministry (08JC870008) and National Natural Science Foundation of China (60603027). REFERENCES [1] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, ‘‘Item-Based collaborative filtering recommendation algorithm’’, International World Wide Web Conference, ACM New York, NY, USA, Hong Kong, 2001, pp. 285-295. [2] Michael J. Pazzani, ‘‘A Framework for Collaborative, ContentBased and Demographic Filtering’’, Vol 13, Issue 5-6, pp. 393- 408, doi:10.1023/A:1006544522159. [3] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl, ‘‘Evaluating Collaborative Filtering Recommender System’’, ACM Transactions on Information Systems (TOIS), Madrid, Spain, January 2004, Vol. 22, No. 1, pp. 5-53, doi: 10.1007/978-3-540-72079-9. [4] M.J. Huang, and G.Z Yang, ‘‘Ontology-based Personalized Recommendation in E-Learning’’, Science Technology and Engineering, July 2007, Vol .7, pp. 3394-3398. [5] Yi Zhang, and Jonathan Koren, ‘‘Efficient Bayesian Hierarchical User Modeling for Recommendation Systems’’, ACM Transactions on Information Systems, ACM New York, NY, USA, Amsterdam, the Netherlands, 2007, pp. 47-54. 52