正在加载图片...
Content Abstract Contact of Figure 1: Data used in this paper for building paper-reviewer preference models these ratings are really bids/ signs of interest to review pa- it is useless for making actual assignments. After all, it gives pers, not the actual ratings reviewers assign to papers after all reviewers exactly the same order of paper preference reading and evaluating them ) A rating Tui indicates the Thus, we are really after the remaining unexplained vari- preference by reviewer u of paper i, where high values mean ability, where reviewer-specific preferences are getting ex- ronger preferences. Usually the vast majority of ratings pressed. Uncovering these preferences is the subject of the are unk next subsections reviewers, and only 6267 bids. In ICDM'O7, the given bids re between I and 4, indicating preferences as follows: 4= 2.2 A factor model High",3=“OK”,2=“Low"andl=“ No and we aim to make Latent factor models(e.g,3)comprise a common ap- predictions in the same space proach to collaborative filtering with the goal to uncover We distinguish predicted ratings from known ones, by us- latent features that explain observed ratings. The premise ing the notation fui for the predicted value of Tui. To eval- of such models is that both ers and papers can be uate the models we assess rmse over 100 random 90-10 characterized as vectors in a common f-D space. The in- training-test splits. We hasten to add that we do not ad teraction between reviewers and papers is modeled by inner vocate the myopic view of RMSE [4 as the primary crite- products in that space, the fourth term of Eg. 1 rion for recommender systems evaluation. We use it in this pu E R and gi ER are the factor vectors of reviewer u ection primarily due to its and paper i, respectively. The resulting average test RMSE rect optimizers. In the next section we will evaluate perfo is slowly decreasing when increasing the dimensionality of mance according to criteria more natural to CPAP. We also the latent factor space. E. g, for f= 50 it is 0.6240, and note that small improvements in overall RMsE will typi- for f= 100 it is 0. 6234. Henceforth, we use f= 100 cally translate into substantial improvements in bottom-line performance for predicting reviewer-paper preferences 2.3 Subject categories The model we learn is of the form While latent factor models automatically infer suitable u+bu+bi+pugi+>ice categories, much can be learned by known categories at- tributed to both papers and reviewers. ICDMO7 submis- sions specify a number of predefined categories as primary EIER(u)situs+ 2vER( SuUret and secondary topics for a given paper. We model the en- ∈R(u)Su tered matching between paper i and category c by and we proceed to explain each of the terms below 1c∈ prinary(i) 2.1 Baseline model c∈ secondary(i) otherwise Much of the variability in the data is explained by global effects, which can be reviewer- or paper-specific. It is im The value ent(1 for "primary", 0.5 for"secondary portant to capture this variability by a separate component, is derived validation and is quite intuitive. Sin hus letting the more involved models deal only with genuine larly, we us lowing for matching reviewers with their reviewer-paper interactions. We model these global effects desired categories through the first three terms of Eq. 1, i.e., u+bu+bi. The e∈ interest onstant u indicates a global bias in the data, which is taken o be the overall mean rating. The parameter bu captures otherwise reviewer-specific bias, accounting for the fact that different reviewers use different rating scales. Finally, the paper bias, Notice that in ICDMO7, reviewers could specify lack of in- bi, accounts for the fact that certain papers tend to attract terest(or inability to) review papers from certain categories higher (or, lower) bids than others. We learn optimal values (this is different from conflicts of interest, discussed later) n) and bi(i= 1 In the fifth term of Eg. 1, the weights we indicate the sig the associated squared error function with just these three nificance of each category in linking a reviewer to a paper terms(along with some regularization to avoid overfitting). and are learnt automatically by minimizing the squared er- The resulting average test RMSE is 0.6286 ror on the training set. It is plausible that, e. g, a mutual viewer effect (u+bu, with RMSE 0. 6336)to be much more a paper, while a mutual interest in another category B is less significant than paper bias (u+bi, RMSE 1.2943)in re- influential on papers choice. Table 1 depicts results of this ducing the error. This indicates a tendency of reviewers nalysis, showing differences in orders of magnitude in the o concentrate all ratings near their mean ratings, which is ability of different categories to correctly predict associations supported by examination of the data. of reviewers to papers. Note in particular that there is no While the baseline model could explain much of the data obvious monotonic relationship between the weight imputed variability, as evident by its relatively low associated RMSE, to categories and the number of papers/reviewers associatedFigure 1: Data used in this paper for building paper-reviewer preference models. these ratings are really bids/signs of interest to review pa￾pers, not the actual ratings reviewers assign to papers after reading and evaluating them.) A rating rui indicates the preference by reviewer u of paper i, where high values mean stronger preferences. Usually the vast majority of ratings are unknown, e.g., the ICDM data involves 529 papers, 203 reviewers, and only 6267 bids. In ICDM’07, the given bids are between 1 and 4, indicating preferences as follows: 4= “High”, 3=“OK”, 2=“Low” and 1=“No” and we aim to make predictions in the same space. We distinguish predicted ratings from known ones, by us￾ing the notation ˆrui for the predicted value of rui. To eval￾uate the models we assess RMSE over 100 random 90-10 training-test splits. We hasten to add that we do not ad￾vocate the myopic view of RMSE [4] as the primary crite￾rion for recommender systems evaluation. We use it in this section primarily due to its convenience for constructing di￾rect optimizers. In the next section we will evaluate perfor￾mance according to criteria more natural to CPAP. We also note that small improvements in overall RMSE will typi￾cally translate into substantial improvements in bottom-line performance for predicting reviewer-paper preferences. The model we learn is of the form: rˆui = µ + bu + bi + p T u qi + X c σicθucwc +γ P j∈R(u) sij ruj α + P j∈R(u) sij + φ P v∈R(i) suvrvi β + P v∈R(i) suv (1) and we proceed to explain each of the terms below. 2.1 Baseline model Much of the variability in the data is explained by global effects, which can be reviewer- or paper-specific. It is im￾portant to capture this variability by a separate component, thus letting the more involved models deal only with genuine reviewer-paper interactions. We model these global effects through the first three terms of Eq. 1, i.e., µ + bu + bi. The constant µ indicates a global bias in the data, which is taken to be the overall mean rating. The parameter bu captures reviewer-specific bias, accounting for the fact that different reviewers use different rating scales. Finally, the paper bias, bi, accounts for the fact that certain papers tend to attract higher (or, lower) bids than others. We learn optimal values for bu (u = 1, . . . , m) and bi (i = 1, . . . , n), by minimizing the associated squared error function with just these three terms (along with some regularization to avoid overfitting). The resulting average test RMSE is 0.6286. A separate analysis of each of the two biases shows re￾viewer effect (µ + bu, with RMSE 0.6336) to be much more significant than paper bias (µ + bi, RMSE 1.2943) in re￾ducing the error. This indicates a tendency of reviewers to concentrate all ratings near their mean ratings, which is supported by examination of the data. While the baseline model could explain much of the data variability, as evident by its relatively low associated RMSE, it is useless for making actual assignments. After all, it gives all reviewers exactly the same order of paper preferences. Thus, we are really after the remaining unexplained vari￾ability, where reviewer-specific preferences are getting ex￾pressed. Uncovering these preferences is the subject of the next subsections. 2.2 A factor model Latent factor models (e.g., [3]) comprise a common ap￾proach to collaborative filtering with the goal to uncover latent features that explain observed ratings. The premise of such models is that both reviewers and papers can be characterized as vectors in a common f-D space. The in￾teraction between reviewers and papers is modeled by inner products in that space, the fourth term of Eq. 1. Here, pu ∈ R f and qi ∈ R f are the factor vectors of reviewer u and paper i, respectively. The resulting average test RMSE is slowly decreasing when increasing the dimensionality of the latent factor space. E.g., for f = 50 it is 0.6240, and for f = 100 it is 0.6234. Henceforth, we use f = 100. 2.3 Subject categories While latent factor models automatically infer suitable categories, much can be learned by known categories at￾tributed to both papers and reviewers. ICDM’07 submis￾sions specify a number of predefined categories as primary and secondary topics for a given paper. We model the en￾tered matching between paper i and category c by: σic = 8 < : 1 c ∈ primary(i) 1 2 c ∈ secondary(i) 0 otherwise The value assignment (1 for “primary”, 0.5 for “secondary”) is derived by cross validation and is quite intuitive. Simi￾larly, we use the following for matching reviewers with their desired categories: θuc = 8 < : 1 c ∈ interest(u) − 1 2 c ∈ no interest(u) 0 otherwise Notice that in ICDM’07, reviewers could specify lack of in￾terest (or inability to) review papers from certain categories (this is different from conflicts of interest, discussed later). In the fifth term of Eq. 1, the weights wc indicate the sig￾nificance of each category in linking a reviewer to a paper, and are learnt automatically by minimizing the squared er￾ror on the training set. It is plausible that, e.g., a mutual interest in some category A, will strongly link a reviewer to a paper, while a mutual interest in another category B is less influential on papers choice. Table 1 depicts results of this analysis, showing differences in orders of magnitude in the ability of different categories to correctly predict associations of reviewers to papers. Note in particular that there is no obvious monotonic relationship between the weight imputed to categories and the number of papers/reviewers associated
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有