正在加载图片...
number of papers(assigned to the category). For brevity, only a few categories are showlt't category),and Table 1: Subject categories, inferred weights, number of reviewers (with expertise in tha f re papers Healthcare, epidemic modeling, and clinical research Security, privacy, and data integrity ctured data ings: web, social and comput Novel data mining algorithms in traditional areas(such as classification,0.08924 Algorithms for new, structured, data types, such as arising in 0.006015 ding subject categories to the ple because it was used during ICDM07 and thus enables a eIs, the resulting rmse is 0.6197 baseline comparison with an approach that does not perfor 2.4 Paper-paper similarities any preference modeling. It can incorporate global confer ence constraints such as the desired number of reviewers for We inject paper-paper similarities into our models in a each paper(kp), and a desired maximum number of papers way reminiscent of item- item recommenders [6]. The build- for each reviewer(kr).(For ICDM'O7, these values are 3 ing blocks here are similarity values sii, which measure the and 9, respectively. Denoting the predicted ratings matrix similarity of paper i and paper j. The similarities could be as R, the goal is to optimize the assignments matrix A [7 derived from the ratings data, but those are already covered y the latent factor model. Rather, we derive the similarity argmax trace(R2A)=agmx∑∑RnA,(2) of two papers by computing the cosine of their abstracts Usually we work with the square of the cosine, which better where Au∈[0,1vu,j ontrasts the higher similarities against the lower ones. In the sixth term of Eq. 1, the set R(u) contains all papers and∑An≤k,w, on which u bid. The constant a is for regularization: it penalizing cases where the weighted average has very low Au≤k determined by CR(u)i is very small. In our dataset it was support,ie.∑ oss validation to be 0.001. The parameter Here, the objective criterion-trace(RA)-captures the r sets the overall weight of the paper-paper component. It is learnt as part of the optimization process(cross-validation global affinity of all reviewers across all their assigned pa- could have been used as well). Its final value is close to 0.7. pers. Cols can be modeled by hardwiring the desired entries When this term is combined with the overall scheme. the of A (to zero) and taking them out of play'in Eq 2. rMsE drops down further to 0.6038 This integer programming problem is reformulated into an easier-to-manage linear programming problem by a series of 2.5 Reviewer-reviewer similarities steps, using the node-edge adjacency matrix, where every row corresponds to a node in A, and every column repre We craft reviewer-reviewer similarities suv analogously to sents an edge [7]. This reformulation is a bit more com paper-paper similarities, measured as the number of com plicated, but essentially renders the problem solvable via monly co-authored papers as reported in DBLP. We point methods such as Simplex or interior point programming. In out that DBlP data might be incomplete, and co-authorship particular, as Taylor shows in [7], because the reformulated does not imply similarity of research interests. Nevertheless our main contribution here is to show how to incorporate constraint matrix is totally unimodular, there exists at least reviewer-reviewer similarities in Eq. 1 and more sophist one globally optimal assignment with integral (and due to cated ways to define suv can be readily plugged in. By in- the constraints, Boolean) tegrating this factor, the rmse is 0.6015 4. EXPERIMENTAL RESULTS 2.6 Conflicts of interest(Col We have already shown the ability of our modeling to A final source of data is conflicts of interest for certain(pa- better capture reviewer- paper preferences. But do the imm per, reviewer)combinations, e.g., the reviewer might be the proved models translate into better assignments? Note the former advisor of the author. Many conferences define what key distinction between recommendations and assignments. it means to have a Col and solicit this information explicitly To evaluate assignment quality, we extend the train-test aring the bidding phase. We do not aim to model/predict methodology from above. In other words, both the predic- new Cols but show in the next section how they are incor- tion algorithm and the assignment algorithm cannot see the porated to avoid making erroneous assignments originally given preferences within the test set. We use the training set to learn model (1), predict all ratings using this 3. OPTIMIZING PAPER ASSIGNMENT model, and feed these predictions as input to(2). While the resulting assignment will be spread across the training and Our predicted preference matrix can now be supplied as test sets, we will specifically evaluate those made from the input to any of the assignment algorithms discussed in 2 est set and determine whether the reviewer had rated them We chose the Taylor algorithm 7 as a representative exam as" No, ' 'Low, ' OK, or High. This methodology mimicsTable 1: Subject categories, inferred weights, number of reviewers (with expertise in that category), and number of papers (assigned to the category). For brevity, only a few categories are shown. Category Weight # reviewers # papers (wc) primary (secondary) Healthcare, epidemic modeling, and clinical research 0.395121 31 7 (7) Security, privacy, and data integrity 0.334821 23 12 (6) Handling imbalanced data 0.284398 24 6 (10) Mining textual and unstructured data 0.245319 66 38 (30) Mining in networked settings: web, social and computer networks, 0.206318 62 44 (29) and online communities Novel data mining algorithms in traditional areas (such as classification, 0.089248 91 147 (71) regression, clustering, probabilistic modeling, and association analysis) Dealing with cost sensitive data and loss models 0.03453 12 4 (4) Algorithms for new, structured, data types, such as arising in 0.006015 60 21 (25) chemistry, biology, environment, and other scientific domains with the category. When adding subject categories to the baseline and factor models, the resulting RMSE is 0.6197. 2.4 Paper-paper similarities We inject paper-paper similarities into our models in a way reminiscent of item-item recommenders [6]. The build￾ing blocks here are similarity values sij , which measure the similarity of paper i and paper j. The similarities could be derived from the ratings data, but those are already covered by the latent factor model. Rather, we derive the similarity of two papers by computing the cosine of their abstracts. Usually we work with the square of the cosine, which better contrasts the higher similarities against the lower ones. In the sixth term of Eq. 1, the set R(u) contains all papers on which u bid. The constant α is for regularization: it is penalizing cases where the weighted average has very low support, i.e. P j∈R(u) sij is very small. In our dataset it was determined by cross validation to be 0.001. The parameter γ sets the overall weight of the paper-paper component. It is learnt as part of the optimization process (cross-validation could have been used as well). Its final value is close to 0.7. When this term is combined with the overall scheme, the RMSE drops down further to 0.6038. 2.5 Reviewer-reviewer similarities We craft reviewer-reviewer similarities suv analogously to paper-paper similarities, measured as the number of com￾monly co-authored papers as reported in DBLP. We point out that DBLP data might be incomplete, and co-authorship does not imply similarity of research interests. Nevertheless, our main contribution here is to show how to incorporate reviewer-reviewer similarities in Eq. 1 and more sophisti￾cated ways to define suv can be readily plugged in. By in￾tegrating this factor, the RMSE is 0.6015. 2.6 Conflicts of interest (CoI) A final source of data is conflicts of interest for certain (pa￾per, reviewer) combinations, e.g., the reviewer might be the former advisor of the author. Many conferences define what it means to have a CoI and solicit this information explicitly during the bidding phase. We do not aim to model/predict new CoIs but show in the next section how they are incor￾porated to avoid making erroneous assignments. 3. OPTIMIZING PAPER ASSIGNMENT Our predicted preference matrix can now be supplied as input to any of the assignment algorithms discussed in [2]. We chose the Taylor algorithm [7] as a representative exam￾ple because it was used during ICDM’07 and thus enables a baseline comparison with an approach that does not perform any preference modeling. It can incorporate global confer￾ence constraints such as the desired number of reviewers for each paper (kp), and a desired maximum number of papers for each reviewer (kr). (For ICDM’07, these values are 3 and 9, respectively.) Denoting the predicted ratings matrix as R, the goal is to optimize the assignments matrix A [7]: argmax A trace “ R T A ” = argmax A X u X j RujAuj , (2) where Auj ∈ [0, 1] ∀u, j, and X j Auj ≤ kp, ∀u, and X u Auj ≤ kr, ∀j. Here, the objective criterion—trace ` RT A ´ —captures the global affinity of all reviewers across all their assigned pa￾pers. CoIs can be modeled by hardwiring the desired entries of A (to zero) and taking them ‘out of play’ in Eq. 2. This integer programming problem is reformulated into an easier-to-manage linear programming problem by a series of steps, using the node-edge adjacency matrix, where every row corresponds to a node in A, and every column repre￾sents an edge [7]. This reformulation is a bit more com￾plicated, but essentially renders the problem solvable via methods such as Simplex or interior point programming. In particular, as Taylor shows in [7], because the reformulated constraint matrix is totally unimodular, there exists at least one globally optimal assignment with integral (and due to the constraints, Boolean) coefficients. 4. EXPERIMENTAL RESULTS We have already shown the ability of our modeling to better capture reviewer-paper preferences. But do the im￾proved models translate into better assignments? Note the key distinction between recommendations and assignments. To evaluate assignment quality, we extend the train-test methodology from above. In other words, both the predic￾tion algorithm and the assignment algorithm cannot see the originally given preferences within the test set. We use the training set to learn model (1), predict all ratings using this model, and feed these predictions as input to (2). While the resulting assignment will be spread across the training and test sets, we will specifically evaluate those made from the test set and determine whether the reviewer had rated them as ‘No,’ ‘Low,’ ‘OK,’ or ‘High.’ This methodology mimics
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有