proposal vector and the centroids of _中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：mining for proposal reviewers lessons learned at the National Science Foundation

正在加载图片...

Industrial and Government Applications Track Paper proposal vector and the centroids of the panels. The output below responsibility in the initial sort. It is then assigned to a program director in the panel checking stage. In the next section, we op terms of NSF04xXXX2 are: sensor, wireless, discuss assisting in the assignment of reviewers to proposals hierarch hannel, energi alloc, poor, radio, path. Alternate panels for orphan: cluster WoN3, cluster 3.5 Assigning Reviewers WoN,C⊥ uster CIP-SC The most straightforward way to s for a proposal would simply be to select the hors of the previous This algorithm for assigning an orphan proposal to a panel is related to Rocchio's algorithm for text classification [13].The proposals that are the most similar to w prop MailCAT system [14] used the idea of displaying a few possible reviewed. This is the approach that has been used in some past folders for filing e-mail messages analogous to the way that efforts at automatic reviewer assignments (e.g,[15]). This Revaide finds a few possible panels. In both cases, the idea is to approach does a fair job but has some important drawbacks. The ope with the reality that text classification is not 100% accurate main problem occurs when a proposal has more than one topic (a fairly common while providing benefit by focusing a person on a few occurrence)and one topic dominates the match possibilities out of the many that are available with other proposals. This leads to a set of reviewers that all have e same expertise, often leaving other topics in the target document uncovered. For example, consider a document about 3.4 Proposal classification data mining using Gaussian mixture models to predict outcomes Revaide has the capability of performing text classification. in a medical context. Ideally you would want a mix of reviewer The algorithm for recommending a panel for orphan proposals is expertise for this document: general data mining, the specific one use of text classification. This section describes another use: technique being used, as well as the field it is being applied to performing an initial assignment of proposals to program Simply selecting reviewers by document similarity would tend to directors. Recall that teams of program directors sort through select reviewers who matched most closely to the primary topic of proposals to identify the major area before further subdividing the paper (as determined by the TF-IDF weighting process) nto panels. Revaide can use a text classification algorithm to possibly failing to select any reviewers at all for an important perform this initial sort. In this case, the training data is the secondary topic of the docum previous year's proposals and the class is the name of the program To solve this problem, we approach the task slightly officer who organized the review panel the previous year. Ina differently. Instead of finding the N closest matches for the target the goal of the text classification is to find the person who will assume initial ownership of this year's proposal proposal, we look for the set of N proposals that together best responsibilities in the prior year. The initial am director match the target document. We define a measure that indicates either places a proposal into a panel they will or to another program officer who is a better match In a study using cross validation of th or passes it the degree of the overlap between the terms in a proposal vector submitted to Information and Intelligent Systems, the terms represent a proposal as a normalized weighted vector of classification accurad 80.9%. This clearly is not good enough for a fully automated system. However, it provides P1…,P tremendous benefits within the existing workflow. For example, rather than having 10 people each sort through 1000 proposals to we represent a reviewer s expertise as a find proposals of interest, each person is initially assigned normalized vector: approximately 100 by the text classification algorithm. Each program director then reviews those 100 proposals and on average needs to find a better program director for 20 proposals. This has greatly reduced the amount of effort required to identify the best Where P, is the weight of term i in a pre program officer for each proposal weight of term i in a reviewers expertise e define a Revaide assists with ead the panel formation residual term vector to represent the relevant n the pi process, first by recommending an initial program officer. nd that are not in the expertise of the reviewer. The weight of each the final program officer is decided upon for each proposal, the of the residual term vectors is the difference between the weight proposals are manually subdivided into panels and the panels are the proposal and expertise vector with a minimum ofo checked for coherence. A proposal might be"orphaned"if it was initially misrouted or delayed or if no program officer claimed More generally, there is typically more than one reviewer NSF,the initial assignment may be based upon the to e define the residual term vector when there are k reviewers many program officers are rotators who spend a short fficer's predecessors proposals This overview les the process. Two program e.g., at th R max(O tersection of databases and artificial intelliproposal vector and the centroids of the panels. The output below illustrates this process. Top terms of NSF04XXXX2 are: sensor, wireless, hierarch, node, channel, energi, signal, rout, alloc, poor, radio, path. Cluster WON2 is the best match for NSF04XXXX2 Alternate panels for orphan: Cluster WON3, Cluster WON, Cluster CIP-SC This algorithm for assigning an orphan proposal to a panel is related to Rocchio’s algorithm for text classification [13]. The MailCAT system [14] used the idea of displaying a few possible folders for filing e-mail messages analogous to the way that Revaide finds a few possible panels. In both cases, the idea is to cope with the reality that text classification is not 100% accurate while providing benefit by focusing a person on a few possibilities out of the many that are available. 3.4 Proposal Classification Revaide has the capability of performing text classification. The algorithm for recommending a panel for orphan proposals is one use of text classification. This section describes another use: performing an initial assignment of proposals to program directors. Recall that teams of program directors sort through proposals to identify the major area before further subdividing into panels. Revaide can use a text classification algorithm to perform this initial sort. In this case, the training data is the previous year’s proposals and the class is the name of the program officer who organized the review panel the previous year. That is, the goal of the text classification is to find the person who will assume initial ownership of this year’s proposals based upon their responsibilities in the prior year3 . The initial program director either places a proposal into a panel they will organize or passes it to another program officer who is a better match for the proposal. In a study using cross validation of the 2004 proposals submitted to Information and Intelligent Systems, the classification accuracy was 80.9%. This clearly is not good enough for a fully automated system. However, it provides tremendous benefits within the existing workflow. For example, rather than having 10 people each sort through 1000 proposals to find proposals of interest, each person is initially assigned approximately 100 by the text classification algorithm. Each program director then reviews those 100 proposals and on average needs to find a better program director for 20 proposals. This has greatly reduced the amount of effort required to identify the best program officer for each proposal. Revaide assists with each step of the panel formation process, first by recommending an initial program officer. Once the final program officer is decided upon for each proposal4 , the proposals are manually subdivided into panels and the panels are checked for coherence. A proposal might be “orphaned” if it was initially misrouted or delayed or if no program officer claimed 3 Because many program officers are rotators who spend a short time at NSF, the initial assignment may be based upon the program officer’s predecessor’s proposals. 4 This overview slightly simplifies the process. Two program directors may decide to hold a joint panel, e.g., at the intersection of databases and artificial intelligence. responsibility in the initial sort. It is then assigned to a program director in the panel checking stage. In the next section, we discuss assisting in the assignment of reviewers to proposals. 3.5 Assigning Reviewers The most straightforward way to choose N reviewers for a proposal would simply be to select the N authors of the previous proposals that are the most similar to the new proposal to be reviewed. This is the approach that has been used in some past efforts at automatic reviewer assignments (e.g., [15]). This approach does a fair job but has some important drawbacks. The main problem occurs when a proposal has more than one topic (a fairly common occurrence) and one topic dominates the match with other proposals. This leads to a set of reviewers that all have the same expertise, often leaving other topics in the target document uncovered. For example, consider a document about data mining using Gaussian mixture models to predict outcomes in a medical context. Ideally you would want a mix of reviewer expertise for this document: general data mining, the specific technique being used, as well as the field it is being applied to. Simply selecting reviewers by document similarity would tend to select reviewers who matched most closely to the primary topic of the paper (as determined by the TF-IDF weighting process) possibly failing to select any reviewers at all for an important secondary topic of the document. To solve this problem, we approach the task slightly differently. Instead of finding the N closest matches for the target proposal, we look for the set of N proposals that together best match the target document. We define a measure that indicates the degree of the overlap between the terms in a proposal vector and a set of expertise vectors. We represent a proposal as a normalized weighted vector of terms: n P p ,..., p 1 = r . Similarly, we represent a reviewer’s expertise as a normalized vector: n E e ,..., e 1 = r . Where pi is the weight of term i in a proposal and ri is the weight of term i in a reviewer’s expertise vector. We define a residual term vector to represent the relevant terms in the proposal that are not in the expertise of the reviewer. The weight of each of the residual term vectors is the difference between the weight in the proposal and expertise vector with a minimum of 0. max(0, ),..., max(0, ) 1 1 n e n R = p − e p − r . More generally, there is typically more than one reviewer and we define the residual term vector when there are k reviewers to be ) , ),..., max(0, , max(0, 1 1 = − ∑ − ∑ k i n i e n p k i i R p ε e ε r 866 Industrial and Government Applications Track Paper

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：mining for proposal reviewers lessons learned at the National Science Foundation