Industrial and Government Applications Track Paper Mining for Proposal Reviewers Lessons learned at the national science foundation Seth hettich Michael Pazzani Google, Inc Rutgers University 1600 Amphitheatre Parkway CoRE Building, Rm 706 Mou View. ca 994043 sh@ics. uci. edu Piscataway, NJ 08854-8018 Pa rutgers ed ABSTRACT directors who In this paper, we discuss a prototype application deployed at the ultimately make all decisions. Altho paper reports on U.S. National Science Foundation for assisting program directors reviewing proposals, we argue that the and technology in identifying reviewers for proposals. The application helps would also apply to the reviewing of papers submitted to program directors sort proposals into panels and find reviewers conferences and journals for proposals. To accomplish these tasks, it extracts information Many proposals are reviewed in panels, i.e, a group of from the full text of proposals both to learn about the topics of pically 8-15 reviewers who meet to discuss a set of 20-4 proposals and the expertise of reviewers. We discuss a of related proposals, with each panelist typically reviewing 6- alternatives that were explored, the solution plemented, and the experience in using the solution proposals. Most proposals are submitted in response to a particular solicitation(e.g,"Information Technology Research") workflow of NSF or to a specific program(e. g, " Human Computer Interaction) Individual program directors, or for larger solicitations teams of Categories and Subject Descriptors program officers, perform a number of tasks to insure that H 2.8 Database Applications Data Mining proposals are reviewed equitably. These tasks include 1. Divide the proposals into"clusters"of 20-40 related General terms proposals to create panels Algorithms. Human Factors 2. Finding reviewers. ging applications, technology, Identify potential external reviewers to invite for Keyword Keyword extraction, similarity functions, clustering, information If there is not adequate expertise on a panel to review a proposal, obtain "ad hoc" reviews from 1. INTRODUCTION people with that expertise who are not on a panel The National Science Foundation receives over 40.000 proposals a year. Each proposal is reviewed by several extemal In addition to this lengthy process, reviewers must not have a conflict of interest with proposals they are reviewing(e.g, they reviewers. It is critical to the mission of the agency and the may not be from the same department as the proposals autho integrity of the review process that every proposal is reviewed by and a diverse group of panelists (both scientifically and researchers with the expertise necessary to comment on the merit of the proposal. If there is not a good match between the topic of demographically)is desirable to insure that multiple perspectives a proposal and the expertise of the reviewers, then it is possible are represented in the review process. Furthermore, due to that a project is funded that will not advance the progress of scheduling or workload conflicts, not every invited review science or that a very promising proposal is declined. We explore ccepts the invitation, requiring an iterative process of inviting a the problem of using data mining technology to assist progra batch of reviewers and then inviting others to fill in gaps after the initial reviewers respond to the invitation directors in the review of proposals. Care is taken to match the technology to the existing workflow of the agency and to A particular consideration at NSF is that many proposals are multidiscipline lining genome data. To determine if such a proposal is meritorious, it is important to consult some experts Permission to make digital or hard copies of all or part of this work for ersonal or classroom use is granted without fee provided that copies with backgrounds in data mining(to insure that the method ot made or distributed for profit or commercial advantage and that proposed are likely to work)and in the biological sciences(te copies bear this notice and the full citation on the first page. To copy Insure that the problem addressed is an important open problem) otherwise, or republish, to post on servers redistribute to lists If all reviewers have expertise in one area, it's possible that an d/or a fee important problem would be addressed by a technique that isn't KDD06, August 20-23, 2006, Philadelphia, Pennsylvania, USA. Copyright2006ACMl-59593-3395060008.5500
Mining for Proposal Reviewers: Lessons Learned at the National Science Foundation Seth Hettich Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 9 94043 sjh@ics.uci.edu Michael J. Pazzani Rutgers University CoRE Building, Rm 706 96 Frelinghuysen Rd Piscataway, NJ 08854-8018 Pazzani @ rutgers.edu ABSTRACT In this paper, we discuss a prototype application deployed at the U.S. National Science Foundation for assisting program directors in identifying reviewers for proposals. The application helps program directors sort proposals into panels and find reviewers for proposals. To accomplish these tasks, it extracts information from the full text of proposals both to learn about the topics of proposals and the expertise of reviewers. We discuss a variety of alternatives that were explored, the solution that was implemented, and the experience in using the solution within the workflow of NSF. Categories and Subject Descriptors H.2.8 [Database Applications]: Data Mining General Terms Algorithms, Human Factors, Emerging applications, technology, and issues Keywords Keyword extraction, similarity functions, clustering, information retrieval. 1. INTRODUCTION The National Science Foundation receives over 40,000 proposals a year. Each proposal is reviewed by several external reviewers. It is critical to the mission of the agency and the integrity of the review process that every proposal is reviewed by researchers with the expertise necessary to comment on the merit of the proposal. If there is not a good match between the topic of a proposal and the expertise of the reviewers, then it is possible that a project is funded that will not advance the progress of science or that a very promising proposal is declined. We explore the problem of using data mining technology to assist program directors in the review of proposals. Care is taken to match the technology to the existing workflow of the agency and to use technology to offer suggestions to program directors who ultimately make all decisions. Although this paper reports on reviewing proposals, we argue that the lessons and technology would also apply to the reviewing of papers submitted to conferences and journals. Many proposals are reviewed in panels, i.e., a group of typically 8-15 reviewers who meet to discuss a set of 20-40 related proposals, with each panelist typically reviewing 6-10 proposals. Most proposals are submitted in response to a particular solicitation (e.g., “Information Technology Research”) or to a specific program (e.g., “Human Computer Interaction”). Individual program directors, or for larger solicitations teams of program officers, perform a number of tasks to insure that proposals are reviewed equitably. These tasks include: 1. Divide the proposals into “clusters” of 20-40 related proposals to create panels. 2. Finding reviewers: • Identify potential external reviewers to invite for each panel. • Assign panelists as reviewers of proposals. • If there is not adequate expertise on a panel to review a proposal, obtain “ad hoc” reviews from people with that expertise who are not on a panel. In addition to this lengthy process, reviewers must not have a conflict of interest with proposals they are reviewing (e.g., they may not be from the same department as the proposal’s author), and a diverse group of panelists (both scientifically and demographically) is desirable to insure that multiple perspectives are represented in the review process. Furthermore, due to scheduling or workload conflicts, not every invited reviewer accepts the invitation, requiring an iterative process of inviting a batch of reviewers and then inviting others to fill in gaps after the initial reviewers respond to the invitation. A particular consideration at NSF is that many proposals are multidisciplinary, e.g., mining genome data. To determine if such a proposal is meritorious, it is important to consult some experts with backgrounds in data mining (to insure that the methods proposed are likely to work) and in the biological sciences (to insure that the problem addressed is an important open problem). If all reviewers have expertise in one area, it’s possible that an important problem would be addressed by a technique that isn’t Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD'06, August 20–23, 2006, Philadelphia, Pennsylvania, USA. Copyright 2006 ACM 1-59593-339-5/06/0008...$5.00. 862 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper very promising or that very promising technology would be ethnic groups. These panels have heterogeneo applied to a problem that is already solved scientific content. Other solicitations focus on advancing the frontier of science and might divide 2. Exploring Potential solutions proposals into panels by scientific subfield. Within Over the past decade, vendors have proposed various text a scientific panel proposals might lining technologies to NSf to help with the reviewing process heterogeneous broader impact such as The most common technology proposed is automated groups clustering to help organize proposals into panels. A variety of eating results of interest to ndustry alternative approaches(e.g, hierarchical [1] or k-means [2] have In general, the problem with fully automated text clustering d.While these present interesting views of proposal solutions is that they dont leave room for human input of workflow of NSF or that have gained universal acceptance by addresses issues raised. For example, the simplest k-means program officers who organize panels and assign reviewers. clustering algorithm is incremental and would allow for the late Automated clustering approaches suffer from a number of flaws additions to the existing clusters. However, the results of k- that have reduced their utility in dividing proposals into panels means are not stable so it results in different partitioning of the 1. The size of clusters. Most clustering algorithms same data on different runs. Several investigators(e.g. [4]and oduce clusters of different size. Often, D have looked at adding constraints to the clustering process so there are a few very clusters and a larger that constraints are approximately the same size. However, none number of very small clusters. In contrast, NSF of these address the lack of alignment with the organization panels are often approximately the same size due structure and workflow. In Section 3, we discuss an approach to logistical constraints ranging from the size of "cluster checking"in which algorithms related to text clustering oms to the number of proposals that can be and classification are used to suggest improvements to clusters produced by people and new proposals are added to existing 2. The stability of clusters. Dividing panels often occurs incrementally NSf has also explored and experimented with technology for solicitations have deadlines, some proposals that assigning reviewers to proposals. One approach is to create a come in before the deadline are misrouted and thel database of reviewers with keywords indicating user expertise. found a few weeks later. Occasionally, due to These databases are populated by users filling out a form wit er or natural disasters. a deadline their expertise. Experience within NSF on prototypes of reviewer extended for some regions of the country. Many databases have found mixed results. Common problems include lustering algorithms rerun on a s 1. It is difficult for a scientific comm anded data set produce drastically different upon a taxonomy of keywords need only results. Some algorithms are stochastic in nature examine the ACM Computing Classification and produce different clusters when rerun on the Schemeathttp://www.acm.orgclass/1998togain same data(e.g, see [3]). It is difficult to convince an appreciation for the difficulty. While this officers with different backgrounds and classification is adequate for a coarse sorting of that a computer system has found an pic areas ganization of a group of proposals if that too coarse to be of much use in bringing expertise organization changes drastically nto the reviewing process. For example, the most 3. Lack of alignment with organizational structure of ne-grained term representing the topic area of this NSF. The clusters produced by clustering conference is"Data Mining. If this were used as algorithms rarely correspond to the scientific and the basis for assigning reviewers, then a system organizational structure of NSF. Each panel has a that uses a keyword-based approach would believe rogram officer (or occasionally a team of 2-3 that anyone publishing in this conference would be considered equally qualified to review a pre lusters are created automatically without regard to or paper on any topic in the conference. The Data the organization and program officers' expertise Mining field has become sufficiently specialized ome clusters do not correspond to established that one can be an expert in one area(such as scientific fields and no program director wants to association rules)and not have detailed expertise be responsible for reviewing proposals that dont in other areas(such as text classification) and an fall within their general area of expertise ideal reviewer for a proposal in one area may not be qualified for another area 4. Lack of alignment with the goals of the solicitation. For example, some solicitations focus 2. It is difficult to maintain such a keyword database on broadening participation in the scientific over time. New topics arise in rapidly growing workforce, and it is useful to group proposals into fields requiring the taxonomy and database to be nels that address issues such as increasing the updated frequently. This is particularly important participation of women and others that focus on for a funding agency that has the goal of funding f underrepresented 63
very promising or that very promising technology would be applied to a problem that is already solved. 2. Exploring Potential Solutions Over the past decade, vendors have proposed various text mining technologies to NSF to help with the reviewing process. The most common technology proposed is automated text clustering to help organize proposals into panels. A variety of alternative approaches (e.g., hierarchical [1] or k-means [2]) have been explored. While these present interesting views of proposal submission data, they do not produce results that fit easily into the workflow of NSF or that have gained universal acceptance by program officers who organize panels and assign reviewers. Automated clustering approaches suffer from a number of flaws that have reduced their utility in dividing proposals into panels. 1. The size of clusters. Most clustering algorithms produce clusters of quite different size. Often, there are a few very large clusters and a larger number of very small clusters. In contrast, NSF panels are often approximately the same size due to logistical constraints ranging from the size of rooms to the number of proposals that can be discussed per day. 2. The stability of clusters. Dividing proposals into panels often occurs incrementally. Although most solicitations have deadlines, some proposals that come in before the deadline are misrouted and then found a few weeks later. Occasionally, due to severe weather or natural disasters, a deadline is extended for some regions of the country. Many clustering algorithms if rerun on a slightly expanded data set produce drastically different results. Some algorithms are stochastic in nature and produce different clusters when rerun on the same data (e.g., see [3]). It is difficult to convince program officers with different backgrounds and expertise that a computer system has found an ideal organization of a group of proposals if that organization changes drastically. 3. Lack of alignment with organizational structure of NSF. The clusters produced by clustering algorithms rarely correspond to the scientific and organizational structure of NSF. Each panel has a program officer (or occasionally a team of 2-3 program officers) with specific expertise. When clusters are created automatically without regard to the organization and program officers’ expertise, some clusters do not correspond to established scientific fields and no program director wants to be responsible for reviewing proposals that don’t fall within their general area of expertise. 4. Lack of alignment with the goals of the solicitation. For example, some solicitations focus on broadening participation in the scientific workforce, and it is useful to group proposals into panels that address issues such as increasing the participation of women and others that focus on increasing the participation of underrepresented ethnic groups. These panels have heterogeneous scientific content. Other solicitations focus on advancing the frontier of science and might divide proposals into panels by scientific subfield. Within a scientific panel proposals might have a heterogeneous broader impact such as increasing the participation of underrepresented groups or creating results of interest to undustry. In general, the problem with fully automated text clustering solutions is that they don’t leave room for human input of preferences or constraints. There has been some research that addresses issues raised. For example, the simplest k-means clustering algorithm is incremental and would allow for the late additions to the existing clusters. However, the results of kmeans are not stable so it results in different partitioning of the same data on different runs. Several investigators (e.g., [4] and [5]) have looked at adding constraints to the clustering process so that constraints are approximately the same size. However, none of these address the lack of alignment with the organization structure and workflow. In Section 3, we discuss an approach to “cluster checking” in which algorithms related to text clustering and classification are used to suggest improvements to clusters produced by people and new proposals are added to existing panels. NSF has also explored and experimented with technology for assigning reviewers to proposals. One approach is to create a database of reviewers with keywords indicating user expertise. These databases are populated by users filling out a form with their expertise. Experience within NSF on prototypes of reviewer databases have found mixed results. Common problems include: 1. It is difficult for a scientific community to agree upon a taxonomy of keywords. One need only examine the ACM Computing Classification Scheme at http://www.acm.org/class/1998 to gain an appreciation for the difficulty. While this classification is adequate for a coarse sorting of papers into topic areas, the topic areas tend to be too coarse to be of much use in bringing expertise into the reviewing process. For example, the most fine-grained term representing the topic area of this conference is “Data Mining.” If this were used as the basis for assigning reviewers, then a system that uses a keyword-based approach would believe that anyone publishing in this conference would be considered equally qualified to review a proposal or paper on any topic in the conference. The Data Mining field has become sufficiently specialized that one can be an expert in one area (such as association rules) and not have detailed expertise in other areas (such as text classification) and an ideal reviewer for a proposal in one area may not be qualified for another area. 2. It is difficult to maintain such a keyword database over time. New topics arise in rapidly growing fields requiring the taxonomy and database to be updated frequently. This is particularly important for a funding agency that has the goal of funding work at the frontier of science rather than 863 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper concentrating on incremental work in mature 3. Revaide We have deployed a prototype system, Revaide, within NSF 3. If unrestricted text is allowed as descriptions for that addresses the problems with previous fully autonomous expertise, it is rare that potenti ystems. The philosophy behind the system is to assist program ogram directors, and proposal authors all select directors and not replace their judgment with a x system the same free text terms. Numerous studies of One key design criteria is that Revaide offers suggestions that have found low may be accepted or declined individually. In this section, we agreement among individuals assigning keywords introduce Revaide, its tasks and solution, and evaluate the utility to content(e.g, [8) of using Revaide. We introduce a measure to evaluate how well of reviewers is suited for a proposal 4. There is not high compliance with requests of users Following the discussion of the key components of Revaide in to enter information into the database. Many this section, we will report on the experiences using the researchers are too usy to fill out forms or algorithm agreeing to review proposals is a service to the funding agency, being asked to review proposals is 3.1 Representing Proposals as welcome to some as other forms of service such Proposals are submitted to NSF in PDF form. Revaide as serving on jury duty converts the proposals to ASCll and represents proposals in the standard TF-IDF vector space [10] as term vectors in the space of 5. The interface for submitting proposals to NSF, all words in the document collection. The entire proposal is used Fastlane, does not allow keywords to be entered ncluding the references and resume of the investigator. On describing the proposals. While this could be simple use of Revaide is to annotate spreadsheets of proposals added to the interface, doing so would require with the 20 terms with highest TF-IDF weights. These keywords onsensus that this will facilitate proposal handling are often more informative to program directors than the title to and this has not been demonstrated convincingly determine what a proposal is about. While early versions of whe Due to the limitations of keyword-based database systems, Revaide used stemming [ll to convert words to root forms, w found that stemming reduced the human comprehensibility of th when they are used within NSF, they are limited to suggesting a resulting term vector representation. Experience showed that and Information Science and Engineering at NSF has using stemming did not increase the quality of the suggestions experimented with a keyword system (e.g, in the 2001 ITR made by Revaide. Therefore, we no longer use stemming competition), it was not used in subsequent years One other enhancement also increased the comprehensibility Finally, NSF has experimented with systems that allow of the resulting term representation. We augmented the stoplist of panelists to indicate preferences for reviewing proposals within a Items that should not be used as keywords. While most stoplists include common words such as articles and prepositions, we reviewing a proposal on a numeric scale. Many conferences also augmented the stoplist to include words that appeared in use similar systems such as Cyberchair [9]. In Cyberchair, a proposals that were not descriptive of the proposal content, constraint satisfaction algorithm assigns people proposals they are including the e-mail addresses of Pls and the name and city of the ost interested in. These systems only address part of the university. These words frequently occur within a few proposals and not in many others giving them high TF-IDF weights, but panelists but only assigning proposals to panelists once they have they contused program directors when used as keywords and these systems as well, i.e., not every panelist promptly enters An example will illustrate the representation used by preferences data and a single person not replying can delay the Revaide for one proposal. The terms with the highest weights and assignments for all others. In addition, it isn't clear what the their weights were image: 0.031, judgments: 0.028 preference scores mean or how much thought goes into th feedback: 0.027, relevance: 0.026, multimodal assignments. While the intent is to judge how well qualified 0.020, retrieval: 0.019, and preference: 0.017 reviewer is to review a proposal, we have observed many To preserve the privacy of the submitter, we cannot provide the panelists having a strong preference for proposals by well known title or abstract, but we find that the automatically extracted researchers and fewer having a preference for proposals by less keywords do indeed provide a compact representation that makes established researchers. While NSF typically asks for preferences sense to program directors and provides a basis to assist on 20-30 proposals, some conferences ask for preference data on reviewers 200-300 papers. The second author admits that when presented with 300 papers in Cyberchair, not as much time is spent 3.2 Representing Reviewer Expertise reviewing the abstracts of the last batch of papers as the first to determine preferences. Finally, there is also a problem with Revaide represents the expertise of a reviewer with the TF multidisciplinary proposals if people from one discipline have IDF representation of the proposals they have submitted to Nsf in reference for a paper. It can occur that all computer scientists the past. While it would be possible to use published papers of and no biologists give high preference scores to a bioinformatics authors downloaded from Citeseer [12] or Google Scholar as proposal, in which case a preference-based system will result in of expertise, there are advantages in using NSF one aspect of the proposal not being reviewed proposals in a practical system deployed at NSF. First, all proposals are similar in style and length. These conditions ar
concentrating on incremental work in mature fields. 3. If unrestricted text is allowed as descriptions for expertise, it is rare that potential reviewers, program directors, and proposal authors all select the same free text terms. Numerous studies of information retrieval systems have found low agreement among individuals assigning keywords to content (e.g., [8]). 4. There is not high compliance with requests of users to enter information into the database. Many researchers are too busy to fill out forms or hesitant to “volunteer” for reviewing. While agreeing to review proposals is a service to the funding agency, being asked to review proposals is as welcome to some as other forms of service such as serving on jury duty. 5. The interface for submitting proposals to NSF, Fastlane, does not allow keywords to be entered describing the proposals. While this could be added to the interface, doing so would require consensus that this will facilitate proposal handling and this has not been demonstrated convincingly. Due to the limitations of keyword-based database systems, when they are used within NSF, they are limited to suggesting a pool of candidates for a panel on a given topic. While Computer and Information Science and Engineering at NSF has experimented with a keyword system (e.g., in the 2001 ITR competition), it was not used in subsequent years. Finally, NSF has experimented with systems that allow panelists to indicate preferences for reviewing proposals within a panel. In such systems, panelists indicate their preference for reviewing a proposal on a numeric scale. Many conferences also use similar systems such as Cyberchair [9]. In Cyberchair, a constraint satisfaction algorithm assigns people proposals they are most interested in. These systems only address part of the reviewer assignment problem. They do not assist with identifying panelists but only assigning proposals to panelists once they have been identified. There has been an issue with compliance on these systems as well, i.e., not every panelist promptly enters preferences data and a single person not replying can delay the assignments for all others. In addition, it isn’t clear what the preference scores mean or how much thought goes into the assignments. While the intent is to judge how well qualified a reviewer is to review a proposal, we have observed many panelists having a strong preference for proposals by well known researchers and fewer having a preference for proposals by less established researchers. While NSF typically asks for preferences on 20-30 proposals, some conferences ask for preference data on 200-300 papers. The second author admits that when presented with 300 papers in Cyberchair, not as much time is spent reviewing the abstracts of the last batch of papers as the first to determine preferences. Finally, there is also a problem with multidisciplinary proposals if people from one discipline have a preference for a paper. It can occur that all computer scientists and no biologists give high preference scores to a bioinformatics proposal, in which case a preference-based system will result in one aspect of the proposal not being reviewed. 3. Revaide We have deployed a prototype system, Revaide, within NSF that addresses the problems with previous fully autonomous systems. The philosophy behind the system is to assist program directors and not replace their judgment with a black box system. One key design criteria is that Revaide offers suggestions that may be accepted or declined individually. In this section, we introduce Revaide, its tasks and solution, and evaluate the utility of using Revaide. We introduce a measure to evaluate how well the expertise of a group of reviewers is suited for a proposal. Following the discussion of the key components of Revaide in this section, we will report on the experiences using the algorithm. 3.1 Representing Proposals Proposals are submitted to NSF in PDF form. Revaide converts the proposals to ASCII and represents proposals in the standard TF-IDF vector space [10] as term vectors in the space of all words in the document collection. The entire proposal is used including the references and resume of the investigator. One simple use of Revaide is to annotate spreadsheets of proposals with the 20 terms with highest TF-IDF weights. These keywords are often more informative to program directors than the title to determine what a proposal is about. While early versions of Revaide used stemming [11] to convert words to root forms, we found that stemming reduced the human comprehensibility of the resulting term vector representation. Experience showed that using stemming did not increase the quality of the suggestions made by Revaide. Therefore, we no longer use stemming. One other enhancement also increased the comprehensibility of the resulting term representation. We augmented the stoplist of items that should not be used as keywords. While most stoplists include common words such as articles and prepositions, we augmented the stoplist to include words that appeared in proposals that were not descriptive of the proposal content, including the e-mail addresses of PIs and the name and city of the university. These words frequently occur within a few proposals and not in many others giving them high TF-IDF weights, but they confused program directors when used as keywords and degraded the quality of Revaide’s suggestions. An example will illustrate the representation used by Revaide for one proposal. The terms with the highest weights and their weights were image: 0.031, judgments: 0.028, feedback: 0.027, relevance: 0.026, multimodal: 0.020, retrieval: 0.019, and preference: 0.017. To preserve the privacy of the submitter, we cannot provide the title or abstract, but we find that the automatically extracted keywords do indeed provide a compact representation that makes sense to program directors and provides a basis to assist reviewers. 3.2 Representing Reviewer Expertise Revaide represents the expertise of a reviewer with the TFIDF representation of the proposals they have submitted to NSF in the past. While it would be possible to use published papers of authors downloaded from Citeseer [12] or Google Scholar as measures of expertise, there are advantages in using NSF proposals in a practical system deployed at NSF. First, all proposals are similar in style and length. These conditions are 864 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper deal for keyword extraction with TF-IDF. Second, the proposals The first step in cluster checking is to form a representation ave a variety of meta-data that is useful in other aspects of the of the important terms of the cluster. In Revaide, this is done by process. This meta-data includes the PIs name, e-mail address finding the centroid [10] of the proposals that are in each cluster, and other contact information, and an NSF ID for the Pls essentially creating a term vector for each cluster that is the university. This meta-data simplifies contacting the Pl and average"of the term vectors of the proposals. Next, the cosine checking for conflicts of interest between proposals and similarity [10] is found between each proposals term vector and reviewers. Third, NSF has a strong preference for using people each clusters term vector. REVAIDE produces a summary of with PH. D. degrees as reviewers, and one can't distinguish new e important terms in each cluster. These terms are chosen based graduate students from professors on published papers. By using on a weighted TF/IDF score. The example below illustrates such a people who have submitted to NSF as a reviewer In addition to the TF-IdF weight of each term problem is avoided since those eligible to apply to NSF are also prints out the number of proposals in the cluster that eligible to review. Finally, using proposals also avoids the ch it automatically creates a large pool of potential reviewers. A 24/280p senstem. 2o]ann 28/28) hobo: 0. 1447(n disadvantage of this approach is that it does include people who 22/28)imag: 0.114 (in 22/28) motion: 0.107 /A do not submit to NSF, such as researchers from industry or from 22/28) intellig: 0.104076 (in 25/28) mobil: 0.102 outside the Us. Of course, program directors may identify such (in23/28) agent:0.094(in18/28) autom:0.091 eople through usual means, such as checking the editorial board 0.077(in 23/28) sens: 0.068554 (in 26/28) of journals and program committees of conferences autonom:0.068(in25/28)se1f:0.068(in21/28) assemb1:0.064(in18/28) In practice, we restrict Revaide's pool of reviewers to those If the most similar cluster to a proposal is not the chu review process to insure that the reviewers were thought by their which a proposal has been assigned, that is a sign that a propos peers to have expertise in the area. We also leave out proposals is potentially in the wrong cluster. Such discrepancies are pointed with more than one author so that it is clear who has the e out to the program director with in a proposal. When more than one past proposal is available for a proposal to another panel. Below, the output of cluster checking given author, all of the proposals are combined by adding and is shown omitting any identifying information from the output. then re-normalizing the term vectors to form a model of the The top 20 terms of panel CIP-sc are: sensor section would also serve as the expertise representation of the 32/32) node: 0.147 (in 27/32) transtor: 0.157321 ous0.355(i 0.136(i 29/32) devic:0.132(in30/32) signa1:0.129(in 30/32) traffic:0.129(in22/32)grid:0.119 3.3 Cluster Checking 21/32) event:0.116937(in32/32) energ:0.107 The first task we consider is assisting groups of program (in 29/32) transmiss: 0. 105 (in 25/32)protoco directors to form panels. The most help is needed in large n25/32)mobi1:0.100(in26/32)rout competitions where 500-1500 proposals may be submitted at a 0.096 (in 23/32)agent: 0.092 (in 17/32) safeti time. NSF's system produces a spreadsheet that includes columns 0.091 (in 25/32) containing information such as the author' s name, institution, the Panel DsP is a better match for proposal title of the proposal and links to the abstract and the pdf of the NSFo4XXXX1 than cluster CIP-sC entire proposal. Teams of program directors manually sort these proposals first into general areas and then into panels of 20-30 In our experience, Revaide recommends a better panel for proposals. Due to the short time and large number of proposals, it approximately 5% of the proposals. We have received comments is possible that a proposal could be put into a panel with only a loose relationship to the majority of the proposals. Due to the overlooked that, "in response to Revaide's cluster checking distributed nature of the work, it is also possible that no one Often, Revaide finds a better panel that is a matter of emphasis claims responsibility for a proposal within a proposal, e.g contribution to comput As described earlier, attempts to use automated clustering opposed to making a failed at this task when program directors didn 't accept the results computer vision techniques of the clustering system. Instead of automatically clustering, Revaide checks the clusters produced by program directors for A special case of the cluster checking is when a proposal has coherence and suggests improvements. In addition, Revaide not been put into any panel. This can occur if no member of the suggests panels for"orphan proposals that are not assigned to am directors has identified that a panel. Furhermore, before program directors form panels, the proposal falls within the scope of the panel. In this case, the panel spreadsheet they use is augmented first with the terms that have that is most similar to the proposal is found, together with the next the highest TF-IDF weights of each proposal three, as determined by cosine similarity between the orphan This example shows an earlier version of Revaide that used Although the weights are not included, the terms are ordered by stemming [9], perhaps also illustrating why we turn stemming weigh off in later versions
ideal for keyword extraction with TF-IDF. Second, the proposals have a variety of meta-data that is useful in other aspects of the process. This meta-data includes the PI’s name, e-mail address and other contact information, and an NSF ID for the PI’s university. This meta-data simplifies contacting the PI and checking for conflicts of interest between proposals and reviewers. Third, NSF has a strong preference for using people with PH.D. degrees as reviewers, and one can’t distinguish new graduate students from professors on published papers. By using people who have submitted to NSF as a reviewer pool, this problem is avoided since those eligible to apply to NSF are eligible to review. Finally, using proposals also avoids the problem of disambiguating people with common names. Finally, it automatically creates a large pool of potential reviewers. A disadvantage of this approach is that it does include people who do not submit to NSF, such as researchers from industry or from outside the US. Of course, program directors may identify such people through usual means, such as checking the editorial board of journals and program committees of conferences. In practice, we restrict Revaide’s pool of reviewers to those authors of proposals that have been judged as “fundable” by the review process to insure that the reviewers were thought by their peers to have expertise in the area. We also leave out proposals with more than one author so that it is clear who has the expertise in a proposal. When more than one past proposal is available for a given author, all of the proposals are combined by adding and then re-normalizing the term vectors to form a model of the expertise. The example proposal representation in the previous section would also serve as the expertise representation of the author that submitted the proposal. 3.3 Cluster Checking The first task we consider is assisting groups of program directors to form panels. The most help is needed in large competitions where 500-1500 proposals may be submitted at a time. NSF’s system produces a spreadsheet that includes columns containing information such as the author’s name, institution, the title of the proposal and links to the abstract and the PDF of the entire proposal. Teams of program directors manually sort these proposals first into general areas and then into panels of 20-30 proposals. Due to the short time and large number of proposals, it is possible that a proposal could be put into a panel with only a loose relationship to the majority of the proposals. Due to the distributed nature of the work, it is also possible that no one claims responsibility for a proposal. As described earlier, attempts to use automated clustering failed at this task when program directors didn’t accept the results of the clustering system. Instead of automatically clustering, Revaide checks the clusters produced by program directors for coherence and suggests improvements. In addition, Revaide suggests panels for “orphan” proposals that are not assigned to a panel. Furthermore, before program directors form panels, the spreadsheet they use is augmented first with the terms that have the highest TF-IDF weights1 of each proposal. 1 Although the weights are not included, the terms are ordered by weight. The first step in cluster checking is to form a representation of the important terms of the cluster. In Revaide, this is done by finding the centroid [10] of the proposals that are in each cluster, essentially creating a term vector for each cluster that is the “average” of the term vectors of the proposals. Next, the cosine similarity [10] is found between each proposal’s term vector and each cluster’s term vector. REVAIDE produces a summary of the important terms in each cluster. These terms are chosen based on a weighted TF/IDF score. The example below illustrates such a summary. In addition to the TF-IDF weight of each term2 , Revaide also prints out the number of proposals in the cluster that contain each term. The top 20 terms of panel ROB are: robot: 0.267(in 24/28) sensor: 0.203 (in 28/28) vehicl: 0.144 (in 22/28) imag: 0.114 (in 22/28) motion: 0.107 (in 22/28) intellig: 0.104076 (in 25/28) mobil: 0.102 (in 23/28) agent: 0.094 (in 18/28) autom: 0.091 (in 25/28) movement: 0.078 (in 17/28) action: 0.077 (in 23/28) sens: 0.068554 (in 26/28) autonom: 0.068 (in 25/28) self: 0.068 (in 21/28) assembl: 0.064 (in 18/28) If the most similar cluster to a proposal is not the cluster to which a proposal has been assigned, that is a sign that a proposal is potentially in the wrong cluster. Such discrepancies are pointed out to the program director with a suggestion to move the proposal to another panel. Below, the output of cluster checking is shown omitting any identifying information from the output. The top 20 terms of panel CIP-SC are: sensor: 0.355 (in 31/32) vehicl: 0.2493 (in 22/32) wireless: 0.178 (in 29/32) monitor: 0.157 (in 32/32) node: 0.147 (in 27/32) transport: 0.136 (in 29/32) devic: 0.132 (in 30/32) signal: 0.129 (in 30/32) traffic: 0.129 (in 22/32) grid: 0.119 (in 21/32) event: 0.116937 (in 32/32) energi: 0.107 (in 29/32) transmiss: 0.105 (in 25/32) protocol: 0.103 (in 27/32) flow: 0.103 (in 26/32) layer: 0.100317 (in 25/32) mobil: 0.100 (in 26/32) rout: 0.096 (in 23/32) agent: 0.092 (in 17/32) safeti: 0.091 (in 25/32) Panel DSP is a better match for proposal NSF04XXXX1 than cluster CIP-SC. In our experience, Revaide recommends a better panel for approximately 5% of the proposals. We have received comments from program directors that include, “Thanks, I don’t know how I overlooked that,” in response to Revaide’s cluster checking. Often, Revaide finds a better panel that is a matter of emphasis within a proposal, e.g., determining that a proposal will make a contribution to computer vision for astronomical applications as opposed to making a contribution to astronomy using existing computer vision techniques. A special case of the cluster checking is when a proposal has not been put into any panel. This can occur if no member of the distributed team of program directors has identified that a proposal falls within the scope of the panel. In this case, the panel that is most similar to the proposal is found, together with the next three, as determined by cosine similarity between the orphan 2 This example shows an earlier version of Revaide that used stemming [9], perhaps also illustrating why we turn stemming off in later versions. 865 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper proposal vector and the centroids of the panels. The output below responsibility in the initial sort. It is then assigned to a program director in the panel checking stage. In the next section, we op terms of NSF04xXXX2 are: sensor, wireless, discuss assisting in the assignment of reviewers to proposals hierarch hannel, energi alloc, poor, radio, path. Alternate panels for orphan: cluster WoN3, cluster 3.5 Assigning Reviewers WoN,C⊥ uster CIP-SC The most straightforward way to s for a proposal would simply be to select the hors of the previous This algorithm for assigning an orphan proposal to a panel is related to Rocchio's algorithm for text classification [13].The proposals that are the most similar to w prop MailCAT system [14] used the idea of displaying a few possible reviewed. This is the approach that has been used in some past folders for filing e-mail messages analogous to the way that efforts at automatic reviewer assignments (e.g,[15]). This Revaide finds a few possible panels. In both cases, the idea is to approach does a fair job but has some important drawbacks. The ope with the reality that text classification is not 100% accurate main problem occurs when a proposal has more than one topic (a fairly common while providing benefit by focusing a person on a few occurrence)and one topic dominates the match possibilities out of the many that are available with other proposals. This leads to a set of reviewers that all have e same expertise, often leaving other topics in the target document uncovered. For example, consider a document about 3.4 Proposal classification data mining using Gaussian mixture models to predict outcomes Revaide has the capability of performing text classification. in a medical context. Ideally you would want a mix of reviewer The algorithm for recommending a panel for orphan proposals is expertise for this document: general data mining, the specific one use of text classification. This section describes another use: technique being used, as well as the field it is being applied to performing an initial assignment of proposals to program Simply selecting reviewers by document similarity would tend to directors. Recall that teams of program directors sort through select reviewers who matched most closely to the primary topic of proposals to identify the major area before further subdividing the paper (as determined by the TF-IDF weighting process) nto panels. Revaide can use a text classification algorithm to possibly failing to select any reviewers at all for an important perform this initial sort. In this case, the training data is the secondary topic of the docum previous year's proposals and the class is the name of the program To solve this problem, we approach the task slightly officer who organized the review panel the previous year. Ina differently. Instead of finding the N closest matches for the target the goal of the text classification is to find the person who will assume initial ownership of this year's proposal proposal, we look for the set of N proposals that together best responsibilities in the prior year. The initial am director match the target document. We define a measure that indicates either places a proposal into a panel they will or to another program officer who is a better match In a study using cross validation of th or passes it the degree of the overlap between the terms in a proposal vector submitted to Information and Intelligent Systems, the terms represent a proposal as a normalized weighted vector of classification accurad 80.9%. This clearly is not good enough for a fully automated system. However, it provides P1…,P tremendous benefits within the existing workflow. For example, rather than having 10 people each sort through 1000 proposals to we represent a reviewer s expertise as a find proposals of interest, each person is initially assigned normalized vector: approximately 100 by the text classification algorithm. Each program director then reviews those 100 proposals and on average needs to find a better program director for 20 proposals. This has greatly reduced the amount of effort required to identify the best Where P, is the weight of term i in a pre program officer for each proposal weight of term i in a reviewers expertise e define a Revaide assists with ead the panel formation residual term vector to represent the relevant n the pi process, first by recommending an initial program officer. nd that are not in the expertise of the reviewer. The weight of each the final program officer is decided upon for each proposal, the of the residual term vectors is the difference between the weight proposals are manually subdivided into panels and the panels are the proposal and expertise vector with a minimum ofo checked for coherence. A proposal might be"orphaned"if it was initially misrouted or delayed or if no program officer claimed More generally, there is typically more than one reviewer NSF,the initial assignment may be based upon the to e define the residual term vector when there are k reviewers many program officers are rotators who spend a short fficer's predecessors proposals This overview les the process. Two program e.g., at th R max(O tersection of databases and artificial intelli
proposal vector and the centroids of the panels. The output below illustrates this process. Top terms of NSF04XXXX2 are: sensor, wireless, hierarch, node, channel, energi, signal, rout, alloc, poor, radio, path. Cluster WON2 is the best match for NSF04XXXX2 Alternate panels for orphan: Cluster WON3, Cluster WON, Cluster CIP-SC This algorithm for assigning an orphan proposal to a panel is related to Rocchio’s algorithm for text classification [13]. The MailCAT system [14] used the idea of displaying a few possible folders for filing e-mail messages analogous to the way that Revaide finds a few possible panels. In both cases, the idea is to cope with the reality that text classification is not 100% accurate while providing benefit by focusing a person on a few possibilities out of the many that are available. 3.4 Proposal Classification Revaide has the capability of performing text classification. The algorithm for recommending a panel for orphan proposals is one use of text classification. This section describes another use: performing an initial assignment of proposals to program directors. Recall that teams of program directors sort through proposals to identify the major area before further subdividing into panels. Revaide can use a text classification algorithm to perform this initial sort. In this case, the training data is the previous year’s proposals and the class is the name of the program officer who organized the review panel the previous year. That is, the goal of the text classification is to find the person who will assume initial ownership of this year’s proposals based upon their responsibilities in the prior year3 . The initial program director either places a proposal into a panel they will organize or passes it to another program officer who is a better match for the proposal. In a study using cross validation of the 2004 proposals submitted to Information and Intelligent Systems, the classification accuracy was 80.9%. This clearly is not good enough for a fully automated system. However, it provides tremendous benefits within the existing workflow. For example, rather than having 10 people each sort through 1000 proposals to find proposals of interest, each person is initially assigned approximately 100 by the text classification algorithm. Each program director then reviews those 100 proposals and on average needs to find a better program director for 20 proposals. This has greatly reduced the amount of effort required to identify the best program officer for each proposal. Revaide assists with each step of the panel formation process, first by recommending an initial program officer. Once the final program officer is decided upon for each proposal4 , the proposals are manually subdivided into panels and the panels are checked for coherence. A proposal might be “orphaned” if it was initially misrouted or delayed or if no program officer claimed 3 Because many program officers are rotators who spend a short time at NSF, the initial assignment may be based upon the program officer’s predecessor’s proposals. 4 This overview slightly simplifies the process. Two program directors may decide to hold a joint panel, e.g., at the intersection of databases and artificial intelligence. responsibility in the initial sort. It is then assigned to a program director in the panel checking stage. In the next section, we discuss assisting in the assignment of reviewers to proposals. 3.5 Assigning Reviewers The most straightforward way to choose N reviewers for a proposal would simply be to select the N authors of the previous proposals that are the most similar to the new proposal to be reviewed. This is the approach that has been used in some past efforts at automatic reviewer assignments (e.g., [15]). This approach does a fair job but has some important drawbacks. The main problem occurs when a proposal has more than one topic (a fairly common occurrence) and one topic dominates the match with other proposals. This leads to a set of reviewers that all have the same expertise, often leaving other topics in the target document uncovered. For example, consider a document about data mining using Gaussian mixture models to predict outcomes in a medical context. Ideally you would want a mix of reviewer expertise for this document: general data mining, the specific technique being used, as well as the field it is being applied to. Simply selecting reviewers by document similarity would tend to select reviewers who matched most closely to the primary topic of the paper (as determined by the TF-IDF weighting process) possibly failing to select any reviewers at all for an important secondary topic of the document. To solve this problem, we approach the task slightly differently. Instead of finding the N closest matches for the target proposal, we look for the set of N proposals that together best match the target document. We define a measure that indicates the degree of the overlap between the terms in a proposal vector and a set of expertise vectors. We represent a proposal as a normalized weighted vector of terms: n P p ,..., p 1 = r . Similarly, we represent a reviewer’s expertise as a normalized vector: n E e ,..., e 1 = r . Where pi is the weight of term i in a proposal and ri is the weight of term i in a reviewer’s expertise vector. We define a residual term vector to represent the relevant terms in the proposal that are not in the expertise of the reviewer. The weight of each of the residual term vectors is the difference between the weight in the proposal and expertise vector with a minimum of 0. max(0, ),..., max(0, ) 1 1 n e n R = p − e p − r . More generally, there is typically more than one reviewer and we define the residual term vector when there are k reviewers to be ) , ),..., max(0, , max(0, 1 1 = − ∑ − ∑ k i n i e n p k i i R p ε e ε r 866 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper where a controls the amount of overlap in expertise desired in the An important aspect algorithm is that it can easily be 1, then it is sufficient to have one reviewer started from a partial solution. This turns out to be a very useful whose expertise about a term equals the importance of that term property when considering the context in which the system is to the proposal. If E is 0.5, then two reviewers should have used By allowing program directors to provide a partial solution expertise on every term in the proposal that will then guide the system towards its final solution, we allow the experts to use Revaide as a tool to assist them to complete To compare alternative sets of reviewers and alternative their jobs rather than using it to completely replace their approaches for finding reviewers we define a measure called Sum of Residual Term Weight (SRTW)to be Another benefit of sRTM is that it may be used to determine whether a proposal has reviewers with adequate expertise. Whe SRTM Pr there is no reviewer with expertise on an aspect of the proposal the value of SRTM for that proposal would be higher than others This might occur if the pool of reviewers is too small or if the We define the goal of assigning I to be finding a set proposal is on a topic that had not received submissions in the of reviewers that reduces the sum of res ms to be o and the ast. One way to find a reviewer in this case is to use the terms one set of reviewers is better suited to review a proposal than a with the highest residual weights as query to a specialized search number if that set of reviewers has a lower Srtm engine such as Google Scholar. Figure I illustrates the results of We have implemented a hill-climbing search algorithm to Google Scholar using the three terms with the highest residual weights from table 1. Although Google Scholar is not integrated find a set of reviewers for each proposal. We start by finding the with the entire workflow of Revaide(e.g, it doesn't identify the best" reviewer and then iteratively select another reviewer until N are found. At each step, the reviewer that minimizes SRTM is e-mail address and affiliation of the authors), it still provides a selected. This iterative process will reduce the residual term useful way of recommending reviewers eight. The residual term weight with no reviewers is 1.0(since As we have described assigning reviewers and srtm so far e work with normalized vectors). As each reviewer is selected, the goal is to find a set of reviewers for a single propos the term weights are adjusted according to the expertise of the However, at NSF panels, reviewers typically review several reviewer. By subtracting the expertise vector from the document proposals in a panel. Revaide can easily be used to recommend panelists for a set of proposals. Recall that in cluster checking, will decrease Revaide creates a term vector for each panel that is the centroids Table I shows a trace of how the residual term weights are of the proposals in the panel. This clu reduced by selecting reviewers. The row shows the most the terms that are most important to the proposals in the pan To invite panelists, Revaide simply finds the panelists whose remaining table shows the residual term vector after subtracting expertise best reduces the SRTM of the centroid of the panel.In feedback for image retrieval is to be reviewed. The first reviewer reviewers might be selected for a panel of 24 proposals. A lower each expertise vector(with 8=0.5). A proposal on relevance selected is an expert on image retrieval. Once that contribution value of s is used when selecting reviewers for a panel. For has been accounted for, we see terms such as"image" have a example, a value of 0. 2 will bias Revaide toward finding 5 lower term weight, reducing their impact on finding the next viewers with expertise in the major areas. In reality not everyone who is invited to review actually agrees to. Therefore reviewer. The second reviewer has greater experience M The we typically ask 20 with the expectation of getting a 50% yield rocess repeats until the desired number of reviewers are found vIewer using the confirmed reviewers as a starting point and finding reviewers to complement their expertise After Reviewer 1 judgments 0.280 feedback 0.023 relevance 0022im 0.020 multimodal.020 After Reviewer 2 feedback.023image 0.020 multimodal 0.020 preference 0.016 judgments 0.015 After Reviewer 3 feedback 0.020 multimodal 0.019 preference 0.016 judgments 0.015 solici Table 1. a trace of the residual term vectors after assigning reviewers 67
where ε controls the amount of overlap in expertise desired in the reviewers. If ε is 1, then it is sufficient to have one reviewer whose expertise about a term equals the importance of that term to the proposal. If ε is 0.5, then two reviewers should have expertise on every term in the proposal. To compare alternative sets of reviewers and alternative approaches for finding reviewers we define a measure called Sum of Residual Term Weight (SRTW) to be: = ∑ − ∑ i k j i j e i SRTM p ) , max(0, ε We define the goal of assigning reviewers to be finding a set of reviewers that reduces the sum of residual terms to be 0 and the one set of reviewers is better suited to review a proposal than a number if that set of reviewers has a lower SRTM. We have implemented a hill-climbing search algorithm to find a set of reviewers for each proposal. We start by finding the “best” reviewer and then iteratively select another reviewer until N are found. At each step, the reviewer that minimizes SRTM is selected. This iterative process will reduce the residual term weight. The residual term weight with no reviewers is 1.0 (since we work with normalized vectors). As each reviewer is selected, the term weights are adjusted according to the expertise of the reviewer. By subtracting the expertise vector from the document vector, the sum of residual term weights in the document vector will decrease. Table 1 shows a trace of how the residual term weights are reduced by selecting reviewers. The row shows the most important terms in the term vector of a proposal and the remaining table shows the residual term vector after subtracting each expertise vector (with ε =0.5). A proposal on relevance feedback for image retrieval is to be reviewed. The first reviewer selected is an expert on image retrieval. Once that contribution has been accounted for, we see terms such as “image” have a lower term weight, reducing their impact on finding the next reviewer. The second reviewer has greater experience in image relevance judgments and these terms are reduced in weight. The process repeats until the desired number of reviewers are found. An important aspect of this algorithm is that it can easily be started from a partial solution. This turns out to be a very useful property when considering the context in which the system is used. By allowing program directors to provide a partial solution that will then guide the system towards its final solution, we allow the experts to use Revaide as a tool to assist them to complete their jobs rather than using it to completely replace their judgments. Another benefit of SRTM is that it may be used to determine whether a proposal has reviewers with adequate expertise. When there is no reviewer with expertise on an aspect of the proposal, the value of SRTM for that proposal would be higher than others. This might occur if the pool of reviewers is too small or if the proposal is on a topic that had not received submissions in the past. One way to find a reviewer in this case is to use the terms with the highest residual weights as query to a specialized search engine such as Google Scholar. Figure 1 illustrates the results of Google Scholar using the three terms with the highest residual weights from table 1. Although Google Scholar is not integrated with the entire workflow of Revaide (e.g., it doesn’t identify the e-mail address and affiliation of the authors), it still provides a useful way of recommending reviewers. As we have described assigning reviewers and SRTM so far, the goal is to find a set of reviewers for a single proposal. However, at NSF panels, reviewers typically review several proposals in a panel. Revaide can easily be used to recommend panelists for a set of proposals. Recall that in cluster checking, Revaide creates a term vector for each panel that is the centroids of the proposals in the panel. This cluster term vector represents the terms that are most important to the proposals in the panel. To invite panelists, Revaide simply finds the panelists whose expertise best reduces the SRTM of the centroid of the panel. In this case, rather than assigning four reviewers to a proposal, 12 reviewers might be selected for a panel of 24 proposals. A lower value of ε is used when selecting reviewers for a panel. For example, a value of 0.2 will bias Revaide toward finding 5 reviewers with expertise in the major areas. In reality not everyone who is invited to review actually agrees to. Therefore, we typically ask 20 with the expectation of getting a 50% yield. Once many reviewers have accepted, Revaide can be run again using the confirmed reviewers as a starting point and finding reviewers to complement their expertise. Proposal image 0.031 judgments 0.028 feedback 0.027 relevance 0.026 multimodal 0.020 After Reviewer 1 judgments 0.280 feedback 0.023 relevance 0.022 image 0.020 multimodal 0.020 After Reviewer 2 feedback 0.023 image 0.020 multimodal 0.020 preference 0.016 judgments 0.015 After Reviewer 3 feedback 0.020 multimodal 0.019 preference 0.016 judgments 0.015 solicit 0.011 Table 1. A trace of the residual term vectors after assigning reviewers. 867 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper odal preference- Google Scholar- Microsoft Internet Explorer Good dle ack multimodal preference lar Help Resuts 1-30 cf about 3.010 for feedback multimodal preference. (0. 11 seconds) Multimodal interfaces for dynamic interactve meps- group of 10>r6. portal acm. org aback, and Investigating a Multimodal Solut on for Improving Force Feedback Generated Textures. group ot 23 53 Multimodal augmentation can i mmands to the multimod a modal search and relevance feedback in a digital vdeo library imodal Search and Relevance Feedback. Keywords: Multimodal Search, Releance plicit Feedback, and Explicit Feeback. 1. Introduction VA JACK cal validation of the indows E) accessibility settings and multimodal feedback for a menu ack for a menu selection task multimodal feedback had a neg ible effe Diabetic Retinopathy Error resoution dunng multimodal human-computer interaction- group of 2n Figure 1. Using the terms with the highest residual weights as a query to Google Scholar work was then investigated by NSFs Inspector General 3.6 Integration with NSF Systems not emphasized in this paper, Revaide still retains these Over the course of three years, Revaide has transitioned from capabilities a set of utilities run remotely at UC Irvine to a prototype deployed within the cise directorate at NSF. In the first year, Revaide was In the second year, Revaide was re-engineered to accept meta-data on proposals so that it can do conflict of interest bility to find the N most similar to the previous proposals. This checking and produce output that includes names and contact into nowed the promise of the technique, but the utility was of potential reviewers. Revaide also has access to the previous diminished by the loose coupling with NSF's systems and summary reviews rankings and funding decisions on proposals so workflow. For example, the first version of Revaide could find only those whose expertise has been validated by the peer-review the closest proposals but didn't have the meta-data process were considered as potential reviewers. Revaide also automatically associate a title, author and the author' s contact info helped NSF achieve its diversity goals by including some with the proposal. These were later manually added demographic data on reviewers. If a proposal or panel did not spreadsheets. At this point, Revaide also could not perform include female reviewers, reviewers from underrepresented conflict of interest checking and would recommend a reviewer groups, or reviewers from EPSCOR states (ie, states that do from the same institution as the proposal s author, a violation of receive much federal research funding), additional reviewers were NSF's policies. Furthermore, it would even recommend that an recommended from these groups, insuring that proposals are not author review a proposal by that author based on the author's just reviewed by an " old club"and that a diverse group of prior proposa investigators has the benefits of participating in funding decisions The inability to perform conflict of interest checking also At this point, Revaide was also changed from using cosine lead to a serendipitous finding: Revaide could be used to spot similarity for selecting reviewers to using the residual term weight probable revisions of prior year's proposals. A new proposal was proach described earlier. This was done in response to th typically much more similar to a prior version of that same problem of cosine similarity on interdisciplinary proposals proposal than any other previous proposal. In a small number of leading to recommending proposals only from a single discipline stances, we found a proposal that was too similarto However, Revaide was still used remotely from California when previously submitted funded proposal, a clear violation of NSF the results and data were in Arlington, VA. Delays caused by licy. In other cases, we found too much similarity to a proposal computation to converted proposals from PDF to ASCII and index currently under review at another part of NSF, another violation proposals, transferring gigabytes of data, minor errors in the meta of NSF policy. Revaide merely alerted program directors to these possible violations. Program directors decided whether there was a probable violation, which in the case of resubmission of funded
3.6 Integration with NSF Systems Over the course of three years, Revaide has transitioned from a set of utilities run remotely at UC Irvine to a prototype deployed within the CISE directorate at NSF. In the first year, Revaide was used only in parallel with existing systems and only had the ability to find the N most similar to the previous proposals. This showed the promise of the technique, but the utility was diminished by the loose coupling with NSF’s systems and workflow. For example, the first version of Revaide could find the closest proposals but didn’t have the meta-data to automatically associate a title, author and the author’s contact info with the proposal. These were later manually added to spreadsheets. At this point, Revaide also could not perform conflict of interest checking and would recommend a reviewer from the same institution as the proposal’s author, a violation of NSF’s policies. Furthermore, it would even recommend that an author review a proposal by that author based on the author’s prior proposal. The inability to perform conflict of interest checking also lead to a serendipitous finding: Revaide could be used to spot probable revisions of prior year’s proposals. A new proposal was typically much more similar to a prior version of that same proposal than any other previous proposal. In a small number of instances, we found a proposal that was “too similar” to a previously submitted funded proposal, a clear violation of NSF policy. In other cases, we found too much similarity to a proposal currently under review at another part of NSF, another violation of NSF policy. Revaide merely alerted program directors to these possible violations. Program directors decided whether there was a probable violation, which in the case of resubmission of funded work was then investigated by NSF’s Inspector General. While not emphasized in this paper, Revaide still retains these capabilities. In the second year, Revaide was re-engineered to accept meta-data on proposals so that it can do conflict of interest checking and produce output that includes names and contact info of potential reviewers. Revaide also has access to the previous summary reviews rankings and funding decisions on proposals so only those whose expertise has been validated by the peer-review process were considered as potential reviewers. Revaide also helped NSF achieve its diversity goals by including some demographic data on reviewers. If a proposal or panel did not include female reviewers, reviewers from underrepresented groups, or reviewers from EPSCOR states (i.e, states that do receive much federal research funding), additional reviewers were recommended from these groups, insuring that proposals are not just reviewed by an “old boys club” and that a diverse group of investigators has the benefits of participating in funding decisions. At this point, Revaide was also changed from using cosine similarity for selecting reviewers to using the residual term weight approach described earlier. This was done in response to the problem of cosine similarity on interdisciplinary proposals leading to recommending proposals only from a single discipline. However, Revaide was still used remotely from California when the results and data were in Arlington, VA. Delays caused by computation to converted proposals from PDF to ASCII and index proposals, transferring gigabytes of data, minor errors in the metaFigure 1. Using the terms with the highest residual weights as a query to Google Scholar. 868 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper ta and the difference in time zone typically resulted in a two- that best reduces the of residual weights from the or three-day turnaround in running the system. Nonetheless, the centroid of the panel =0.5) the panelists ar system illustrated its utility by finding proposals that were selected, then four panelists are assigned to proposals by revaide obviously assigned to the wrong panel and suggesting qualifie using SRTM with e=0.5. The mean residual term weight under reviewers that were overlooked by program officers these conditions is shown in Table 2. It is apparent from this In the third year, encouraged by the results of the second gure that both approaches that examine the proposals to select year, NSF purchased the appropriate computer equipment and ran panelists have a benefit over picking panelists who are experts in Revaide in house. Furthermore, this enabled tighter integration the general subject area( Standing Panel). Furthermore, selecting with NSFs databases, e.g., proposal meta-data was accepted in panelists with complementary expertise( SRTM) has an advantage of selecting panelists whose expertise is most similar to the he exact format produced by NSF's systems rather than requiring central theme of the proposals(Similarity) an intermediate step f manually reformat Furthermore, processes were put into place to accurately record and maintain the data used by revaide. This reduced the time quired to get results from Revaide from a few day Standing Panel Similarity SRTM hours. Plans are now being evaluated to have a contractor full 0.783 0.662 0.5 grate revaide with NSF's intemal systems and build a web interface to Revaide. In the next section, we summarize the Table 2. Sum of residual term weights with three experiences in the third year of using Revaide alternative appro 4. Evaluation and Lessons learned 4.3 Experiences and Lessons Learned In this section, we report on two experiments that In the third year of Reviade's development, it was relied irically evaluate the utility of the residual term upon heavily in the Information and Intelligent Systems(IIs)with ach in assigning reviewers. We also report on the the evaluation of a competition that received slightly over 1000 we have learned in deploying Revaide in the government proposals and several other competitions with 200-500 proposals It was also used in competitions in the Computer and 4.1 Selecting Reviewers for Proposals Communication Foundations Division and a Computer and We consider selecting reviewers independently for Information Science and Engineering interdisciplinary proposals. In particular, for each proposal submitted to the 2004 competition. In lIs, Revaide was relied upon to initially dispatch Information Technology Research program in the division of proposals to program officers, to check panels for coherence, to Information and Intelligent Systems, a total of approximately find panels for orphan proposals, and to recommend reviewers for 1, 500, we compare finding the three closest review etermined by cosine similarity to the three that best reduce Revaide greatly reduced the time required to form SRTM. In each case the pool of reviewers is the people who panels. In one competition, this was essentially submitted proposals to the division in the prior three years. The completed in two weeks compared to average sum of residual term weights (with E=0.5) decreases approximately six weeks for a smaller competition from 0.636 for the three closest to 0.569 for Revaide's approach that didnt use revaide Note that this average does not tell the entire story. For more thar five percent of the proposals, perhaps the most interdisciplinar e pool of reviewers beyond proposals, there was a difference of greater than 0. 15 in the sum those normally called upon by program officers of residual terms, demonstrating the importance of finding a set of While some members of the community had been reviewers with complementary expertise. Of course, it may seem called upon repeatedly, others with similar ke a tautology to show that a system that attempts to minimize expertise had been overlooked. In many cases, SRTM has a lower SRTM. However, this also puts a number people who had not reviewed before agreed to behind the intuition that similarity alone isn't sufficient for review nearly immediately when asked, while finding reviewers for interdisciplinary proposals those frequently called upon are more reluctant to serve another time 4.2 Selecting panelists Revaide greatly reduced the amount of time to find Here we consider alternative strategies for selecting panelists reviewers for panels. One program officer for two panels of proposals submitted to the 2005 Universal reported it took a week rather than a month to Access solicitation. For each panel Dare using SIX finalize two panels. randomly selected people funded in the prior year as reviewers (analogous to the common conference practice of inviting a One program officer after using Revaide asked program committee before papers are submitted or the NIH panelists to select which proposals they were most practice of having a standing panel), the six reviewers closest to interested in reviewing the centroid of the proposals in the panel, and the six reviewers desired proposals by the panelists were indeed the proposals that led to the reviewers invitation. officer thought the For example, an unexpected carriage return in a proposal title reviewers suggested had expertise that wasn't resulted in an ill-formed tab separated file relevant to the proposal. However, after the 69
data5 and the difference in time zone typically resulted in a twoor three-day turnaround in running the system. Nonetheless, the system illustrated its utility by finding proposals that were obviously assigned to the wrong panel and suggesting qualified reviewers that were overlooked by program officers. In the third year, encouraged by the results of the second year, NSF purchased the appropriate computer equipment and ran Revaide in house. Furthermore, this enabled tighter integration with NSF’s databases, e.g., proposal meta-data was accepted in the exact format produced by NSF’s systems rather than requiring an intermediate step of manually reformatting the data. Furthermore, processes were put into place to accurately record and maintain the data used by Revaide. This reduced the time required to get results from Revaide from a few days to a few hours. Plans are now being evaluated to have a contractor fully integrate Revaide with NSF’s internal systems and build a web interface to Revaide. In the next section, we summarize the experiences in the third year of using Revaide. 4. Evaluation and Lessons Learned In this section, we report on two experiments that empirically evaluate the utility of the residual term weight approach in assigning reviewers. We also report on the lessons we have learned in deploying Revaide in the government context. 4.1 Selecting Reviewers for Proposals We consider selecting reviewers independently for proposals. In particular, for each proposal submitted to the 2004 Information Technology Research program in the division of Information and Intelligent Systems, a total of approximately 1,500, we compare finding the three closest reviewers as determined by cosine similarity to the three that best reduce SRTM. In each case the pool of reviewers is the people who submitted proposals to the division in the prior three years. The average sum of residual term weights (with ε = 0.5) decreases from 0.636 for the three closest to 0.569 for Revaide’s approach. Note that this average does not tell the entire story. For more than five percent of the proposals, perhaps the most interdisciplinary proposals, there was a difference of greater than 0.15 in the sum of residual terms, demonstrating the importance of finding a set of reviewers with complementary expertise. Of course, it may seem like a tautology to show that a system that attempts to minimize SRTM has a lower SRTM. However, this also puts a number behind the intuition that similarity alone isn’t sufficient for finding reviewers for interdisciplinary proposals. 4.2 Selecting Panelists Here we consider alternative strategies for selecting panelists for two panels of proposals submitted to the 2005 Universal Access solicitation. For each panel, we compare using six randomly selected people funded in the prior year as reviewers (analogous to the common conference practice of inviting a program committee before papers are submitted or the NIH practice of having a standing panel), the six reviewers closest to the centroid of the proposals in the panel, and the six reviewers 5 For example, an unexpected carriage return in a proposal title resulted in an ill-formed tab separated file. that best reduces the sum of residual term weights from the centroid of the panel (with ε = 0.5). Once the panelists are selected, then four panelists are assigned to proposals by Revaide using SRTM with ε = 0.5. The mean residual term weight under these conditions is shown in Table 2. It is apparent from this figure that both approaches that examine the proposals to select panelists have a benefit over picking panelists who are experts in the general subject area (Standing Panel). Furthermore, selecting panelists with complementary expertise (SRTM) has an advantage of selecting panelists whose expertise is most similar to the central theme of the proposals (Similarity). Standing Panel Similarity SRTM 0.783 0.662 0.521 Table 2. Sum of residual term weights with three alternative approaches to selecting panelists. 4.3 Experiences and Lessons Learned In the third year of Reviade’s development, it was relied upon heavily in the Information and Intelligent Systems (IIS) with the evaluation of a competition that received slightly over 1000 proposals and several other competitions with 200-500 proposals. It was also used in competitions in the Computer and Communication Foundations Division and a Computer and Information Science and Engineering interdisciplinary competition. In IIS, Revaide was relied upon to initially dispatch proposals to program officers, to check panels for coherence, to find panels for orphan proposals, and to recommend reviewers for most panels. Some summary results and lessons learned included: • Revaide greatly reduced the time required to form panels. In one competition, this was essentially completed in two weeks compared to approximately six weeks for a smaller competition that didn’t use Revaide. • Revaide increased the pool of reviewers beyond those normally called upon by program officers. While some members of the community had been called upon repeatedly, others with similar expertise had been overlooked. In many cases, people who had not reviewed before agreed to review nearly immediately when asked, while those frequently called upon are more reluctant to serve another time. • Revaide greatly reduced the amount of time to find reviewers for panels. One program officer reported it took a week rather than a month to finalize two panels. • One program officer after using Revaide asked panelists to select which proposals they were most interested in reviewing. Frequently, the most desired proposals by the panelists were indeed the proposals that led to the reviewer’s invitation. • In one case, a program officer thought the reviewers suggested had expertise that wasn’t relevant to the proposal. However, after the 869 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper ram director read the and not just the Our goal with residual term weight is to represent the terms act. it was found that in a proposal left uncovered by a partial set of reviewers. One on the reviewers approach to this problem is Maximal Marginal Relevance(MMr) were selected 1201. MMR provides a way to adjust the ranking(or re-rank) the leve there are several factors responsible for the retrieved results of a query to produce a diverse set of documents. MMR is based on comparing retrieved documents to each other in succe order to select a diverse group. In contrast, SRTW is a more 1. Rapid turnaround is quite important in getting the focused measure that seeks to achieve diversity to satisy the goal system accepted. Even a day's delay at a critical of covering terms in a source document time cannot be tolerated. This implies a close integration between the existing databases and 6. Future Work processes and the reviewer recommendation into its data infrastructure and workflow. Revaide would then be 2. The system was put within the existing workflow able to directly access Nsf databases rather than going through of the organization. Other alternatives explored, intermediate files. We plan on conducting further research on the such as automated clustering, redefined the roles of general topic of reviewer assignment. In particular, we ar people in the organization. exploring approaches that will balance reviewer assignments across reviewers on a panel. We believe such an approach 3. The system is not a black box that produces a need to consider the residual term weights, the number of solution but rather provides a basis for its proposals assigned to a reviewer, and the distance between a recommendations in terms of automatically proposal and a reviewers expertise(because in our experience derived keywords. For example, the keywords for reviewers have a strong aversion to reviewing proposals outside an Al panel were logic, reasoning their expertise) infe lanni ment, game, 7. Conclusions ecision motifs robabilistic We have described Revaide, an emerging application rule. Similarly, a confusion matrices for deployed at NSF as prototype. While much of Revaide relies oposal assignment convinced people that the olution was much better than chance but not mining. First, omniscient have defined a new measure of similarity suited for insuring that tise is found for all aspects of a proposal to be 4. Each recomme was subject to validation Second, we have shown that text mining technology can be and could be d independently of others. deployed to augment rather than replace human judgment pplement the capabilities of program officers and 8. ACKNOWLEDGMENTS serves as"another set of eyes" to focus program officers attention on potential improvements Thanks to all at nSf who were instrumental This also means that imperfect technology(e.g,a development and deployment of Revaide. Feedback of classifier with 80% accuracy) can still be users helped in the design of later versions. Many also beneficial in an organization that has higher navigate the approval standards 9. REFERENCES 5. Related work [1 willett, P(1998). Recent Trends in Hierarchic Document Revaide addresses the challenge of assigning reviewers (cf Clustering: A Critical Review, Information Processing and 116. The main technical contribution of Revaide is the use of the Management, 24(5), 577-597. sum of residual term weights measure in reviewer assignment. In [2] Larsen, B and Chinatsu A. (1999). Fast and effective text plementation, we used a well established but simple document mining using linear-time document clustering, Proceedings model: TF-IDF weights on words. The residual term weight of the fifth ACM SIGKDD international conference or approach is independent of the document model and could just as Knowledge discovery and data mining, pp 16-22 asily be used with hand-selected keywords, LSI terms(e.g 17D, or author and topic models(e.g, [18 and [19]). We did [3] Hopcroft, J, Khan, O, Kulis, B. Selman, B Tracking evolving communities in large linked networks. Proc. Natl indeed consider using lsI in Revaide but have decided against it Acad. Sci. USA 101(Suppl. 1), 5249-5253 because LSI doesn't produce terms that are easily understood by people and can easily be used es for a text search engine 4 Bradley, P, Bennett, P and Demiriz, A.(2000) If we had access to only abstracts, LSI might prove particularly Constrained k-means clustering. Technical report, MSR-TR- useful, but in longer documents such as full proposals, the 2000-6 mIcrosoft Research benefits of LSI are less dramatic and not worth the lack of [5] Banerjee, A &G hosh, J (2002) comprehensibility in this application. rning for Clustering on High-dimensional 8
program director read the proposal and not just the abstract, it was found that the proposal did indeed touch on all the topics for which the reviewers were selected. We believe there are several factors responsible for the success of Revaide: 1. Rapid turnaround is quite important in getting the system accepted. Even a day’s delay at a critical time cannot be tolerated. This implies a close integration between the existing databases and processes and the reviewer recommendation system. 2. The system was put within the existing workflow of the organization. Other alternatives explored, such as automated clustering, redefined the roles of people in the organization. 3. The system is not a black box that produces a solution but rather provides a basis for its recommendations in terms of automatically derived keywords. For example, the keywords for an AI panel were logic, reasoning, inference, planning, action, reinforcement, game, variables, agent, classifiers, planners, inhabitant, decision, graph, motifs, probabilistic, propositional, and rule. Similarly, a confusion matrices for proposal assignment convinced people that the solution was much better than chance but not omniscient. 4. Each recommendation was subject to validation and could be ignored independently of others. Furthermore, the system was designed to supplement the capabilities of program officers and serves as “another set of eyes” to focus program officers’ attention on potential improvements. This also means that imperfect technology (e.g., a classifier with 80% accuracy) can still be beneficial in an organization that has higher standards. 5. Related Work Revaide addresses the challenge of assigning reviewers (cf [16]. The main technical contribution of Revaide is the use of the sum of residual term weights measure in reviewer assignment. In implementation, we used a well established but simple document model: TF-IDF weights on words. The residual term weight approach is independent of the document model and could just as easily be used with hand-selected keywords, LSI terms (e.g., [17]), or author and topic models (e.g., [18] and [19]). We did indeed consider using LSI in Revaide but have decided against it because LSI doesn’t produce terms that are easily understood by people and can easily be used as queries for a text search engine. If we had access to only abstracts, LSI might prove particularly useful, but in longer documents such as full proposals, the benefits of LSI are less dramatic and not worth the lack of comprehensibility in this application. Our goal with residual term weight is to represent the terms in a proposal left uncovered by a partial set of reviewers. One approach to this problem is Maximal Marginal Relevance (MMR) [20]. MMR provides a way to adjust the ranking (or re-rank) the retrieved results of a query to produce a diverse set of documents. MMR is based on comparing retrieved documents to each other in order to select a diverse group. In contrast, SRTW is a more focused measure that seeks to achieve diversity to satisy the goal of covering terms in a source document. 6. Future Work NSF is evaluating plans to more closely integrate Revaide into its data infrastructure and workflow. Revaide would then be able to directly access NSF databases rather than going through intermediate files. We plan on conducting further research on the general topic of reviewer assignment. In particular, we are exploring approaches that will balance reviewer assignments across reviewers on a panel. We believe such an approach will need to consider the residual term weights, the number of proposals assigned to a reviewer, and the distance between a proposal and a reviewer’s expertise (because in our experience reviewers have a strong aversion to reviewing proposals outside their expertise). 7. Conclusions We have described Revaide, an emerging application deployed at NSF as prototype. While much of Revaide relies upon existing technology for representing documents, Revaide makes two contributions to the practice of text mining. First, we have defined a new measure of similarity suited for insuring that expertise is found for all aspects of a proposal to be reviewed. Second, we have shown that text mining technology can be deployed to augment rather than replace human judgment. 8. ACKNOWLEDGMENTS Thanks to all at NSF who were instrumental in the development and deployment of Revaide. Feedback of early users helped in the design of later versions. Many also helped navigate the approval. 9. REFERENCES [1] Willett, P. (1998). Recent Trends in Hierarchic Document Clustering: A Critical Review, Information Processing and Management, 24(5), 577-597. [2] Larsen , B. and Chinatsu A. (1999). Fast and effective text mining using linear-time document clustering, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 16-22. [3] Hopcroft, J., Khan, O., Kulis, B. & Selman, B. Tracking evolving communities in large linked networks.Proc. Natl Acad. Sci. USA 101(Suppl.1), 5249-5253 [4] Bradley, P., Bennett, P and. Demiriz., A. (2000) Constrained k-means clustering. Technical report, MSR-TR- 2000-6 5Microsoft Research. [5] Banerjee , A. &G hosh,, J. (2002). Frequency Sensitive Competitive Learning for Clustering on High-dimensional 870 Industrial and Government Applications Track Paper
Industrial and Government Applications Track Paper ersphere, International Joint Conference on Neural [14] Segal, R and Kephart, J(1999). MailCat: An Intelligent Networks(IJCNN), pp. 1590-95 Assistant for Organizing E-Mail. In Proceedings of the Third 6 Furnas, G.W., Landauer, T K, Gomez, L M, Dumais, S.T. International Conference on Autonomous Agents (1987): The vocabulary problem in human-system [15 Basu, C, Hirsh, H, Cohen, W, and Nevill-Manning, C communication. Commun. ACM 30. 964-971 (1999). Recommending Papers by Mining the web, Proo [7 Ding, W, and Marchionini, G. A Study on Video Browsing IJCAI Workshops on Learning About Users and Machine Strategies. Technical Report UMIACS-TR-97-40, University Learning for Information Filtering, IJCAl99, Stockholm, of Maryland, College Park, MD, 1997 Sweden. [8 Furnas, G.W., Landauer, T K, Gomez, L M, Dumais, S.T [16] Geller, J and Scherl, R, 1997 Challenge: Technology for ( 1987): The vocabulary problem in human-system Automated Reviewer Selection. IJCAI 199755-61 communication. Communications of the ACm 30. 964-971 [17]Dumais, S, Nielsen, J. (1992, Automating the Assignment of [9] van de stadt, R(2000). Cyber Chair, an Online Submission Submitted Manuscripts to Reviewers Proceedings of the 15th and Reviewing System or: A Program Chairs Best Friend annual international ACM SIGIR conference on Research WWw nd development in information retrieval: 233-244 [10] Salton, G, McGill, MJ(1983). Introduction to modern [18] Steyvers, M, Smyth, P, Griffiths, T. (2004)Probabilistic information retrieval. NY: Mcgraw-Hill Author-Topic Models for Information Discovery KDD04 Seattle, Washington USA [11 Porter, M.F., (1980), An algorithm for suffix stripping, Program,14(3):130-137 [19]Mann, G, Mimno, D and McCallum, A (in press) Bibliometric Impact Measures Leveraging Topic Analysis [12 Giles, C Bollacker, K, Lawrence, S.(1998). CiteSeer: An Joint Conference on Digital Libraries (JCDl) Automatic Citation Indexing System. Third ACM Conference on Digital Libraries, pp. 89-98, 1998 [20] Carbonell, J and Goldstein, J(1998). The Use of MMR. [13 Rocchio, J(1971) Relevance feedback in information Diversity-Based Reranking for Reordering Documents and Producing Summaries, SIGIR98, Melbourne Australia etrieval, in. The SMART Retrieval System: Experiments in utomatic Document Processing. Prentice-Hall Inc, pg 313- 323
Hypersphere, International Joint Conference on Neural Networks (IJCNN), pp. 1590-95. [6] Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T. (1987): The vocabulary problem in human-system communication. Commun. ACM 30. 964-971 [7] Ding, W., and Marchionini, G. A Study on Video Browsing Strategies. Technical Report UMIACS-TR-97-40, University of Maryland, College Park, MD, 1997. [8] Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T. (1987): The vocabulary problem in human-system communication. Communications of the ACM 30. 964-971 [9] van de Stadt, R. (2000). CyberChair, an Online Submission and Reviewing System or: A Program Chair’s Best Friend, WWW9. [10] Salton, G., & McGill, MJ (1983). Introduction to modern information retrieval. NY: McGraw-Hill [11] Porter, M.F., (1980), An algorithm for suffix stripping, Program, 14(3) :130-137 [12] Giles, C. Bollacker, K., Lawrence, S. (1998). CiteSeer: An Automatic Citation Indexing System. Third ACM Conference on Digital Libraries, pp. 89-98, 1998. [13] Rocchio, J. (1971) Relevance feedback in information retrieval, in . The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall Inc., pg 313- 323. [14] Segal, R and Kephart, J (1999).. MailCat: An Intelligent Assistant for Organizing E-Mail. In Proceedings of the Third International Conference on Autonomous Agents. [15] Basu, C., Hirsh, H., Cohen, W., and Nevill-Manning, C., (1999). Recommending Papers by Mining the Web, Proc. IJCAI Workshops on Learning About Users and Machine Learning for Information Filtering, IJCAI 99, Stockholm, Sweden. [16] Geller, J. and Scherl, R., 1997 Challenge: Technology for Automated Reviewer Selection, IJCAI 1997 55-61 [17] Dumais, S., Nielsen, J. (1992, Automating the Assignment of Submitted Manuscripts to Reviewers Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval: 233-244 [18] Steyvers, M., Smyth, P., Griffiths, T. (2004) Probabilistic Author-Topic Models for Information Discovery KDD’04, Seattle, Washington USA. [19] Mann, G., Mimno, D. and McCallum, A (in press). Bibliometric Impact Measures Leveraging Topic Analysis. Joint Conference on Digital Libraries (JCDL). [20] Carbonell, J. and Goldstein, J (1998). The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, SIGIR’98, Melbourne Australia 871 Industrial and Government Applications Track Paper