Requirements for expertise location systems in biomedical science and the semantic Web Titus Schleyer, Heiko Spallek, Brian S Butler, Sushmita Subramanian, Daniel Weiss, M. Louisa Poythress, Phijarana Rattanathikum, Gregory Mueller' School of Dental Medicine and Joseph M Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA I titus, hspallek, butler)@pitt. edu tel Corporation, Santa Clara, CA The MITRE Corporation, Bedford, MA Brulant, Inc.. Beachwood. OH Adobe Systems Incorporated, San Jose, CA DeepLocal, Inc, Pittsburgh, PA Abstract. Recent trends in science are increasing the need for researchers to form collaborations. To date, however, electronic systems have played only a minor role in helping scientists do so. This study used a literature review, and contextual inquiries and semistructured interviews with biomedical scientists to develop a preliminary set of requirements for electronic systems designed to help optimize how biomedical researchers choose collaborators. We then re viewed the requirements in light of emerging research on expertise location us- ing the Semantic Web. The requirements include aspects such as comprehen- sive, complete and up-to-date online profiles that are easy to create and maintain;the ability to exploit social networks when searching for collabora- tors; information to help gauge the compatibility of personalities and work styles: and recommendations for effective searching and making"non-intuitive connections between researchers. The Semantic Web offers significant oppor- profile data from disparate sources, annotating contributions to social media us ng methods such as Semantically Interlinked Online Communities, and con cept-based querying using ontologies. Future work should validate the preli nary requirements and explore in detail how the Semantic Web can help address Keywords: expertise location, requirements, Semantic Web, biomedical re- search 1 Introduction Increased collaboration across all fields of biomedical science has emerged as one ssible way to ac eve success and progress in combating disease and im roving health. " Team science, ""networked science"and inter/multi-disciplinary re- earch [1] are terms used to denote collaborative approaches expected to solve re- search problems of ever-growing complexity. Programmatic initiatives such as the
Requirements for expertise location systems in biomedical science and the Semantic Web Titus Schleyer1 , Heiko Spallek1 , Brian S. Butler2 , Sushmita Subramanian3 , Daniel Weiss4 , M. Louisa Poythress5 , Phijarana Rattanathikum6 , Gregory Mueller7 1 School of Dental Medicine and 2 Joseph M Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA {titus, hspallek, bbutler}@pitt.edu 3 Intel Corporation, Santa Clara, CA 4 The MITRE Corporation, Bedford, MA 5 Brulant, Inc., Beachwood, OH 6 Adobe Systems Incorporated, San Jose, CA 7 DeepLocal, Inc., Pittsburgh, PA Abstract. Recent trends in science are increasing the need for researchers to form collaborations. To date, however, electronic systems have played only a minor role in helping scientists do so. This study used a literature review, and contextual inquiries and semistructured interviews with biomedical scientists to develop a preliminary set of requirements for electronic systems designed to help optimize how biomedical researchers choose collaborators. We then reviewed the requirements in light of emerging research on expertise location using the Semantic Web. The requirements include aspects such as comprehensive, complete and up-to-date online profiles that are easy to create and maintain; the ability to exploit social networks when searching for collaborators; information to help gauge the compatibility of personalities and work styles; and recommendations for effective searching and making “non-intuitive” connections between researchers. The Semantic Web offers significant opportunities for operationalizing the requirements, for instance through aggregating profile data from disparate sources, annotating contributions to social media using methods such as Semantically Interlinked Online Communities, and concept-based querying using ontologies. Future work should validate the preliminary requirements and explore in detail how the Semantic Web can help address them. Keywords: expertise location, requirements, Semantic Web, biomedical research 1 Introduction Increased collaboration across all fields of biomedical science has emerged as one possible way to achieve greater success and progress in combating disease and improving health. “Team science,” “networked science” and inter/multi-disciplinary research [1] are terms used to denote collaborative approaches expected to solve research problems of ever-growing complexity. Programmatic initiatives such as the
Roadmap and the Clinical and Translational Science Award(CTSA) programs of the National Institutes of Health(nih)demonstrate that funding agencies and re search organiz io [2] and the numbers of their making more individuals available for collaboration, either locally or remotely. As a wider range of collaborations is becoming recognized as valuable, many researchers are beginning to expand their collaborative horizons. At the same time, the Internet is making locating collaborators easier. In fact, modern communication and collabora ive technologies increase the number of potential collaborators by making many re- mote collaborations once considered impractical feasible At the same time, expertise location has been, and continues to be, a significant hallenge for many organizations [3, 4]. Scientists often turn to colleagues or the pub. lished literature to find collaborators [5]. However, these approaches do not scale well in the context of an increasing pool of potential collaborators. As the universe of po tential collaborators and information about them grows, the time and effort needed to evaluate each collaborative opportunity remains the same. A newer method for finding collaborators is to use databases of researchers par- d this type, which include"expertise locating systems, [6]"knowledge communities, [7] and"communities of practice, "[8] all embody, to varying degrees, the ability to find experts and, by extension, potential collaborators. The CSCw literature contains numerous examples of such systems [9-12]. Most of these systems are designed to help a person solve a specific problem at a particular point in time. However, scien- sts seeking collaborators face a bigger challenge. Not only are they looking for the most qualified expert, but they also plan to enter into a more or less long-term rela nship. Evaluating an individuals promise for such a relationship requires informa on,engagement and effort much beyond what is needed for finding an expert for singular (or even episodic)problem-solving. Only few reports of expertise location systems in academia have been published [11, 13]. While many commercial offerings, hecoMmunityofScience(cos;www.cos.com),Linked (www.linkedin.com),IndexCopernicusScientists(scientists.indexcopernicus.com), Biomedexperts(www.bioMedexperts.com) nd Research Crossroads (www.researchcrossroads.com),purporttomakeiteasiertohelpscientistsfindcol- laborator, no reports in the literature describe how well these systems actually do so The Semantic Web is a technology with significant promise to ameliorate the ex- pertise location problem [14]. As individuals create an increasing number of"digital trails"of their work processes and products, more information about their activities and relationships becomes computationally accessible. However, expertise location systems that leverage data from the Semantic Web must be constructed with the needs and requirements of the end user in mind. We therefore have organized this paper in two parts. We first present a set of preliminary requirements for expertise location NihRoadmapforMedicalResearchhttp://nihroadmap.nih.gov/ ClinicalandTranslationalScienceAwards,http://www.ctsaweb.org/
Roadmap1 and the Clinical and Translational Science Award (CTSA)2 programs of the National Institutes of Health (NIH) demonstrate that funding agencies and research organizations are not just passively observing this trend, but are actively encouraging it. In the process, many academic/research institutions are extending the scale and scope of their research portfolio [2] and the numbers of their research faculty, thus making more individuals available for collaboration, either locally or remotely. As a wider range of collaborations is becoming recognized as valuable, many researchers are beginning to expand their collaborative horizons. At the same time, the Internet is making locating collaborators easier. In fact, modern communication and collaborative technologies increase the number of potential collaborators by making many remote collaborations once considered impractical feasible. At the same time, expertise location has been, and continues to be, a significant challenge for many organizations [3,4]. Scientists often turn to colleagues or the published literature to find collaborators [5]. However, these approaches do not scale well in the context of an increasing pool of potential collaborators. As the universe of potential collaborators and information about them grows, the time and effort needed to evaluate each collaborative opportunity remains the same. A newer method for finding collaborators is to use databases of researchers partially or exclusively designed for the purpose. Knowledge management systems of this type, which include “expertise locating systems,” [6] “knowledge communities,” [7] and “communities of practice,” [8] all embody, to varying degrees, the ability to find experts and, by extension, potential collaborators. The CSCW literature contains numerous examples of such systems [9-12]. Most of these systems are designed to help a person solve a specific problem at a particular point in time. However, scientists seeking collaborators face a bigger challenge. Not only are they looking for the most qualified expert, but they also plan to enter into a more or less long-term relationship. Evaluating an individual’s promise for such a relationship requires information, engagement and effort much beyond what is needed for finding an expert for singular (or even episodic) problem-solving. Only few reports of expertise location systems in academia have been published [11,13]. While many commercial offerings, such as the Community of Science (COS; www.cos.com), LinkedIn (www.linkedin.com), Index Copernicus Scientists (scientists.indexcopernicus.com), BiomedExperts (www.biomedexperts.com) and Research Crossroads (www.researchcrossroads.com), purport to make it easier to help scientists find collaborators, no reports in the literature describe how well these systems actually do so. The Semantic Web is a technology with significant promise to ameliorate the expertise location problem [14]. As individuals create an increasing number of “digital trails” of their work processes and products, more information about their activities and relationships becomes computationally accessible. However, expertise location systems that leverage data from the Semantic Web must be constructed with the needs and requirements of the end user in mind. We therefore have organized this paper in two parts. We first present a set of preliminary requirements for expertise location 1 NIH Roadmap for Medical Research, http://nihroadmap.nih.gov/ 2 Clinical and Translational Science Awards, http://www.ctsaweb.org/
systems for biomedical scientists. Second, we discuss the requirements in light of technological capabilities and challenges of the Semantic Web 2 Methods This study drew on several methodological approaches in order to develop a derstanding of how scientific collaborations are established and what requir should inform the design of expertise location systems. The methods we cluded(1)affinity diagramming of issues in scientific collaboration; (2)a literature review of expertise location in computer-supported cooperative work and other disci plines; (3)contextual inquiries with 10 biomedical scientists; and (4) findings from 30 semistructured interviews with biomedical scientists from a variety of disciplines To develop the affinity diagram, the members of the project team(which consisted of all authors) recorded thoughts ideas and observations regarding the establishment of scientific collaborations and then took turns arranging them into naturally-forming ategories. The team then rearranged the groups to form a hierarchy that revealed the hajor issues of the domain. The most prominent groups were then adopted as the foci of exploratory investigations, specifically the literature search and contextual inquir We searched the literature using keywords including"expertise locating sy expertise location systems, ""expertise management systems, edge communities,”“ knowledge management" and“ knowledge managemer ms, "communities of practice " and"virtual communities"in the field of biomedi cal research, informatics, computer science and information science. The databases we searched were medline. the isi Web of Science. the acm portal and the ieee Digital Library(all available years) Contextual inquiry(CD)[15] sessions were performed with ten researchers from a range of disciplines and levels of seniority at Carnegie Mellon University and the University of Pittsburgh. Because we could not directly observe researchers in the process of forming collaborations, we mainly focused on retrospective accounts. The contextual inquiries were complemented by findings from 30 semistructured inter- views with scientists. The interviews focused on current and previous collaborations locating collaborators, solving problems in research, and information needs and in- formation resource use of participants. Four faculty researchers(including three au- thors: TKS, HS, BB)and one staff member conducted the interviews individually with a convenience sample of scientists from the six Health Science Schools at the University of Pittsburgh. While conducting our background studies, we formulated a running list of re- quirements for systems that help optimize how scientists choose collaborators. We enerated this list using an approach similar to grounded theory [16]. in which models nd hypotheses are progressively inferred from the data. We kept a record of the evi dence that supported each requirement, e.g. statements of our study participants findings from the literature, as well as of factors that would modify its validity or ap- plicability. The studies conducted as part of this project were approved by the Unive
systems for biomedical scientists. Second, we discuss the requirements in light of technological capabilities and challenges of the Semantic Web. 2 Methods This study drew on several methodological approaches in order to develop a rich understanding of how scientific collaborations are established and what requirements should inform the design of expertise location systems. The methods we used included (1) affinity diagramming of issues in scientific collaboration; (2) a literature review of expertise location in computer-supported cooperative work and other disciplines; (3) contextual inquiries with 10 biomedical scientists; and (4) findings from 30 semistructured interviews with biomedical scientists from a variety of disciplines. To develop the affinity diagram, the members of the project team (which consisted of all authors) recorded thoughts, ideas and observations regarding the establishment of scientific collaborations and then took turns arranging them into naturally-forming categories. The team then rearranged the groups to form a hierarchy that revealed the major issues of the domain. The most prominent groups were then adopted as the foci of exploratory investigations, specifically the literature search and contextual inquiries. We searched the literature using keywords including “expertise locating systems,” “expertise location systems,” “expertise management systems,” “knowledge communities,” “knowledge management” and “knowledge management systems,” “communities of practice,” and “virtual communities” in the field of biomedical research, informatics, computer science and information science. The databases we searched were MEDLINE, the ISI Web of Science, the ACM Portal and the IEEE Digital Library (all available years). Contextual inquiry (CI) [15] sessions were performed with ten researchers from a range of disciplines and levels of seniority at Carnegie Mellon University and the University of Pittsburgh. Because we could not directly observe researchers in the process of forming collaborations, we mainly focused on retrospective accounts. The contextual inquiries were complemented by findings from 30 semistructured interviews with scientists. The interviews focused on current and previous collaborations, locating collaborators, solving problems in research, and information needs and information resource use of participants. Four faculty researchers (including three authors: TKS, HS, BB) and one staff member conducted the interviews individually with a convenience sample of scientists from the six Health Science Schools at the University of Pittsburgh. While conducting our background studies, we formulated a running list of requirements for systems that help optimize how scientists choose collaborators. We generated this list using an approach similar to grounded theory [16], in which models and hypotheses are progressively inferred from the data. We kept a record of the evidence that supported each requirement, e.g. statements of our study participants or findings from the literature, as well as of factors that would modify its validity or applicability. The studies conducted as part of this project were approved by the Univer-
sity of Pittsburgh Institutional Review Board(IRB approval numbers: 0612065 and PRO07050299) Once the list of requirements was final, we reviewed the literature about the Se mantic Web with a particular focus on expertise location. We used this literature to in form the di on of the capabilitie of the Semantic Web in light of 3 Results 3.1 Preliminary r nts for expertise locatio The following 10 requirements for expertise location systems have been ordered loosely in an attempt to group related items (1)The effort required to create and update nline profile should be ith ed benefit of the system. Many current online networking systems for scientists, such as the COs, require a nificant amount of effort to create and maintain a comprehensive profile. Many sts considered this investment of time and effort difficult to justify as there is no clear gain to being part of the system. Only a few researchers we interviewed, spe cifically junior ones or those new to the organization, indicated that COS and/or the Faculty Research Interests Project(FRIP)at the University of Pittsburgh [11] helped them find collaborators. Several commented that they had tried to use COS and/or FRIP, but abandoned them when their attempt at finding a collaborator through them was not successful (2)Online profiles should present rich and comprehensive information about otential collaborators in an organized manner to reduce the effort involved in naking collaboration decisions. The Internet makes a significant amount of information available about individual scientists, but unfortunately in a very fragmented and inhomogeneous manner. Our background research showed that at present, researchers sometimes use multiple in- formation sources such as mEdlinE, Google Scholar, the ISI Web of Science and other databases to evaluate a potential collaborator. Retrieving, collating and review- ng information from these sources. however, often takes more time and effort than the individual is willing to expend. An expertise location system should collate and organize this information and present it to collaboration seekers in an easy-to-use for- mat in order to reduce the effort involved in choosing collaborators ()Online profiles should to be up-to-date, because some information they contain has a short lifespan. At its core, choosing a collaborator is an attempt to predict how someone else will be- have in the future. While knowledge about past behavior can be useful for doing so
sity of Pittsburgh Institutional Review Board (IRB approval numbers: 0612065 and PRO07050299). Once the list of requirements was final, we reviewed the literature about the Semantic Web with a particular focus on expertise location. We used this literature to inform the discussion of the capabilities and challenges of the Semantic Web in light of the requirements we formulated. 3 Results 3.1 Preliminary requirements for expertise location systems in biomedical science The following 10 requirements for expertise location systems have been ordered loosely in an attempt to group related items. (1) The effort required to create and update an online profile should be commensurate with the perceived benefit of the system. Many current online networking systems for scientists, such as the COS, require a significant amount of effort to create and maintain a comprehensive profile. Many scientists considered this investment of time and effort difficult to justify as there is no clear gain to being part of the system. Only a few researchers we interviewed, specifically junior ones or those new to the organization, indicated that COS and/or the Faculty Research Interests Project (FRIP) at the University of Pittsburgh [11] helped them find collaborators. Several commented that they had tried to use COS and/or FRIP, but abandoned them when their attempt at finding a collaborator through them was not successful. (2) Online profiles should present rich and comprehensive information about potential collaborators in an organized manner to reduce the effort involved in making collaboration decisions. The Internet makes a significant amount of information available about individual scientists, but unfortunately in a very fragmented and inhomogeneous manner. Our background research showed that at present, researchers sometimes use multiple information sources such as MEDLINE, Google Scholar, the ISI Web of Science and other databases to evaluate a potential collaborator. Retrieving, collating and reviewing information from these sources, however, often takes more time and effort than the individual is willing to expend. An expertise location system should collate and organize this information and present it to collaboration seekers in an easy-to-use format in order to reduce the effort involved in choosing collaborators. (3) Online profiles should to be up-to-date, because some information they contain has a short lifespan. At its core, choosing a collaborator is an attempt to predict how someone else will behave in the future. While knowledge about past behavior can be useful for doing so
the value of this information declines with time. Out-of-date profiles reduce the use- fulness of information that collaboration seekers require. On the other hand, not all in- formation in a profile is subject to the same rate of decay. Information about prof sional degrees of a collaborator tends to be relatively static, while publication topics and activity may not always reflect an individuals current research focus and produc (4)Researchers should be able to exploit their own and others'social networks when searching for collaborators. Social networks have been suggested as important structures for finding expertise and nformation [17]. Established researchers often use existing connections with col- agues as their primary resource for locating new collaborators. Junior researchers with few or no contacts within the desired field, may have significant difficulty initi- ating collaborations that Many scientists in our study indicated they are more likely to contact a colleague whom they think will know someone with the required expertise than cold-call a stranger. In addition, many emphasized the key role that deans, department heads and other well-connected individuals in the organization play n helping establish collaborations. The advantages of a mediated form of contact are that it may make it more likely that two parties will be compatible, increase the chances of a timely response, and provide a less intimidating method of contact. The system should model proximity, which influences the potential success of collaboration in several respect Physical proximity, social proximity, organizational proximity, and proximity in terms of shared research interests are all aspects of"proximity "that can affect the outcome of collaborations. Physical proximity provides access to potential collabora- tors, and allows the collaboration seeker to make informal and unobtrusive nents about compatibility. In the absence of physical proximity, shared research in terests andor common organizational or research communities can serve as similarity of work styles and other"soft" traits influencing collaborations. Our background research indicated that personal compatibility and similar work style are important factors determining the success of collaborations. The literature also in dicates that more than a simple overlap of interests is needed to create a successful collaboration. Expertise location systems should therefore facilitate an assessment of these factors, for instance, by identifying social connections (7) Social networks solely based on co-authorship may only partially describe a revious attempts to automatically describe a researchers collaborative network based on co-authorship of papers were only partially successful [18, 19]. Although co- authorship seems to be a good starting point for describing a collaboration network, it should be supplemented and validated by other data. Ideally, expertise location sys-
the value of this information declines with time. Out-of-date profiles reduce the usefulness of information that collaboration seekers require. On the other hand, not all information in a profile is subject to the same rate of decay. Information about professional degrees of a collaborator tends to be relatively static, while publication topics and activity may not always reflect an individual's current research focus and productivity. (4) Researchers should be able to exploit their own and others’ social networks when searching for collaborators. Social networks have been suggested as important structures for finding expertise and information [17]. Established researchers often use existing connections with colleagues as their primary resource for locating new collaborators. Junior researchers, with few or no contacts within the desired field, may have significant difficulty initiating collaborations that way. Many scientists in our study indicated they are more likely to contact a colleague whom they think will know someone with the required expertise than cold-call a stranger. In addition, many emphasized the key role that deans, department heads and other well-connected individuals in the organization play in helping establish collaborations. The advantages of a mediated form of contact are that it may make it more likely that two parties will be compatible, increase the chances of a timely response, and provide a less intimidating method of contact. (5) The system should model proximity, which influences the potential success of collaboration in several respects. Physical proximity, social proximity, organizational proximity, and proximity in terms of shared research interests are all aspects of “proximity” that can affect the outcome of collaborations. Physical proximity provides access to potential collaborators, and allows the collaboration seeker to make informal and unobtrusive assessments about compatibility. In the absence of physical proximity, shared research interests and/or common organizational or research communities can serve as surrogates. (6) The system should facilitate the assessment of personal compatibility, similarity of work styles and other “soft” traits influencing collaborations. Our background research indicated that personal compatibility and similar work style are important factors determining the success of collaborations. The literature also indicates that more than a simple overlap of interests is needed to create a successful collaboration. Expertise location systems should therefore facilitate an assessment of these factors, for instance, by identifying social connections. (7) Social networks solely based on co-authorship may only partially describe a researcher’s collaborative network. Previous attempts to automatically describe a researcher’s collaborative network based on co-authorship of papers were only partially successful [18,19]. Although coauthorship seems to be a good starting point for describing a collaboration network, it should be supplemented and validated by other data. Ideally, expertise location sys-
tems could create a preliminary network from co-authorship data that can be triangu 2The system should account for researchers'preferences regarding privacy and public availability of information about them. To varying degrees, researchers tend to be protective of information about themselves or the projects they are working on. On the other hand, researchers are motivated to share information when they feel that doing so will add value to their work. As re- search on the structure and dynamics of networks has shown [20], central nodes in a etwork attract more links than peripheral nodes. By inference, highly productive sci entists may be the focus of a disproportionately large number of contacts in profes sional networks. This type of social overload may cause them not to be favorable to additional contacts. Expertise location systems should therefore allow users to control whether they are visible at all, and, if so, which information is available about them under which circumstances () The system should provide methods to search effectively across disciplines. esearchers need to be able to effectively search for collaborators in domains outside eir own. However, researchers from one domain are unlikely to be aware of the terminology they need to search for in order to find a specific area of expertise. Stan dardized terminologies, such as Medical Subject Headings(MeSH), facilitate search ing, but create artificial boundaries(for instance between MeSH- and non-MeSH- dexed literatures). Systems that guide non-experts towards the appropriate subd ain and research category rather than making them provide keywords themselves (10) The system should help make "non-intuitive" connections between researchers Many scientific collaborations produce novel and innovative insights when the re search interests of collaborators are complementary, or, at least, not closely aligned. However, similarity and complementarity of research interests are difficult to defin Multidisciplinary research is often viewed as requiring complementary expertise from different fields; he even research teams within the same field are often config- ured to include investigators with slightly divergent interests. Many existing systems and resources focus on finding individuals with shared interests, which is much easier and straightforward than identifying those with complementary interests. One exam- ple for producing such connections computationally are systems that mine the litera- ture for relations among research areas that are not obvious at first glance [21, 22] Advanced implementations of expertise location systems to support collaboration seekers could integrate such functionality 3.2 The Semantic Web as a technical basis for expertise location systems Few papers have discussed the problem of expertise location in the context of the Se- mantic Web [14, 23-26]. However, Semantic Web technologies represent a rich array
tems could create a preliminary network from co-authorship data that can be triangulated and validated using other information. (8) The system should account for researchers’ preferences regarding privacy and public availability of information about them. To varying degrees, researchers tend to be protective of information about themselves or the projects they are working on. On the other hand, researchers are motivated to share information when they feel that doing so will add value to their work. As research on the structure and dynamics of networks has shown [20], central nodes in a network attract more links than peripheral nodes. By inference, highly productive scientists may be the focus of a disproportionately large number of contacts in professional networks. This type of social overload may cause them not to be favorable to additional contacts. Expertise location systems should therefore allow users to control whether they are visible at all, and, if so, which information is available about them under which circumstances. (9) The system should provide methods to search effectively across disciplines. Researchers need to be able to effectively search for collaborators in domains outside their own. However, researchers from one domain are unlikely to be aware of the terminology they need to search for in order to find a specific area of expertise. Standardized terminologies, such as Medical Subject Headings (MeSH), facilitate searching, but create artificial boundaries (for instance, between MeSH- and non-MeSHindexed literatures). Systems that guide non-experts towards the appropriate subdomain and research category rather than making them provide keywords themselves may help ameliorate this problem. (10) The system should help make “non-intuitive” connections between researchers. Many scientific collaborations produce novel and innovative insights when the research interests of collaborators are complementary, or, at least, not closely aligned. However, similarity and complementarity of research interests are difficult to define. Multidisciplinary research is often viewed as requiring complementary expertise from different fields; however, even research teams within the same field are often configured to include investigators with slightly divergent interests. Many existing systems and resources focus on finding individuals with shared interests, which is much easier and straightforward than identifying those with complementary interests. One example for producing such connections computationally are systems that mine the literature for relations among research areas that are not obvious at first glance [21,22]. Advanced implementations of expertise location systems to support collaboration seekers could integrate such functionality. 3.2 The Semantic Web as a technical basis for expertise location systems Few papers have discussed the problem of expertise location in the context of the Semantic Web [14,23-26]. However, Semantic Web technologies represent a rich array
of possibilities addressing many, but not all, of the requirements listed above. The Semantic Web is most likely to serve as a useful technological infrastructure for im- plementing expertise location systems, not as an end-to-end architecture. Traditionally, a significant hurdle for adoption and use of expertise location sys- tems has been the effort required to create and maintain comprehensive and up-to- date profiles. The Semantic Web can help ameliorate this problem by making infor mation available that accumulates as a result of an individual'sdigital activities " For instance, the Semantic Web makes it very easy to collate information from social networks and social media, for instance Friend-of-a-friend(FOAF) systems, online ommunities, blogs and information-sharing sites [14]. The resulting profile could, for instance, include topics that the individual has discussed with others or individuals s/he has interacted with. However, this information is not likely to substitute for more formal and rigorously maintained information, such as that found in a researchers curriculum vitae(CV)[27]. Researchers in expertise location systems must clearly be the method used to generate the data. a related issue is the aggregation of data from sources other than the web, for in stance Collaborative Work Environments(CWEs). While Cwes tend to connect indi- viduals within organizations quite well, they fail to do so among organizations. In- formation from CWEs made accessible through a framework such as Semantically Interlinked Online Communities (SIOC)[14] could contribute rich information to re- an individual with information generated about the individual by others. MEDLINE, bogle Scholar, the ISI Web of Science and other databases are examples of re- sources/databases that contain information about researchers. One significant chal enge is to match information from different sources unambiguously to the individual Ideally, the various online identities/unique identifiers of an individual are explicitly nked, as described by Bojars [14 Automatically collating information using these strategies is likely to result in pro- files that are more comprehensive and up-to-date than those compiled using other means. For instance, contributions to social media can be aggregated in near real-time and combined with information that may not be widely available in public for some time(for instance, a recently accepted paper listed in a CV). Social networks con- structed from FOAF systems and online interactions may be more complete than or complementary to those based on co-authorship s Expertise location systems need to be able to search across content domains as well social spaces. Searching effectively across content domains requires ontologies, which are central to the vision of the Semantic Web [26]. While well-developed and sophisticated ontologies exist for some domains, for instance, the Medical Subject Headings used to index the biomedical literature, they are not universally available. Semantic mapping among different ontologies is a significant problem on which con- siderable attention has been focused [28-30]. Computational tools to e quene among different ontologies have been described [23, 28 but at present, no large-scale trials examining how well the approach works in practice(similar, for instance, to the National Library of Medicine's Large Scale Vocabulary Test [31])have been pub-
of possibilities addressing many, but not all, of the requirements listed above. The Semantic Web is most likely to serve as a useful technological infrastructure for implementing expertise location systems, not as an end-to-end architecture. Traditionally, a significant hurdle for adoption and use of expertise location systems has been the effort required to create and maintain comprehensive and up-todate profiles. The Semantic Web can help ameliorate this problem by making information available that accumulates as a result of an individual's “digital activities.” For instance, the Semantic Web makes it very easy to collate information from social networks and social media, for instance Friend-of-a-friend (FOAF) systems, online communities, blogs and information-sharing sites [14]. The resulting profile could, for instance, include topics that the individual has discussed with others or individuals s/he has interacted with. However, this information is not likely to substitute for more formal and rigorously maintained information, such as that found in a researcher’s curriculum vitae (CV) [27]. Researchers in expertise location systems must clearly be motivated to keep their profile current, comprehensive and up-to-date, regardless of the method used to generate the data. A related issue is the aggregation of data from sources other than the Web, for instance Collaborative Work Environments (CWEs). While CWEs tend to connect individuals within organizations quite well, they fail to do so among organizations. Information from CWEs made accessible through a framework such as Semantically Interlinked Online Communities (SIOC) [14] could contribute rich information to researcher profiles. The Semantic Web also presents the opportunity to connect information created by an individual with information generated about the individual by others. MEDLINE, Google Scholar, the ISI Web of Science and other databases are examples of resources/databases that contain information about researchers. One significant challenge is to match information from different sources unambiguously to the individual. Ideally, the various online identities/unique identifiers of an individual are explicitly linked, as described by Bojars [14]. Automatically collating information using these strategies is likely to result in profiles that are more comprehensive and up-to-date than those compiled using other means. For instance, contributions to social media can be aggregated in near real-time and combined with information that may not be widely available in public for some time (for instance, a recently accepted paper listed in a CV). Social networks constructed from FOAF systems and online interactions may be more complete than or complementary to those based on co-authorship. Expertise location systems need to be able to search across content domains as well as social spaces. Searching effectively across content domains requires ontologies, which are central to the vision of the Semantic Web [26]. While well-developed and sophisticated ontologies exist for some domains, for instance, the Medical Subject Headings used to index the biomedical literature, they are not universally available. Semantic mapping among different ontologies is a significant problem on which considerable attention has been focused [28-30]. Computational tools to bridge queries among different ontologies have been described [23,28], but at present, no large-scale trials examining how well the approach works in practice (similar, for instance, to the National Library of Medicine’s Large Scale Vocabulary Test [31]) have been published
Searching across social spaces suffers from a similar problem if individual identi ties can not be matched between systems. Frameworks such as Open Social [32] are ssential to allowing users to traverse social networks without regard to syster boundaries Building expertise location systems on top of the Semantic Web does not only re- quire the capability to aggregate and organize data about each expert but also to pre- sent the data in a usable and useful form to collaboration seekers. The experts listed by the system must be able to view and, if necessary, change how they appear to users of the system. This includes taking individual needs for controlled access to profile nformation into account. For instance, researchers may prefer to limit public, anony- mous access to information about them, but be more open within their social network Second, systems should facilitate rapid, progressively detailed review of potential col laborator. Given the fact that choosing a collaborator is a highly subjective and idio- cratic process, system performance may be weighted to provide a larger number of otential candidates, rather than attempting to present only a few candidates that the tem perceives as""optimal. This tradeoff between sensitivity and specificity could be adjusted as the system learns about the preferences of its users In summary, the Semantic Web presents many opportunities for helping implement expertise location systems. However, the Semantic Web does not exist independent of the computational tools, environment, workflow and user behavior of biomedical sci ntists,and thus must integrate with the current context of system use, not strive to 4 Discussion Given the increasing trend towards collaboration in science, as well as the expanding niverse of potential collaborators for scientists, electronic systems can be expected to play an increasingly important role in connecting scientists to one another. While tra ditional approaches will always play a role in how scientists connect with and select collaborators, expertise location systems have the potential to improve how effec tively and efficiently scientists form collaborations. At their lowest level of imple mentation, they can reduce the workload of simple tasks related to forming collabora- tions, for instance collecting and organizing information about a potential collaborator. More advanced functionality would allow collaboration seekers to use information not usually available to them, for instance how potential collaborators re- late to the seekers existing social network. Further developments could integrate computational approaches to identifying scientific opportunities, as Swanson has demonstrated [22]. Our research has shown that expertise location systems for establishing collabora- tions in biomedical science have a complex and multifaceted set of requirements Clearly, one challenge for designing these systems is that seeking, evaluating and hoosing scientific collaborators is a complex decision-making process that is poorly understood. Our study only presents a first step in understanding how to build system that are truly useful tools for establishing promising and high-impact collaboration The list of requirements we formulated is clearly preliminary, and should be validated
Searching across social spaces suffers from a similar problem if individual identities can not be matched between systems. Frameworks such as OpenSocial [32] are essential to allowing users to traverse social networks without regard to system boundaries. Building expertise location systems on top of the Semantic Web does not only require the capability to aggregate and organize data about each expert, but also to present the data in a usable and useful form to collaboration seekers. The experts listed by the system must be able to view and, if necessary, change how they appear to users of the system. This includes taking individual needs for controlled access to profile information into account. For instance, researchers may prefer to limit public, anonymous access to information about them, but be more open within their social network. Second, systems should facilitate rapid, progressively detailed review of potential collaborators. Given the fact that choosing a collaborator is a highly subjective and idiosyncratic process, system performance may be weighted to provide a larger number of potential candidates, rather than attempting to present only a few candidates that the system perceives as “optimal.” This tradeoff between sensitivity and specificity could be adjusted as the system learns about the preferences of its users. In summary, the Semantic Web presents many opportunities for helping implement expertise location systems. However, the Semantic Web does not exist independent of the computational tools, environment, workflow and user behavior of biomedical scientists, and thus must integrate with the current context of system use, not strive to replace it. 4 Discussion Given the increasing trend towards collaboration in science, as well as the expanding universe of potential collaborators for scientists, electronic systems can be expected to play an increasingly important role in connecting scientists to one another. While traditional approaches will always play a role in how scientists connect with and select collaborators, expertise location systems have the potential to improve how effectively and efficiently scientists form collaborations. At their lowest level of implementation, they can reduce the workload of simple tasks related to forming collaborations, for instance collecting and organizing information about a potential collaborator. More advanced functionality would allow collaboration seekers to use information not usually available to them, for instance how potential collaborators relate to the seeker’s existing social network. Further developments could integrate computational approaches to identifying scientific opportunities, as Swanson has demonstrated [22]. Our research has shown that expertise location systems for establishing collaborations in biomedical science have a complex and multifaceted set of requirements. Clearly, one challenge for designing these systems is that seeking, evaluating and choosing scientific collaborators is a complex decision-making process that is poorly understood. Our study only presents a first step in understanding how to build systems that are truly useful tools for establishing promising and high-impact collaborations. The list of requirements we formulated is clearly preliminary, and should be validated
with a larger number of participants, at other institutions/research setting other geographic locations. A competitive analysis of existing systems may have provided additional and useful formal data to this study. However, the rapidly moving market for such systems would have reduced the usefulness of such an evaluation beyond a very limited time frame A related question is how well the requirements, which are mainly based on find ings from biomedical disciplines, generalize to other scientific domains. While we drew on literature that included studies from a variety of scientific disciplines, our ob servations and interviews were conducted predominantly with biomedical scientists Therefore, claims of generalizability are difficult to make, especially given the spe cific history, culture and structure of the biomedical research enterprise in the US. For instance, federal funding agencies, such as the NIH, play a very prominent role in shaping researcher behavior and priorities. (The current trend towards multidiscipli- nary research is an example. )Second, non-research oriented organizations, such as for-profit hospital systems, function both as data providers and employers of some re- searchers. This circumstance can influence collaborative behavior. for instance when the organization attempts to preserve its competitive advantage through policies limit- ing collaboration. Clearly, the history and tradition of collaborative work in a disci pline can influence individual behavior. As a recent book suggests [33],some re- such as his gy physics and astronomy, have a much sti tradition of collaboration and data sharing than other fields. While the requirements rticulated in this paper may be seen as a viable starting point, additional work is needed to understand the degree to which they can be g Additional studies, both in biomedical science and in other fields should also be helpful in elucidating some of the implicit contradictions in the current list of re- quirements. For instance, the desire for privacy of selected information(Requirement 8)conflicts, to some degree, with the need to provide comprehensive information (Requirement 2)and the desire to search effectively across disciplines (requir nent 9). The trade-offs among the requirements are likely context-dependent, and fur- ther research should provide insight into situations and use cases where and how par- ticular trade-offs should be made As shown above, Semantic Web technologies have significant potential for ad- disparate and inhomogeneous sources using ontologies and annotation frameworks are key to creating the rich and comprehensive profiles that are the basis for making connections among researchers. Several challenges present themselves for future re search in this context. First, we need to understand in more depth how scientists seek, evaluate and choose evaluators. Such research should include, for instance, factors that motivate and prompt scientists to look for collaborators; the criteria they use to evaluate them; and circumstances influencing the adoption of new tools to support the formation of collaboration. Second, we need to begin the process of translating tem requirements into Semantic Web applications. Early efforts in this area have been encouraging [13, 23]. However, we need to ensure that these applications work in a generalizable manner, and do not result in insular applications that are difficult to ap ply in other contexts. Third, we need to begin to consider measurements for system performance of expertise location systems. Analogously to benchmarking systems information retrieval, we need to define performance criteria and system outcomes
with a larger number of participants, at other institutions/research settings and in other geographic locations. A competitive analysis of existing systems may have provided additional and useful formal data to this study. However, the rapidly moving market for such systems would have reduced the usefulness of such an evaluation beyond a very limited time frame. A related question is how well the requirements, which are mainly based on findings from biomedical disciplines, generalize to other scientific domains. While we drew on literature that included studies from a variety of scientific disciplines, our observations and interviews were conducted predominantly with biomedical scientists. Therefore, claims of generalizability are difficult to make, especially given the specific history, culture and structure of the biomedical research enterprise in the US. For instance, federal funding agencies, such as the NIH, play a very prominent role in shaping researcher behavior and priorities. (The current trend towards multidisciplinary research is an example.) Second, non-research oriented organizations, such as for-profit hospital systems, function both as data providers and employers of some researchers. This circumstance can influence collaborative behavior, for instance when the organization attempts to preserve its competitive advantage through policies limiting collaboration. Clearly, the history and tradition of collaborative work in a discipline can influence individual behavior. As a recent book suggests [33], some research areas, such as high-energy physics and astronomy, have a much stronger tradition of collaboration and data sharing than other fields. While the requirements articulated in this paper may be seen as a viable starting point, additional work is needed to understand the degree to which they can be generalized. Additional studies, both in biomedical science and in other fields, should also be helpful in elucidating some of the implicit contradictions in the current list of requirements. For instance, the desire for privacy of selected information (Requirement 8) conflicts, to some degree, with the need to provide comprehensive information (Requirement 2) and the desire to search effectively across disciplines (Requirement 9). The trade-offs among the requirements are likely context-dependent, and further research should provide insight into situations and use cases where and how particular trade-offs should be made. As shown above, Semantic Web technologies have significant potential for addressing the requirements for expertise location systems. Integrating information from disparate and inhomogeneous sources using ontologies and annotation frameworks are key to creating the rich and comprehensive profiles that are the basis for making connections among researchers. Several challenges present themselves for future research in this context. First, we need to understand in more depth how scientists seek, evaluate and choose evaluators. Such research should include, for instance, factors that motivate and prompt scientists to look for collaborators; the criteria they use to evaluate them; and circumstances influencing the adoption of new tools to support the formation of collaboration. Second, we need to begin the process of translating system requirements into Semantic Web applications. Early efforts in this area have been encouraging [13,23]. However, we need to ensure that these applications work in a generalizable manner, and do not result in insular applications that are difficult to apply in other contexts. Third, we need to begin to consider measurements for system performance of expertise location systems. Analogously to benchmarking systems in information retrieval, we need to define performance criteria and system outcomes
What constitutes a"relevant hit" in an expertise location system? How does relevance vary based on different user characteristics? What role do semantic technologies play n achieving and assessing system outcomes? As we address these research questions, xpertise location systems have the potential to become increasingly important in en- hancing and strengthening scientific collaboration Acknowledgments This project was, in part, supported by grant ULl RRO24153 from the National Cen- ter for Research Resources(NCRR), a component of the National Institutes of Health and NIH Roadmap for Medical Research. We appreciate Ellen Detlefsen,s and Erin Nordenberg's help with interviewing scientists, Janice Stankowicz's help with managing the interview process, and Michael Dziabiak's help with formatting and submission. Special thanks go to the reviewers and their constructive comments, which helped improve the paper significantly Daniel Weiss's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author nce References Braun, T, Schubert, A: A Quantitative View on the Coming of Age of Interdisciplinarity in the Sciences 1980-1999. Scientometrics. 58, 183-189(2003) Moses, H. Ill, Dorsey, E.R., Matheson, D H, Their, S.O. Financial Anatomy of Biomedi- cal Research. JAMA. 294, 1333-1342(2005) 3. ODell, C, Grayson Jr, C.J. If Only We Knew What We Know: The Transfer of Internal nowledge and Best practice. Free Press. New York (1998 4. Stenmark, D. Leveraging Tacit Organizational Knowledge. J Manage Inform. Syst. 17, 9 5. Kraut, R.E., Galegher, J, Egido, C: Relationships and Tasks in Scientific Research Col- laboration. Hum-Comput Interact 3, 31-58(1987-1988) 6. McDonald, D w, Ackerman, M.S.: Expertise Recommender: A Flexible Recor System and Architecture. In: Proceedings of the 2000 ACM Conference on Computer Sup ooperative Work, Pp 231-240. ACI 7. Erickson, T, Kellogg, w.A.: Knowledge Communities: Online Environments for Support ing Knowledge Management and Its Social Context. In: Ackerman. M.S., Pipek, v, wulf, V.(eds Sharing Expertise: Beyond Knowledge Management. pp. 299-325. MIT Press Cambridge (2003) 8. Millen, D R, Fontaine, M.A., Muller, M.J. Understanding the Benefit and Costs of Com- CM.45,69-73 (2002) 9. Ackerman, M.S., Palen, L: The Zephyr Help Instance: Promoting ongoing Activity in a CSCW System. In: Tauber, M.J. (ed )Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Common Ground, pp 268-275. ACM, New 10. Jacovi, M, Soroka, V, Ur, S: Why Do We ReachOut?: Functions of a Semi-Persistent Peer Support Tool. In: Proceedings of the 2003 International ACM SIGGROUP Conf ence on Supporting Group Work, pp. 161-169. ACM, New York (2003 I1. Friedman, P.W., Winnick, B L, Friedman, C P, Mickelson, P C: Development of a MeSH-based Index of Faculty Research Interests. Proc. AMIA. Symp., 265-269(2000)
What constitutes a “relevant hit” in an expertise location system? How does relevance vary based on different user characteristics? What role do semantic technologies play in achieving and assessing system outcomes? As we address these research questions, expertise location systems have the potential to become increasingly important in enhancing and strengthening scientific collaboration. Acknowledgments This project was, in part, supported by grant UL1 RR024153 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH) and NIH Roadmap for Medical Research. We appreciate Ellen Detlefsen’s and Erin Nordenberg’s help with interviewing scientists, Janice Stankowicz’s help with managing the interview process, and Michael Dziabiak’s help with formatting and submission. Special thanks go to the reviewers and their constructive comments, which helped improve the paper significantly. Daniel Weiss's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author. References 1. Braun, T., Schubert, A.: A Quantitative View on the Coming of Age of Interdisciplinarity in the Sciences 1980-1999. Scientometrics. 58, 183-189 (2003) 2. Moses, H. III., Dorsey, E.R., Matheson, D.H., Their, S.O.: Financial Anatomy of Biomedical Research. JAMA. 294, 1333-1342 (2005) 3. O'Dell, C., Grayson Jr., C.J.: If Only We Knew What We Know: The Transfer of Internal Knowledge and Best Practice. Free Press, New York (1998) 4. Stenmark, D.: Leveraging Tacit Organizational Knowledge. J. Manage. Inform. Syst. 17, 9- 24 (2000) 5. Kraut, R.E., Galegher, J., Egido, C.: Relationships and Tasks in Scientific Research Collaboration. Hum-Comput. Interact. 3, 31-58 (1987-1988) 6. McDonald, D.W., Ackerman, M.S.: Expertise Recommender: A Flexible Recommendation System and Architecture. In: Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, pp. 231-240. ACM, New York (2000) 7. Erickson, T., Kellogg, W.A.: Knowledge Communities: Online Environments for Supporting Knowledge Management and Its Social Context. In: Ackerman, M.S., Pipek, V., Wulf, V. (eds.) Sharing Expertise: Beyond Knowledge Management. pp. 299-325. MIT Press, Cambridge (2003) 8. Millen, D.R., Fontaine, M.A., Muller, M.J.: Understanding the Benefit and Costs of Communities of Practice. Commun. ACM. 45, 69-73 (2002) 9. Ackerman, M.S., Palen, L.: The Zephyr Help Instance: Promoting ongoing Activity in a CSCW System. In: Tauber, M.J. (ed.) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Common Ground, pp. 268-275. ACM, New York (1996) 10. Jacovi, M., Soroka, V., Ur, S.: Why Do We ReachOut?: Functions of a Semi-Persistent Peer Support Tool. In: Proceedings of the 2003 International ACM SIGGROUP Conference on Supporting Group Work, pp. 161-169. ACM, New York (2003) 11. Friedman, P.W., Winnick, B.L., Friedman, C.P., Mickelson, P.C.: Development of a MeSH-based Index of Faculty Research Interests. Proc. AMIA. Symp., 265-269 (2000)