正在加载图片...
matching techniques can be used on these bags-of-concepts(BOC). However, these ap. proaches fail to determine the right matches when there is no direct overlap/intersection in the concepts. For example, do two users with Yahoo and Google in their respective profiles have nothing in common? There does seem to be an intersection in these user interests for Web-based IT companies or web search tools! Such overlaps are missed as current approaches work under the assumption that the profile representations (B contain all the information about the user. As a result, relationships that are not explicit in the representations are usually ignored. Furthermore, these mechanisms cannot handle user profiles that are at different levels of granularity or abstractions(e. g, jazz and musi as the implicit relationship between the concepts is ignored In this paper, we solve the above issues in user profile matching through effective use of ontologies. We define the notion of semantic similarity between two user profiles to consider inherent relationships between concepts/words appearing in their respective BOW representation. We use the process of spreading to include additional related terms to a user profile by referring to an ontology (wordnet or wikipedia)and experiment with multiple techniques to enable better profile matching. We propose simple metrics for computing similarity between two user profiles with ontology-based Spreading Activation NetworkS(SAN). We evaluate multiple mechanisms for extending user profiles(set and graph based spreading) and semantic matching(set intersection and bipartite graphs) of profiles. We show the effectiveness of our user profile matching techniques for accuracy in expert-ranking as well as candidate selection. From a given set of user profiles, our bipartite-graph based algorithms can accurately spot an expert just within its top three ranks. In applications where a group of candidate users need to be found (for a job interview), we get very good precision and recall as well The organization of the rest of this document is as follows. We describe different related research efforts for profile building and ontology-based semantic matching tech- section 2 followed by a brief section giving some background and definitions needed to understand our solution. An overview of the our spreading process is presented in Section 4. We present our new similarity measures in Section 5. We describe our evaluation procedure for expert finding in Section 6 and share our improved results. We summarize our contributions and state possible future work in Section 7 2 Related work Determining interest profiles of users based on their personal documents is an important research topic in information extraction and a number of techniques to achieve this have been proposed. Expert finding techniques that combine multiple sources of expertise evidence such as academic papers and social citation network have also bee proposed [1] User profiles have been extracted using multiple types of corpora-utilizing knowledge about the expert in Wikipedia [2], analysing the expert's documents [3-5], and analysing openly accessible research contributions of the expert [6]. Use of Wikipedia corpus to generate semantic user profiles [7]have been seen. Pre-processing the profile terms by mapping terms to such ontology concepts prior to computing cosine similarity has been shown to yield better matching [ 3]. A number of traditional similarity measurement techniques such as the cosine similarity measure or term vector similarity [8, 9), Dice's efficient [10] and Jaccards index [ll] are used in profile matching. For example, Jaccards index is used in [2] to match expert profiles constructed using Wikipedia knowledge. This approach will not determine a semantic inexact match when there is no direct overlap in the concepts in the two user profiles. Use of knowledge obtained frommatching techniques can be used on these bags-of-concepts (BOC). However, these ap￾proaches fail to determine the right matches when there is no direct overlap/intersection in the concepts. For example, do two users with Yahoo and Google in their respective profiles have nothing in common? There does seem to be an intersection in these users’ interests for Web-based IT companies or web search tools! Such overlaps are missed as current approaches work under the assumption that the profile representations (BOW) contain all the information about the user. As a result, relationships that are not explicit in the representations are usually ignored. Furthermore, these mechanisms cannot handle user profiles that are at different levels of granularity or abstractions (e.g., jazz and music) as the implicit relationship between the concepts is ignored. In this paper, we solve the above issues in user profile matching through effective use of ontologies. We define the notion of semantic similarity between two user profiles to consider inherent relationships between concepts/words appearing in their respective BOW representation. We use the process of spreading to include additional related terms to a user profile by referring to an ontology (Wordnet or Wikipedia) and experiment with multiple techniques to enable better profile matching. We propose simple metrics for computing similarity between two user profiles with ontology-based Spreading Activation Networks (SAN). We evaluate multiple mechanisms for extending user profiles (set and graph based spreading) and semantic matching (set intersection and bipartite graphs) of profiles. We show the effectiveness of our user profile matching techniques for accuracy in expert-ranking as well as candidate selection. From a given set of user profiles, our bipartite-graph based algorithms can accurately spot an expert just within its top three ranks. In applications where a group of candidate users need to be found (for a job interview), we get very good precision and recall as well. The organization of the rest of this document is as follows. We describe different related research efforts for profile building and ontology-based semantic matching tech￾niques in section 2 followed by a brief section giving some background and definitions needed to understand our solution. An overview of the our spreading process is presented in Section 4. We present our new similarity measures in Section 5. We describe our evaluation procedure for expert finding in Section 6 and share our improved results. We summarize our contributions and state possible future work in Section 7. 2 Related Work Determining interest profiles of users based on their personal documents is an important research topic in information extraction and a number of techniques to achieve this have been proposed. Expert finding techniques that combine multiple sources of expertise evidence such as academic papers and social citation network have also bee proposed [1]. User profiles have been extracted using multiple types of corpora - utilizing knowledge about the expert in Wikipedia [2], analysing the expert’s documents [3–5], and analysing openly accessible research contributions of the expert [6]. Use of Wikipedia corpus to generate semantic user profiles [7] have been seen. Pre-processing the profile terms by mapping terms to such ontology concepts prior to computing cosine similarity has been shown to yield better matching [3]. A number of traditional similarity measurement techniques such as the cosine similarity measure or term vector similarity [8, 9], Dice’s coefficient [10] and Jaccard’s index [11] are used in profile matching. For example, Jaccard’s index is used in [2] to match expert profiles constructed using Wikipedia knowledge.This approach will not determine a semantic inexact match when there is no direct overlap in the concepts in the two user profiles. Use of knowledge obtained from
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有