《电子商务 E-business》阅读文献：A Hybrid Recommender System Guided by. Semantic User Profiles for Search in the E-learning Domain

团购合买资源类别：文库，文档格式：PDF，文档页数：10，文件大小：710.6KB

JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 a Hybrid recommender system guided b Semantic User Profiles for Search in the E-learning domain Leyla Zhuhadar and Olfa Nasraoui Knowledge Discovery and Web Mining Lab Department of Computer Engineering and Computer Science University of Louisville, Louisville, KY 40292, USA mails: leyla. zhuhadar @wku.edu, olfa nasraoui@louisville. edu Abstrack-Various concepts, methods, and technical user interface. We noticed, by tracking user behavior in architectures of recommender systems have been integrated our applied personalized vertical search engine, into E-commerce storefronts, such as Amazon. com, Netflix, HyperMany Media, that using general recommendation te.Thereby recently, Web users have become more methods was not sufficient to make users interested in familiar with the notion of recommendations. Nevertheless. Ising the recommendations provided by the system into scientific information retrieval repositories, such as However, if the recommender system was tailored to the libraries, content management systems, online learning user's specific needs via personalization, the user got platforms, etc. This paper presents an implementation of a more interested and engaged into the recommendation hybrid recommender system to p process. Our finding resulted experience on a real online learning repository and vertical personalization aspect. We considered personalization as search engine named HyperMany Media. This repository the main building block of the recommender system contains educational content of courses, lectures, architecture. This conclusion is noticeable in most of the multimedia resources, ete. The main objective of this paper recommender systems that succeeded. Their success was is to illustrate the methods, concepts, and architecture that a result not of the complexity of the theoretical HyperMany Media repository. This recommender system is methodology that has been used to design the system, but driven by two types of recommendations: content-based rather of the usability and the simplicity of the (domain ontology model and rule-based(learner's interest- recommender system interface which guides the user based and cluster-based). Finally, combining the content- without interrupting his/her activities. In this paper, we ased and the rule-based models provides the user with present an implementation of a hybrid recommender hybrid recommendations that influence the ranking of the system on a search engine frontend to a real online retrieved documents. with different weights. Our learning repository named HyperManyMedia. This experiments were carried out on the HyperManyMedia repository contains educational content of courses, mantie search engine at Western Kentucky University. We lectures. multimedia resources, etc. The main objective of effectiveness of re-ranking based on the learner' s semantic this paper is to illustrate the methods, concepts, and profile. Overall, the results demonstrate the effectiveness architecture that we used to integrate the recommender the re-ranking based on personalization system into the HyperMan a reposit recommender system is driven by two types of Index Terms- recommender system, search engine, clus- recommendations: content-based (domain ontology tering, personalization, semantic profi model) and rule-based (learner's interest-based and cluster-based). The domain ontology model which is used L. INTRODUCTION to represent the learning materials, is composed of a The work presented in this paper describes a hybrid ierarchy of concepts and subconcepts that represent recommendation based retrieval model that can filter colleges, courses, and lectures, whereas, the learner's information based on user needs. We believe that the ontology model represents a subset of the domain methodology for designing an efficient recommender ontology(an ontology that contains only a personalized system, regardless of the approach used, i.e,content Tune ubset from the whole domain which consists based, collaborative, or hybrid, is to incorporate the only of the college/courses/lectures that the learner is following essential elements: contextual information, user interested in). Finally, combining the content-based and nteraction with the system, flexibility of receiving commend a ations provides the user with recommendations in a less intrusive manner, detecting the hybrid recommendations that influence the ranking of the user's change of interest and responding accordingly retrieved documents via different weights. However, supporting user feedback, and finally the simplicity of the before describing the design of our system,we first C 2010 ACADEMY PUBLISHER doi:10.4304/etwi24.272-281

A Hybrid Recommender System Guided by Semantic User Profiles for Search in the E-learning Domain Leyla Zhuhadar and Olfa Nasraoui Knowledge Discovery and Web Mining Lab Department of Computer Engineering and Computer Science University of Louisville, Louisville, KY 40292, USA Emails: leyla.zhuhadar@wku.edu, olfa.nasraoui@louisville.edu Abstract—Various concepts, methods, and technical architectures of recommender systems have been integrated into E-commerce storefronts, such as Amazon.com, Netflix, etc. Thereby, recently, Web users have become more familiar with the notion of recommendations. Nevertheless, little work has been done to integrate recommender systems into scientific information retrieval repositories, such as libraries, content management systems, online learning platforms, etc. This paper presents an implementation of a hybrid recommender system to personal the user’s experience on a real online learning repository and vertical search engine named HyperManyMedia. This repository contains educational content of courses, lectures, multimedia resources, etc. The main objective of this paper is to illustrate the methods, concepts, and architecture that we used to integrate a hybrid recommender system into the HyperManyMedia repository. This recommender system is driven by two types of recommendations: content-based (domain ontology model) and rule-based (learner’s interestbased and cluster-based). Finally, combining the contentbased and the rule-based models provides the user with hybrid recommendations that influence the ranking of the retrieved documents with different weights. Our experiments were carried out on the HyperManyMedia semantic search engine at Western Kentucky University. We used Top-n-Recall and Top-n-Precision to measure the effectiveness of re-ranking based on the learner’s semantic profile. Overall, the results demonstrate the effectiveness of the re-ranking based on personalization. Index Terms— recommender system, search engine, clustering, personalization, semantic profile I. INTRODUCTION The work presented in this paper describes a hybrid recommendation based retrieval model that can filter information based on user needs. We believe that the methodology for designing an efficient recommender system, regardless of the approach used, i.e., contentbased, collaborative, or hybrid, is to incorporate the following essential elements: contextual information, user interaction with the system, flexibility of receiving recommendations in a less intrusive manner, detecting the user’s change of interest and responding accordingly, supporting user feedback, and finally the simplicity of the user interface. We noticed, by tracking user behavior in our applied personalized vertical search engine, HyperManyMedia, that using general recommendation methods was not sufficient to make users interested in using the recommendations provided by the system. However, if the recommender system was tailored to the user’s specific needs via personalization, the user got more interested and engaged into the recommendation process. Our finding resulted in generalizing the personalization aspect. We considered personalization as the main building block of the recommender system architecture. This conclusion is noticeable in most of the recommender systems that succeeded. Their success was a result not of the complexity of the theoretical methodology that has been used to design the system, but rather of the usability and the simplicity of the recommender system interface which guides the user without interrupting his/her activities. In this paper, we present an implementation of a hybrid recommender system on a search engine frontend to a real online learning repository named HyperManyMedia. This repository contains educational content of courses, lectures, multimedia resources, etc. The main objective of this paper is to illustrate the methods, concepts, and architecture that we used to integrate the recommender system into the HyperManyMedia repository. This recommender system is driven by two types of recommendations: content-based (domain ontology model) and rule-based (learner’s interest-based and cluster-based). The domain ontology model which is used to represent the learning materials, is composed of a hierarchy of concepts and subconcepts that represent colleges, courses, and lectures; whereas, the learner’s ontology model represents a subset of the domain ontology (an ontology that contains only a personalized, pruned subset from the whole domain which consists only of the college/courses/lectures that the learner is interested in). Finally, combining the content-based and rule-based recommendations provides the user with hybrid recommendations that influence the ranking of the retrieved documents via different weights. However, before describing the design of our system, we first 272 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 © 2010 ACADEMY PUBLISHER doi:10.4304/jetwi.2.4.272-281

JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 present a comprehensive background of the origin of relevance, but there is no reference to the type of measure recommender systems and other related work in Section used in the ACM Portal, as cited in [14,(2) behavior Il. Then, we present various methodologies that we used based recommendations presented as"Peer-to-Peer in Section Ill, followed with a detailed description of our readers of this article also read". According to [14], this implementation and the evaluation results in Section IV. recommendation is built using simple frequency counts Finally, we draw our conclusions and therefore fails to provide accurate recommendations According to [21, IEEE Xplore announced the mplementation of content-based recommendations on IL. PREVIOUS WORK heir portal. Nevertheless, to date, no such recommende The scope of literature review in this paper concerns system is embedded into the IEEE Xplore libraries recommender systems in academic repositories. More However, CiteSeerl showed a promising venue for the specifically, we are interested in answering the following usage of recommender systems. The first prototype question: What is the current state-of-the-Art and the recommendations: (I) link structure-based rec- next generation of recommender systems in academic ommendation those recommendations are based on link and e-books repositories consider the value of citations and they can be distinguished into tour types of embedding recommender systems into their system. inside the searched document, recommend documents To answer this question, we reviewed the most popular that cite the document, the Co-citation and the active scientific digital libraries. In addition, we investigated bibliography),(2) content-based recommendations using (TF-IDF) similarity metrics and (3) explicit some of the promising Web 2.0 digital libraries that use recommendations, where the user can rank the retrieved recommender systems documents on a scale of I to 5. In addition, the user can le believe that a few of the main services that write a review or a comment about the paper. However the progress of this portal apparently stopped since 2006. benefit from the usage of recommender systems are The success of Google Scholar is evident even though it digital libraries. In particular, when we compare the usability of search engines with digital libraries, we provides limited recommendations, e.g., finding similar documents based on content and the ranking of those notice that the design of search engines has changed documents may be inherited from Google's page ranking dramatically over the last decade. Web users can easily algorithm. Another limitation of Google Scholar is that it search for resources using search engines. This flexibility does not retrieve documents that are cited inside a interface.However,digital libraries did not adapt to those this specific document. As we noticed, a variety of changes.The complexity of using a combined recommender systems portals have been implemented methodology of Boolean operators with Metadata fields the domain of digital libraries and scientific repositories, to retrieve resources from databases is considered to be a tedious process, especially for the new generation of Web some of which succeeded while others failed to survive users who are not used to spending a long time to search In the following paragraph, we discuss two significant c recommender systems tor resources. For example, many Web users now prefer first is the Melvyl recommender system, which has been using Google Scholar to search for journal articles, research papers, and e-books, regardless of its limitation implemented by the California digital library2. This to provide the user with a complete access to the resource system uses a simple technique to provide recommendations to users. First, it generates a graph nless the user already set his/her digital libraries access all the purchased documents in the library, then each inside the advanced feature in Google Scholar), than the document is considered as a weighted node (with the digital libraries, such as ACM, IEEE Xplore or CiteSeer. weight representing the number of purchases).Therefore Thus it appears that the simplicity of the Google Scholar the recommendation for a given document is based on the interface surpasses the accuracy that major digital libraries provide. However, ACM, IEEE Xplore and eighboring nodes (documents) which are sorted CiteSeer incorporated some techniques that could be according to their edge ights. The second specialized for the dom considered as a form of recommendations (with little scientific papers, It uses hybrid recommendations success). For example, ACM Portal provides two types of recommendations:(1)a content-based research tool combining a collaborative filtering and a content-based known as find similar articles" The mechanism used to pproach. The system uses graph theory where each find similar pat research paper is considered as a node and the citations involves three techniques: cluster inside each paper are considered as recommended nodes analysis, dictionary and thesauri. The retrieved Also, the system uses a more complex collaborative documents are ranked based on date, publisher, or filtering(CF)technique that considers each cited paper as an input, therefore also considering all citation papers as http://citeseeristpsu.edu recommendations. This technique is referred to as Dense http://www.dlib.org/architexT/at-dlib2query.html CF ly, the system applies a content-based http://techlens.cs.umn.edu/tl3 recommendation technique (TF-IDF) on the list of all C 2010 ACADEMY PUBLISHER

present a comprehensive background of the origin of recommender systems and other related work in Section II. Then, we present various methodologies that we used in Section III, followed with a detailed description of our implementation and the evaluation results in Section IV. Finally, we draw our conclusions. II. PREVIOUS WORK The scope of literature review in this paper concerns recommender systems in academic repositories. More specifically, we are interested in answering the following question: What is the current state-of-the-Art and the next generation of recommender systems in academic repositories and do scientific portals, digital libraries, and e-books repositories consider the value of embedding recommender systems into their system? To answer this question, we reviewed the most popular scientific digital libraries. In addition, we investigated some of the promising Web 2.0 digital libraries that use recommender systems. We believe that a few of the main services that can benefit from the usage of recommender systems are digital libraries. In particular, when we compare the usability of search engines with digital libraries, we notice that the design of search engines has changed dramatically over the last decade. Web users can easily search for resources using search engines. This flexibility is provided by the simplicity of the search engines’ user interface. However, digital libraries did not adapt to those changes. The complexity of using a combined methodology of Boolean operators with Metadata fields to retrieve resources from databases is considered to be a tedious process, especially for the new generation of Web users who are not used to spending a long time to search for resources. For example, many Web users now prefer using Google Scholar to search for journal articles, research papers, and e-books, regardless of its limitation to provide the user with a complete access to the resource (unless the user already set his/her digital libraries’ access inside the advanced feature in Google Scholar), than the digital libraries, such as ACM, IEEE Xplore or CiteSeer. Thus it appears that the simplicity of the Google Scholar interface surpasses the accuracy that major digital libraries provide. However, ACM, IEEE Xplore and CiteSeer incorporated some techniques that could be considered as a form of recommendations (with little success). For example, ACM Portal provides two types of recommendations: (1) a content-based research tool known as “find similar articles”. The mechanism used to find similar papers involves three techniques: cluster analysis, dictionary and thesauri. The retrieved documents are ranked based on date, publisher, or relevance, but there is no reference to the type of measure used in the ACM Portal, as cited in [14], (2) behaviorbased recommendations presented as “Peer-to-Peer readers of this article also read”. According to [14], this recommendation is built using simple frequency counts, and therefore fails to provide accurate recommendations. According to [2], IEEE Xplore announced the implementation of content-based recommendations on their portal. Nevertheless, to date, no such recommender system is embedded into the IEEE Xplore libraries. However, CiteSeer1 showed a promising venue for the usage of recommender systems. The first prototype provided the users with three different types of recommendations: (1) link structure-based recommendation: those recommendations are based on link citations and they can be distinguished into four types of recommendations (recommend documents that are cited inside the searched document, recommend documents that cite the document, the Co-citation and the active bibliography), (2) content-based recommendations using (TF-IDF) similarity metrics and (3) explicit recommendations, where the user can rank the retrieved documents on a scale of 1 to 5. In addition, the user can write a review or a comment about the paper. However, the progress of this portal apparently stopped since 2006. The success of Google Scholar is evident even though it provides limited recommendations, e.g., finding similar documents based on content and the ranking of those documents may be inherited from Google’s page ranking algorithm. Another limitation of Google Scholar is that it does not retrieve documents that are cited inside a specific document, but rather only the documents that cite this specific document. As we noticed, a variety of recommender systems portals have been implemented in the domain of digital libraries and scientific repositories, some of which succeeded while others failed to survive. In the following paragraph, we discuss two significant implementations of scientific recommender systems. The first is the Melvyl recommender system, which has been implemented by the California digital library2. This system uses a simple technique to provide recommendations to users. First, it generates a graph of all the purchased documents in the library, then each document is considered as a weighted node (with the weight representing the number of purchases). Therefore, the recommendation for a given document is based on the neighboring nodes (documents) which are sorted according to their edge weights. The second is TechLens3, which is specialized for the domain of scientific papers, it uses hybrid recommendations combining a collaborative filtering and a content-based approach. The system uses graph theory where each research paper is considered as a node and the citations inside each paper are considered as recommended nodes. Also, the system uses a more complex collaborative filtering (CF) technique that considers each cited paper as an input, therefore also considering all citation papers as recommendations. This technique is referred to as Dense CF. Finally, the system applies a content-based recommendation technique (TF-IDF) on the list of all 1 http://citeseer.ist.psu.edu 2 http://www.dlib.org/Architext/AT-dlib2query.html 3 http://techlens.cs.umn.edu/tl3 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 273 © 2010 ACADEMY PUBLISHER

JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 recommended papers. Thus, the most similar papers are IIL methodolOGY recommended to the user. The system provides two The methodology of designing our hybrid options:(1)Pure content-based CF(the similarity recommender system is divided into two parts:(1)The measure is only based on two entities, the title of th paper and the abstract, and (2) Content-based Separated- First we relied on a fine grained taxonomy that he final recommendations provided to the user would be encapsulates all the domain of Education in general, and a list of sorted recommendations that combine multiple taxonomy from WordNet. This attempt ended with a factors based on the type that the user chose. Recently with the increased popularity of social tagging systems great disappointment since WordNet wider and different than what the portals such as CiteULike4 and BibSonomys, are Hyper Many Media domain contains. As a result, two considered promising projects that use social bookmarking to derive recommendations. [ 1], [5].(61, [4] major problems occur, the overloading of the fine grained on the user profiles, In this case by learning from implicit ambiguity. Therefore, a decision was made to create a hand-made ontology using a coarse-grained taxonomy. In feedback or past click history. Other ways to form a user Section IV, we describe in detail the design of the domain model include using data mining, such as by mining ontology.(2)The second part centers around designing association rules [11], or by partitioning a set of user the learner's ontology: Each learner has his or her own sessions into clusters or groups of similar sessions. The latter groups are called session clusters [12],[10], or user ontology based on his/her preferences. The learner's profiles [12], [10]. More recently, [13 presented a ogy is extracted from the domain ontology and presented as a pruned subset ontology. In Section IV, we Semantic Web usage mining methodology for mining describe in detail the design of the learner's ontology In evolving user profiles on dynamic Websites by clustering the following sections, we describe the methodology used profiles of one period with those discovered in previous to provide the learm er with hybrid recommendations: (D periods to detect profile evolution, and also to understand Ontology Content-based,(2)Cluster-based, and (3) what type of profile evolutions have occurred. This latter Interest-based branch of using data mining techniques to discover user models from Web usage data is referred to as Web Usage A. Building the Hyper Many Media Domain Ontology ning.A previous work that used Web mining for Recently, a variety of knowledge-based framework ap- developing smart E-learning systems [16] integrated Web plications became available that support modeling ontolo- usage mining, where patterns were automatically gies. The best known applications are Protege and Al- discovered from users'actions, and then fed into a tova. We used Protege as a framework application recommender system that could assist learners in the Figure I shows the design of the HyperManyMedia online learning activities by suggesting actions or ontology in Protege. Since our approach is based on a resources to a user. Another type of data mining in E- search engine recommender system, the content of each learning was performed on documents rather than on the lecture is considered as a document and the students'actions. This type of data mining is more akin to recommendation of pages is related to the degree of text mining(i.e, knowledge discovery from text data) and the re than Web usage mining [3].This approach helps alleviate indexing of the lecture (Webpage).The volume of data that can be overwhelming for a learner. It Model(VSM)and the score of a query g for a document works by organizing the articles and documents based on d is computed based on the cosine similarity between the the topics and also providing summaries for documents. document and the query vector. The implementation can [7] combines Web usage mining and text-based indexing be described as follows: (1)Preliminary crawling and recommendations. I8]uses a learning algorithm to select platform that contributes to the content of the sequential articles based on context and user-click recommendation;(2)We start by representing each feedback to recommend news articles to users. Our the n documents as a term vector d=, approach shares some similarity with the above where w is the term weight for term (), combining the techniques. It is a Hybrid recommender system which term frequency, tf, and the Term's Inverse Document combines Content-based recommendations with two types of Rule-based recommendations. In Section Ill, we requency IDF;=log- if this term occurs in ni explain our methodology, followed by the imple- documents, as wi=tf* log -, and(3)Building the E entation section. Finally, we present our evaluations nd we conclude with our key findings learning Domain Ontology: Let R represent the root of C 2010 ACADEMY PUBLISHER

recommended papers. Thus, the most similar papers are recommended to the user. The system provides two options: (1) Pure content-based CF (the similarity measure is only based on two entities, the title of the paper and the abstract, and (2) Content-based SeparatedCF, where the whole text in the papers is considered as the final recommendations provided to the user would be a list of sorted recommendations that combine multiple factors based on the type that the user chose. Recently, with the increased popularity of social tagging systems, portals such as CiteULike4 and BibSonomy5, are considered promising projects that use social bookmarking to derive recommendations. [1], [5], [6], [4] used a different approach to recommend documents based on the user profiles, In this case by learning from implicit feedback or past click history. Other ways to form a user model include using data mining, such as by mining association rules [11], or by partitioning a set of user sessions into clusters or groups of similar sessions. The latter groups are called session clusters [12], [10], or user profiles [12], [10]. More recently, [13] presented a Semantic Web usage mining methodology for mining evolving user profiles on dynamic Websites by clustering the user sessions in each period and relating the user profiles of one period with those discovered in previous periods to detect profile evolution, and also to understand what type of profile evolutions have occurred. This latter branch of using data mining techniques to discover user models from Web usage data is referred to as Web Usage Mining. A previous work that used Web mining for developing smart E-learning systems [16] integrated Web usage mining, where patterns were automatically discovered from users’ actions, and then fed into a recommender system that could assist learners in their online learning activities by suggesting actions or resources to a user. Another type of data mining in Elearning was performed on documents rather than on the students’ actions. This type of data mining is more akin to text mining (i.e., knowledge discovery from text data) than Web usage mining [3]. This approach helps alleviate some of the problems in E-learning that are due to the volume of data that can be overwhelming for a learner. It works by organizing the articles and documents based on the topics and also providing summaries for documents. [7] combines Web usage mining and text-based indexing and search in the content to provide hybrid recommendations. [8] uses a learning algorithm to select sequential articles based on context and user-click feedback to recommend news articles to users. Our approach shares some similarity with the above techniques. It is a Hybrid recommender system which combines Content-based recommendations with two types of Rule-based recommendations. In Section III, we explain our methodology, followed by the implementation section. Finally, we present our evaluations and we conclude with our key findings. III. METHODOLOGY The methodology of designing our hybrid recommender system is divided into two parts: (1) The first part centers around designing the domain ontology: First, we relied on a fine grained taxonomy that encapsulates all the domain of Education in general, and E-learning in specific, by borrowing an already made taxonomy from WordNet. This attempt ended with a great disappointment since the terminology used in WordNet is far wider and different than what the HyperManyMedia domain contains. As a result, two major problems occur, the overloading of the fine grained taxonomy during the searching process and the ambiguity. Therefore, a decision was made to create a hand-made ontology using a coarse-grained taxonomy. In Section IV, we describe in detail the design of the domain ontology. (2) The second part centers around designing the learner’s ontology: Each learner has his or her own ontology based on his/her preferences. The learner’s ontology is extracted from the domain ontology and presented as a pruned subset ontology. In Section IV, we describe in detail the design of the learner’s ontology. In the following sections, we describe the methodology used to provide the learner with hybrid recommendations: (1) Ontology Content-based, (2) Cluster-based, and (3) Interest-based. A. Building the HyperManyMedia Domain Ontology Recently, a variety of knowledge-based framework applications became available that support modeling ontologies. The best known applications are Protégé6 and Altova7. We used Protégé as a framework application. Figure 1 shows the design of the HyperManyMedia ontology in Protégé. Since our approach is based on a search engine recommender system, the content of each lecture is considered as a document and the recommendation of pages is related to the degree of matching between a learner’s query and the reverseindexing of the lecture (Webpage). The HyperManyMedia search engine uses the Vector Space Model (VSM) and the score of a query q for a document d is computed based on the cosine similarity between the document and the query vector. The implementation can be described as follows: (1) Preliminary crawling and indexing (offline): crawling and indexing the E-learning platform that contributes to the content of the recommendation; (2) We start by representing each of the N documents as a term vector d = , where ݓi is the term weight for term (i), combining the term frequency, ݐ݂i, and the Term’s Inverse Document Frequency ܨܦܫi = log ே ௡೔ if this term occurs in ݊௜ documents, as ݓi = ݐ݂i כ log ே ௡೔ , and (3) Building the Elearning Domain Ontology: Let R represent the root of 274 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 © 2010 ACADEMY PUBLISHER

OURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 Asserted model Inferred model la somunieaeit FO Elenivei Analysis for Bu financial acco untiorla d ueg el lt EOMgmt_ OF. NonprofiT_ org 。 Teoria d igure 1. Hierarchical Structure of the Hyper Mammy Media Ontology. the domain which is represented as a tree, and Ci follows, let docs(U =UKer dki be the documents represent a concept under R, as r= Ui=, Ci where n is he number of concepts in the domain. Each concept Ci visited by the i learner Un, then starting from the leaves, consists either of subconcepts (Ci=Ui=1 SCii) document visited by the learner in the"domain semantic leaves which are the actual documents(Ci=Uk=1 dki). structure" and then increments the visit count(initialized We encoded the above semantic information into a tree- with 0) of each visited node along with its ancestors all structured domain ontology in OWL, based on the hierarchy of the E-learning resources. The root concepts the way up to the root. After back-propagating the counts e the colleges, while the sub-concepts are the courses, ScosimeiaF-lF- 2=1d q /2p=(d)2. 2y=(q)2 (1) and the leaves are the resources of the domain(lectures) IV IMPLEMENTATION of all the documents in this way in the domain structure eps only the concepts(colleges) A. Ontology Content-based Recommendations ind sub-concepts (courses) related to the learners The idea of a Content-based recommender system in an interests along with their weighted interests(which are the number of visits). When a learner searches for a the lectures that the learner has visited, the platform lecture using a specific query q, the cosine similarity recommends other lectures with content that is similar to measure is used to retrieve the most similar documents d the content of the viewed lectures We build the learner's that contain the terms in the query, as shown in equation ontology profile by extracting the learner interests from (1). As we mentioned in Section the ne user's profile. Let docs U =UK-1 dki be the Hyper Media search engine's scoring algorithm is based on the VSM. For each field, the score is computed documents visited by the i learner, U. The learners as follows ontology is considered as a subset of the E-learning domain ontology from Section IlL. A. Since the activity score(q, d)= coord(q, d)x query Norm(q)x 2(tf(t in d)x log of the user's activities records the visited documents id f(t)2 xt. getBoosto x norm(t, d)) (which are the leaves), a bottom-up pruning algorithm is used to extract the semantic concepts that the learner is Lucene(Apache) defines each term for equation(2)as interested in. Each learner U, R has a dynamic semantic follows [9], where ff( in d) is the number of times term t representation. First, we collect the learner's activities appears in the currently scored document d, defined over a period of time to form an initial lear C 2010 ACADEMY PUBLISHER

Figure 1. Hierarchical Structure of the HyperManyMedia Ontology. the domain which is represented as a tree, and Ci represent a concept under R, as ܴ ൌ ׫௜ୀଵ ௡ C୧, where n is the number of concepts in the domain. Each concept Ci consists either of subconcepts (C୧ ൌ ׫௝ୀଵ ௠ ܵܥ (௜௝or of leaves which are the actual documents (C୧ ൌ ׫௞ୀଵ ௟ ݀௞௜). We encoded the above semantic information into a treestructured domain ontology in OWL, based on the hierarchy of the E-learning resources. The root concepts are the colleges, while the sub-concepts are the courses, and the leaves are the resources of the domain (lectures). IV IMPLEMENTATION A. Ontology Content-based Recommendations The idea of a Content-based recommender system in an E-learning platform can be summarized as follows: Given the lectures that the learner has visited, the platform recommends other lectures with content that is similar to the content of the viewed lectures. We build the learner’s ontology profile by extracting the learner interests from the user’s profile. Let ݀݋ܿݏሺUiሻ ൌ ׫݇ൌ1 ݈ ݀݇݅ be the documents visited by the i th learner, Ui. The learner’s ontology is considered as a subset of the E-learning domain ontology from Section III.A. Since the activity log of the user’s activities records the visited documents (which are the leaves), a bottom-up pruning algorithm is used to extract the semantic concepts that the learner is interested in. Each learner Ui R has a dynamic semantic representation. First, we collect the learner’s activities over a period of time to form an initial learner profile, as follows, let ݀݋ܿݏሺUiሻ ൌ ׫݇ൌ1 ݈ ݀݇݅ be the documents visited by the i th learner Ui, then starting from the leaves, the bottom-up pruning algorithm searches for each document visited by the learner in the “domain semantic structure”, and then increments the visit count (initialized with 0) of each visited node along with its ancestors all the way up to the root. After back-propagating the counts of all the documents in this way in the domain structure, the pruning algorithm keeps only the concepts (colleges) and sub-concepts (courses) related to the learner’s interests along with their weighted interests (which are the number of visits). When a learner searches for a lecture using a specific query q, the cosine similarity measure is used to retrieve the most similar documents d that contain the terms in the query, as shown in equation (1). As we mentioned in Section III, the HyperManyMedia search engine’s scoring algorithm is based on the VSM. For each field, the score is computed as follows, ݏܿݎ݋݁ሺݍ݀ ,ሻ ൌ ܿݎ݋݋݀ሺݍ݀ ,ሻ ൈ ݑݍ݁ݕݎܰݎ݋݉ሺݍሻ ൈ ∑ሺݐ ݂ሺݐ݀ ݊݅ ሻ ൈ ݅݀ ݂ ሺݐሻ2 ൈ ݐ݁݃ .ݐݏ݋݋ܤݐሺሻ ൈ ݊ݎ݋݉ሺݐ݀ ,ሻሻ (2) Lucene8 (Apache) defines each term for equation (2) as follows [9], where tf (t in d) is the number of times term t appears in the currently scored document d, defined as ܵ஼௢௦௜௡௘ ൌ ௗ೅௤ ||ௗ||మ·||௤||మ ൌ ∑ ݀௝ݍ ௝ට∑ ሺ݀௝ሻ ௠ ଶ ௝ୀଵ ൗ · ௠ ௝ୀଵ ∑ ሺݍ௝ሻ ௠ ଶ ௝ୀଵ (1) JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 275 © 2010 ACADEMY PUBLISHER

JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 ff(t in d=frequency/, idf(o) is the inverse document cosine similarity to the query g. Our search engine(based requency(related to the number of documents in which on Nutch) uses optional boosting scores to determine the fr he term t appears), defined idf (t)=1+ importance of each term in an indexed document, when log(numDocs/docFreq 1) and coord(g, d) is a score adding up the document-to-query term factor based on how many of the query terms are found in matches in the cosine similarity. Thus a higher boosting the specified document, and query Norm(g) is a factor for a term will force a larger contribution from that normalizing factor used to make scores between queries term in the sum. We modified the boosting score follows: field set Boost=a, in case of Category I B, in case of Category2, and query Norm(sumofSquaredWeights)= field set Boost(=Y, in case of Category3. Accordingly, 1/sumofsquaredWeights1/2 ( all documents have been boosted and re-ranked based on two factors. Here, we are going to introduce the first The sum of squared weights(of the query terms) is factor and in the following section, the second factor computed by the query weight object. For example, for a Algorithm I maps the ranked documents to the learner's Boolean query, we compute this value using, semantic profile(learner's previous visited lectures)as Category I, where each document di, belonging to a sumofSquaredWeights=g get Boost 02.2 (id f(t) learner's semantic profile, is assigned a priority ranking t. getBoosto)- tin g (4)(a= 5.0). This boosting score has been implemented using field. setBoostO, the weight is only added to the Where t get Boost( is a search time boost of term t in the documents that the learner is interested in, based on query q as specified in the query text, or as set by appli- his/her previous activities(sessions). Since we used the cation calls to set Boost, there is only multi-terms boost ontology to generate the user profile, we named this type access, and so the boost of a term in the query is of recommendation Ontology ible by calling the sub-query getBoost(0), and Recommendations (t, d) encapsulates a few(indexing time)boost and length factors: (1)document boost, is set by calling lgorithm I Re-ranking a learner's search results doc. set Boost before adding the document to the index (2)field boost, is set by calling field set Boost( before Input: :/user index dding the field to a document, and (3)lengthNorm Input: aBy//threshold (field), is computed when the document is added to the Output: Rank -fd do.d re-rank index in accordance with the number of tokens of this kRk-ald, sc: d, li de/faul searchresuls fo field in the document, so that shorter fields contribute Rc-Und A+du. more to the score, and LengthNorm is computed by the foreach di∈Rank if d; E UR, then dacument is in ser profile Similarity class in effect at indexing. When a document is di boost=a added to the index, all the above factors are multiplied. If he document has multiple fields with the same name, all if d, E RC then /documenr is in recommended cluster heir boosts are multiplied together orm(t, d)= doc. get Boost. LengthNorm ffield) f. getBoostO end Son Ranks based ow the documen boost field d;, boost When a learner searches for lectures using a specific query q, the cosine similarity measure is used to retrieve the most similar documents that contain the terms in the B. cluster-based recommendations query. In our approach, these results have been re-ranked A total corpus consisting of around 7, 424 document based on two main factors: (1)the semantic relation (lectures), was divided into 4, 888 English documents and between these documents and the learner's semantic 2, 536 Spanish documents. In both cases,we profile, and (2) the most similar cluster to the learner's experimented with partitional algorithms, direct K-way semantic profile(recommended cluster). Algorithm I clustering(similar to K-means ), and repeated bisection or maps the ranked documents to the learner semantic Bisecting K-Means with all criterion functions. We also profile(Category I), where each document d, belonging experimented with graph-partitioning-based clustering to a learners semantic profile, is assigned a priority algorithms [15]. First, for clustering English documents, ranking(a=5.0), and each document d, belonging to the we compared different hierarchical algorithms for the recommended cluster(Category 2)is assigned a priority English corpus consisting of 4, 888 documents using the ranking(B =3.0), while the rest of the documents Category 3)have the lowest priority (r =1.0). The clustering package Cluto [15]. The best clustering method threshold of each parameter was decided heuristically after several trials (a=5.0, B=3.0, and y=1.0). All the s documents, in each category, are then re-ranked based on http://lucene.apacheorg/java/2_4_0/api/org/apache/lucene/search/si C 2010 ACADEMY PUBLISHER

tf (t in d)= frequency½, id f (t) is the inverse document frequency (related to the number of documents in which the term t appears), defined as ݂݅݀ሺݐሻ ൌ1൅ logሺ݊ݑ݉݋ܦܿݏ݀ ݋ܿݎܨ݁ݍ ൅ 1 ⁄ ሻ and coord(q,d) is a score factor based on how many of the query terms are found in the specified document, and queryNorm(q) is a normalizing factor used to make scores between queries comparable, ൌ ሻݏݐ݄ܹ݃݅݁݀݁ݎܽݑݍ݂ܱܵ݉ݑݏሺ݉ݎ݋ܰݕݎ݁ݑݍ (3⁄ ⁄ (2 1ݏݐ݄ܹ݃݅݁݀݁ݎܽݑݍ݂ܱܵ݉ݑݏ 1 The sum of squared weights (of the query terms) is computed by the query weight object. For example, for a Boolean query, we compute this value using, ൉ ሻݐሺ݅݀ ݂ ሺ . ∑ ሺሻ2ݐݏ݋݋ܤݐ݁݃ .ݍ ൌ ݏݐ݄ܹ݃݅݁݀݁ݎܽݑݍ݂ܱܵ݉ݑݏ ݐ݁݃ .ݐݏ݋݋ܤݐሺሻሻ2 t in q (4) Where t.getBoost() is a search time boost of term t in the query q as specified in the query text, or as set by application calls to setBoost(), there is only multi-terms boost access, and so the boost of a term in the query is accessible by calling the sub-query getBoost(), and norm(t,d) encapsulates a few (indexing time) boost and length factors: (1) document boost, is set by calling doc.setBoost() before adding the document to the index, (2)field boost, is set by calling field.setBoost() before adding the field to a document, and (3) lengthNorm (field), is computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score, and LengthNorm is computed by the Similarity class in effect at indexing. When a document is added to the index, all the above factors are multiplied. If the document has multiple fields with the same name, all their boosts are multiplied together, · ሻ݂݈݅݁݀ሺ݉ݎ݋݄ܰݐ݈݃݊݁ . ሺሻݐݏ݋݋ܤݐ݁݃ .ܿ݋݀ ൌ ሻ, ݀ݐሺ݉ݎ݋݊ (5 (ݐ ݏܽ ݀݁݉ܽ݊ ݀ ݊݅ ݂ ݈݂݀݁݅ ሺሻݐݏ݋݋ܤݐ݁݃ . ݂ ∏ When a learner searches for lectures using a specific query q, the cosine similarity measure is used to retrieve the most similar documents that contain the terms in the query. In our approach, these results have been re-ranked based on two main factors: (1) the semantic relation between these documents and the learner’s semantic profile, and (2) the most similar cluster to the learner’s semantic profile (recommended cluster). Algorithm 1 maps the ranked documents to the learner semantic profile (Category 1), where each document di, belonging to a learner’s semantic profile, is assigned a priority ranking (α = 5.0), and each document di belonging to the recommended cluster (Category 2) is assigned a priority ranking (β = 3.0), while the rest of the documents (Category 3) have the lowest priority (γ = 1.0). The threshold of each parameter was decided heuristically after several trials (α = 5.0, β = 3.0, and γ = 1.0). All the documents, in each category, are then re-ranked based on cosine similarity to the query q. Our search engine (based on Nutch) uses optional boosting scores to determine the importance of each term in an indexed document, when adding up the document-to-query term matches in the cosine similarity. Thus a higher boosting factor for a term will force a larger contribution from that term in the sum. We modified the boosting score as follows: field.setBoost() = α, in case of Category1, field.setBoost() = β, in case of Category2, and field.setBoost() = γ, in case of Category3. Accordingly, all documents have been boosted and re-ranked based on two factors. Here, we are going to introduce the first factor and in the following section, the second factor. Algorithm 1 maps the ranked documents to the learner’s semantic profile (learner’s previous visited lectures) as Category 1, where each document di, belonging to a learner’s semantic profile, is assigned a priority ranking (α = 5.0). This boosting score has been implemented using field.setBoost(), the weight is only added to the documents that the learner is interested in, based on his/her previous activities (sessions). Since we used the ontology to generate the user profile, we named this type of recommendation, Ontology Content-based Recommendations. B. Cluster-based Recommendations A total corpus consisting of around 7,424 documents (lectures), was divided into 4,888 English documents and 2,536 Spanish documents. In both cases, we experimented with partitional algorithms, direct K-way clustering (similar to K-means), and repeated bisection or Bisecting K-Means with all criterion functions. We also experimented with graph-partitioning-based clustering algorithms [15]. First, for clustering English documents, we compared different hierarchical algorithms for the English corpus consisting of 4,888 documents using the clustering package Cluto [15]. The best clustering method 8 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Si milarity.html 276 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 © 2010 ACADEMY PUBLISHER

TABLE I. ENGLISH CLUSTERS DESCRIPTIVE FEATURES Cluster 0 angle 4.20% prime 3.10% line 2.60% distance 2.60% Cluster 1 terms 2.60% child 2.40% means 2.10% stuttering 1.60% Cluster 2 called 1.50% war 1.40% sort 1.20% people 1.00% Cluster 3 flood 5.80% water 1.40% building 1.40% elevation 1.40% Cluster 4 audit 4.40% board 3.40% internal 2.90% management 2.40% Cluster 5 zero 3.80% grams 3.00% fraction 2.80% hundred 2.50% Cluster 6 material 4.30% materials 1.80% process 1.50% type 1.40% Cluster 7 time 2.50% times 1.90% rainfall 1.80% storm 1.50% Cluster 8 voice 2.10% vocal 1.90% speech 1.40% pitch 1.30% Cluster 9 class 5.50% java 4.20% method 4.00% methods 3.30% Cluster 10 price 7.30% market 4.40% cost 2.60% product 2.50% Cluster 11 mean 2.10% basically 2.10% five 1.90% data 1.80% Cluster 12 income 4.70% accounting 3.80% balance 3.60% statement 2.90% Cluster 13 data 7.10% system 2.80% database 2.80% server 2.60% Cluster 14 children 2.60% child 2.00% program 1.70% time 1.50% Cluster 15 course 6.60% assignments 5.80% class 1.90% topic 1.90% Cluster 16 equal 4.10% zero 3.30% look 2.80% negative 2.60% Cluster 18 game 9.30% theorem 7.30% muhamet 5.20% ergin 5.20% Cluster 19 transport 4.60% waves 3.70% environment 3.30% concentration 3.10% Cluster 20 poem 2.20% read 1.60% look 1.30% little 1.20% Cluster 21 information 6.00% systems 5.70% technology 5.10% organizational 3.60% Cluster 22 test 2.40% child 1.60% score 1.60% words 1.40% Cluster 23 five 4.30% times 4.00% example 2.90% nine 2.80% Cluster 24 deviance 7.10% social 6.60% deviant 3.60% identity 2.90% Cluster 25 square 9.50% squared 6.80% equal 4.20% times 3.00% Cluster 26 western 1.50% online 1.50% literature 1.50% course 1.40% Cluster 27 times 5.00% equal 3.60% minus 3.40% zero 2.70% Cluster 28 game 5.70% player 2.50% strategic 2.30% strategy 2.10% Cluster 29 time 1.50% product 1.30% look 1.20% example 1.10% Cluster 30 angle 8.60% equal 5.80% triangle 3.80% proposition 3.60% Cluster 31 lecture 11.60% global 2.40% population 1.90% species 1.80% Cluster 32 metal 2.90% formula 2.70% name 2.60% minus 2.30% Cluster 33 market 11.70% markets 8.90% competition 8.80% strategy 7.60% Cluster 34 transportation 6.70% land 3.10% planning 2.80% transit 2.50% Cluster 35 time 3.40% value 2.60% markets 2.10% resources 1.70% Cluster 36 transportation 6.70% land 3.10% planning 2.80% transit 2.50% Cluster 37 time 3.40% value 2.60% markets 2.10% resources 1.70% TABLE II. SPANISH CLUSTERS DESCRIPTIVE FEATURES Cluster 0 desagradables 33.30% aborrecible 33.30% repugnancia 33.30% accionistas 0.00% Cluster 1 ciclo 7.40% dep 4.00% global 3.40% azufre 3.00% Cluster 2 contabilidad 4.20% balance 4.10% pasivo 3.20% contable 1.90% Cluster 3 precios 5.70% producci 2.60% fijaci 2.20% discriminaci 2.00% Cluster 4 product 8.60% design 7.10% hill 7.10% mcgraw 7.10% Cluster 5 programa 2.00% coordenadas 1.80% gui 1.60% pdb 1.60% Cluster 6 teorema 11.60% conocimiento 11.60% espesamiento 11.10% trade 9.40% Cluster 7 conservaci 7.40% masa 7.30% difusi 6.20% volumen 5.60% Cluster 8 ajuste 20.70% ruido 17.30% persistente 7.80% stico 6.00% Cluster 9 patente 2.20% stephen 1.80% patentes 1.40% invenciones 1.30% Cluster 10 juego 6.20% juegos 5.90% nash 4.90% prueba 2.40% Cluster 11 subastas 10.10% equivalencia 8.20% subasta 4.60% licitaci 4.60% Cluster 12 colas 16.20% nacimiento 6.50% muerte 6.50% sistemas 5.20% Cluster 13 arrays 3.50% lista 2.60% array 1.40% elemento 1.40% Cluster 14 interpretaci 83.60% hoy 9.20% objetivos 6.50% los 0.60% Cluster 15 software 3.40% ide 1.90% requisitos 1.80% desarrollo 1.70% Cluster 16 red 2.70% fibra 2.40% paquetes 2.40% redes 2.30% Cluster 17 navegador 4.40% html 3.50% server 3.30% mime 2.90% Cluster 20 kang 11.50% arnold 9.80% james 9.80% barnett 9.80% Cluster 21 reacciones 10.90% reacci 4.30% concentraciones 4.20% concentraci 3.20% Cluster 22 xml 4.20% web 2.50% corba 1.60% servidor 1.10% Cluster 23 transporte 10.30% suelo 3.40% planificaci 3.20% teor 2.90% Cluster 24 nike 1.70% reputaci 1.60% industria 1.40% empresas 1.30% Cluster 25 pasajeros 9.10% mortalidad 8.90% desarrollados 6.80% vuelos 5.30% Cluster 26 hilo 6.30% hilos 4.20% eventos 2.00% deeventos 1.60% Cluster 27 desplazamiento 7.90% colas 7.10% servidores 6.50% ciudad 3.80% Cluster 28 productividad 19.20% primaria 11.80% lecturas 4.20% ecolog 3.60% Cluster 29 amortizaci 13.00% fiscal 5.00% gasto 4.30% impuestos 4.30% Cluster 30 replicador 22.00% ess 10.30% din 6.90% evolutiva 6.80% JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 277 © 2010 ACADEMY PUBLISHER

278 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 TABLE II SPANISH CLUSTERS DESCRIPTIVE FEATURES CONT etiqueta desviado Cluster 39 duraderas.309 500% 360% 1.70% 90% ster 42 cognitive 6.30% 90% aprobar 5.60%conocimientos5.60% luster 43 300% 4.50% Cluster 45 2.60% norte 250% luster 46 0% especificaciones 2.90% necesidades.70% .80% back 160% nsidad 360%fecundidad.40% edades2.60% WKU, Distance Learning HyperManyMedia Platform Home Back Semantic Search Engineering Search.ses围 etadata Search I cross Language Search sub Classes IVisuaLontoloay Search Database Sterner and systems courser lntroduction to Computers and Engineering Problem Soling. Fall 2005: Prof Judson Harward, graton Technologies to Computers and Engineering Problem soMng Fall 2005 Protessof-Introducton to Computes Engineering Problem soMing Fall Hydrology httpmypermanymediawku.edumitoowenglish/1.00Docleeture1php(gacheD)(epain)(ancherg(moreftommnermanmedawiuedFoodPlainmanaGement course floodplain Management: Professor CAMPBELL: tecture. Storm Water Utility SurverForma Audio OME COURSE. Floodplain Management DEPARTMENT- Engineering Lectures PROFESSOR. CAMPBELL Lecture: storm ntpampermanymedia wku ecu. FL- PodLecturei php (cache)(explain)anchors)(met from hnermanymega wiec dcourse-FloodpLain Management_CAMPBELL ecture10-. Hational Flood tnss rance Program.A Ingeniero OME COURSE: Flood g Lectures PROFESSOR: nupAmcermanymecia wiu ecu campbell FM PodLecture10 php (cache) (explain)anchors)(more from herman a wiec Figure 2. Semantic Terms Recommendations for the English corpus, which produced the highest the ontology). We consider extracting the most similar Purity =0.959 and with the lowest Entropy=0.05, was (recommended) cluster C =BestCluster, which is the Agglomerative Method, with Number of Clusters= 38, summarized by the Top n keywords(significant or using Clustering Criterion Function and Cosine frequent terms)to modify the learners semantic ontology Similarity Measures as inter-object similarity measure, as and adding the clusters terms as semantic terms under own in equation(1). Table I shows the descriptive the concepts(parent nodes)that these documents belong features in each cluster(those features that we added to to, as a Rule-based recommendation. In Algorithm l,we the ontology ) Second, for clustering Spanish documents defined this rule as Category 2, where each document d, we also compared different hierarchical algorithms for belonging to the recommended cluster is assigned a he Spanish corpus consisting of 2, 536 documents. The priority ranking(B=3.0). This boosting score has been best clustering method for this corpus, which produced implemented using field set Boost(. When a learner the highest Purity=0.927 and with the lowest Entropy= searches for lectures using a specific query q, the cosine 0. 140, was the Agglomerative Method, with Number of similarity measure is used to retrieve the most similar Clusters= 50, using Clustering Criterion Function and documents that contain the terms in the query. Those Cosine Similarity Measures as inter-object similarity documents are re-ranked based on the weighting factor B measure. Table Il and Table Ill show the descriptive Also, we name this type of recommendation, Cluster- features in each cluster(those features that we added to based recommendations C 2010 ACADEMY PUBLISHER

Figure 2. Semantic Terms Recommendations. for the English corpus, which produced the highest Purity = 0.959 and with the lowest Entropy = 0.05, was the Agglomerative Method, with Number of Clusters= 38, using Clustering Criterion Function and Cosine Similarity Measures as inter-object similarity measure, as shown in equation (1). Table I shows the descriptive features in each cluster (those features that we added to the ontology). Second, for clustering Spanish documents we also compared different hierarchical algorithms for the Spanish corpus consisting of 2,536 documents. The best clustering method for this corpus, which produced the highest Purity = 0.927 and with the lowest Entropy = 0.140, was the Agglomerative Method, with Number of Clusters = 50, using Clustering Criterion Function and Cosine Similarity Measures as inter-object similarity measure. Table II and Table III show the descriptive features in each cluster (those features that we added to the ontology). We consider extracting the most similar (recommended) cluster Ci = BestCluster, which is summarized by the Top n keywords (significant or frequent terms) to modify the learner’s semantic ontology and adding the cluster’s terms as semantic terms under the concepts (parent nodes) that these documents belong to, as a Rule-based recommendation. In Algorithm 1, we defined this rule as Category 2, where each document di belonging to the recommended cluster is assigned a priority ranking (β = 3.0). This boosting score has been implemented using field.setBoost(). When a learner searches for lectures using a specific query q, the cosine similarity measure is used to retrieve the most similar documents that contain the terms in the query. Those documents are re-ranked based on the weighting factor β. Also, we name this type of recommendation, Clusterbased Recommendations. TABLE III. SPANISH CLUSTERS DESCRIPTIVE FEATURES CONT. Cluster 31 mercados 5.10% poder 4.50% segmentos 4.10% marketing 4.00% Cluster 32 lotka 10.70% nichos 9.20% competencia 7.50% xplique 5.40% Cluster 33 integraci 2.30% organizativos 1.60% negocio 1.60% tecnolog 1.50% Cluster 34 ondas 17.00% onda 7.50% fluido 1.90% dispersi 1.80% Cluster 35 etiqueta 6.70% desviado 3.70% desviaci 3.20% negar 2.30% Cluster 36 juego 5.20% juegos 4.00% estrat 2.80% jugador 2.00% Cluster 37 gen 7.30% mendel 7.10% mutantes 4.20% genes 3.70% Cluster 38 aritm 11.50% operadores 7.30% estructuras 5.70% control 3.50% Cluster 39 duraderas 5.30% recurso 5.00% ventajas 3.60% podemos 2.40% Cluster 40 consultas 1.80% bases 1.70% filas 1.60% datos 1.60% Cluster 41 memoria 2.10% java 1.90% clases 1.90% clase 1.80% Cluster 42 cognitivo 6.30% decisi 5.90% aprobarla 5.60% conocimientos 5.60% Cluster 43 nodo 9.60% nodos 5.60% sub 3.00% rboles 2.70% Cluster 44 aparcamiento 5.50% transporte 4.50% viajes 3.00% mit 2.60% Cluster 45 lagos 2.60% especie 2.60% norte 2.50% avi 2.10% Cluster 46 dise 3.10% especificaciones 2.90% necesidades 2.70% piz 1.80% Cluster 47 contestar 3.30% redacte 1.70% feedback 1.60% quejaslea 1.40% Cluster 48 poblaci 3.80% densidad 3.60% fecundidad 3.40% edades 2.60% Cluster 49 huella 12.00% ecol 10.40% demogr 10.00% poblaci 5.10% 278 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 © 2010 ACADEMY PUBLISHER

280 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 in addition, the user is provided with recommendations length of query, we used the Top-100 most used queries based on an extended ontology by using the First Order The personalized semantic search shows an improvement Logic. In this case, we defined the following entities: in precision that varies between 5-25 % This has College, has Course, has Language, has Lecture, improvement is noticeable between the top-30 and top-50 has professor. sub Class of search results for single-keyword and two-keywords In addition, each entity has different characteristics queries. The recall results show a noticeable (Functional, Description), for example, we illustrate the improvement in recall between top-20 and top-40. Also characteristics of Entity has College:(Description: we can summarize the impact of the query size by College, Equivalent classes: Colegio, Superclasses: noticing that Precision is better when the size of a query Thing, Members: Accounting, Architecture and was I or 2; whereas, Recall starts with a better results for Manufacturing, Biology, etc, Disjoint: Sub classes). Two queries of size 2 till Top-20, then both queries of size 1, 2 most important definitions used in our ontology design converge almost to the same results. Overall, these results are the following: (1) Equivalent classes: Equivalent show the effectiveness of the re-ranking based on the classes equal to relation, to mention some of these learner's semantic profile entities( College Colegio, Engineering Ingenieria, English≡ Ingles, Social Work≡ Trabajo Social Chemistry Quimica, etc )and (2) Sub Class of VI CONCLUSION Related to the hierarchy design of our domain:,etc In this paper, we presented a hybrid recommender engine to personalize search in the E-learning domain This engine is driven by multi-ontology models: ontology V. EXPERIMENTAL ANALYSI content-based recommendations (domain ontology model), and ontology rule-based recommendations Several evaluation metrics have been introduced in the (cluster-based and interest-based ) We illustrated the literature, such as Recall, Precision, F-measure, methods, concepts, and architecture to integrate a Harmonic Mean, E-Measure, User-Oriented Measure recommendation engine into an E-learning search system (coverage, novelty ) expected search length, satisfaction, We demonstrated the design of the HyperManyMedia frustration, etc. The most widely used ones in evaluating ontology using the Protege framework. In this context, search engines have been Top-n Recall and Top-n this ontology is composed of a hierarchy of concepts and Precision. Top-n Recall is the number of relevant sub-concepts that represents colleges, courses, and retrieved documents among the top n retrieved documents lectures. Also, we described the implementation of Rule- divided by the total number of relevant documents, and based recommendations by using clustering techniques to Top-n Precision is the number of relevant retrieved extract descriptive features from clusters, those features documents within the top n divided by n. For example, have been added to the domain ontology under the related starting with the top 50 results and going down to the top concepts using the Protege framework. In addition, we 10 search results: n=50, 40, 30,10, e.g., at n= 50, the implemented a semantic mapping between the query and top-50 search results are used for recall computing the the learner's semantic profile to present the user's precision. Therefore used Top-nl-Recall and Top-n- interest. Finally, each type of these recommendations Precision to measure the effectiveness of re-ranking influenced the re-ranking of the retrieved documents with based on the learner's semantic profile(testing set). For different factors. Our experiments were carried out on the the evaluation, we used our own semantic search engine HyperManyMedia semantic search engine at Western to evaluate each query, and compute the Top-n-Precision Kentucky University. We used Top-n-Recall and Top-n- and Top-n-Recall for normal search and for personalized Precision to measure the effectiveness of re-ranking semantic search for each learner. The problem with based on the learner's semantic profile. Overall, the evaluating a real search engine is that you cannot search results showed the effectiveness of the re-ranking compare results obtained with different datasets. First, based on personalization using a different dataset will return results not related to he content of the repository and in this case, our own search engine evaluations results will definitely be better REfERENCES on our dataset. Second, from an architectural stands point, we cannot compare our search engine results with another [1] M. de Gemmis, G. Semeraro, P. Lops, and P.Basile.A search engine because the only search engine that Retrieval Model for Personalized Searching Relying aware of that has an architecture that supports the Content-based User Profiles ntegration of semantics is the one we used, Nutch [2 G. Grenier. Path to document recommendation services Figure 3 shows the Average Percentage of Technologies that enabled the development of on-line Improvement in Top-n Precision, whereas, Figure 4 information systems. In Presentation held at the ACS shows the average percentage ting, volume 230, 2005 of Improvement in Top-n [3] K. Hammouda and M. Kamel. Data mining in e-learning Recall for the personalized search over the normal search. with three of queries(1, 2, and 3 keywords E-Learning Networked Environments and Architectures: A We used keyword queries extracted from the logs that Knowledge Processing perspective, series: Adranced sers typed the most for searching content. For each C 2010 ACADEMY PUBLISHER

in addition, the user is provided with recommendations based on an extended ontology by using the First Order Logic. In this case, we defined the following entities: has_College, has_Course, has_Language, has_Lecture, has_Professor, sub_Class_Of. In addition, each entity has different characteristics (Functional, Description), for example, we illustrate the characteristics of Entity = has_College: (Description: College, Equivalent classes: Colegio, Superclasses: Thing, Members: Accounting, Architecture_and Manufacturing, Biology, etc., Disjoint: Sub classes). Two most important definitions used in our ontology design are the following: (1) Equivalent classes: Equivalent classes equal to ≡ relation, to mention some of these entities (College ≡ Colegio, Engineering ≡ Ingenieria, English ≡ Ingles,..., Social Work ≡ Trabajo Social, Chemistry ≡ Quimica, etc.) and (2) Sub_Class_Of: Related to the hierarchy design of our domain: , etc. V. EXPERIMENTAL ANALYSIS Several evaluation metrics have been introduced in the literature, such as Recall, Precision, F-measure, Harmonic Mean, E-Measure, User-Oriented Measure (coverage, novelty), expected search length, satisfaction, frustration, etc. The most widely used ones in evaluating search engines have been Top-n Recall and Top-n Precision. Top-n Recall is the number of relevant retrieved documents among the top n retrieved documents divided by the total number of relevant documents, and Top-n Precision is the number of relevant retrieved documents within the top n divided by n. For example, starting with the top 50 results and going down to the top 10 search results: n = 50, 40, 30,...,10, e.g., at n = 50, the top-50 search results are used for recall computing the precision. Therefore, we used Top-n-Recall and Top-nPrecision to measure the effectiveness of re-ranking based on the learner’s semantic profile (testing set). For the evaluation, we used our own semantic search engine10 to evaluate each query, and compute the Top-n-Precision and Top-n-Recall for normal search and for personalized semantic search for each learner. The problem with evaluating a real search engine is that you cannot compare results obtained with different datasets. First, using a different dataset will return results not related to the content of the repository and in this case, our own search engine evaluation’s results will definitely be better on our dataset. Second, from an architectural stands point, we cannot compare our search engine results with another search engine because the only search engine that we are aware of that has an architecture that supports the integration of semantics is the one we used, Nutch. Figure 3 shows the Average Percentage of Improvement in Top-n Precision, whereas, Figure 4 shows the Average Percentage of Improvement in Top-n Recall for the personalized search over the normal search, with three sizes of queries (1, 2, and 3 keywords). We used keyword queries extracted from the logs that users typed the most for searching content. For each length of query, we used the Top-100 most used queries. The personalized semantic search shows an improvement in precision that varies between 5-25 %. This improvement is noticeable between the top-30 and top-50 search results for single-keyword and two-keywords queries. The recall results show a noticeable improvement in recall between top-20 and top-40. Also, we can summarize the impact of the query size by noticing that Precision is better when the size of a query was 1 or 2; whereas, Recall starts with a better results for queries of size 2 till Top-20, then both queries of size 1, 2 converge almost to the same results. Overall, these results show the effectiveness of the re-ranking based on the learner’s semantic profile. VI CONCLUSION In this paper, we presented a hybrid recommender engine to personalize search in the E-learning domain. This engine is driven by multi-ontology models: ontology content-based recommendations (domain ontology model), and ontology rule-based recommendations (cluster-based and interest-based). We illustrated the methods, concepts, and architecture to integrate a recommendation engine into an E-learning search system. We demonstrated the design of the HyperManyMedia ontology using the Protégé framework. In this context, this ontology is composed of a hierarchy of concepts and sub-concepts that represents colleges, courses, and lectures. Also, we described the implementation of Rulebased recommendations by using clustering techniques to extract descriptive features from clusters, those features have been added to the domain ontology under the related concepts using the Protégé framework. In addition, we implemented a semantic mapping between the query and the learner’s semantic profile to present the user’s interest. Finally, each type of these recommendations influenced the re-ranking of the retrieved documents with different factors. Our experiments were carried out on the HyperManyMedia semantic search engine at Western Kentucky University. We used Top-n-Recall and Top-nPrecision to measure the effectiveness of re-ranking based on the learner’s semantic profile. Overall, the search results showed the effectiveness of the re-ranking based on personalization. REFERENCES [1] M. de Gemmis, G. Semeraro, P. Lops, and P. Basile. A Retrieval Model for Personalized Searching Relying on Content-based User Profiles. [2] G. Grenier. Path to document recommendation services: Technologies that enabled the development of on-line information systems. In Presentation held at the ACS National Meeting, volume 230, 2005. [3] K. Hammouda and M. Kamel. Data mining in e-learning. E-Learning Networked Environments and Architectures: A Knowledge Processing perspective", series: Advanced 280 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 © 2010 ACADEMY PUBLISHER

JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL 2, NO 4, NOVEMBER 2010 Information and Knowledge Processing. Springer Book Leyla Zhuhadar received the Ph. D de- Series. 2007. gree in Computer Engineering and [4 T Joachims Optimizing search engines using clickthrough data. Proceedings of the eighth ACM SIGKDD Computer Science from the University of Louisville in2009 international conference on Knowledge discovery and data tining, pages 133-142, 2002. Currently, she is a Research Scientist at 5 T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Western Kentucky University and an Radlinski, and G. Gay. Evaluating the accuracy of implicit Adjunct Assistant Professor at the feedback from clicks and query reformulations in web Department of Computer Engineering and Computer search. 2007 Science(CECS), at the University of Louisville.Her [6] T. Joachims and F. Radlinski. Search engines that learn research interests are in Knowledge Acquisition from the from implicit feedback. Computer, 40(8): 34-40, 2007. Web, Information Retrieval, Ontology Engineering, [7]M K.Khribi, M. Jemni, and O. Nasraoui. Automatic Semantic Web, Metadata for Accessible Learning Recommendations for E-Learning Personalization Based on Web Usage Mining Techniques and Informatic Objects. She designed and implemented two working Retrieval. Educational Technology and Society, 2009 research platforms in the e-learning domain, [8] Lihong Li, Wei Chut, John Langford, and Robert E. Hyper Many Media and the Semantic Repository. She is a 79m, arik contextual -bandit approach to personalized member of IEEE, IEEE Women in Engineering, IEEE recommendation. In WWW2010. Yahoo! Labs, Computer Society, IEEE Education Society, the ACM SIGKDD, SIGACCESS, SIGIR, Web Intelligence 10] B. Mobasher, R. Cooley, and J Srivastava. Automatic Consortium(WIC), and the aled personalizatIo web Communications of the ACM, 43(8): 142-151, 2000 [10] B. Mobasher, R. Cooley, and J. Srivastava. Automatic Olfa Nasraoui is the founding director of ersonalization lIning the Knowledge Discovery and Web Mining Commumications of the ACM, 43(8): 142-151, 2000 University of Louisville, [1B. Mobasher, H. Dai, T Luo, and M. Nakagawa. Effective where she is also an Associate professor of personalization based on association rule discovery from mputer Engineering web usage data. In Proceedings of the 3rd international d the Endowed Chair of e-Commerce. She workshop on Web information and data management, page received the ph. d. de 15.ACM,2001 ngineering and Computer Science from the University of [2]O. Nasraoui,R Krishnapuram, and A Joshi. Mining web Missouri, Columbia, in 1999. From 2000 to 2004, she was an access logs using a fuzzy relational clustering algorithm Assistant Professor at the University of Memphis. Her researcl based on a robust estimator e International World Wide Web Conference, Toronto, Canada, 1999 interests include data mining, Web mining, stream data mining and computational intelligence. She is a member of IEEE, IEEE [3]O. Nasraoui, M. Soliman, E. Saka, A. Badia, and R. Women in Engineering, and in the last 10 years, has been active Germain. A Web Usage Mining Framework for Mining in the SIGKDD community, notably by organizing the User Profiles in Dynamic Web Sites. IEEE WebkDD workshop on Web Mining and by serving as vice- TRANSACTIONS ON KNOWLEDGE AND DATA Chair on Data Mining conferences, including KDD 2009, ENGINEERING, pages 202-215, 2008 ICDM 2009, and WI 2009. She is a recipient of a US National [14] A W. Neumann. Recommender Systems for Information Science Foundation Faculty Early Career Development Providers: Designing Customer Centric Paths to (CAREER)Award, and a Best Paper Award in the Artificial Information. Springer Verlag, 2009 Neural Networks in Engineering Conference. She has published [5M. Rasmussen and G. Karypis. gcluto: An interac ustering, visualization, and analysis system. CSE/ UMN more than 100 publications, and acquired close to $2M in funding for research from NSF, NASA and other agencies Technical Report: TR#04, 21, 2008 [16]OR Zaiane. Building a recommender agent for e-learning systems. Computers in Education, 2002. Proceedings International Conference on, pages 55-59 vol 1, 3-6 Dec [17 Leyla Zhuhadar, Olfa Nasraoui, and Robert Wyatt Automated discovery, categorization and retrieval of personalized semantically enriched e-learning resources International Conference on Semantic Computing, 0: 414- 419.2009 [18 Leyla Zhuhadar, Olfa Nasraoui, and robert Wyatt. Dual representation of the semantic user profile for personalized web search in an evolving domain. In Proceedings of the AAA/ 2009 Spring Symposium on Social Semantic Web, Where Web 2.0 meets Web 3.0, pages 84-89, 2009 C 2010 ACADEMY PUBLISHER

Information and Knowledge Processing, Springer Book Series, 2007. [4] T. Joachims. Optimizing search engines using clickthrough data. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142, 2002. [5] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. 2007. [6] T. Joachims and F. Radlinski. Search engines that learn from implicit feedback. Computer, 40(8):34–40, 2007. [7] M.K. Khribi, M. Jemni, and O. Nasraoui. Automatic Recommendations for E-Learning Personalization Based on Web Usage Mining Techniques and Information Retrieval. Educational Technology and Society, 2009. [8] Lihong Li, Wei Chut, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW2010. Yahoo! Labs, 2010. [9] [10] B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8):142–151, 2000. [10] B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8):142–151, 2000. [11] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from web usage data. In Proceedings of the 3rd international workshop on Web information and data management, page 15. ACM, 2001. [12] O. Nasraoui, R. Krishnapuram, and A. Joshi. Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator. Eighth International World Wide Web Conference, Toronto, Canada, 1999. [13] O. Nasraoui, M. Soliman, E. Saka, A. Badia, and R. Germain. A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, pages 202–215, 2008. [14] A.W. Neumann. Recommender Systems for Information Providers: Designing Customer Centric Paths to Information. Springer Verlag, 2009. [15] M. Rasmussen and G. Karypis. gcluto: An interactive clustering, visualization, and analysis system. CSE/UMN Technical Report: TR# 04, 21, 2008. [16] O.R. Zaiane. Building a recommender agent for e-learning systems. Computers in Education, 2002. Proceedings. International Conference on, pages 55–59 vol.1, 3-6 Dec. 2002. [17] Leyla Zhuhadar, Olfa Nasraoui, and Robert Wyatt. Automated discovery, categorization and retrieval of personalized semantically enriched e-learning resources. International Conference on Semantic Computing, 0:414– 419, 2009. [18] Leyla Zhuhadar, Olfa Nasraoui, and Robert Wyatt. Dual representation of the semantic user profile for personalized web search in an evolving domain. In Proceedings of the AAAI 2009 Spring Symposium on Social Semantic Web, Where Web 2.0 meets Web 3.0, pages 84–89, 2009. Leyla Zhuhadar received the Ph.D. degree in Computer Engineering and Computer Science from the University of Louisville, Louisville, in 2009. Currently, she is a Research Scientist at Western Kentucky University and an Adjunct Assistant Professor at the Department of Computer Engineering and Computer Science (CECS), at the University of Louisville. Her research interests are in Knowledge Acquisition from the Web, Information Retrieval, Ontology Engineering, Semantic Web, Metadata for Accessible Learning Objects. She designed and implemented two working research platforms in the e-learning domain, HyperManyMedia and the Semantic Repository. She is a member of IEEE, IEEE Women in Engineering, IEEE Computer Society, IEEE Education Society, the ACM, SIGKDD, SIGACCESS, SIGIR, Web Intelligence Consortium (WIC), and the AIED. Olfa Nasraoui is the founding Director of the Knowledge Discovery and Web Mining Laboratory, at the University of Louisville, where she is also an Associate Professor of Computer Engineering and Computer Science and the Endowed Chair of e-Commerce. She received the Ph.D. degree in Computer Engineering and Computer Science from the University of Missouri, Columbia, in 1999. From 2000 to 2004, she was an Assistant Professor at the University of Memphis. Her research interests include data mining, Web mining, stream data mining, and computational intelligence. She is a member of IEEE, IEEE Women in Engineering, and in the last 10 years, has been active in the SIGKDD community, notably by organizing the WebKDD workshop on Web Mining and by serving as ViceChair on Data Mining conferences, including KDD 2009, ICDM 2009, and WI 2009. She is a recipient of a US National Science Foundation Faculty Early Career Development (CAREER) Award, and a Best Paper Award in the Artificial Neural Networks in Engineering Conference. She has published more than 100 publications, and acquired close to $2M in funding for research from NSF, NASA and other agencies. JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 4, NOVEMBER 2010 281 © 2010 ACADEMY PUBLISHER

点击进入文档下载页（PDF格式）

已到末页，全文结束

点击下载（PDF格式）

浏览记录