正在加载图片...
Personal web-based agents such as Letizia [15]. Syskill Webert Quick rofiles on an [21] and Personal Webwatcher [ 19] track the users browsing and research paper topics. This allows inferences from the ontology to formulate user profiles. Profiles are constructed from positive and assist profile generation; in our case topic inheritance is used to egative examples of interest, obtained from explicit feedback or infer interest in super-classes of specific topics. Sharing interest euristics analysing browsing behaviour. They then suggest which profiles with the AkT ontology is not difficult since they are ks are worth following from the current web page by explicitly represented using ontological terms commending page links most similar to the users profile. Just Previous trials [18] of Quickstep used hand-crafted initial profiles ke a content-based recommender system, a few examples of based on interview data, to cope with the cold-start problem interest must be observed or elicited from the user before a useful profile can be constructed Linking Quickstep with the Akt ontology automates this process, allowing a more realistic cold-start solution that will scale to Ontologies can be used to improve content-base larger numbers of users in OntoSeek [13]. Users of Onto Seek navigate order to formulate queries. Ontologies can 5.1 Paper classification algorithm automatically construct knowledge bases from web pages, such as Every research paper within Quicksteps central database is in Web-KB [8]. Web-kB takes manually labelled examples of represented using a term frequency vector. Terms are single words domain concepts and applies machine-learning techniques to within the document, so term frequency vectors are computed by lassify new web pages. Both systems do not, however, capture counting the number of times words appear within the paper, Each dynamic information such as user interest dimension within a vector represents a term. Dimensionality Also of relevance are systems such as CiteSeer [6], which use reduction on vectors is achieved by removing common words found op-list and stemming words using the Porter[22] content-based similarity matching to help search for interesting stemming algorithm. Quickstep uses vectors with 10-15,000 research papers within a digital librar dimensions 5. THE QUICKSTEP RECOMMENDER Once added to the database, papers are classified using an IBk [1] SYSTEM classifier boosted by the AdaBoostMl [11 algorithm. The IBk Quickstep [18] is a hybrid recommender system, addressing the classifier is a k-Nearest Neighbour type classifier that uses eal-world problem of recommending on-line research papers to researchers. User browsing behaviour is unobtrusively monitored Figure 2 shows the basic k-Nearest Neighbour algorithm. The via a proxy server, logging each URL browsed during normal closeness of an unclassified vector to its neighbours within the work activity. A nearest-neighbour algorithm classifies browsed ector space determines its classification URLs based on a training set of labelled example papers, storing each new paper in a central database. The database of known pers grows over time, building a shared pool of knowledge. w(d4)=y∑(-b) Explicit feedback and browsed URL's form the basis of the interest profile for each user. Figure 1 shows an overview of the Quickstep system w(da, db )knn distance between document a and b orld wie Users Profiles number of terms in document set Figure 2. k-Nearest Neighbour algorithm Classifie Recommender Classifiers like k-Nearest Neighbour allow more training examples to be added to their vector space without the need to re- build the entire classifier. They also degrade well, so even when incorrect the class returned is normally in the right Classified neighbourhood"and so at least partially relevant. This makes k- papers Nearest Neighbour a robust choice of algorithm for this task Boosting works by repeatedly running a weak learning algorith Figure 1. The Quickstep recommender system on various distributions of the training set, and then combining the classifiers produced by the weak learner into a single Cach day a set of recommendations is computed, based or composite classifier. The "weak "learning algorithm here is the IBk classifier. Figure 3 shows the Ada BoostMI algorithm correlations between user interest profiles and classified paper opics. Any feedback offered by users on these recommendations is recorded when the user looks at them. Users can provide new examples of topics and correct paper classifications where wrong. In this way the training set, and hence classification accuracy, Improves over timePersonal web-based agents such as Letizia [15], Syskill & Webert [21] and Personal Webwatcher [19] track the users browsing and formulate user profiles. Profiles are constructed from positive and negative examples of interest, obtained from explicit feedback or heuristics analysing browsing behaviour. They then suggest which links are worth following from the current web page by recommending page links most similar to the users profile. Just like a content-based recommender system, a few examples of interest must be observed or elicited from the user before a useful profile can be constructed. Ontologies can be used to improve content-based search, as seen in OntoSeek [13]. Users of OntoSeek navigate the ontology in order to formulate queries. Ontologies can also be used to automatically construct knowledge bases from web pages, such as in Web-KB [8]. Web-KB takes manually labelled examples of domain concepts and applies machine-learning techniques to classify new web pages. Both systems do not, however, capture dynamic information such as user interests. Also of relevance are systems such as CiteSeer [6], which use content-based similarity matching to help search for interesting research papers within a digital library. 5. THE QUICKSTEP RECOMMENDER SYSTEM Quickstep [18] is a hybrid recommender system, addressing the real-world problem of recommending on-line research papers to researchers. User browsing behaviour is unobtrusively monitored via a proxy server, logging each URL browsed during normal work activity. A nearest-neighbour algorithm classifies browsed URL’s based on a training set of labelled example papers, storing each new paper in a central database. The database of known papers grows over time, building a shared pool of knowledge. Explicit feedback and browsed URL’s form the basis of the interest profile for each user. Figure 1 shows an overview of the Quickstep system. World Wide Web Users Profiles Classifier Recommender Classified papers World Wide Web Users Profiles Classifier Recommender Classified papers Figure 1. The Quickstep recommender system Each day a set of recommendations is computed, based on correlations between user interest profiles and classified paper topics. Any feedback offered by users on these recommendations is recorded when the user looks at them. Users can provide new examples of topics and correct paper classifications where wrong. In this way the training set, and hence classification accuracy, improves over time. Quickstep bases its user interest profiles on an ontology of research paper topics. This allows inferences from the ontology to assist profile generation; in our case topic inheritance is used to infer interest in super-classes of specific topics. Sharing interest profiles with the AKT ontology is not difficult since they are explicitly represented using ontological terms. Previous trials [18] of Quickstep used hand-crafted initial profiles, based on interview data, to cope with the cold-start problem. Linking Quickstep with the AKT ontology automates this process, allowing a more realistic cold-start solution that will scale to larger numbers of users. 5.1 Paper classification algorithm Every research paper within Quickstep’s central database is represented using a term frequency vector. Terms are single words within the document, so term frequency vectors are computed by counting the number of times words appear within the paper. Each dimension within a vector represents a term. Dimensionality reduction on vectors is achieved by removing common words found in a stop-list and stemming words using the Porter [22] stemming algorithm. Quickstep uses vectors with 10-15,000 dimensions. Once added to the database, papers are classified using an IBk [1] classifier boosted by the AdaBoostM1 [11] algorithm. The IBk classifier is a k-Nearest Neighbour type classifier that uses example documents, called a training set, added to a vector space. Figure 2 shows the basic k-Nearest Neighbour algorithm. The closeness of an unclassified vector to its neighbours within the vector space determines its classification. w(da,db) = √ ____________ Σ j = 1..T (tja – tjb) 2 w(da,db) kNN distance between document a and b da,db document vectors T number of terms in document set tja weight of term j document a w(da,db) = √ ____________ Σ j = 1..T (tja – tjb) 2 w(da,db) kNN distance between document a and b da,db document vectors T number of terms in document set tja weight of term j document a Figure 2. k-Nearest Neighbour algorithm Classifiers like k-Nearest Neighbour allow more training examples to be added to their vector space without the need to re￾build the entire classifier. They also degrade well, so even when incorrect the class returned is normally in the right “neighbourhood” and so at least partially relevant. This makes k￾Nearest Neighbour a robust choice of algorithm for this task. Boosting works by repeatedly running a weak learning algorithm on various distributions of the training set, and then combining the classifiers produced by the weak learner into a single composite classifier. The “weak” learning algorithm here is the IBk classifier. Figure 3 shows the AdaBoostM1 algorithm
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有