The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm DIR OTHER ARTICLE 33,3 An opinion-based decision model for recommender systems 84 Sea woo kim Refereed article received Division of Information and Communication engineering Korea Advanced institute of Science and Technology, Seoul, South Korea Approved for publication 21 October 2008 Chin-Wan Chung Department of computer Science Korea Advanced institute of science and Technology, Seoul, South Korea, and Daeeun kim School of electrical and Electronic Engineering, Yonsei university Seoul. South Korea Purpose A good recommender system helps users find items of interest on the web and can provide commendations based on user preferences. In contrast to automatic technology.generate recommender systems, this paper aims to use dynamic expert groups that are automatically formed o recommend domain-specific documents for general users. In addition, it aims to test several effectiveness measures of rank order to determine if the top-ranked lists recommended by the experts were reliable Design/methodologylapproach-In the approach, expert groups evaluate web documents to provide a recommender system for general users. The authority and make-up of the expert group are adjusted through user feedback. The system also uses various measures to gauge the difference between the opinions of experts and those of general users to improve the evaluation effectiveness. Findings- The proposed system is efficient when there is major support from experts and general users. The recommender system is especially effective where there is a limited amount of evaluation data from general users. Originality/value- This study of how to effectively recommend web documents to users based on the opinions of experts. Simulation results were provided to show the effectiveness of the dynamic expe for recommender systems. Keywords Information retrieval, Skills, Worldwide web Paper type Research paper Introduction Emerald The development of recommender systems as a means of information retrieval has emerged as an important issue of the internet, and has drawn attention both from academics and the commercial sector Online Information Review This research was supported by the Ministry of Knowledge Economy, Korea under th DEmeald Group Publishing Limited Information Technology Research Center support programme supervised by the Institute of DoI 101108/1468152091090970 Information Technology Advancement(grant number IlTA-2008-C1090-0801-0031)
OTHER ARTICLE An opinion-based decision model for recommender systems Sea Woo Kim Division of Information and Communication Engineering, Korea Advanced Institute of Science and Technology, Seoul, South Korea Chin-Wan Chung Department of Computer Science, Korea Advanced Institute of Science and Technology, Seoul, South Korea, and DaeEun Kim School of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea Abstract Purpose – A good recommender system helps users find items of interest on the web and can provide recommendations based on user preferences. In contrast to automatic technology-generated recommender systems, this paper aims to use dynamic expert groups that are automatically formed to recommend domain-specific documents for general users. In addition, it aims to test several effectiveness measures of rank order to determine if the top-ranked lists recommended by the experts were reliable. Design/methodology/approach – In the approach, expert groups evaluate web documents to provide a recommender system for general users. The authority and make-up of the expert group are adjusted through user feedback. The system also uses various measures to gauge the difference between the opinions of experts and those of general users to improve the evaluation effectiveness. Findings – The proposed system is efficient when there is major support from experts and general users. The recommender system is especially effective where there is a limited amount of evaluation data from general users. Originality/value – This is an original study of how to effectively recommend web documents to users based on the opinions of human experts. Simulation results were provided to show the effectiveness of the dynamic expert group for recommender systems. Keywords Information retrieval, Skills, Worldwide web Paper type Research paper Introduction The development of recommender systems as a means of information retrieval has emerged as an important issue of the internet, and has drawn attention both from academics and the commercial sector. The current issue and full text archive of this journal is available at www.emeraldinsight.com/1468-4527.htm This research was supported by the Ministry of Knowledge Economy, Korea under the Information Technology Research Center support programme supervised by the Institute of Information Technology Advancement (grant number IITA-2008-C1090-0801-0031). OIR 33,3 584 Refereed article received 1 April 2008 Approved for publication 21 October 2008 Online Information Review Vol. 33 No. 3, 2009 pp. 584-602 q Emerald Group Publishing Limited 1468-4527 DOI 10.1108/14684520910969970
An example of the use of such systems is to recommend new products or items of An opinion- interest to online customers, using customer preferences. When customers have no based decision personal experience of an item or class of items, they are often interested in retrieving nformation about the items or products that many others have ordered or used. However, model many recommender systems have focused on retrieving information considering the preferences of just one or a few customers, and sometimes there may be no information about user preferences for a recommender system to draw on Recommender systems can 585 also be applied to retrieve relevant web documents. Web documents need users evaluations in order to be recommendable. A simple citation search for a relevant document may not reflect well the crucial point of a document. We need a different approach to recommender systems In our research we have explored a method in which human agents collect useful information and provide it to eneral users. a group of human users or experts can cooperate to determine whether a web document includes useful information and rank the web documents in order Such information can be made available to general users as recommendations. The feedback of the users can reorganise the group and refine the knowledge level that a group of human experts provides. This kind of adaptive organisation and feedback loop will give users access to expert knowledge in a particular domain, and will have a filtering effect on biased opinions from just a few people. Information retrieval systems often give numeric scores to documents and then rank them based on the scores in order to make recommendations to users there have been several approaches to the information retrieval system, the most common of which are the vector space model, the probabilistic model and the inference network model. In the vector space model, a document is represented by a vector of terms, that is, words and phrases (alton et al, 1975). The model calculates the similarity between a query and a document. The angle between the query vector and the document vector can be measured using the cosine property(the dot product of two vectors is involved with the cosine angle). More similar vectors will have a numeric cosine value close to 1. The probabilistic model estimates the probability of the relevance of a document to a query robertson, 1977) and documents can be ranked based on the relevance probability. In the inference network model, an inference process is applied to model the information retrieval turtle and croft, 1991), where a document instantiates a term with a certain strength, and it accumulates the credit from multiple terms to assign a numeric score to the document. Then the strength of instantiation is taken as the weight of the term in the document. The above scoring methods can assist in the automatic evaluation of documents But while this kind of numeric assignment can give a rough evaluation or coarse information retrieval, in many cases it cannot provide accurate information about given documents Many common web searches retrieve a very small number of relevant documents. Topic distillation is a special kind of topical relevance search where the user wishes to find a few key websites rather than every relevant webpage. Because these types of searches are so common, web search evaluations have come to focus on tasks where there are a very few relevant documents. Evaluations with just a few relevant documents pose special challenges for current metrics (Soboroff, 2006). The development of intelligent information retrieval techniques has large impact potential in many domains(fern et al, 2007)
An example of the use of such systems is to recommend new products or items of interest to online customers, using customer preferences. When customers have no personal experience of an item or class of items, they are often interested in retrieving information about the items or products that many others have ordered or used. However, many recommender systems have focused on retrieving information considering the preferences of just one or a few customers, and sometimes there may be no information about user preferences for a recommender system to draw on. Recommender systems can also be applied to retrieve relevant web documents. Web documents need many users’ evaluations in order to be recommendable. A simple citation search for a relevant document may not reflect well the crucial point of a document. We need a different approach to recommender systems. In our research we have explored a method in which human agents collect useful information and provide it to general users. A group of human users or experts can cooperate to determine whether a web document includes useful information and rank the web documents in order. Such information can be made available to general users as recommendations. The feedback of the users can reorganise the group and refine the knowledge level that a group of human experts provides. This kind of adaptive organisation and feedback loop will give users access to expert knowledge in a particular domain, and will have a filtering effect on biased opinions from just a few people. Information retrieval systems often give numeric scores to documents and then rank them based on the scores in order to make recommendations to users. There have been several approaches to the information retrieval system, the most common of which are the vector space model, the probabilistic model and the inference network model. In the vector space model, a document is represented by a vector of terms, that is, words and phrases (Salton et al., 1975). The model calculates the similarity between a query and a document. The angle between the query vector and the document vector can be measured using the cosine property (the dot product of two vectors is involved with the cosine angle). More similar vectors will have a numeric cosine value close to 1. The probabilistic model estimates the probability of the relevance of a document to a query (Robertson, 1977) and documents can be ranked based on the relevance probability. In the inference network model, an inference process is applied to model the information retrieval (Turtle and Croft, 1991), where a document instantiates a term with a certain strength, and it accumulates the credit from multiple terms to assign a numeric score to the document. Then the strength of instantiation is taken as the weight of the term in the document. The above scoring methods can assist in the automatic evaluation of documents. But while this kind of numeric assignment can give a rough evaluation or coarse information retrieval, in many cases it cannot provide accurate information about given documents. Many common web searches retrieve a very small number of relevant documents. Topic distillation is a special kind of topical relevance search where the user wishes to find a few key websites rather than every relevant webpage. Because these types of searches are so common, web search evaluations have come to focus on tasks where there are a very few relevant documents. Evaluations with just a few relevant documents pose special challenges for current metrics (Soboroff, 2006). The development of intelligent information retrieval techniques has large impact potential in many domains (Fern et al., 2007). An opinionbased decision model 585
DIR ecommender systems, as one information retrieval technique, can be broadly 33,3 categorised into content-based and collaborative filtering systems(hill et al, 1995 Resnick et al, 1994; Shardanand and Maes, 1995; Soboroff et al, 1999 ) Content-based filtering methods use textual descriptions of documents or items to be recommended. A users profile is associated with the content of the documents that the user has already rated. The features of documents are extracted by information retrieval, pattern 86 recognition or machine learning techniques. Then the content-based system recommends documents that match the users profile delgado et al, 1998; Soborof et al, 1999). In contrast, collaborative filtering systems are based on user ratings rather than the features of the documents(Breese et al, 1998; Soboroff et al, 1999: Shardanand and Maes, 1995). These systems predict the ratings of a user for given documents or items, depending on the ratings of other users with similar preferences to the user Collaborative filtering systems, such as groupLens (Resnick et al, 1994; Konstan et al. 1997), can be part of recommender systems for online shopping sites. They recommend items to users, using the history of products that similar users have ordered or have viewed Most recommender systems use analysis of the users preferences. Such systems require the user to judge many items in order to obtain the users preferences. In general, many online customers or users are interested in other users' opinions or ratings about items that belong to a certain category For instance, many e-commerce customers like to see the top-ranked lists of rating scores of many users for retail items in order to help them make a purchase decision. However, recommender systems still have difficulty providing relevant rating information before they receive a large number of user evaluations or feedbacks In this paper, we provide a new method for evaluating web documents using a representative board of human agents(an "expert group"). This is different from automatic recommender systems with software agents or feature extractions. We suggest that dynamic expert groups should be created from among users to evaluate domain-specific documents for webpage ranking, and that the group members should have dynamic authority weights depending on the performance of their ranking evaluations. This method will be quite effective in recommending web documents or items that many users have not already evaluated -in such cases it is difficult for automatic recommender services to provide effective recommendations. Because in our approach users with expertise in a domain category evaluate the documents, it is not feasible to replace human agents with intelligent software agents Our recommender system with dynamic expert groups may be extended to challenge search engine designs and image retrieval problems. Many search engines find relevant information and its importance by applying automatic citation analysis to the general subject of a query. The hypertext connectivity of web documents has been a good measure for automatic web citation analysis. This method works on the assumption that a webpage that is cited many times is popular and important. Many automatic page-ranking systems have used this citation metric to decide the relative importance of web documents. The IBM HITS system maintains a hub and an authority score for every document(Kleinberg, 1998). A method called PageRank computes a ranking for every web document based on a web connectivity graph ( brin and Page, 1998) with the random walk traversal. It also considers the relative
Recommender systems, as one information retrieval technique, can be broadly categorised into content-based and collaborative filtering systems (Hill et al., 1995; Resnick et al., 1994; Shardanand and Maes, 1995; Soboroff et al., 1999). Content-based filtering methods use textual descriptions of documents or items to be recommended. A user’s profile is associated with the content of the documents that the user has already rated. The features of documents are extracted by information retrieval, pattern recognition or machine learning techniques. Then the content-based system recommends documents that match the user’s profile (Delgado et al., 1998; Soboroff et al., 1999). In contrast, collaborative filtering systems are based on user ratings rather than the features of the documents (Breese et al., 1998; Soboroff et al., 1999: Shardanand and Maes, 1995). These systems predict the ratings of a user for given documents or items, depending on the ratings of other users with similar preferences to the user. Collaborative filtering systems, such as GroupLens (Resnick et al., 1994; Konstan et al., 1997), can be part of recommender systems for online shopping sites. They recommend items to users, using the history of products that similar users have ordered or have viewed. Most recommender systems use analysis of the user’s preferences. Such systems require the user to judge many items in order to obtain the user’s preferences. In general, many online customers or users are interested in other users’ opinions or ratings about items that belong to a certain category. For instance, many e-commerce customers like to see the top-ranked lists of rating scores of many users for retail items in order to help them make a purchase decision. However, recommender systems still have difficulty providing relevant rating information before they receive a large number of user evaluations or feedbacks. In this paper, we provide a new method for evaluating web documents using a representative board of human agents (an “expert group”). This is different from automatic recommender systems with software agents or feature extractions. We suggest that dynamic expert groups should be created from among users to evaluate domain-specific documents for webpage ranking, and that the group members should have dynamic authority weights depending on the performance of their ranking evaluations. This method will be quite effective in recommending web documents or items that many users have not already evaluated – in such cases it is difficult for automatic recommender services to provide effective recommendations. Because in our approach users with expertise in a domain category evaluate the documents, it is not feasible to replace human agents with intelligent software agents. Our recommender system with dynamic expert groups may be extended to challenge search engine designs and image retrieval problems. Many search engines find relevant information and its importance by applying automatic citation analysis to the general subject of a query. The hypertext connectivity of web documents has been a good measure for automatic web citation analysis. This method works on the assumption that a webpage that is cited many times is popular and important. Many automatic page-ranking systems have used this citation metric to decide the relative importance of web documents. The IBM HITS system maintains a hub and an authority score for every document (Kleinberg, 1998). A method called PageRank computes a ranking for every web document based on a web connectivity graph (Brin and Page, 1998) with the random walk traversal. It also considers the relative OIR 33,3 586
importance by checking the rank of documents a document is ranked as highly An opinion- important when the document has backlinks from documents with high authority, based decision such as the yahoo homepage. However, automatic citation analysis is limited in that it does not reflect well th model importance of a document from a human perspective. There are many cases where simple citation counting does not reflect our commonsense concept of importance ( Brin and Page, 1998). This research addresses this problem by exploring a method of 587 ranking based on human interactions, where a pool of expert human agents are used to evaluate web documents and their authority is dynamically determined through user feedback on their performance Rocchio(1971) proposed relevance feedback for query modification, where users judge the relevance of a document for a query and leave feedback. The system then updates the query based on the feedback. This has been shown to be quite effective in query modification. Following this idea, we apply the relevance feedback of users to the ranked documents provided by the expert group. The feedback information will modify the authority weight of the expert group members. As a result, the decisions of the expert group will reflect the feedback of users as time passes. In this paper, we suggest a novel recommender system based on human interactions. All the key decisions follow human opinions from a specialised or expert"group, so more reasonable recommendations can be made available in situations that are vague because few users have evaluated an item. Automatic selection or ejection of expert members based on their performance can be used to maintain the expertise of the group. The relevant documents provided by the expert group are sorted in rank order. To check the effectiveness of the system, we have developed several effectiveness measures based on rank order. In this paper we validate our approach with simulations of user feedback and expert group reorganisation, and evaluate the results using the new effectiveness measures. Our preliminary work was published in conference proceedings(Kim and Kim, 2001; Kim and Chung, 2001) Proposed method Dynamic authority weights of experts We define a group of people with high authority and much expertise in a special field as an expert group Figure 1 shows a framework for a search engine with our recommender system. a meta-search engine is used to collect good web documents from the conventional search engines(e.g. Yahoo, Alta Vista, Excite and InfoSeek). The addresses of the documents cited in the search engines are stored in the document database. Also recorded for each web document are details of how many search engines in the meta-search engine referred to the document, and how many times online users had accessed the web document using the search engine For every category there is a list of top-ranked documents rated by an expert group which are sorted by score. Authoritative webpages are determined by human expert group members. The experts examine the content of candidate webpages that are highly referenced among web documents or have been accessed by many users. The method of employing an expert group is based on the idea that for a given decision task requiring expert knowledge, many experts may be better than one if their individual
importance by checking the rank of documents – a document is ranked as highly important when the document has backlinks from documents with high authority, such as the Yahoo homepage. However, automatic citation analysis is limited in that it does not reflect well the importance of a document from a human perspective. There are many cases where simple citation counting does not reflect our commonsense concept of importance (Brin and Page, 1998). This research addresses this problem by exploring a method of ranking based on human interactions, where a pool of expert human agents are used to evaluate web documents and their authority is dynamically determined through user feedback on their performance. Rocchio (1971) proposed relevance feedback for query modification, where users judge the relevance of a document for a query and leave feedback. The system then updates the query based on the feedback. This has been shown to be quite effective in query modification. Following this idea, we apply the relevance feedback of users to the ranked documents provided by the expert group. The feedback information will modify the authority weight of the expert group members. As a result, the decisions of the expert group will reflect the feedback of users as time passes. In this paper, we suggest a novel recommender system based on human interactions. All the key decisions follow human opinions from a specialised or “expert” group, so more reasonable recommendations can be made available in situations that are vague because few users have evaluated an item. Automatic selection or ejection of expert members based on their performance can be used to maintain the expertise of the group. The relevant documents provided by the expert group are sorted in rank order. To check the effectiveness of the system, we have developed several effectiveness measures based on rank order. In this paper we validate our approach with simulations of user feedback and expert group reorganisation, and evaluate the results using the new effectiveness measures. Our preliminary work was published in conference proceedings (Kim and Kim, 2001; Kim and Chung, 2001). Proposed method Dynamic authority weights of experts We define a group of people with high authority and much expertise in a special field as an “expert group”. Figure 1 shows a framework for a search engine with our recommender system. A meta-search engine is used to collect good web documents from the conventional search engines (e.g. Yahoo, AltaVista, Excite and InfoSeek). The addresses of the documents cited in the search engines are stored in the document database. Also recorded for each web document are details of how many search engines in the meta-search engine referred to the document, and how many times online users had accessed the web document using the search engine. For every category there is a list of top-ranked documents rated by an expert group, which are sorted by score. Authoritative webpages are determined by human expert group members. The experts examine the content of candidate webpages that are highly referenced among web documents or have been accessed by many users. The method of employing an expert group is based on the idea that for a given decision task requiring expert knowledge, many experts may be better than one if their individual An opinionbased decision model 587
DIR 33,3 web 88 Web Crawler Monitor search Indexer meta-search engine DBQuery Category Ranking Figure 1 Search eng chitecture expert group anking engine judgments are properly combined. In our system, experts decide whether a web document should be classified as a recommended document for a given category a simple way to combine the experts individual judgements is majority voting iere and Tadepalli, 1997; Li and Jain, 1998), where each expert has a binary vote for each web document and the documents obtaining equal to or greater than half of the votes are classified into a top-ranked list. An alternative method is a weighted linear combination, where a weighted linear um of expert voting yields the collaborative net-effect ratings of documents. In this paper, we take the adaptive weighted linear combination method, where the individual contributions of members of the expert groups are weighted by their evaluation performance. All the experts evaluations are summed with weighted linear combinations. The expert rating results will dynamically change depending on each experts performance. Our approach to expert group decision-making is similar to the classifier committee concept of Li and Jain(1998)and Sebastiani (1999), except that their methods use classifiers based on various statistical or learning techniques instead of human interactions and decisions. This weighted measure is useful even when the number of experts is not fixed. How to choose experts and decide authority weights is an issue. Initially, experts ill be selected from among the users who have most frequently rated products or documents. a positive authority weight will be assigned to each expert member. The voting results of experts will determine a score over a given document. The score ranking will reflect the importance of the document. As time goes on, the authority weight will be changed depen users'feedback. An expert will receive a higher authority weight if his or her agrees with those of general users, and
judgments are properly combined. In our system, experts decide whether a web document should be classified as a recommended document for a given category. A simple way to combine the experts’ individual judgements is majority voting (Liere and Tadepalli, 1997; Li and Jain, 1998), where each expert has a binary vote for each web document and the documents obtaining equal to or greater than half of the votes are classified into a top-ranked list. An alternative method is a weighted linear combination, where a weighted linear sum of expert voting yields the collaborative net-effect ratings of documents. In this paper, we take the adaptive weighted linear combination method, where the individual contributions of members of the expert groups are weighted by their evaluation performance. All the experts’ evaluations are summed with weighted linear combinations. The expert rating results will dynamically change depending on each expert’s performance. Our approach to expert group decision-making is similar to the classifier committee concept of Li and Jain (1998) and Sebastiani (1999), except that their methods use classifiers based on various statistical or learning techniques instead of human interactions and decisions. This weighted measure is useful even when the number of experts is not fixed. How to choose experts and decide authority weights is an issue. Initially, experts will be selected from among the users who have most frequently rated products or documents. A positive authority weight will be assigned to each expert member. The voting results of experts will determine a score over a given document. The score ranking will reflect the importance or popularity of the document. As time goes on, the authority weight will be changed depending on users’ feedback. An expert will receive a higher authority weight if his or her opinion agrees with those of general users, and Figure 1. Search engine architecture OIR 33,3 588
otherwise, a lower authority weight. If the authority weight becomes negativ An opinion- corresponding expert will be dropped from the representative board and a new based decision expert opinions. If there is more than one user who has provided the most frequent model feedback, one user will be randomly chosen from among them. In this way, the constitution of the expert group is dynamically changed We define a rating score matrix X=[X;l, when the i-th expert rates a web 589 document d with a score X=[Xi]. For each web document d, the voting score of an expert committee is given as follows: =2 where Ne is the number of experts for a given category and ri is the relative authority for the i-th expert member of the expert pool, and wi is the authority weight for the i-th expert member. We suppose w should always be positive. The weight w; is a dynamic factor, and it represents each experts authority to evaluate documents. a higher authority weight indicates that the expert has more influence in a voting decision. voting scores and actual voting scores, as follows. on of differences between desired E [V(dj)-v'(dD V(di) where n is the number of documents evaluated by users, v(d) is the users'voting score for an expert-voted document d;. We assume V(di is the average over all user scores, but in reality it is rarely possible to receive feedback from all users. The authority weight for each expert is changed every session, which is a given period of time, and at the same time v(di) can be approximated by the central limit theorem with a set of v(dg, which is the average user rating during the given session. We use a gradient-descent method over the error measure E with respect to a weight w and the gradient is given by ae a LV(di)-v(dil where S=Ek,Wr is the sum of weights, and A;=V(dj)-v(di) is the difference between the predicted voting score and the users' rating score during a session for a document d 24(+1)=2(0)-n-v(4)÷+a(a()-2(-1) We apply the similar scheme shown in error back-propagation of multiplayer perceptrons(haykin, 1999)to our approach. If we update the weights of experts with the feedback of users about a web document d, the weight is changed each session by the following dynamic equation:
otherwise, a lower authority weight. If the authority weight becomes negative, the corresponding expert will be dropped from the representative board and a new member will be chosen from among users who have highest participation in evaluating expert opinions. If there is more than one user who has provided the most frequent feedback, one user will be randomly chosen from among them. In this way, the constitution of the expert group is dynamically changed. We define a rating score matrix X ¼ ½Xij, when the i-th expert rates a web document dj with a score X ¼ ½Xij. For each web document dj, the voting score of an expert committee is given as follows: VðdjÞ ¼ X Ne i¼1 rixij ¼ X Ne i¼1 wi PNe k¼1wk xij where Ne is the number of experts for a given category and ri is the relative authority for the i-th expert member of the expert pool, and wi is the authority weight for the i-th expert member. We suppose wi should always be positive. The weight wi is a dynamic factor, and it represents each expert’s authority to evaluate documents. A higher authority weight indicates that the expert has more influence in a voting decision. We define the error measure E as a squared sum of differences between desired voting scores and actual voting scores, as follows: E ¼ 1 2 Xn j¼1 VðdjÞ 2 V0 ðdjÞ 2 ¼ 1 2 Xn j¼1 Xn i¼1 wi PNe k¼1wk xij 2 V0 ðdjÞ ( )2 where n is the number of documents evaluated by users, V0 (dj) is the users’ voting score for an expert-voted document dj. We assume V0 (dj) is the average over all user scores, but in reality it is rarely possible to receive feedback from all users. The authority weight for each expert is changed every session, which is a given period of time, and at the same time V0 (dj) can be approximated by the central limit theorem with a set of V˜ 0 (dj), which is the average user rating during the given session. We use a gradient-descent method over the error measure E with respect to a weight wi and the gradient is given by: ›E ›wi ¼ › ›wi 1 2 Xn j¼1 VðdjÞ 2 V~0 ðdjÞ 2 ! ¼ Xn j¼1 xij 2 VðdjÞ Dj S where S ¼ PNe k¼1wk is the sum of weights, and Dj ¼ VðdjÞ 2 V~0 ðdjÞ is the difference between the predicted voting score and the users’ rating score during a session for a document dj: wiðt þ 1Þ ¼ wiðtÞ 2 h xij 2 VðdjÞ Dj S þ að Þ wiðtÞ 2 wiðt 2 1Þ We apply the similar scheme shown in error back-propagation of multiplayer perceptrons (Haykin, 1999) to our approach. If we update the weights of experts with the feedback of users about a web document dj, the weight is changed each session by the following dynamic equation: An opinionbased decision model 589
DIR 33,3 (+1)=(1)-n1x-Vd)+a(2(D-a0(t-1) where m is a learning rate proportional to the number of user ratings per session and a h The above equation says how to reward or penalise the authority weights of experts the momentum constant 590 for their share of responsibility for any error. According to the equation, the weight change involves the correlation between a voting score difference among experts and he error difference. For example, when both an expert-voted score and the desirable-rank score are larger than the weighted average voting score or both of them are smaller than the average score, the expert is rewarded; if otherwise, the expert penalised. In this case some experts have rewards and others receive penalties depending on the weighted average voting score of the expert group Evaluation of effectiveness When dynamic authority weights are assigned to experts for a category the expert group ratings can form a ranking list in order. We need to determine if the given ranking list is reliable. Reliable ranking means that good experts have been selected for an expert group and they recommend relevant documents or items to general users. We evaluate the prediction performance of expert groups in terms of effectiveness-that is, a measure of the agreement between expert groups and users-in ranking a test set of web documents. We assume there are many users to evaluate the top-ranked lists in contrast to a small number of experts in a category group We suggest several effectiveness measures that are related to the agreement in rank order between expert ratings and user ratings. They are rank order window measure rank function measure, and Fe measure with rank order partition. We compared these with Spearmans correlation measure, which is a common measure in the information retrieval field Rank order window measure. Given a sample query or category, we can represent the effectiveness as the percentage of top-ranked lists that user ratings rank in the same or very close position as an expert group does. Given top-ranked web documents D=dl, do,..., dn,i, we can define effectiveness As with rank order window S(dy as A s S(dk=l min S(dk) μ(d)-d) 28(dk)+1 where dk is the k-th web document from the test set for a given category, and 8(dy is the width of the window centred in the rank uld) assigned by the ratings of experts for dk- Q() is the rank position of the average rating score of users for a document dk. s(dy) calculates the rate of the rank order difference in the window u(d2)-d4),p(d2)+d) For this measure, we directly compare the rank of documents that the expert group provides with the rank given by users. For each of the top-ranked documents the experts recommend, we calculate how much the rank position is changed by user feedback. However, we check the position change within a window size
wiðt þ 1Þ ¼ wiðtÞ 2 h xij 2 VðdjÞ Dj S þ að Þ wiðtÞ 2 wiðt 2 1Þ where h is a learning rate proportional to the number of user ratings per session and a is the momentum constant. The above equation says how to reward or penalise the authority weights of experts for their share of responsibility for any error. According to the equation, the weight change involves the correlation between a voting score difference among experts and the error difference. For example, when both an expert-voted score and the desirable-rank score are larger than the weighted average voting score, or both of them are smaller than the average score, the expert is rewarded; if otherwise, the expert is penalised. In this case some experts have rewards and others receive penalties depending on the weighted average voting score of the expert group. Evaluation of effectiveness When dynamic authority weights are assigned to experts for a category, the expert group ratings can form a ranking list in order. We need to determine if the given ranking list is reliable. Reliable ranking means that good experts have been selected for an expert group and they recommend relevant documents or items to general users. We evaluate the prediction performance of expert groups in terms of effectiveness – that is, a measure of the agreement between expert groups and users – in ranking a test set of web documents. We assume there are many users to evaluate the top-ranked lists in contrast to a small number of experts in a category group. We suggest several effectiveness measures that are related to the agreement in rank order between expert ratings and user ratings. They are rank order window measure, rank function measure, and Fb measure with rank order partition. We compared these with Spearman’s correlation measure, which is a common measure in the information retrieval field. Rank order window measure. Given a sample query or category, we can represent the effectiveness as the percentage of top-ranked lists that user ratings rank in the same or very close position as an expert group does. Given top-ranked web documents D ¼ {d1, d2, ... , dn}, we can define effectiveness ld with rank order window d(dk) as: Ld ¼ Pn k¼1 SðdkÞ n SðdkÞ ¼ 1 2 1 dðdkÞ min dðdkÞ; mðdk X ÞþdðdkÞ i¼mðdkÞ2dðdkÞ mðdkÞ 2 QðdiÞ 2dðdkÞ þ 1 ! where dk is the k-th web document from the test set for a given category, and d(dk) is the width of the window centred in the rank m(dk) assigned by the ratings of experts for dk. Q(dk) is the rank position of the average rating score of users for a document dk. S(dk) calculates the rate of the rank order difference in the window mðdkÞ 2 dðdkÞ; mðdkÞ þ dðdkÞ . For this measure, we directly compare the rank of documents that the expert group provides with the rank given by users. For each of the top-ranked documents the experts recommend, we calculate how much the rank position is changed by user feedback. However, we check the position change within a window size. OIR 33,3 590
Rank function measure. Given web resources D=dl, d, .. dni, and a set of all An opinion- by their weighted rating values according to experts evaluations. We define a measure p to evaluate a ranking function over given ranked web documents D as follows model (d,dk)=Card((d; E DI( =i<k)Ad(di)<d(dRN) 591 a rank function over web resources D, which gives a sorting order given set, and where Card is a cardinality function to count the number of elements in a given set, and We define a user satisfaction function y over expert- voted ranked sites D as B=1(d,d) (n-1)n-2)/2 where o is the rank function obtained from the result of all user ratings for n documents,and0≤y≤1. Similar to the rank order window measure, we compare the user feedback rank and the expert group rank for a document. We calculate the distance of rank diffe each document and sum all the distances for ranked documents that the expert group recommend o Fameasure with rank order partition. The evaluation of search effectiveness in a document is one of the essential components in information retrieval In the objective evaluation of retrieval techniques, two properties- precision and recall have been accepted as general-purpose evaluation criteria. Precision is the conditional probability that when a document is predicted to be in a positive class, it truly belongs in this class. Recall is the conditional probability that a document belonging to a positive class is truly classified in this class(Raghavan et al, 1989, Sebastiani, 1999). a good information retrieval system will have high precision and recall. Researchers have applied a variation of precision and recall, since the two properties have a trade-off depending on their application. The two properties can be assembled with the FB-measure, a combination of precision and recall(Raghavan et al, 1989; Sebastiani, 1999) ye suggest a variation of precision and recall for the rank order system. We first partition recommended documents by their rank order and make classes. we define a positive class i as the top [10(i-1)+1-10]ranked documents by expert voting and a negative class as the others. For example, class 2 documents are the top 111- 20]ranked documents The precision probability Pi and recall probability Ri for ranking site class i may be estimated using the contingency relations between expert ratings and user ratings, and those probabilities in our application can be calculated with transition instances between classes. A transition instance pi is defined as the number of instances that are predicted to be in class i by expert ratings, but that belong to class j by user ratings. Here we give the distance penalty among the classes, since we consider rank order relations. If the actual rating class j is closer to the predicted class i, then we give higher precision probability:
Rank function measure. Given web resources D ¼ {d1, d2, ... , dn}, and a set of all rank functions F over the set D, we suppose that d1, d2, ... , dn is decreasingly ordered by their weighted rating values according to experts’ evaluations. We define a measure r to evaluate a ranking function over given ranked web documents D as follows: rðf; dkÞ ¼ Card di [ Djð1 # i , kÞ ^ fðdi , fðdkÞ where Card is a cardinality function to count the number of elements in a given set, and f is a rank function over web resources D, which gives a sorting order. We define a user satisfaction function C over expert-voted ranked sites D as follows: cðfÞ ¼ Pn k¼1rðf; dkÞ ðn 2 1Þðn 2 2Þ = 2 where f is the rank function obtained from the result of all user ratings for n documents, and 0 # C # 1. Similar to the rank order window measure, we compare the user feedback rank and the expert group rank for a document. We calculate the distance of rank difference for each document and sum all the distances for ranked documents that the expert group recommend. Fb measure with rank order partition. The evaluation of search effectiveness in a document is one of the essential components in information retrieval. In the objective evaluation of retrieval techniques, two properties – precision and recall – have been accepted as general-purpose evaluation criteria. Precision is the conditional probability that when a document is predicted to be in a positive class, it truly belongs in this class. Recall is the conditional probability that a document belonging to a positive class is truly classified in this class (Raghavan et al., 1989; Sebastiani, 1999). A good information retrieval system will have high precision and recall. Researchers have sometimes applied a variation of precision and recall, since the two properties have a trade-off depending on their application. The two properties can be assembled with the Fb-measure, a combination of precision and recall (Raghavan et al., 1989; Sebastiani, 1999). We suggest a variation of precision and recall for the rank order system. We first partition recommended documents by their rank order and make classes. We define a positive class i as the top [10(i 2 1) þ 1 , 10i ] ranked documents by expert voting and a negative class as the others. For example, class 2 documents are the top [11 , 20] ranked documents. The precision probability Pi and recall probability Ri for ranking site class i may be estimated using the contingency relations between expert ratings and user ratings, and those probabilities in our application can be calculated with transition instances between classes. A transition instance pij is defined as the number of instances that are predicted to be in class i by expert ratings, but that belong to class j by user ratings. Here we give the distance penalty among the classes, since we consider rank order relations. If the actual rating class j is closer to the predicted class i, then we give higher precision probability: An opinionbased decision model 591
DIR R 33,3 R where m is the number of classes, and P, R are the average precision and recall 592 probabilities, respectively. The distance between classes is considered to calculate P R Then the effectiveness measure can be computed using the value of 0= B=oo (van Rijsbergen, 1979; Cohen and Singer, 1999; Yang, 2000) FB=0+1).PR B2.P+R To balance precision and recall, a value B= l is used in our experiments. If FB is close to zero, then the current documents ranked in a class through expert voting results can be seen to have many false responses from the feedback of general users or many new documents positioned in the top ranks. If FB is close to one, then top-ranked sites have good feedback from general users and little change occurs in the top-ranked lists Spearman's correlation measure Spearman s rank order correlation measure, which is a popular measure for information retrieval systems, checks whether rank-ordered data is correlated. Let xi be the rank of a document d in d=du, do, .. dn) by expert ratings and y; the rank of d, by user ratings. The non-parametric correlation is defined to be the linear correlation coefficient of the ranks ∑(x-)(y一列 rs where x, y are the average of xi, yi respectively. a value of rs= l indicates a completely positive correlation, which is desirable in our application, and rs =0 indicates no correlation of the data User confidence level. We maintain a rating record for each user. Many users evaluate web documents, but it is difficult to extract user preferences due to the lack of rating information Before reflecting each user evaluation, we need to check if each user has rated a sufficient number of documents and the user evaluations are reliable. Thus if we assume a rating score level from l to m, the confidence level C for a user u is defined as follows: C(u)=->p(i)logp(i) where p( is the probability that the user u rates documents with score i, and it can be calculated by counting the number of documents with score i among all the documents that the user u has rated The confidence level of a user is an entropy measurement to check the distribution of score ratings. If it is more equally distributed, it is more likely that the user has given a sufficient number of ratings and also that the user has unbiased criteria for evaluations. For example, if a user consistently puts only low scores or high scores on
Pi ¼ P pii m j¼1 pij · j j i2jþ1 ; Ri ¼ P pii m j¼1 pji · j j i2jþ1 P ¼ Pm i¼1 Pi m ; R ¼ Pm i¼1 Ri m where m is the number of classes, and P, R are the average precision and recall probabilities, respectively. The distance between classes is considered to calculate Pi, Ri . Then the effectiveness measure can be computed using the value of 0 # b # 1 (van Rijsbergen, 1979; Cohen and Singer, 1999; Yang, 2000): Fb ¼ ðb2 þ 1Þ · P · R b2 · P þ R To balance precision and recall, a value b ¼ 1 is used in our experiments. If Fb is close to zero, then the current documents ranked in a class through expert voting results can be seen to have many false responses from the feedback of general users or many new documents positioned in the top ranks. If Fb is close to one, then top-ranked sites have good feedback from general users and little change occurs in the top-ranked lists. Spearman’s correlation measure. Spearman’s rank order correlation measure, which is a popular measure for information retrieval systems, checks whether rank-ordered data is correlated. Let xi be the rank of a document di in D ¼ {d1, d2, ... , dn} by expert ratings and yi the rank of di by user ratings. The non-parametric correlation is defined to be the linear correlation coefficient of the ranks: rs ¼ P i ðxi 2 xÞð yi 2 yÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i ðxi 2 xÞ 2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i ð yi 2 yÞ 2 q where x, y are the average of xi, yi, respectively. A value of rs ¼ 1 indicates a completely positive correlation, which is desirable in our application, and rs ¼ 0 indicates no correlation of the data. User confidence level. We maintain a rating record for each user. Many users evaluate web documents, but it is difficult to extract user preferences due to the lack of rating information. Before reflecting each user evaluation, we need to check if each user has rated a sufficient number of documents and the user evaluations are reliable. Thus, if we assume a rating score level from 1 to m, the confidence level C for a user u is defined as follows: CðuÞ ¼ 2 Xm i¼1 pði Þ log pði Þ where p(i) is the probability that the user u rates documents with score i, and it can be calculated by counting the number of documents with score i among all the documents that the user u has rated. The confidence level of a user is an entropy measurement to check the distribution of score ratings. If it is more equally distributed, it is more likely that the user has given a sufficient number of ratings and also that the user has unbiased criteria for evaluations. For example, if a user consistently puts only low scores or high scores on OIR 33,3 592
web documents, the user has a low confidence level. The rating information of users An opinion- ho keep low confidence levels for many sessions will not be considered for the based decision database of rating scores as well as the analysis of user preferences. model Metrics experiment We simulated the dynamic process of web document ranking and the constitution of expert groups depending on their performance. The performance of expert groups in reality will remain for future works. The the simulation test was to confirm that the decisions of dynamic expert groups reflect general users' opinions or ratings, and that this approach has the potential to recommend documents that have not been rated yet. The results of the test also show ow effective the system is In the simulation, we assumed: ten categories to need expert groups, a maximum of ten experts for each group, 10,000 web documents(d k= l,..., 10,000 )in the movie search engine, and 500 random users logged into our search engine. We modelled random log-in patterns of online users as a poisson process(ross, 2000; Taylor and Karlin, 1998). Each user had an arrival rate, in other words, an access rate and a transaction processing time, thus we defined the arrival rate A; for a user u; for I=l,., 500. For each user uli, the probability that the user accessed the search engine document within time At was p=i-e-Ajat where At is the basic time unit. This Poisson process closely resembles the pattern of incoming random patterns. Thus, our simulation work is valid for the real system. For every session we selected the top-100 ranked documents recommended by an expert group for each category and applied our effectiveness measures to the top-ranked lists. For the rank order window measure, we used window size S(dk)=4 For the FB measure, we grouped the top-100 ranked documents into ten classes, each of which contained ten documents Figure 2 shows the plots of effectiveness with four different measures, as the dynamic process of ranking evaluations continued. The expert group members and their knowledge levels were fixed for each category, and a random sequence of user ratings given. The results show the agreement level between the expert groups and the users in ranking documents according to the queries or categories. The simulation was run ten times for each category, and only two category results are displayed among ten categories. The figures show the average performance results with 95 per cent confidence intervals The results of the rank order window measure were similar to those of the rank function measure while the results of the Spearmans correlation measure were similar to those of the fo measure. the rank order window and rank function measure can be seen as micro- view evaluations of rank order difference. and the others as the macro-view. In Figure 2, the rank function or rank order window measure has a relatively low value, compared with the Spearman's correlation or FB measure. It is because the micro-view evaluations would need a more perfect rank match with desired ranks to reach 1. Even if there is a small number of elements with considerable rank difference, the measures will be severely influenced. However, it is notable that the four measures show a similar trend of curves, as the decision performance
web documents, the user has a low confidence level. The rating information of users who keep low confidence levels for many sessions will not be considered for the database of rating scores as well as the analysis of user preferences. Experiments Metrics experiment We simulated the dynamic process of web document ranking and the changing constitution of expert groups depending on their performance. (The prediction performance of expert groups in reality will remain for future works.) The purpose of the simulation test was to confirm that the decisions of dynamic expert groups reflect general users’ opinions or ratings, and that this approach has the potential to recommend documents that have not been rated yet. The results of the test also show how effective the system is. In the simulation, we assumed: ten categories to need expert groups, a maximum of ten experts for each group, 10,000 web documents (dk, k ¼ 1, ... , 10,000) in the movie search engine, and 500 random users logged into our search engine. We modelled random log-in patterns of online users as a Poisson process (Ross, 2000; Taylor and Karlin, 1998). Each user had an arrival rate, in other words, an access rate and a transaction processing time, thus we defined the arrival rate li for a user ui for I ¼ 1, ... , 500. For each user ui, the probability that the user accessed the search engine document within time Dt was Pi ¼ i 2 e 2liDt where Dt is the basic time unit. This Poisson process closely resembles the pattern of incoming random patterns. Thus, our simulation work is valid for the real system. For every session we selected the top-100 ranked documents recommended by an expert group for each category and applied our effectiveness measures to the top-ranked lists. For the rank order window measure, we used window size dðdkÞ ¼ 4. For the Fb measure, we grouped the top-100 ranked documents into ten classes, each of which contained ten documents. Figure 2 shows the plots of effectiveness with four different measures, as the dynamic process of ranking evaluations continued. The expert group members and their knowledge levels were fixed for each category, and a random sequence of user ratings given. The results show the agreement level between the expert groups and the users in ranking documents according to the queries or categories. The simulation was run ten times for each category, and only two category results are displayed among ten categories. The figures show the average performance results with 95 per cent confidence intervals. The results of the rank order window measure were similar to those of the rank function measure while the results of the Spearman’s correlation measure were similar to those of the Fb measure. The rank order window and rank function measure can be seen as micro-view evaluations of rank order difference, and the others as the macro-view. In Figure 2, the rank function or rank order window measure has a relatively low value, compared with the Spearman’s correlation or Fb measure. It is because the micro-view evaluations would need a more perfect rank match with desired ranks to reach 1. Even if there is a small number of elements with considerable rank difference, the measures will be severely influenced. However, it is notable that the four measures show a similar trend of curves, as the decision performance increased. An opinionbased decision model 593