Dynamic Models of Expert Groups to Recommend Web documents DaeEun Kim and Sea Woo Kim2 Division of Informatics University of Edinburgh, 5 Forrest Hill Edinburgh, EHI 2QL United Kingdom daeeunedai. ed acu 2 Manna Information System Bangbae-dong 915-9, Seocho-gu Seoul. 137-060. Korea seawoogunitel. co, kr Abstract. Recently most recommender systems have been developed to recommend items or documents based on user preferences for a partic- ular user, but they have difficulty in deriving user preferences for users who have not rated many documents. In this paper we use dynamic expert groups which are automatically formed to recommend domain- specific documents for unspecified users. The group members have d amic authority weights depending on their performance of the ranking evaluations. Human evaluations over web pages are very effective to find relevant information in a specific domain. In addition, we have tested several effectiveness measures on rank order to determine if the current top-ranked lists recommended by experts are reliable. We show simula- tion results to check the possibility of dynamic expert group models for 1 Introduction The development of recommender systems has emerged as an important issue e Internet application, and have drawn attention in the academic and com- mercial fields. An example of this application is to recommend new products or items of interest to online customers, using customer preferences Recommender systems can be broadly categorized into content-based and collaborative filtering systems 6, 13, 16, 17. Content-based filtering methods e textual descriptions of the documents or items to be recommended. A users file is associated with the content of the documents that the user has alread rated. The features of documents are extracted from information retrieval, pat tern recognition, or machine learning techniques. Then the content-based system recommends documents that match the user's profile or tendency 4, 17. In con ast, collaborative filtering systems are based on user ratings rather than the features in the documents 1, 17, 16. The systems predict the ratings of a user P. Constantopoulos and I.T. Solberg(Eds ) ECDL 2001, LNCS 2163, pp. 275-286, 2001
Dynamic Models of Expert Groups to Recommend Web Documents DaeEun Kim1 and Sea Woo Kim2 1 Division of Informatics, University of Edinburgh, 5 Forrest Hill Edinburgh, EH1 2QL United Kingdom daeeun@dai.ed.ac.uk 2 Manna Information System Bangbae-dong 915-9, Seocho-gu Seoul, 137-060, Korea seawoo@unitel.co.kr Abstract. Recently most recommender systems have been developed to recommend items or documents based on user preferences for a particular user, but they have difficulty in deriving user preferences for users who have not rated many documents. In this paper we use dynamic expert groups which are automatically formed to recommend domainspecific documents for unspecified users. The group members have dynamic authority weights depending on their performance of the ranking evaluations. Human evaluations over web pages are very effective to find relevant information in a specific domain. In addition, we have tested several effectiveness measures on rank order to determine if the current top-ranked lists recommended by experts are reliable. We show simulation results to check the possibility of dynamic expert group models for recommender systems. 1 Introduction The development of recommender systems has emerged as an important issue in the Internet application, and have drawn attention in the academic and commercial fields. An example of this application is to recommend new products or items of interest to online customers, using customer preferences. Recommender systems can be broadly categorized into content-based and collaborative filtering systems [6, 13, 16, 17]. Content-based filtering methods use textual descriptions of the documents or items to be recommended. Auser’s profile is associated with the content of the documents that the user has already rated. The features of documents are extracted from information retrieval, pattern recognition, or machine learning techniques. Then the content-based system recommends documents that match the user’s profile or tendency [4, 17]. In contrast, collaborative filtering systems are based on user ratings rather than the features in the documents [1, 17, 16]. The systems predict the ratings of a user P. Constantopoulos and I.T. Sølvberg (Eds.): ECDL 2001, LNCS 2163, pp. 275–286, 2001. c Springer-Verlag Berlin Heidelberg 2001
276 DaeEun Kim and sea woo kim over given documents or items, depending on ratings of other users with tastes similar to the user. Collaborating filtering systems, such as GroupLens [13, 9 can be a part of recommender systems for online shopping sites. They recom- mend items to users, with the history of products that similar users have ordered efore or have been interested in Most recommender systems have focused on the recommendations for a par- ticular user with the analysis of user preferences. Such systems require the user to judge many items in order to obtain the user's preferences. In general, many online customers or users are interested in other users' opinions or ratings about items that belong to a certain category, before they become used to searching for items of interest. For instance, customers in E-commerce like to see top-ranked lists of rating scores of many users for items that retailers provide, in order to purchase specific items. However, recommender systems still have the difficulty in providing relevant rating information before they receive a large number of user evaluations In this paper, we use a method to evaluate web documents by a representa- tive board of human agents [ 7; we call it an expert group. This is different from automatic recommender systems with software agents or feature extractions. We suggest dynamic expert groups among users should be automatically created to evaluate domain-specific documents for web page ranking and also the group members have dynamic authority weights depending on their performance of the ranking evaluations. This method is quite effective in recommending web documents or items that many users have not evaluated. a voting board of experts with expertise on a domain category is operated to evaluate the docu- ments. In this kind of problem, it is not a feasible idea to replace human agents by intelligent soft agents Our recommender system with dynamic expert groups may be extended to challenge search engine designs and image retrieval problems. Many search en gines find relevant information and its importance using automatic citation anal- ysis to the general subject of queries. The connectivity of hypertext documents has been a good measure for automatic web citation. This method works on the assumption that a site which is cited many times is popular and important Many automatic page ranking systems have used this citation metric to decide the relative importance of web documents. IBM HITS system maintains a hub and an authority score for every document [8. A method called Page Rank is suggested to compute a ranking for every web document based on the web con- nectivity graph 2 with the random walk traversal. It also considers the relative importance by checking ranks of documents, which means that a document is ranked as highly important when the document has backlinks from documents with high authority, such as the Yahoo home page However, the automatic citation analysis has a limitation that it does flect well the importance in viewpoints of human evaluation. There are many ses where simple citation counting does not refect our common sense concept of importance 2. The method of developing a new ranking technique based on human interactions has been explored in this paper to handle the problems
276 DaeEun Kim and Sea Woo Kim over given documents or items, depending on ratings of other users with tastes similar to the user. Collaborating filtering systems, such as GroupLens [13, 9], can be a part of recommender systems for online shopping sites. They recommend items to users, with the history of products that similar users have ordered before or have been interested in. Most recommender systems have focused on the recommendations for a particular user with the analysis of user preferences. Such systems require the user to judge many items in order to obtain the user’s preferences. In general, many online customers or users are interested in other users’ opinions or ratings about items that belong to a certain category, before they become used to searching for items of interest. For instance, customers in E-commerce like to see top-ranked lists of rating scores of many users for items that retailers provide, in order to purchase specific items. However, recommender systems still have the difficulty in providing relevant rating information before they receive a large number of user evaluations. In this paper, we use a method to evaluate web documents by a representative board of human agents [7]; we call it an expert group. This is different from automatic recommender systems with software agents or feature extractions. We suggest dynamic expert groups among users should be automatically created to evaluate domain-specific documents for web page ranking and also the group members have dynamic authority weights depending on their performance of the ranking evaluations. This method is quite effective in recommending web documents or items that many users have not evaluated. Avoting board of experts with expertise on a domain category is operated to evaluate the documents. In this kind of problem, it is not a feasible idea to replace human agents by intelligent soft agents. Our recommender system with dynamic expert groups may be extended to challenge search engine designs and image retrieval problems. Many search engines find relevant information and its importance using automatic citation analysis to the general subject of queries. The connectivity of hypertext documents has been a good measure for automatic web citation. This method works on the assumption that a site which is cited many times is popular and important. Many automatic page ranking systems have used this citation metric to decide the relative importance of web documents. IBM HITS system maintains a hub and an authority score for every document [8]. Amethod called Page Rank is suggested to compute a ranking for every web document based on the web connectivity graph [2] with the random walk traversal. It also considers the relative importance by checking ranks of documents, which means that a document is ranked as highly important when the document has backlinks from documents with high authority, such as the Yahoo home page. However, the automatic citation analysis has a limitation that it does not reflect well the importance in viewpoints of human evaluation. There are many cases where simple citation counting does not reflect our common sense concept of importance [2]. The method of developing a new ranking technique based on human interactions has been explored in this paper to handle the problems
Dynamic Models of Expert Groups to Recommend Web Documents 277 We run a pool of experts, human agents to evaluate web documents and their authorities are dynamically determined by their performance. Also we suggest several effectiveness measures based on rank order. We have simulation result of expert-selection process over users'random access to web documents 2 Method 2.1 Dynamic Authority Weights of ExI eb Web Crawler meta-search engine QUery Category Ranking pert group ranking engine Fig. l Search engine diagram We define a group of people with high authority and much expertise in a special field as an expert group. This expert group is automatically established to evaluate web documents on a specific category. We provide a framework for search engine with our recommender system in Fig. 1. A meta-search engine is to collect good web documents from conventional search engines(e. g. Yahoo Alta Vista, Excite, Info Seek) Addresses of the documents cited in search engine are stored in the document dB. each web document has the information of how ny search engines in the meta-search engine are referring to the document nd keeps the record of how many times online users have accessed the web document using the search engine. For every category there is a list of top ranked documents rated by an expe group which are sorted by score. Authoritative web pages are determined by human expert members. The experts directly examine the content of candidat web pages, which are highly referenced amo many users. The method of employing an expert group is based on the idea that
Dynamic Models of Expert Groups to Recommend Web Documents 277 We run a pool of experts, human agents to evaluate web documents and their authorities are dynamically determined by their performance. Also we suggest several effectiveness measures based on rank order. We have simulation results of expert-selection process over users’ random access to web documents. 2 Method 2.1 Dynamic Authority Weights of Experts DBQuery Category Ranking expert group Database file Indexer Web Crawler meta-search engine web Monitor search engine ranking engine users Fig. 1. Search engine diagram We define a group of people with high authority and much expertise in a special field as an expert group. This expert group is automatically established to evaluate web documents on a specific category. We provide a framework for search engine with our recommender system in Fig.1. Ameta-search engine is run to collect good web documents from conventional search engines(e.g. Yahoo, AltaVista, Excite, InfoSeek). Addresses of the documents cited in search engines are stored in the document DB. Each web document has the information of how many search engines in the meta-search engine are referring to the document, and keeps the record of how many times online users have accessed the web document using the search engine. For every category there is a list of top ranked documents rated by an expert group, which are sorted by score. Authoritative web pages are determined by human expert members. The experts directly examine the content of candidate web pages, which are highly referenced among web documents or accessed by many users. The method of employing an expert group is based on the idea that
278 DaeEun Kim and sea woo kim for a given decision task requiring expert knowledge, many experts may be better than one if their individual judgments are properly combined. In our system experts decide whether a web document should be classified into a recommended document for a given category. A simple way is the majority voting [ 11, 10 where each expert has a binary vote for a web document and then the document obtaining equal to or greater than half of the votes are classified into a top ranked list An alternative method we can consider is a weighted linear combination. A weighted linear sum of expert votings yields the collaborative net-effect ratings of documents. In this paper, we take the adaptive weighted linear combination method, where the individual contributions of members in the expert group are weighted by their judgment performance. The evaluations of all the experts are summed with weighted linear combinations. The expert rating results will dynamically change depending on each expert's performance. Our approach of expert group decision is similar to a classifier committee concept in automatic text categorization [10, 15. Their methods use classifiers based on various sta tistical or learning techniques instead of human interactions and decisions. This weighted measure is useful even when the number of experts is not fixed It will be an issue how to choose experts and decide authority weights. We define a rating score matrix X lxii when the i-th expert rates a web document d, with a score Xij. For each web document dj, the voting score of an expert committee is given as follows V(4)=∑rx=∑、Rx k=1 where Ne is the number of experts for a given category and ri is the relative authority for the i-th expert member in the expert pool, and wi is the authority weight for the i-th expert member. We suppose wi should be positive for all time. he weight wi is a dynamic factor, and it represents each expert's authority to evaluate documents. The higher authority weight indicates the expert is more influential to make a voting decision We define the error measure e as a squared sum of differences between desired roting scores and actual voting scores as follows E=22v(1)-V(d)=2红k+-a where n is the number of documents evaluated by users, V'(di) is the users voting score for an expert-voted document d;. We assume V'(di) is the average over all user scores, but in reality it is rarely possible to receive the feedback from all users. The authority weight for each expert is changed every session, which is a given period of time, and at the same time V'(di) can be approximated by the central limit theorem with a set of V'(di), which is the average of user ratings during the given session
278 DaeEun Kim and Sea Woo Kim for a given decision task requiring expert knowledge, many experts may be better than one if their individual judgments are properly combined. In our system, experts decide whether a web document should be classified into a recommended document for a given category. Asimple way is the majority voting [11, 10], where each expert has a binary vote for a web document and then the document obtaining equal to or greater than half of the votes are classified into a top ranked list. An alternative method we can consider is a weighted linear combination. A weighted linear sum of expert votings yields the collaborative net-effect ratings of documents. In this paper, we take the adaptive weighted linear combination method, where the individual contributions of members in the expert groups are weighted by their judgment performance. The evaluations of all the experts are summed with weighted linear combinations. The expert rating results will dynamically change depending on each expert’s performance. Our approach of expert group decision is similar to a classifier committee concept in automatic text categorization [10, 15]. Their methods use classifiers based on various statistical or learning techniques instead of human interactions and decisions. This weighted measure is useful even when the number of experts is not fixed. It will be an issue how to choose experts and decide authority weights. We define a rating score matrix X = [χij ] when the i-th expert rates a web document dj with a score χij . For each web document dj , the voting score of an expert committee is given as follows : V (dj ) = Ne i=1 riχij = Ne i=1 wi Ne k=1 wk χij where Ne is the number of experts for a given category and ri is the relative authority for the i-th expert member in the expert pool, and wi is the authority weight for the i-th expert member. We suppose wi should be positive for all time. The weight wi is a dynamic factor, and it represents each expert’s authority to evaluate documents. The higher authority weight indicates the expert is more influential to make a voting decision. We define the error measure E as a squared sum of differences between desired voting scores and actual voting scores as follows : E = 1 2 n j=1 [V (dj ) − V (dj )]2 = 1 2 n j=1 { Ne i=1 wi Ne k=1 wk χij − V (dj )}2 where n is the number of documents evaluated by users, V (dj ) is the users’ voting score for an expert-voted document dj . We assume V (dj ) is the average over all user scores, but in reality it is rarely possible to receive the feedback from all users. The authority weight for each expert is changed every session, which is a given period of time, and at the same time V (dj ) can be approximated by the central limit theorem with a set of V (dj ), which is the average of user ratings during the given session.
Dynamic Models of Expert groups to Recommend Web Documents 279 We use a gradient-descent method over the error measure E with respect to a weight wi and the gradient is given by dE ∑Ⅳ(d)-V(d4)3)=∑-v(a where S= ke, Wk is the sum of weights, and 4,=[V(di)-v'(di) is the difference between predicted voting score and users'rating score during a session for a document dj We apply the similar scheme shown in error back-progation of multilayer perceptrons 5 to our approach. If we update weights of experts by feedback of users about a web document dj, the weight is changed each session by the following dynamic equation (t+1)=m(1)-mx-vd3+a(4()-21(t-1) where n is a learning rate proportional to the number of user ratings per session, and a is the momentum constant The above equation says how to reward or penalize authority weights for their share of the responsibility for any error. According to the equation, the weight change involves with the correlation between a voting score difference among experts and the error difference. For example, when both an expert voted score and the desirable rank score are larger than the weighted average voting score, or both of them are smaller than the average score, the expert gets rewards, otherwise gets penalty. In this case some experts have rewards and others receive penalties depending on the weighted average voting score of the expert group 2.2 Evaluation effectiveness When dynamic authority weights are assigned to experts for a category, the expert group ratings can form a ranking list in order. We need to determine if the given ranking list is reliable. Reliable ranking means that good experts selected into a pool of expert group and they recommend relevant documents items to general users. We evaluate the prediction performance of expert groups in terms of effectiveness, that is, a measure of the agreement between expert groups and users in ranking a test set of web documents. We assume there are many users to evaluate top-ranked lists in contrast to a small number of expert e suggest several effectiveness measures which are related to the agreement in rank order between expert ratings and user ratings. They are rank order indow measure, rank function measure, Spearmans correlation measure and FB measure with rank order partition
Dynamic Models of Expert Groups to Recommend Web Documents 279 We use a gradient-descent method over the error measure E with respect to a weight wi and the gradient is given by ∂E ∂wi = ∂ ∂wi ( 1 2 n j=1 [V (dj ) − V (dj )]2) = n j=1 [χij − V (dj )]∆j S where S = Ne k=1 wk is the sum of weights, and ∆j = [V (dj ) − V (dj )] is the difference between predicted voting score and users’ rating score during a session for a document dj . We apply the similar scheme shown in error back-progation of multilayer perceptrons [5] to our approach. If we update weights of experts by feedback of users about a web document dj , the weight is changed each session by the following dynamic equation : wi(t + 1) = wi(t) − η[χij − V (dj )]∆j S + α(wi(t) − wi(t − 1)) where η is a learning rate proportional to the number of user ratings per session, and α is the momentum constant. The above equation says how to reward or penalize authority weights for their share of the responsibility for any error. According to the equation, the weight change involves with the correlation between a voting score difference among experts and the error difference. For example, when both an expertvoted score and the desirable rank score are larger than the weighted average voting score, or both of them are smaller than the average score, the expert gets rewards, otherwise gets penalty. In this case some experts have rewards and others receive penalties depending on the weighted average voting score of the expert group. 2.2 Evaluation Effectiveness When dynamic authority weights are assigned to experts for a category, the expert group ratings can form a ranking list in order. We need to determine if the given ranking list is reliable. Reliable ranking means that good experts are selected into a pool of expert group and they recommend relevant documents or items to general users. We evaluate the prediction performance of expert groups in terms of effectiveness, that is, a measure of the agreement between expert groups and users in ranking a test set of web documents. We assume there are many users to evaluate top-ranked lists in contrast to a small number of experts in a category group. We suggest several effectiveness measures which are related to the agreement in rank order between expert ratings and user ratings. They are rank order window measure, rank function measure, Spearman’s correlation measure and Fβ measure with rank order partition.
80 DaeEun Kim and Sea Woo Kim Rank Order Window Measure Given a sample query or category, we can represent the effectiveness as the percentage of top-ranked lists that user ratings rank in the same or very close position as an expert group does. Given top- ranked web documents D= d1, d2, .,dn, we can define effectiveness Adelt with rank-order window d(dk)as -1 S(dk) (dk)+6(dk) S(dk)=1 o(dk min(o(dk ), I (dk)-Q(d4) 2(a)+1-D where dk is the k-th web document from the test set for a given category, and 8(dk)is the width of the window centered in the rank u(dk) assigned by the ratings of experts for dk. Q(di)is the rank position of the average rating score of users for a document di. S(dk )calculates the rate of the rank order difference in the window lu(dk)-8(dk), u(dk )+5(dk) Rank Function Measure Given web resources D=d1, d2, .,n) and a set of all rank functions g over the set D, we suppose d1, d2,. dn is decreasingly ordered by their weighted rating values according to experts'judgements. We define a measure p to evaluate a ranking function e over given ranked web p(,dk)=Card({d∈D|(1≤i<k)∧o(d)<叭(dk)}) where Card is a cardinality function to count the number of elements in a given et, and is a rank function over web resources D, which gives a sorting order We define a user satisfaction function y over expert-voted ranked sites D follow y(o ∑k=1P(,dk) where o is the rank function obtained from the result of all user ratings for n documents,and0≤≤1. Spearman,s Correlation Measure Spearman's rank-order correlation mea- sure checks whether rank-ordered data is correlated, let ti be the rank of a document di in d=d1, d2, ...,dn by expert ratings, yi be the rank of di by ings. The nona elation is defined to be the li coefficient of the ranks ∑1(x1-2)2∑,(v- where T, y are the average of i, yi, respectively. a value of rs= l indicates complete positive correlation, which is a desirable state in our application, and rs=0 indicates no correlation of the data
280 DaeEun Kim and Sea Woo Kim Rank Order Window Measure Given a sample query or category, we can represent the effectiveness as the percentage of top-ranked lists that user ratings rank in the same or very close position as an expert group does. Given topranked web documents D = {d1, d2, ..., dn}, we can define effectiveness Λdelta with rank-order window δ(dk) as Λδ = n k=1 S(dk) n S(dk)=1 − 1 δ(dk) min(δ(dk), | µ(dk )+δ(dk) i=µ(dk)−δ(dk) µ(dk) − Q(di) 2δ(dk)+1 |) where dk is the k-th web document from the test set for a given category, and δ(dk) is the width of the window centered in the rank µ(dk) assigned by the ratings of experts for dk. Q(di) is the rank position of the average rating score of users for a document di. S(dk) calculates the rate of the rank order difference in the window [µ(dk) − δ(dk), µ(dk) + δ(dk)]. Rank Function Measure Given web resources D = {d1, d2, ..., dn} and a set of all rank functions Φ over the set D, we suppose d1, d2, ..., dn is decreasingly ordered by their weighted rating values according to experts’ judgements. We define a measure ρ to evaluate a ranking function φ ∈ Φ over given ranked web documents D as follows : ρ(φ, dk) = Card({di ∈ D|(1 ≤ i<k) ∧ φ(di) < φ(dk)}) where Card is a cardinality function to count the number of elements in a given set, and φ is a rank function over web resources D, which gives a sorting order. We define a user satisfaction function Ψ over expert-voted ranked sites D as follows: Ψ(φ¯) = n k=1 ρ(φ, d ¯ k) (n − 1)(n − 2)/2 where φ¯ is the rank function obtained from the result of all user ratings for n documents, and 0 ≤ Ψ ≤ 1. Spearman’s Correlation Measure Spearman’s rank-order correlation measure checks whether rank-ordered data is correlated. Let xi be the rank of a document di in D = {d1, d2, ..., dn} by expert ratings, yi be the rank of di by user ratings. The nonparametric correlation is defined to be the linear correlation coefficient of the ranks: rs = i(xi − x)(yi − y) i(xi − x)2 i(yi − y)2 where x, y are the average of xi, yi, respectively. Avalue of rs = 1 indicates complete positive correlation, which is a desirable state in our application, and rs = 0 indicates no correlation of the data.
Dynamic Models of Expert Groups to Recommend Web Documents 281 FB Measure with Rank Order Partition Evaluation effectiveness also can e described in terms of precision and recall widely used in information retrieval Precision is the conditional probability that when a document is predicted to be in a positive class, it truly belongs to this class. Recall is the conditional probability that a document belonging to positive class is truly classified into this class[12, 15. We partition recommended documents by their rank order and make classes. We define a positive class i as top [(i-1)*10+1 i*10] ranked documents by expert voting and a negative class as the others. For example class 2 documents are top 11 20 ranked documents. The precision probability Pi and recall probability R, for ranking site class i may be estimated using the contingency relations between expert ratings and user ratings, and those probabilities in our application can be calculated with transition instances between classes. transition instance Pii is defined as the number of instances that are predicted to be in class i by expert ratings but belong to class j by user rating ∑1n4-+B=m12-+耳 P R R where m is the number of classes, and P, R are the average precision and recall probabilities, respectively. The distance between classes is considered to calculate P,R. Then effectiveness can be computed using the value of Fa for0≤6≤∞ 19,3,20 To balance precision and recall, a value B=l is used in our experiments. If FB is close to zero, then the current documents ranked in a class through exper voting results can be seen to have many false responses from feedback of general online users or many new documents positioned to the top ranks. If FB is close to one, then top-ranked sites have good feedback from general users and little change occurs on the top ranked lists 2.3 User Confidence level e maintain a rating record of each user. Many users evaluate web documents but it is difficult to extract user preferences due to the lack of rating information Before reflecting each user evaluation, we need to check if each user has rated a sufficient number of documents and the user evaluations are reliable. Thus. if we assume a rating score level from l to m, the confidence level C for a user is defined as follows C(u)=-∑p()ogp(
Dynamic Models of Expert Groups to Recommend Web Documents 281 F Measure with Rank Order Partition Evaluation effectiveness also can be described in terms of precision and recall widely used in information retrieval. Precision is the conditional probability that when a document is predicted to be in a positive class, it truly belongs to this class. Recall is the conditional probability that a document belonging to positive class is truly classified into this class [12, 15]. We partition recommended documents by their rank order and make classes. We define a positive class i as top [(i − 1) ∗ 10 + 1 ∼ i ∗ 10] ranked documents by expert voting and a negative class as the others. For example, class 2 documents are top [11 ∼ 20] ranked documents. The precision probability Pi and recall probability Ri for ranking site class i may be estimated using the contingency relations between expert ratings and user ratings, and those probabilities in our application can be calculated with transition instances between classes. Atransition instance pij is defined as the number of instances that are predicted to be in class i by expert ratings but belong to class j by user ratings. Pi = pii m j=1 pij · |i − j + 1| , Ri = pii m j=1 pji · |i − j + 1| P = m i=1 Pi m , R = m i=1 Ri m where m is the number of classes, and P , R are the average precision and recall probabilities, respectively. The distance between classes is considered to calculate Pi, Ri. Then effectiveness can be computed using the value of Fβ for 0 ≤ β ≤ ∞ [19, 3, 20]. Fβ = (β2 + 1) · P · R β2 · P + R To balance precision and recall, a value β = 1 is used in our experiments. If Fβ is close to zero, then the current documents ranked in a class through expert voting results can be seen to have many false responses from feedback of general online users or many new documents positioned to the top ranks. If Fβ is close to one, then top-ranked sites have good feedback from general users and little change occurs on the top ranked lists. 2.3 User Confidence Level We maintain a rating record of each user. Many users evaluate web documents, but it is difficult to extract user preferences due to the lack of rating information. Before reflecting each user evaluation, we need to check if each user has rated a sufficient number of documents and the user evaluations are reliable. Thus, if we assume a rating score level from 1 to m, the confidence level C for a user u is defined as follows : C(u) = − m i=1 p(i) log p(i)
282 DaeEun Kim and Sea Woo Kim where p(i) is the probability that the user u rates documents with score i, and it can be calculated by counting the number of documents with score i among all the documents that the user u has rated The confidence level of a user is an entropy measurement to check the dis- tribution of score ratings. If it is more equally distributed, it is more likely that the user has given a sufficient number of ratings and also the user has unbiased criteria of evaluations. For example, if a user consistently puts only low scores or high scores on web documents, the user has a low confidence level. The rating information of users who keep low confidence levels for many sessions will not e considered for the database of rating scores as well as the analysis of user 3 Experiments We simulated the dynamic process of web document ranking and creations of expert groups depending on their performance. The prediction performance of expert groups in reality will remain for future works. The purpose of the sim ulation test is to confirm that the dynamic expert groups reflect general users opinions or ratings and has the potential to recommend documents that have not been rated yet. Also it will provide the results of effectiveness with several measures In the simulation, we assumed 10 categories to need expert groups, maximum 10 experts for each expert group, 10000 web documents dk, k=1,., 10000 in the movie search engine, and also 500 random users logging into our search engine We modeled random login patterns of online users as a Poisson process[14, 18 Each user has an arrival rate. in other words. an access rate and a transaction processing time, thus we define the arrival rate Ai for a user ui, for i= 1, . 500 or a each user ui, the probability that the user accesses the search engine document within time At is P= 1 Every session we have selected top-100 ranked documents recommended by an expert group for each category and applied our effectiveness measures to top- ranked lists. For rank order window measure, we used window size &(dk)=4 For Fa measure, we grouped the top-100 ranked documents into 10 classes each Fig 2 shows the plots of effectiveness with four different measures, as the dynamic process of ranking evaluations continues. The expert group members and their knowledge levels are fixed for each category, and a random sequence of ser ratings has been given. The results show the agreement level between expert groups and users in ranking documents according to the queries or categories Simulation was run 10 times for each category, and only 2 category results are displayed among 10 categories. The figures show the average performance re- sults with 95% confidence intervals. The results of rank order window measure are similar to those of rank function measure while the results of Spearmans correlation measure are similar to those of FB measure. The rank order window
282 DaeEun Kim and Sea Woo Kim where p(i) is the probability that the user u rates documents with score i, and it can be calculated by counting the number of documents with score i among all the documents that the user u has rated. The confidence level of a user is an entropy measurement to check the distribution of score ratings. If it is more equally distributed, it is more likely that the user has given a sufficient number of ratings and also the user has unbiased criteria of evaluations. For example, if a user consistently puts only low scores or high scores on web documents, the user has a low confidence level. The rating information of users who keep low confidence levels for many sessions will not be considered for the database of rating scores as well as the analysis of user preferences. 3 Experiments We simulated the dynamic process of web document ranking and creations of expert groups depending on their performance. The prediction performance of expert groups in reality will remain for future works. The purpose of the simulation test is to confirm that the dynamic expert groups reflect general users’ opinions or ratings and has the potential to recommend documents that have not been rated yet. Also it will provide the results of effectiveness with several measures. In the simulation, we assumed 10 categories to need expert groups, maximum 10 experts for each expert group, 10000 web documents dk, k = 1, .., 10000 in the movie search engine, and also 500 random users logging into our search engine. We modeled random login patterns of online users as a Poisson process[14, 18]. Each user has an arrival rate, in other words, an access rate and a transaction processing time, thus we define the arrival rate λi for a user ui, for i = 1, ..., 500. For a each user ui, the probability that the user accesses the search engine document within time ∆t is Pi = 1 − e−λi∆t where ∆t is the basic time unit. Every session we have selected top-100 ranked documents recommended by an expert group for each category and applied our effectiveness measures to topranked lists. For rank order window measure, we used window size δ(dk) = 4. For Fβ measure, we grouped the top-100 ranked documents into 10 classes each of which contains 10 documents. Fig.2 shows the plots of effectiveness with four different measures, as the dynamic process of ranking evaluations continues. The expert group members and their knowledge levels are fixed for each category, and a random sequence of user ratings has been given. The results show the agreement level between expert groups and users in ranking documents according to the queries or categories. Simulation was run 10 times for each category, and only 2 category results are displayed among 10 categories. The figures show the average performance results with 95% confidence intervals. The results of rank order window measure are similar to those of rank function measure while the results of Spearman’s correlation measure are similar to those of Fβ measure. The rank order window
Dynamic Models of Expert Groups to Recommend Web documents and rank function measure can be seen as micro-view evaluations over rank order difference and the others as macro-view 干于 Hi+totti++fifi/ f于 Fig. 2. Results of Effectiveness measures under different categories(a)category A(b)category B ---- Fig 3. An example of weight change for experts(a)category A(b)category B Fig 3 shows the transitions of experts'authority weights accordi rating performance. Each expert has an initial weight 10 and maximum weight 30 is allowed to prevent too much authority over only a few experts. We assume that when one of the weights becomes negative, the corresponding expert is dropped from the committee. In simulation experiments it happened that some experts had high authority weights for a while and yielded their authority levels to other good experts. Many experts with bad performance have been dropped from the expert groups and then new members have been added; many oscillating curves of authority weights are seen between 0 and 10 in Fig 3. As a result, the above
Dynamic Models of Expert Groups to Recommend Web Documents 283 and rank function measure can be seen as micro-view evaluations over rank order difference, and the others as macro-view. 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Time (session) Effectiveness measure rank order window rank function Spearman correlation F beta 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Time (session) Effectiveness measure rank order window rank function Spearman correlation F beta (a) (b) Fig. 2. Results of Effectiveness measures under different categories (a) category A(b) category B 0 10 20 30 40 50 60 70 80 90 100 −5 0 5 10 15 20 25 30 35 Time (session) Authority weight 0 10 20 30 40 50 60 70 80 90 100 5 0 5 10 15 20 25 30 35 Time (session) Authority weight (a) (b) Fig. 3. An example of weight change for experts (a) category A (b) category B Fig.3 shows the transitions of experts’ authority weights according to their rating performance. Each expert has an initial weight 10 and maximum weight 30 is allowed to prevent too much authority over only a few experts. We assume that when one of the weights becomes negative, the corresponding expert is dropped from the committee. In simulation experiments it happened that some experts had high authority weights for a while and yielded their authority levels to other good experts. Many experts with bad performance have been dropped from the expert groups and then new members have been added; many oscillating curves of authority weights are seen between 0 and l0 in Fig.3. As a result, the above
284 DaeEun Kim and Sea Woo Kim process plays a role on filtering out bad experts and keeping good experts as time passes. As the iteration of weight change continues for a long session, the authority weight may become stabilized as shown in Fig 4; there is no newcomer the expert 9 B。唐县 更音乱日s (b) Fig 4. An example of dynamic weights for expert committee members and their performance(a) effectiveness performance(b) weight change Fig. 5. An example of distribution of rank orders by expert group ratings and user ratings(a) before weight change(b)after weight change Fig 5(b)shows an example of the agreement of rank order between the expert ratings and the user ratings in evaluating documents, while there is no regular pattern of the agreement at initial state as shown in Fig. 5(a). After applying daptive change of authority weights, the rank-order prediction of experts be comes close to the rank order of user ratings
284 DaeEun Kim and Sea Woo Kim process plays a role on filtering out bad experts and keeping good experts as time passes. As the iteration of weight change continues for a long session, the authority weight may become stabilized as shown in Fig.4; there is no newcomer in the expert committee. 0 50 100 150 200 250 300 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time (session) Effectiveness measure rank order window rank function Spearman correlation F beta 0 50 100 150 200 250 300 5 0 5 10 15 20 25 30 35 Time (session) Authority weight (a) (b) Fig. 4. An example of dynamic weights for expert committee members and their performance (a) effectiveness performance (b) weight change 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Rank of user ratings Rank of expert ratings 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Rank of expert ratings Rank of user ratings (a) (b) Fig. 5. An example of distribution of rank orders by expert group ratings and user ratings (a) before weight change (b) after weight change Fig.5(b) shows an example of the agreement of rank order between the expert ratings and the user ratings in evaluating documents, while there is no regular pattern of the agreement at initial state as shown in Fig.5(a). After applying adaptive change of authority weights, the rank-order prediction of experts becomes close to the rank order of user ratings