MINING INFLUENCE IN RECOMMENDER SYSTEMS A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA Al Mamunur rashid IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY John T. Riedl. Adviser Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
UMI Number 3250160 Copyright 2007 by Rashid. Al Mamunur All rights reserved NFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion UMI UMI Microform 3250160 Copyright 2007 by ProQuest Information and Learning Company All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code ProQuest Information and Learning Company 300 North Zeeb road P O. Box 1346 Ann Arbor. MI 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
C Al Mamunur Rashid 2007 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
To Tulip Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Abstract The influence of an entity can be defined as its ability to affect the conduct, behav or actions of other entities. We provide evidence in this thesis that influence is important in recommender systems. Recommender systems help people find the things they care about from an unmanageably large number of choices by mining relationships between like-ininded people. Discovering influential items and users can enhance the ability of a recommender system to deliver quality recommendations in various ways, including guiding a new member through the right set of items to evaluate so that the system learns her preferences effectively, and selecting reliable users for early evaluations of new items. How, then, may we discover the most influential items and users in the system? We explore several sources of insight for influence algorithms in recommender systems: social network theory, information theory and mathematical analysis of the recommender algorithms themselves. Broadly speaking a)the nature of prior research on influence in other domains and the viability of applying that research to the recommender systems domain, c)new measures of infuence, based on prior research extended appropriately for recommender systems, and d ) the feasibility and implications of meaningful applications of influence Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Contents Dedication Abstract List of tables List of Figures vlll I Prologue 1.1 The Problem Domain: Recommender Systems Collaborative Filtering 1.2 Recommender Systems and Influence 1. 3 Contributions 1. 4 Thesis Roadmap 2 Experimental Platform 2.1 Introduction 16 2.2 Collaborative Filtering Algorithms Considered 2.3 Experimental Platform Datasets 2. 4 Evaluation metrics II Influence of Users 3 The Idea of Users?Influence in Recommender Systems 3.1 Introduction 3.2 Principles of Influence Measures 3.3 Influence Based on Earlier Work Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
3.3.1 Authority 3.3.2 Centrality-Based Measures 3.3.3 ELPNETWORK VALUE 3.3.4Di 3.4 Proposed Approach: Loo-Based Infuence 3.4. 1 Computing LOO-based measures on USER-BASED kNN 3.5 Summary 4 ENIPD: An Algorithm-Independent Measure of Influence 4.2 ENIPD Idea 4.3 Computing ENIPD from Data 4.3. 1 Computing ENIPD on any CF Algorithm 4.4 Qualitative Factors Potentially Affecting ENIPD 4.5 Building Predictive Models of ENIPD 4.5.2 Results 4.5.2.1 Predictive Performance 4.5.2.2 Relationship Between ENIPD and the factors 4.6 Discussion 4.6.1 Dependence of ENIPD on the CF Algorithm 4.6.2 Applications of ENIPD 4.6.2.1 Reducing Model siz 4.6.2.2 Improving Coverage 4.6.2.3 Enhancing User Participa 64 4.7S 5 ENSI: An Influence Measure to Find Early Evaluators 5.1 Introduction 5.2 ENSI: Influence by Reliable 5.3 Selecting Influencers for Early Evaluation 5.4 Empirical Study 72356 5.4.1 Preparing D 5.4.2 Other approaches compared 5.4.3 Results 1v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
5.4.4 Discussion 5.5 Summar III Influence of Items 6 The Idea of Item Influence in Recommender Systems 6.2 Approaches for Computing Item Infuence 6.2. 1 POPULARItY 622EN 888889 6.2.3 Information Theoretic Measures 6.2.3.1 Entropy 6.2.3.2 ENTROPYO: Entropy Considering Missing values 6.2.3.3 HELF: Harmonic mean of Entropy and Logarithm of Frequency 95 6.2.3.4 IGCN: Information Gain through Clustered Neighbors 6. 3 Empirical Study 6.3.1 Procedure 102 6.3.1.1 Preparing Data 6.3. 1.2 Computing influence measures from data 6.3.1.3 Evaluation 6.3.2 Results 6.4 Discussion 6.5 Summary 113 Iv Online Experiments with Influence 114 7 Learning Preferences of New Users 115 7.1 Introduction 7.2 Offline Experiments 7.2.1 Data 7.2.2 Procedure 2.3 Results 7. 2.4 Discussion 128 7.3 Online Experiments 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
7.3.1 Design 7. 3.2 Results and discussio 7. 4 Summary 133 8 Motivating Contributions of Users 8.1 Introduction 8.2 Research Questions 137 Experimental Design 8.3.1 Experimental Groups 8.3.2 The Experiment 8.3.3 Hypotheses 140 8.3.4 Post-Survey 8.4 Methods 8. 4.2 Movie list 8.4.3 Item value 8.5 Results and discussion 148 8.6 Summary 150 v Epilogue 152 153 9. 1 Future research directions 9.1.1 Attacks on Recommender Systems 9.1.2 Early Evaluations by Influencers 9.1.3 Recommendation Accuracy 9. 1.4 Interface Issues 9.1.5 Diffusion of Influence 9. 1.6 Other Collaborative Filtering Algorithms 160 Bibliography 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
List of tables 2. 1 Comparison between USER-BASED kNN and ITEM-BASED knn CF algorithms. 19 2.2 Properties of the datasets 3.1 Influence measures compared against influence principles 4.1 Relationship between ENIPD and #of ratings of the users 4.2 Squared correlation coefficient between the actual values and predicted values of EniD 4.3 Weights of the features near SvM modeling 6.1 Showing a limitation of entropy 6.2 ENTROPYO computation of the item a, which has been voted 200 times and the votes are uniformly distributed across the rating rating-scale of (1-5) 6. 3 ENTROPYO computation of the item b, which has been voted 3, 000 times and the votes are unanimously 5 6.4 Effect of applying a log transformation to items'rating frequency 6.5 Average percentage of overlapped items between the pairs of item-infuence measures 6.6 Properties of the user-specific top 20 items by various item-influence measures 110 7.1 Table of notations 7. 2 Showing the strengths and weaknesses of the item influence approaches in 127 7. 3 Group-wise participations of the subjects 130 7.4 Effectiveness of the learned user profiles according to th racy of the initial recommendations on two CF algorithms 8. 1 Contrasts testing the four hypotheses Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission