Design of a social Network Based recommender system r Participatory Media Conten Aaditeshwar Seth School of Computer Science University of Waterloo, ON, Canada in sociological terms, and modified if necessary. With the increased reliance of people on the Internet for their news ABSTRACT nd opinion, we feel this to be a responsible goal. Personalized recommender systems are required to help peo- Our proposed system design fulfils these goals, making ple cope with the information glut of participatory media xtensive use of social network information to do this. We content such as blogs. We focus on news related content and begin by illustrating the importance of sociological theory to propose the design of a social network based recommender dentify factors that influence user preferences, and describe em for this purpose. We show that the use of sociological a user modeling approach to learn these preferences. We ry helps to simplify the system design, improve scala- then use these insights to propose the design for a recom- bility, and understand the behavior of the system. In this mender system for participatory media content, and finally paper, we present an overview of our work in sociological focus on one specific component of the system theory and user modeling, outline the system design for a ome open problems, and focus on one component of the 2. THE USE OF SOCIOLOGICAL THEORY system that is strongly grounded in social network theory. Research in the area of media effects and information sci- ch as simplification of news 1. INTRODUCTION to make it more easily understandable to readers, and the opinion diversity expressed in the news to help people gain Given the rapid growth of participatory media content clarity and unbiased viewpoints [4, 5]. We hypothesize that such as ble d podcasts, there is a need to build participatory media improves both simplification and diver personalized recommender systems to recommend only use- sity of information [10]. Consider the following example ful content to users 1. However, the design of such systemm presents many challenges BBC News published an article on November 4 2007 about the 2007 Emergency declared in Pakistan. The article de- 1. Scalability: Systems should be able to scale to millions scribed some aspects of the event, such as President Mushar- of messages published daily, and offer recommendations to raf's justification of his decision, reaction by the judiciary, hundreds of millions of users and condemnation by other political leaders. Following Personalization: Due to the varied interests of users are two comments by readers personalized recommendations should be provided to them. 1."I recently graduated in electrical engineering from Com- 3. Credibility: Mechanisms should be designed to validate sats Islamabad and got a job after a long struggle in one the credibility of participatory media content to ensure that of the telecom companies here in Islamabad. I am hired on credible messages are recommended to users. 4. Grounded in media theory: Much of the participatory me- FATA areas. After this emergency declaration now thinking to cancel the project in that area for which I dia content is related to political opinions that tend to be was hired for, as NWFP and FATA areas are prime hiding subjective. Considerable sociological research in the area of places for Taliban. Now my job is in jeopardy and media and message effects has identified factors that should be considered in the presentation of such content to help know what my future holds for people gain clarity and unbiased viewpoints about various 2."I have family in Karachi and we are leading normal opics. Current recommender systems do not seem to take lives going about our daily work, parties, schools and all,a hese factors into account. For example, past studies on the few changes like more uniformed men and barriers not a big earch services of Google showed that its algorithms tend to problem, in fact most of us are glad that Musharraf took this bias results towards more popular websites and reduce di action, he should have done this earlier. If any Pakistani versity [2); although this was challenged subsequently [3, it leader is to be trusted with leadership it is Musharraf, not indicates uncertainty in whether diversity is indeed assured traitors and looters. in the results. We therefore consider it worthwhile that in Clearly, both these comments explored aspects of the sights from media research should be used to form the thec that had not been considered in the original article, and d thus retical foundations of recommender systems for news related participatorymediasothattheirbehaviorcanbeexplainedhttp://news.bbcco.uk/2/hi/south-asia/7077310.stm
Design of a Social Network Based Recommender System for Participatory Media Content Aaditeshwar Seth School of Computer Science University of Waterloo, ON, Canada ABSTRACT Personalized recommender systems are required to help people cope with the information glut of participatory media content such as blogs. We focus on news related content and propose the design of a social network based recommender system for this purpose. We show that the use of sociological theory helps to simplify the system design, improve scalability, and understand the behavior of the system. In this paper, we present an overview of our work in sociological theory and user modeling, outline the system design for a recommender system that makes use of this work, describe some open problems, and focus on one component of the system that is strongly grounded in social network theory. 1. INTRODUCTION Given the rapid growth of participatory media content such as blogs, videos, and podcasts, there is a need to build personalized recommender systems to recommend only useful content to users [1]. However, the design of such systems presents many challenges. 1. Scalability: Systems should be able to scale to millions of messages published daily, and offer recommendations to hundreds of millions of users. 2. Personalization: Due to the varied interests of users, personalized recommendations should be provided to them. 3. Credibility: Mechanisms should be designed to validate the credibility of participatory media content to ensure that credible messages are recommended to users. 4. Grounded in media theory: Much of the participatory media content is related to political opinions that tend to be subjective. Considerable sociological research in the area of media and message effects has identified factors that should be considered in the presentation of such content to help people gain clarity and unbiased viewpoints about various topics. Current recommender systems do not seem to take these factors into account. For example, past studies on the search services of Google showed that its algorithms tend to bias results towards more popular websites and reduce diversity [2]; although this was challenged subsequently [3], it indicates uncertainty in whether diversity is indeed assured in the results. We therefore consider it worthwhile that insights from media research should be used to form the theoretical foundations of recommender systems for news related participatory media, so that their behavior can be explained in sociological terms, and modified if necessary. With the increased reliance of people on the Internet for their news and opinion, we feel this to be a responsible goal. Our proposed system design fulfils these goals, making extensive use of social network information to do this. We begin by illustrating the importance of sociological theory to identify factors that influence user preferences, and describe a user modeling approach to learn these preferences. We then use these insights to propose the design for a recommender system for participatory media content, and finally focus on one specific component of the system. 2. THE USE OF SOCIOLOGICAL THEORY Research in the area of media effects and information science has identified factors such as simplification of news to make it more easily understandable to readers, and the opinion diversity expressed in the news to help people gain clarity and unbiased viewpoints [4, 5]. We hypothesize that participatory media improves both simplification and diversity of information [10]. Consider the following example. BBC News published an article on November 4th 2007 about the 2007 Emergency declared in Pakistan. The article described some aspects of the event, such as President Musharraf’s justification of his decision, reaction by the judiciary, and condemnation by other political leaders 1 . Following are two comments by readers: 1. “I recently graduated in electrical engineering from Comsats Islamabad and got a job after a long struggle in one of the telecom companies here in Islamabad. I am hired on the basis that they are starting a new project in NWFP and FATA areas. After this emergency declaration company is now thinking to cancel the project in that area for which I was hired for, as NWFP and FATA areas are prime hiding places for Taliban... Now my job is in jeopardy and don’t know what my future holds for me...” 2. “I have family in Karachi and we are leading normal lives going about our daily work, parties, schools and all, a few changes like more uniformed men and barriers not a big problem, in fact most of us are glad that Musharraf took this action, he should have done this earlier... If any Pakistani leader is to be trusted with leadership it is Musharraf, not traitors and looters...” Clearly, both these comments explored aspects of the event that had not been considered in the original article, and thus 1http://news.bbc.co.uk/2/hi/south asia/7077310.stm
added extra information to the article. What is more inter- Thus, if the social relationships between the comment au- esting to notice is that each comment may have been useful thors and recipients are taken into account, it can explain to different people for different reasons. The first comment the reasons why people may find an article to be useful nay have been particularly useful for fellow-graduates of the This can be converted into a qualitative model, as shown author because it could have spurred some corrective actions in Fig. 1. Part (a) shows that people in different parts of on their part, or explained the relevance of the article for he social network may read different messages about the them. In other words, the comment would have simplified same event, referred to as spatial dissemination of the mes- the article for them. The second comment may have also sages. Now, when people comment on these messages lo- been useful, but for a different reason of having increased cally within their clusters, it leads to contextualization of the diversity of the article. Furthermore, it is reasonable to the messages for other members of the same clusters. This expect that different people may have different preferences is because close friends linked through strong ties would typ- of seeking simplified information and diverse information, ically share the same environment and circumstances with and this can influence their perceived usefulness of an arti- each other, and hence message contributions made by them cle and its comments. We next explain how social network will be more contextual for other people sharing the same information can be used to model these user preferences environment. Part(b) shows that when people read mes- ages and comments written by people in adjacent cluster 2.1 Context and completeness linked through weak ties, it provides more completeness to Rather than use the words simplification and diversity hem. This can also happen when people cross-post across we resort to a broader working terminology using the words different clusters, and provide"new" viewpoints or address contert and completeness respectively. Contextualization of different aspects of the event described by the message. Part a message for a recipient helps to"situate"the message bet. (c)shows that these new viewpoints also circulate within ex- ter in the recipient's context, ie. explains the relationship ternal clusters to gain context. This process is termed as the f the message to the recipient, and hence leads to simpli temporal evolution of a message, to indicate that the context fication. Completeness of a message denotes the depth and and completeness of the message increases with time as it breadth of topics covered by it 8, ie. the scope or diver- circulates among different participants. In practice, all the sity of viewpoints it contains. Restating the above processes are likely to occur concurrently and there terminology the first comment in the bbc article would may not be any time-sequencing between the stages. Thus, have provided contert to fellow-graduates of the author of a message can be visualized as spreading on the social net- the comment, and both the comments would have provided work; depending upon the pathways that it takes, it ga additional completeness to the readers by exploring new context and completeness for a recipient pects of the topic. Although these features of context and Note that the model assumes that clusters of strong ties completeness may appear to have arisen randomly in the of people define contextual boundaries between them,where comments,sociological theory helps explain away this ran- people within the same cluster share a common context with domness. Our insight is that instead of considering the com ments as independent entities, the social network relation clusters of common context should therefore be topic spe hips between the comment authors and recipients should cific. Hence, we introduce the notion of a topic specific so- aken into account, as explained next cial network, which consists of only those people who are The strength-of-weak-ties hypothesis in social networks [6 interested in a particular topic. The model is specified pre- states that social networks consist of clusters of people with cisely in [10] as operating within and across clusters in topic "strong ties among members of each cluster, and"weak specific social networks only, rather than the generic social ties linking people across clusters. Whereas strong ties are network over all topics Such insights from sociological theory allow us to model cuted of acquaintances or remote colleagues. The hypothesis participatory messages in mathematical terms by making claims that weak ties are useful for the diffusion of infor- mation, influence, and economic mobility because weak ties and recipients. In [10, we developed methods to classify social network ties as strong or weak. We then formulated implicit to the hypothesis is the phenomenon of homophily graph-theoretic measures to determine the degree of context noticed in multiple studies, which states that people similar or completeness in a message for a recipient, based on the o one another in terms of age, ethnicity, geographical loca- contributions it gained from strongly or weakly connected ion, income status, etc, tend to cluster together [7, 9, 12 portions of th etwork of a recipient. These for- Homophily leads to strong ties between people similar to mulations were validated by correlating measurements on a each another, and weak ties linking diverse clusters together web-crawl of a social networking website, orkut. com, with The strength-of-weak-ties hypothesis can help explain the surveys of real users subscribed to the website BBC news example. Although we do not have knowledge Usefulness model: As stated earlier. different users will of the underlying social network of the various comment au ive different preferences for contextual or complete infor- thors, the phenomenon of homophily makes it likely that the mation they prefer to receive. To give a visual representa- two comment authors would belong to different social net- tion, Fig. 2 illustrates the mass-distribution of the number work clusters, the first author being strongly connected to of strong and weak links of users over the population of recent graduates from the same university, and the second users interested in a certain topic, obtained from our Orkut author being in a higher bourgeoise class cluster, possibly dataset. Clearly, there is a large variation in the charac- ith weak links across the two clusters. The comments will teristics of users for associating with other users who are therefore be more contextual for other people in the same strongly or weakly connected to them. Some users are"self- clusters, and provide more completeness across the clusters referential", having a large number of strong ties but few
added extra information to the article. What is more interesting to notice is that each comment may have been useful to different people for different reasons. The first comment may have been particularly useful for fellow-graduates of the author because it could have spurred some corrective actions on their part, or explained the relevance of the article for them. In other words, the comment would have simplified the article for them. The second comment may have also been useful, but for a different reason of having increased the diversity of the article. Furthermore, it is reasonable to expect that different people may have different preferences of seeking simplified information and diverse information, and this can influence their perceived usefulness of an article and its comments. We next explain how social network information can be used to model these user preferences. 2.1 Context and completeness Rather than use the words simplification and diversity, we resort to a broader working terminology using the words context and completeness respectively. Contextualization of a message for a recipient helps to “situate” the message better in the recipient’s context, ie. explains the relationship of the message to the recipient, and hence leads to simpli- fication. Completeness of a message denotes the depth and breadth of topics covered by it [8], ie. the scope or diversity of viewpoints it contains. Restating the example in this terminology, the first comment in the BBC article would have provided context to fellow-graduates of the author of the comment, and both the comments would have provided additional completeness to the readers by exploring new aspects of the topic. Although these features of context and completeness may appear to have arisen randomly in the comments, sociological theory helps explain away this randomness. Our insight is that instead of considering the comments as independent entities, the social network relationships between the comment authors and recipients should be taken into account, as explained next. The strength-of-weak-ties hypothesis in social networks [6] states that social networks consist of clusters of people with “strong” ties among members of each cluster, and “weak” ties linking people across clusters. Whereas strong ties are typically constituted of close friends, weak ties are constituted of acquaintances or remote colleagues. The hypothesis claims that weak ties are useful for the diffusion of information, influence, and economic mobility, because weak ties help connect diverse clusters of people with each other. Also implicit to the hypothesis is the phenomenon of homophily noticed in multiple studies, which states that people similar to one another in terms of age, ethnicity, geographical location, income status, etc, tend to cluster together [7, 9, 12]. Homophily leads to strong ties between people similar to each another, and weak ties linking diverse clusters together. The strength-of-weak-ties hypothesis can help explain the BBC news example. Although we do not have knowledge of the underlying social network of the various comment authors, the phenomenon of homophily makes it likely that the two comment authors would belong to different social network clusters, the first author being strongly connected to recent graduates from the same university, and the second author being in a higher bourgeoise class cluster, possibly with weak links across the two clusters. The comments will therefore be more contextual for other people in the same clusters, and provide more completeness across the clusters. Thus, if the social relationships between the comment authors and recipients are taken into account, it can explain the reasons why people may find an article to be useful. This can be converted into a qualitative model, as shown in Fig. 1. Part (a) shows that people in different parts of the social network may read different messages about the same event, referred to as spatial dissemination of the messages. Now, when people comment on these messages locally within their clusters, it leads to contextualization of the messages for other members of the same clusters. This is because close friends linked through strong ties would typically share the same environment and circumstances with each other, and hence message contributions made by them will be more contextual for other people sharing the same environment. Part (b) shows that when people read messages and comments written by people in adjacent clusters linked through weak ties, it provides more completeness to them. This can also happen when people cross-post across different clusters, and provide “new” viewpoints or address different aspects of the event described by the message. Part (c) shows that these new viewpoints also circulate within external clusters to gain context. This process is termed as the temporal evolution of a message, to indicate that the context and completeness of the message increases with time as it circulates among different participants. In practice, all the above processes are likely to occur concurrently and there may not be any time-sequencing between the stages. Thus, a message can be visualized as spreading on the social network; depending upon the pathways that it takes, it gains context and completeness for a recipient. Note that the model assumes that clusters of strong ties of people define contextual boundaries between them, where people within the same cluster share a common context with each other. However, people have different interests, and clusters of common context should therefore be topic specific. Hence, we introduce the notion of a topic specific social network, which consists of only those people who are interested in a particular topic. The model is specified precisely in [10] as operating within and across clusters in topic specific social networks only, rather than the generic social network over all topics. Such insights from sociological theory allow us to model participatory messages in mathematical terms by making use of the social network information of message authors and recipients. In [10], we developed methods to classify social network ties as strong or weak. We then formulated graph-theoretic measures to determine the degree of context or completeness in a message for a recipient, based on the contributions it gained from strongly or weakly connected portions of the social network of a recipient. These formulations were validated by correlating measurements on a web-crawl of a social networking website, orkut.com, with surveys of real users subscribed to the website. Usefulness model: As stated earlier, different users will have different preferences for contextual or complete information they prefer to receive. To give a visual representation, Fig. 2 illustrates the mass-distribution of the number of strong and weak links of users over the population of users interested in a certain topic, obtained from our Orkut dataset. Clearly, there is a large variation in the characteristics of users for associating with other users who are strongly or weakly connected to them. Some users are “selfreferential”, having a large number of strong ties but few
口日 a contextualization b Completeness c. Contextualization of complete information Figure 1: Message evolution model members of their cluster, or with the general public opinion or be completely self-reliant in their own beliefs [19. This was used to develop an Eigenvector based method to dif- ferentiate between different kinds of credibilities: contextual credibility based on ratings given by users in the same topic specific cluster; experienced credibility based only on ratings given by a specific user; and public credibility based on all available ratings. These Eigenvectors were used to learn the characteristics of a user towards judging the credibili- ties of contextual and complete information, again by using a Bayesian network. The credibility model was validated on adatasetobtainedfromaweb-crawlofdigg.com. Figure 2: Mass-distribution of strong and weak ties 3. RECOMMENDATION PROCESS weak ties. Some other users are " hub users", having few The sociological models can be used to develop a process strong ties but many weak ties. We noticed that these be for generating message recommendations. We provide a de- havioral characteristics of users also translate into how much tailed description later in Section 4 contextual or complete information the users prefer to re. ceive from their social network. Measures for context and 1. We assume that the social network structure is given completeness were used to build a Bayesian network model in advance as a directed graph G(U, E), where users are that learns user preferences for the usefulness they derive represented as nodes U with edges E between users who are from contextual and complete information 11. The Orkut friends or know each other dataset was used to validate the usefulness model, showing 2. We assume that a list of topics T is given in advance, that social network information can be used to predict the and a boolean value for each(uera∈U, topic t∈m usefulness that people may associate with each message pair indicates whether or not the user is interested in the 2.2 Credibility topic. The induced subgraph Gt formed by users who are terested in some topic t is called the topic specific social The phenomenon of homophily also enables us to develop network for that topic. a model for credibility. Evaluating the credibility of a blog. 3. We assume that a clustering wgonthm exists to find clus- ger or an individual message is a complicated problem. Blog- ters in topic specific social networks such that users within credibilities of bloggers differently according to their own each cluster have strong links between them, and users in contexts. Furthermore, the credibilities are likely to be topic different clusters are connected with weak links pecific: an expert in some area may not be an expert in an- 4. Now, when a new event occurs related to topic t, many other area. As a norm, we can expect that whenever users messages M about the event will be created. For a message annotate a message positively or negatively as in websites m E M, the active environment of the message is defined as ike digg. com, they will always do so based eir inherent the set of participant users who write comments or replies tastes and biases. However, consideration of the underlying to the message. Um refers to the participant users, and Cm social network of the users can help make sense of seem- refers to their contributions. Rm are the ratings given to m gly random ratings: strongly connected users are likely to by other users share the same viewpoints and hence rate ges sinl- larly, but weakly connected users or a recipient the diffusion of M'c M, the additional as already read messages diverse opinions and introduce heterogeneity in the messag of context and complete- ratings, especially for political messages [13 ness a new message m will functions Contert(Gt, u, Um, Cm, M, t)and Credibility model: We use this insight to hypothesize Completeness(Gt, u, Um, Cm, M, t), using the Bayesian use- that users will have different propensities to agree with other fulness model learned for u 10, 11. The credibility of m can
Spatial dissemination a. Contextualization Messages b. Completeness c. Contextualization of complete information Temporal evolution Figure 1: Message evolution model Number of users Figure 2: Mass-distribution of strong and weak ties weak ties. Some other users are “hub users”, having few strong ties but many weak ties. We noticed that these behavioral characteristics of users also translate into how much contextual or complete information the users prefer to receive from their social network. Measures for context and completeness were used to build a Bayesian network model that learns user preferences for the usefulness they derive from contextual and complete information [11]. The Orkut dataset was used to validate the usefulness model, showing that social network information can be used to predict the usefulness that people may associate with each message. 2.2 Credibility The phenomenon of homophily also enables us to develop a model for credibility. Evaluating the credibility of a blogger or an individual message is a complicated problem. Bloggers tend to be opinionated and different people judge the credibilities of bloggers differently according to their own contexts. Furthermore, the credibilities are likely to be topic specific; an expert in some area may not be an expert in another area. As a norm, we can expect that whenever users annotate a message positively or negatively as in websites like digg.com, they will always do so based on their inherent tastes and biases. However, consideration of the underlying social network of the users can help make sense of seemingly random ratings: strongly connected users are likely to share the same viewpoints and hence rate messages similarly, but weakly connected users may cause the diffusion of diverse opinions and introduce heterogeneity in the message ratings, especially for political messages [13]. Credibility model: We use this insight to hypothesize that users will have different propensities to agree with other members of their cluster, or with the general public opinion, or be completely self-reliant in their own beliefs [19]. This was used to develop an Eigenvector based method to differentiate between different kinds of credibilities: contextual credibility based on ratings given by users in the same topic specific cluster; experienced credibility based only on ratings given by a specific user; and public credibility based on all available ratings. These Eigenvectors were used to learn the characteristics of a user towards judging the credibilities of contextual and complete information, again by using a Bayesian network. The credibility model was validated on a dataset obtained from a web-crawl of digg.com. 3. RECOMMENDATION PROCESS The sociological models can be used to develop a process for generating message recommendations. We provide a detailed description later in Section 4. 1. We assume that the social network structure is given in advance as a directed graph G(U, E), where users are represented as nodes U with edges E between users who are friends or know each other. 2. We assume that a list of topics T is given in advance, and a boolean value for each (user u ∈ U, topic t ∈ T) pair indicates whether or not the user is interested in the topic. The induced subgraph Gt formed by users who are interested in some topic t is called the topic specific social network for that topic. 3. We assume that a clustering algorithm exists to find clusters in topic specific social networks such that users within each cluster have strong links between them, and users in different clusters are connected with weak links. 4. Now, when a new event occurs related to topic t, many messages M about the event will be created. For a message m ∈ M, the active environment of the message is defined as the set of participant users who write comments or replies to the message. Um refers to the participant users, and Cm refers to their contributions. Rm are the ratings given to m by other users. 5. For a recipient user u who has already read messages M0 ⊂ M, the additional amount of context and completeness a new message m will provide to u can be computed as functions Context(Gt, u, Um, Cm, M0 , t) and Completeness(Gt, u, Um, Cm, M0 , t), using the Bayesian usefulness model learned for u [10,11]. The credibility of m can
1. Users embedded in a social network 6. Cliet 2. I/O servers: These are responsible to download 个●:D一一 content from the data sources. Since the I/O servers bandwidth bound, they should be geographically distri New messages to download content from sources close to them 3. Topic categorizers: The downloaded content needs e analyzed to categorize it into various topics. Document lassification schemes such as Latent Semantic Indexing, or fast matching of tags and keywords with pre-defined topics may prove useful here 15. The topic categorization servers are likely to be CPu bound; they can be collocated with 2 10 servers the IO servers to minimize any further download of content Metadata about new messages, including information about Figure 3: System architecture the message author and topic, is then pushed to the data 4. Distributed data structure: Inspired by the centralized ap- also be evaluated based on ratings by other users through oach followed by current Internet systems, we expect most a function Credibility (Gt, u, Um, Rm, t) using the Bayesian data to be stored in large clusters of servers in data centers credibility model learned for u [19]. Together, these models This includes the topic specific social network graphs Ge eventually produce a probabilistic answer of whether or not messages M in circulation in the system, their current active o recommend m to the user. This can be used to rank-order environments Um with metadata about various components or drop new messages to be pushed to the user. m,and the credibility ratings Rm given to the messages attributed to its foundations in sociological theory. Ihe t. By by various users. Other than storage, the second task is This recommendation process has many useful points, chie that of computation. We are able to split the calculation ory helps us develop models to personalize message rec for the Contert(), Completeness(-), and Credibility(.) mendation for users, because it allows us to identify functions mentioned earlier into generic components that tures behind the root causes for user preferences. The model are computable in the data centers, and user-specific compo driven approach also allows us to separate the learning pro- nents that are computable in the client application described erences in a privacy-friendly manner without relying upon are appended to the metadata for each message centralized agency to aggregate personal user data. This The design of efficient data structures to store large so- gives us a number of advantages over other styles of produc. cial network graphs, such that clustering and compt ing recommendations, such as collaborative iltering (CF) operations on these graphs can be executed efficie 21]. First, CF requires all data to be reported to a cen- an open problem. Relevant techniques include the tralized agency that tries to compute correlations between of structure indices[ 16 using virtual coordinates and land different pairs of users. This is not privacy-friendly and mark methods, that can be used to optimize the compu presents scalability challenges. Second, CF only observes tation of metrics such as shortest paths, edge betweenness correlations between users without trying to understand the and centralities in large social network graphs reasons behind the correlations. This does not capture con- 5. Reflectors: Whenever a new message enters the system extual information which could, for example, explain why he message and its associated metadata has to be pushed user seem similar to another user at certain times and out to various client applications. This can cause significant not at others. Some CF systems do try to get around these network overhead in message exchanges if a central message oblems: [23] proposes a decentralized system, and [22] registry is maintained. We therefore design a network over- takes higher order correlations between users into account. lay to provide scalability in message updates as follows However, our process can produce just as good recommenda- A reflector instance exists for each cluster in a topic spe- tions in more desirable ways [11], ensuring scalability, credi- cific social network, and hosts an RSS feed for messages rele- bility, privacy-friendliness, and a strong grounding in media vant to users of that cluster. These feeds carry the message theory which helps us explain the behavior of the system metadata and are read by the client application. Super sociological terms. We next describe the architecture nodes are elected to act as reflectors from among the client implement this recommendation proces applications for users belonging to different clusters. When- ever information about a new message is received and up- 4. SYSTEM ARCHITECTURE dated in the centralized data structure. a few reflectors are notified about the message. The initial choice of reflectors Primarily inspired by Cobra, the blog and Rss-feed aggr ation system proposed in [14, Fig. 3 shows the propose may include those corresponding to the cluster of the mes- architecture of the system. We next describe the various age author, and the clusters adjacent to that. Subsequently components and their associated open problems. Later, we operating between reflectors will spread message updates to focus on one specific component and present a few results other reflectors. This helps to propagate updates about new to illustrate the plausibility of our approach messages to relevant reflectors without causing significant 1. Data sources: These are news or blogging websites that network overhead in push or pull of message updates from ost content. They typically provide RSS feeds, or can a central message registry. In Section 5, we describe the ustomized to "ping"aggregation services, such as our sys- design of an efficient topology for the reflector overlay, such em, whenever new content is published that even naive content based routing algorithms operating
New messages by users RSS crawls from news websites 2. IO servers 3. Topic categorizers 4. Data center with large data structure Msg updates 5. Reflectors RSS feeds 6. Client applications 1. Users embedded in a social network Figure 3: System architecture also be evaluated based on ratings by other users through a function Credibility(Gt, u, Um, Rm, t) using the Bayesian credibility model learned for u [19]. Together, these models eventually produce a probabilistic answer of whether or not to recommend m to the user. This can be used to rank-order or drop new messages to be pushed to the user. This recommendation process has many useful points, chiefly attributed to its foundations in sociological theory. The theory helps us develop models to personalize message recommendation for users, because it allows us to identify features behind the root causes for user preferences. The model driven approach also allows us to separate the learning process individually for each user, and hence learn the user preferences in a privacy-friendly manner without relying upon a centralized agency to aggregate personal user data. This gives us a number of advantages over other styles of producing recommendations, such as collaborative filtering (CF) [21]. First, CF requires all data to be reported to a centralized agency that tries to compute correlations between different pairs of users. This is not privacy-friendly and presents scalability challenges. Second, CF only observes correlations between users without trying to understand the reasons behind the correlations. This does not capture contextual information which could, for example, explain why a user may seem similar to another user at certain times and not at others. Some CF systems do try to get around these problems: [23] proposes a decentralized system, and [22] takes higher order correlations between users into account. However, our process can produce just as good recommendations in more desirable ways [11], ensuring scalability, credibility, privacy-friendliness, and a strong grounding in media theory which helps us explain the behavior of the system in sociological terms. We next describe the architecture to implement this recommendation process. 4. SYSTEM ARCHITECTURE Primarily inspired by Cobra, the blog and RSS-feed aggregation system proposed in [14], Fig. 3 shows the proposed architecture of the system. We next describe the various components and their associated open problems. Later, we focus on one specific component and present a few results to illustrate the plausibility of our approach. 1. Data sources: These are news or blogging websites that host content. They typically provide RSS feeds, or can be customized to “ping” aggregation services, such as our system, whenever new content is published. 2. I/O servers: These are responsible to download new content from the data sources. Since the I/O servers will be bandwidth bound, they should be geographically distributed to download content from sources close to them. 3. Topic categorizers: The downloaded content needs to be analyzed to categorize it into various topics. Document classification schemes such as Latent Semantic Indexing, or fast matching of tags and keywords with pre-defined topics may prove useful here [15]. The topic categorization servers are likely to be CPU bound; they can be collocated with the IO servers to minimize any further download of content. Metadata about new messages, including information about the message author and topic, is then pushed to the data center described next. 4. Distributed data structure: Inspired by the centralized approach followed by current Internet systems, we expect most data to be stored in large clusters of servers in data centers. This includes the topic specific social network graphs Gt, messages M in circulation in the system, their current active environments Um with metadata about various components Cm, and the credibility ratings Rm given to the messages by various users. Other than storage, the second task is that of computation. We are able to split the calculation for the Context(..), Completeness(..), and Credibility(..) functions mentioned earlier into generic components that are computable in the data centers, and user-specific components that are computable in the client application described later. The generic components computed in the data center are appended to the metadata for each message. The design of efficient data structures to store large social network graphs, such that clustering and computation operations on these graphs can be executed efficiently, is an open problem. Relevant techniques include the design of structure indices [16] using virtual coordinates and landmark methods, that can be used to optimize the computation of metrics such as shortest paths, edge betweenness, and centralities in large social network graphs. 5. Reflectors: Whenever a new message enters the system, the message and its associated metadata has to be pushed out to various client applications. This can cause significant network overhead in message exchanges if a central message registry is maintained. We therefore design a network overlay to provide scalability in message updates as follows. A reflector instance exists for each cluster in a topic specific social network, and hosts an RSS feed for messages relevant to users of that cluster. These feeds carry the message metadata and are read by the client application. Supernodes are elected to act as reflectors from among the client applications for users belonging to different clusters. Whenever information about a new message is received and updated in the centralized data structure, a few reflectors are notified about the message. The initial choice of reflectors may include those corresponding to the cluster of the message author, and the clusters adjacent to that. Subsequently, based on ratings given to the message, routing algorithms operating between reflectors will spread message updates to other reflectors. This helps to propagate updates about new messages to relevant reflectors without causing significant network overhead in push or pull of message updates from a central message registry. In Section 5, we describe the design of an efficient topology for the reflector overlay, such that even naive content based routing algorithms operating
on the overlay will not cause a substantial downgrading in the scalability of message updates 6. Client application: A client application runs on user SiAgl tolis dataset 3-4- devices such as laptops and mobile phones. The applica tion periodically downloads Rss feeds from the reflector corresponding to the cluster of the user, and monitors the user's browsing history to learn the usefulness and credibil- ity Bayesian models. When updates about new messages are eceived, the user-specific components of the Contert(.) Completeness(), and Credibility(.) functions are computed 8268 0.2 and the models are used to predict the probabilities that the user may find some messages to be useful. Messages expected to be useful are then pushed to the user. Contin ous connectivity with the Internet is not required, making the applications ideal to run in a delay-tolerant fashion on mobile phones. Privacy is ensured because the models are Figure 4: Refector node separation learned only at the client side. This can also make it valid to monitor user activities such as the time a user spends on a message, to tune the models. In prior work [10, we crawled the Orkut website to extract This design addresses all the challenges enumerated ear. a large social network graph of users, and their membership lier. Scalability to millions of new messages each day is in various communities of interests. similar communities nsured by distributing the functionality of crawling, topic were clustered together to yield coarse grained topic inter- ategorization, and statistics maintenance across different ests of users. For the purposes of this paper, we extracted servers. User level personalization for hundreds of millions three subgraphs of different sizes(10000, 17000, and 23000 f users is provided by partitioning the computation and users) from our original dataset, clustered the topic specific personalization functions into centralized and user-specific graphs in each dataset, and constructed a reflector overlay as computations. Network overhead for notifications of new follows. Every cluster in a topic specific graph corresponds messages to many users is avoided by constructing a reflec- to a node in the reflector overlay. An edge is added between tor overlay for efficient routing of message updates. Lastly, two nodes if they have a user in common, or there is a pair rivacy is ensured because user preferences do not have to of users, one in each cluster, having a social network tie reported back to the central servers between them. This represents the intuition that users are aware of their own topic interests(of course) and the topic 5. REFLECTOR OVERLAY TOPOLOGY interests of their friends therefore. an edge in the reflector overlay connects nodes having common users or friends re- Based on the ratings given by users, we expect to eventu spectively. The three overlays constructed in this manner lly assemble similarity matrices for context and complete- consisted of (1899, 4580, and 6593) nodes, with(92, 165 ess of topic specific clusters. These matrices will tell us 301) topics respectively hich clusters of users are contextually similar, and which With the additional assumption of homophily, [12 sug- clusters provide completeness to each other. Dimensionality gested that such topologies are likely to have the navigability reduction using Principal Component Analysis can be used property that messages can be routed to a desired destina to output high dimensional identity vectors for each cluster, tion using a simple greedy algorithm that requires routing such that users of clusters close to each other in this high di- state of size o(degree of node) per node, and routes m mensional space will benefit from exchanging messages with sages with stringent bounds on the route length. However, each other. Since each cluster is represented as a reflector we found that topic popularity in our datasets followed a instance,our goals are to find an overlay topology for reflec- power-law, whereas the model described in [12] assumed a tor connectivity, and a routing algorithms on this overlay, uniform distribution of topic popularity. This resulted in such that message updates can be propagated efficiently disjoint subgraphs for nodes of the same topic in the over- We plan to develop an interest-gradient based routing al- lay, which meant that greedy routing algorithms would not gorithm, similar to [17, 18. Client applications will advertise work on our dataset. To examine the practicality of routing the preferences of their users towards contextual or complete Igorithms requiring more routing state, we therefore exam- information to the corresponding reflectors. The reflectors ined the average node separation and maximum amount of will advertise these preferences to other reflectors a few hops routing state that needs to be maintained at each node away in the reflector graph, so that interest -gradients are Fig. 4 shows the distribution of average length of shortest formed and relevant message updates can get forwarded to hs between reflector nodes of the same topic in the differ- the appropriate reflectors ent datasets. The CDFs to the right consider each topic spe- We next present a few results from our analysis of a plausi- cific network individually, which can be sparse and may even ble topology for the reflector overlay. Primarily inspired by contain disjoint subgraphs. The separation between nodes the explanation of the navigability property of social net that did not have a path to each other was assumed to be works [12, we felt that constructing an overlay based on the maximum diameter in the respective dataset. The CDFs social network relationships may turn out to be efficient. to the left consider the entire graph which allows paths to his was indeed the case, demonstrating yet another useft hop across different topics. As expected, the node sepa- property of social network graphs. ration improves considerably; the average separation is A 3, with 80% of the topics having a separation less than 4
on the overlay will not cause a substantial downgrading in the scalability of message updates. 6. Client application: A client application runs on user devices such as laptops and mobile phones. The application periodically downloads RSS feeds from the reflector corresponding to the cluster of the user, and monitors the user’s browsing history to learn the usefulness and credibility Bayesian models. When updates about new messages are received, the user-specific components of the Context(..), Completeness(..), and Credibility(..) functions are computed, and the models are used to predict the probabilities that the user may find some messages to be useful. Messages expected to be useful are then pushed to the user. Continuous connectivity with the Internet is not required, making the applications ideal to run in a delay-tolerant fashion on mobile phones. Privacy is ensured because the models are learned only at the client side. This can also make it valid to monitor user activities such as the time a user spends on a message, to tune the models. This design addresses all the challenges enumerated earlier. Scalability to millions of new messages each day is ensured by distributing the functionality of crawling, topic categorization, and statistics maintenance across different servers. User level personalization for hundreds of millions of users is provided by partitioning the computation and personalization functions into centralized and user-specific computations. Network overhead for notifications of new messages to many users is avoided by constructing a reflector overlay for efficient routing of message updates. Lastly, privacy is ensured because user preferences do not have to be reported back to the central servers. 5. REFLECTOR OVERLAY TOPOLOGY Based on the ratings given by users, we expect to eventually assemble similarity matrices for context and completeness of topic specific clusters. These matrices will tell us which clusters of users are contextually similar, and which clusters provide completeness to each other. Dimensionality reduction using Principal Component Analysis can be used to output high dimensional identity vectors for each cluster, such that users of clusters close to each other in this high dimensional space will benefit from exchanging messages with each other. Since each cluster is represented as a reflector instance, our goals are to find an overlay topology for reflector connectivity, and a routing algorithms on this overlay, such that message updates can be propagated efficiently. We plan to develop an interest-gradient based routing algorithm, similar to [17,18]. Client applications will advertise the preferences of their users towards contextual or complete information to the corresponding reflectors. The reflectors will advertise these preferences to other reflectors a few hops away in the reflector graph, so that interest-gradients are formed and relevant message updates can get forwarded to the appropriate reflectors. We next present a few results from our analysis of a plausible topology for the reflector overlay. Primarily inspired by the explanation of the navigability property of social networks [12], we felt that constructing an overlay based on social network relationships may turn out to be efficient. This was indeed the case, demonstrating yet another useful property of social network graphs. 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 Fraction of topics having node separation < X Node separation All topics dataset 1 Single topic dataset 1 All topics dataset 2 Single topic dataset 2 All topics dataset 3 Single topic dataset 3 Figure 4: Reflector node separation In prior work [10], we crawled the Orkut website to extract a large social network graph of users, and their membership in various communities of interests. Similar communities were clustered together to yield coarse grained topic interests of users. For the purposes of this paper, we extracted three subgraphs of different sizes (10000, 17000, and 23000 users) from our original dataset, clustered the topic specific graphs in each dataset, and constructed a reflector overlay as follows. Every cluster in a topic specific graph corresponds to a node in the reflector overlay. An edge is added between two nodes if they have a user in common, or there is a pair of users, one in each cluster, having a social network tie between them. This represents the intuition that users are aware of their own topic interests (of course) and the topic interests of their friends; therefore, an edge in the reflector overlay connects nodes having common users or friends respectively. The three overlays constructed in this manner consisted of (1899, 4580, and 6593) nodes, with (92, 165, and 301) topics respectively. With the additional assumption of homophily, [12] suggested that such topologies are likely to have the navigability property that messages can be routed to a desired destination using a simple greedy algorithm that requires routing state of size O(degree of node) per node, and routes messages with stringent bounds on the route length. However, we found that topic popularity in our datasets followed a power-law, whereas the model described in [12] assumed a uniform distribution of topic popularity. This resulted in disjoint subgraphs for nodes of the same topic in the overlay, which meant that greedy routing algorithms would not work on our dataset. To examine the practicality of routing algorithms requiring more routing state, we therefore examined the average node separation and maximum amount of routing state that needs to be maintained at each node. Fig. 4 shows the distribution of average length of shortest paths between reflector nodes of the same topic in the different datasets. The CDFs to the right consider each topic specific network individually, which can be sparse and may even contain disjoint subgraphs. The separation between nodes that did not have a path to each other was assumed to be the maximum diameter in the respective dataset. The CDFs to the left consider the entire graph which allows paths to hop across different topics. As expected, the node separation improves considerably; the average separation is ∼ 3, with 80% of the topics having a separation less than 4
ematical models to characterize such graph properties. For now, given the small node separations and routing state, we safely conclude that the proposed topology is feasible to use 6. CONCLUSIONS We have proposed the design of a personalized and scal- able recommender system for participatory media content using social network theory. Other methods such as CF 21- 23 are only able to replicate average user behavior, our approach uses sociological theory to understand t textual reasons behind a user's preferences, and can hence Lower, path lengt produce closer-fitted recommendations. Although many open 10000 problems remain to be solved, we are hopeful that our method- Routing state: Number of shortest paths through a node ology of integrating elements of information science and soci- ology will improve the state-of-the-art in recommender sys- Figure 5: Upper and lower bounds for routing state tems for participatory media. 7. REFERENCES What is more interesting, however, is that the distributions remain consistent across the three datasets, indicating that []D.sify,“ The State of the Live web” node separation is independent of the number of users. A http://www.sifry.com/alerts/archives/000493.html,April2007. [2 M. Lindeman, et al, " Googlearchy: How a Few Heavily Linked possible explanation is that of topic locality, as noticed for Sites Dominate politics on the web. P2P networks [20]. This observation is encouraging because kt/mpsa03. pdf, 2003. it means that nodes need not flood their interests very far 3S. Fo Search Engine Bias, " PNAS. Vol 103, No 34, 200 into the topic node graph, and hence not cause a significant [4 K. Maglaughlin and D. Sonnenwald, "User Perspectives increase in the size of the routing tables of other nodes We next computed the lower and upper bounds for dis- tribution of the routing state per node. We did this for 5]J. Bryant and D. Zillm edia Effects: Advances in Theory rence Erlbaum Associates USA. 2002 the worst-case where all nodes of a topic are assumed to be [6 M. Granovetter, "The Strength of Weak Ties, "American contextually similar or provide completeness to each other ournal of Sociology, Vol. 78, No. 6, 1973 nodes of the topic. Further, we assumed a naive routing algorithm that establishes at least one route for every pair [8X. Zhu and S Gauch, "Incorporating Quality Metrics in f nodes per topic, and can hence build overlapping routes eidteraweb sistrbu geoinformation Retrieval on the World We find the upper bound by first computing the shortest [9] J Kleinberg, "The Small-World Phenomenon: An Algorithmic paths dii between all pairs of nodes; then, for each node [10lA.seth,“Unc Using Social we compute the number of pairs of 1-hop, 2-hop, etc neig etworks, "Technical Report. CS-2007-47, U of Waterloo, 2007 bors(y, z)of the same topic such that duz dur+drz for dyz E (2.6]. This gives us an upper bound of the maxi- mum number of shortest paths the routing algorithm may [12 D. Watts, P. Dodds, and M. Newman, "Identity and Search in establish between nodes of the same topic that can possibly Social Networks. Science. Vol. 296. No. 5571. 2002. pass through some other given node. For the lower bound [13 B. Baybeck and R. Huckfeldt, "Urban Contexts, Spatially for a given ordering of choosing nodes 1, 2..., we cour (y, a only the first time it occurs as a shortest path pass Information, "Political Geography, Vol. 21, 2002 [14 I. Rose, et al, "Cobra: Content-based Filtering and ing through some node i. After repeated iterations with Aggregation of Blogs and RSS Feeds, USENIX NSDI, 2007 different orderings of choosing nodes r1, x2, the mean dis- [15] F. Fabret,et nd Implementation for tribution of the number of shortest paths across different orderings gives the lower bound distribution [16」M. Rattigan,et Approximation of Network Properties, "ACM SIGKDD, 2006 Fig. 5 shows the distribution for the largest dataset. The [17]C routing state for d=(1,5, 6) was insignificant. It can be unication Paradigm seen that the majority of nodes need state 500 for d=2 for Sensor Networks, ACM MOBICOM. 2000 he state increases for d= 3, 4 but the lower bounds remain [18] A. Carzaniga, cheme for Content-Based Networking, " Infocom, 2004 below 1000, while the upper bounds remain below 4000 for [19 A. Seth, "Design of a Recommender System for Participatory the majority. The routing state was also found to remain changed with the number of users, giving more evidence [20 K. Sripanidkulchai, B. Maggs, and H. Zhang, "Efficient of topic locality in the reflector overlay. Content Location Using Interest-Based Locality in We next analyzed the overlays consisting only of nodes [ 21A. Das. cation: Scalable Online for popular topics with more than 200 users, which repre collaborative Filtering, www, 2007 sented a 20% cutoff from the maximum topic size. The 22X. Song, B. Tseng, C. Lin, and M. Sun,"Personalized routing state was again found to remain unchanged, which Recommendation Driven by Information Flow. "SIGIR, 2006. 23 J. Yang, et Epidemic-based P2P Recommender dicates that popular topics ensure connectivity, while un- System, "SIGIR Workshop on Large Scale Distributed popular topics "hang"off ular topics. This is an Systems, 2007 interesting observation; in the future we plan to build math-
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 100 1000 10000 Fraction of nodes having routing state < X Routing state: Number of shortest paths through a node Upper, path length=2 Lower, path length=2 Upper, path length=3 Lower, path length=3 Upper, path length=4 Lower, path length=4 Figure 5: Upper and lower bounds for routing state What is more interesting, however, is that the distributions remain consistent across the three datasets, indicating that node separation is independent of the number of users. A possible explanation is that of topic locality, as noticed for P2P networks [20]. This observation is encouraging because it means that nodes need not flood their interests very far into the topic node graph, and hence not cause a significant increase in the size of the routing tables of other nodes. We next computed the lower and upper bounds for distribution of the routing state per node. We did this for the worst-case where all nodes of a topic are assumed to be contextually similar or provide completeness to each other, and hence messages should be routable between all pairs of nodes of the topic. Further, we assumed a naive routing algorithm that establishes at least one route for every pair of nodes per topic, and can hence build overlapping routes. We find the upper bound by first computing the shortest paths dij between all pairs of nodes; then, for each node x, we compute the number of pairs of 1-hop, 2-hop, etc neighbors (y, z) of the same topic such that dyz = dyx + dxz for dyz ∈ {2..6}. This gives us an upper bound of the maximum number of shortest paths the routing algorithm may establish between nodes of the same topic that can possibly pass through some other given node. For the lower bound, for a given ordering of choosing nodes x1, x2..., we count (y, z) only the first time it occurs as a shortest path passing through some node xi. After repeated iterations with different orderings of choosing nodes x1, x2..., the mean distribution of the number of shortest paths across different orderings gives the lower bound distribution. Fig. 5 shows the distribution for the largest dataset. The routing state for d = {1,5,6} was insignificant. It can be seen that the majority of nodes need state < 500 for d=2. The state increases for d={3,4} but the lower bounds remain below 1000, while the upper bounds remain below 4000 for the majority. The routing state was also found to remain unchanged with the number of users, giving more evidence of topic locality in the reflector overlay. We next analyzed the overlays consisting only of nodes for popular topics with more than 200 users, which represented a 20% cutoff from the maximum topic size. The routing state was again found to remain unchanged, which indicates that popular topics ensure connectivity, while unpopular topics “hang” off the popular topics. This is an interesting observation; in the future we plan to build mathematical models to characterize such graph properties. For now, given the small node separations and routing state, we safely conclude that the proposed topology is feasible to use. 6. CONCLUSIONS We have proposed the design of a personalized and scalable recommender system for participatory media content using social network theory. Other methods such as CF [21– 23] are only able to replicate average user behavior, whereas our approach uses sociological theory to understand the contextual reasons behind a user’s preferences, and can hence produce closer-fitted recommendations. Although many open problems remain to be solved, we are hopeful that our methodology of integrating elements of information science and sociology will improve the state-of-the-art in recommender systems for participatory media. 7. REFERENCES [1] D. Sifry, “The State of the Live Web,” http://www.sifry.com/alerts/archives/000493.html, April 2007. [2] M. Hindeman, et al, “Googlearchy: How a Few Heavily Linked Sites Dominate Politics on the Web,” http://www.cs.princeton.edu/ kt/mpsa03.pdf, 2003. [3] S. Fortunato, et al, “Topical Interests and the Mitigation of Search Engine Bias,” PNAS, Vol. 103, No. 34, 2006. [4] K. Maglaughlin and D. Sonnenwald, “User Perspectives on Relevance Criteria,” American Society for Information Science and Technology, Vol. 53, No. 5, 2002. [5] J. Bryant and D. Zillman, “Media Effects: Advances in Theory and Research,” Lawrence Erlbaum Associates, USA, 2002. [6] M. Granovetter, “The Strength of Weak Ties,” American Journal of Sociology, Vol. 78, No. 6, 1973. [7] M. McPherson, et al, “Birds of a Feather: Homophily in Social Networks,” Annual Review of Sociology, Vol. 27, 2001. [8] X. Zhu and S. Gauch, “Incorporating Quality Metrics in Centralized/Distributed Information Retrieval on the World Wide Web,” SIGIR, 2000. [9] J. Kleinberg, “The Small-World Phenomenon: An Algorithmic Perspective,” STOC, 2000. [10] A. Seth, “Understanding Participatory Media Using Social Networks,” Technical Report, CS-2007-47, U of Waterloo, 2007. [11] A. Seth and J. Zhang, “A Social Network Based Approach to Personalized Recommendation of Participatory Media Content,” ICWSM, 2008. [12] D. Watts, P. Dodds, and M. Newman, “Identity and Search in Social Networks,” Science, Vol. 296, No. 5571, 2002. [13] B. Baybeck and R. Huckfeldt, “Urban Contexts, Spatially Dispersed Networks, and the Diffusion of Political Information,” Political Geography, Vol. 21, 2002. [14] I. Rose, et al, “Cobra: Content-based Filtering and Aggregation of Blogs and RSS Feeds,” USENIX NSDI, 2007. [15] F. Fabret, et al, “Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Systems,” ACM SIGMOD, 2001. [16] M. Rattigan, et al, “Using Structure Indices for Efficient Approximation of Network Properties,” ACM SIGKDD, 2006. [17] C. Intanagonwiwat, R. Govindan and D. Estrin, “Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks,” ACM MOBICOM, 2000. [18] A. Carzaniga, M. Rutherford, and A. Wolf, “A Routing Scheme for Content-Based Networking,” Infocom, 2004. [19] A. Seth, “Design of a Recommender System for Participatory Media Content,” Manuscript, University of Waterloo, 2008. [20] K. Sripanidkulchai, B. Maggs, and H. Zhang, “Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems,” Infocom, 2003. [21] A. Das, et al, “Google News Personalization: Scalable Online Collaborative Filtering,” WWW, 2007. [22] X. Song, B. Tseng, C. Lin, and M. Sun, “Personalized Recommendation Driven by Information Flow,” SIGIR, 2006. [23] J. Yang, et al, “An Epidemic-based P2P Recommender System,” SIGIR Workshop on Large Scale Distributed Systems, 2007