正在加载图片...
M. Y.H. Al-Shamri, KK Bharadwaj/ Expert Systems with Applications 35 (2008)1386-1399 Depending on the content and the amount of information tance function gives another way to compute similarity, about the user, which is stored in the user profile, a user can which takes into account multiple features(Uiin Bent- be modeled(Froschl, 2005: Koch, 2000). To be precise, a ley, 2004) user model is a representation of the knowledge and per- sonal characteristics, which the system believes that a user possesses. The profiling information can be elicited from demo- graphical data(e. g, users age, gender, occupation, etc. ) Here xi is the jth feature for the common item Si, N is the user preferences about features of the items(e.g, movie number of features, and ==Sxyl, the cardinality of sxy. title, genre, director, year of release, leading actors, etc. ) Note that a vector of features represents each user there and user ratings on experienced items(e.g, previously seen fore it is written bold in formula(2) ovies)(Massa Avesani, 2004). To build a user model two questions have to be taken into account, what infor- 2.1.4. Making recommendations mation has to be represented in the model and how this In this phase, RS assign a predicted rating to all the information is effectively represented? The first question items seen by the neighborhood set and not by the active is general for all applications while the second is applica- user. The predicted rating, pra.j, indicates the expected tion-dependent(Koch, 2000). An effective, tailored recom- interestingness of the item s; to the user wa, is usually com- mendation depends on the underlying user model, which is puted as an aggregate of the ratings of user's(ua) neighbor in turn used for similarity computations. A representative hood set for the same item user model would appropriately reflect user's tastes, prefer ences and need Most previous work on user model construction relies where C denotes the set of neighbors who have rated item nly on explicit ratings(Breese et al., 1998: Goldberg S;. The most widely used aggregation function is the et al., 1992; Resnick et al., 1994; Shardanand Maes, weighted sum(Adomavicius Tuzhilin, 2005: Breese 1995). However, in real life, the way in which two people et al., 1998; Vozalis Margaritis, 2003 ), which is called are said to be similar is not based solely on whether they also Resnick's prediction formula(Resnick et al., 1994) have close opinions on a specific subject, e.g., movie rat ings, but also on other factors, such as their background pra, =ma+k>d(a,c)x(c-me) (4) and personal details. Therefore, issues such as age, gender, and preferences of movie genres should also be taken into The multiplier k serves as a normalizing factor and is usu- account (Ujin Bentley, 2004) ally selected as k=1/>eld(a, c) and me is the average 2. 1.3. Neighborhood set selection Once user models have been established, the system can match the active user to the available database and a set of neighbors for him needs to be formed and ranked accord ing to a suitable distance function. The size of the neigh- 3. A novel hybrid user model borhood set could be fixed by selecting the top K users or could be variable by selecting the users whose similarity The construction of user models is a key task- the sys value is above a certain threshold(Breese et al., 1998: Voz- tems success will depend to a large extent on the models to alis Margaritis, 2003). Various functions have been used represent the users' actual interests. a pure CF user profile to compute the distance, d(a, c), between users ua and ue in( Goldberg et al., 1992; Resnick et al., 1994)consists of a CF recommenders. The most popular function for mem- vector of items with their ratings continuously augmented ory-based CF is the Pearson correlation coefficient as the user interacts with the system over time. This huge (Resnick et al, 1994), where the distance between two users amount of data needs a very large space and a long pro- is based only on the ratings both users have declared. The cessing time. At query time, searching the entire database Pearson correlation coefficient is given by to find the best set of neighbors is computationally very (rxs -,)(rvs-m, expensive. Eventually, the user will leave the Web site corr(x, y) (1) before the processing completes On the other hand, the actual user preferences cannot always be captured by explicit ratings, some content where Sxy is the set of items rated by both users ux and uy. descriptions of items are required. This problem gets solved Let us call the rs, which uses formula (1) for similarity by hybrid filtering. However, most current hybridization computations as Pearson RS(PRS) methods(Burke, 2002)construct two separate profiles Clearly, formula(1)is not appropriate if other features and implement the online process for each filtering tech are included in the model because it considers only the nique separately. Finally, some merger to give the final common items for both users. The modified Euclidean dis- result is used. What if we construct a model according toDepending on the content and the amount of information about the user, which is stored in the user profile, a user can be modeled (Froschl, 2005; Koch, 2000). To be precise, a user model is a representation of the knowledge and per￾sonal characteristics, which the system believes that a user possesses. The profiling information can be elicited from demo￾graphical data (e.g., user’s age, gender, occupation, etc.), user preferences about features of the items (e.g., movie title, genre, director, year of release, leading actors, etc.), and user ratings on experienced items (e.g., previously seen movies) (Massa & Avesani, 2004). To build a user model two questions have to be taken into account, what infor￾mation has to be represented in the model and how this information is effectively represented? The first question is general for all applications while the second is applica￾tion-dependent (Koch, 2000). An effective, tailored recom￾mendation depends on the underlying user model, which is in turn used for similarity computations. A representative user model would appropriately reflect user’s tastes, prefer￾ences, and needs. Most previous work on user model construction relies only on explicit ratings (Breese et al., 1998; Goldberg et al., 1992; Resnick et al., 1994; Shardanand & Maes, 1995). However, in real life, the way in which two people are said to be similar is not based solely on whether they have close opinions on a specific subject, e.g., movie rat￾ings, but also on other factors, such as their background and personal details. Therefore, issues such as age, gender, and preferences of movie genres should also be taken into account (Ujjin & Bentley, 2004). 2.1.3. Neighborhood set selection Once user models have been established, the system can match the active user to the available database and a set of neighbors for him needs to be formed and ranked accord￾ing to a suitable distance function. The size of the neigh￾borhood set could be fixed by selecting the top K users or could be variable by selecting the users whose similarity value is above a certain threshold (Breese et al., 1998; Voz￾alis & Margaritis, 2003). Various functions have been used to compute the distance, d(a, c), between users ua and uc in CF recommenders. The most popular function for mem￾ory-based CF is the Pearson correlation coefficient (Resnick et al., 1994), where the distance between two users is based only on the ratings both users have declared. The Pearson correlation coefficient is given by corrðx; yÞ ¼ P s2Sxy ðrx;s mxÞðry;s my Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P s2Sxy ðrx;s mxÞ 2P s2Sxy ðry;s my Þ 2 q ; ð1Þ where Sxy is the set of items rated by both users ux and uy. Let us call the RS, which uses formula (1) for similarity computations as Pearson RS (PRS). Clearly, formula (1) is not appropriate if other features are included in the model because it considers only the common items for both users. The modified Euclidean dis￾tance function gives another way to compute similarity, which takes into account multiple features (Ujjin & Bent￾ley, 2004) dðx; yÞ ¼ 1 z Xz i¼1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN j¼1 ðxi;j yi;jÞ 2 vuut : ð2Þ Here xi,j is the jth feature for the common item si, N is the number of features, and z = jSxyj, the cardinality of Sxy. Note that a vector of features represents each user there￾fore it is written bold in formula (2). 2.1.4. Making recommendations In this phase, RS assign a predicted rating to all the items seen by the neighborhood set and not by the active user. The predicted rating, pra,j, indicates the expected interestingness of the item sj to the user ua, is usually com￾puted as an aggregate of the ratings of user’s (ua) neighbor￾hood set for the same item sj pra;j ¼ aggruc2C rc;j; ð3Þ where C denotes the set of neighbors who have rated item sj. The most widely used aggregation function is the weighted sum (Adomavicius & Tuzhilin, 2005; Breese et al., 1998; Vozalis & Margaritis, 2003), which is called also Resnick’s prediction formula (Resnick et al., 1994) pra;j ¼ ma þ k X uc2C dða; cÞðrc;j mcÞ: ð4Þ The multiplier k serves as a normalizing factor and is usu￾ally selected as k ¼ 1= P uc2Cjdða; cÞj and mc is the average rating of user uc mc ¼ 1 jScj X s2Sc rc;s: ð5Þ 3. A novel hybrid user model The construction of user models is a key task – the sys￾tem’s success will depend to a large extent on the models to represent the users’ actual interests. A pure CF user profile (Goldberg et al., 1992; Resnick et al., 1994) consists of a vector of items with their ratings continuously augmented as the user interacts with the system over time. This huge amount of data needs a very large space and a long pro￾cessing time. At query time, searching the entire database to find the best set of neighbors is computationally very expensive. Eventually, the user will leave the Web site before the processing completes. On the other hand, the actual user preferences cannot always be captured by explicit ratings, some content descriptions of items are required. This problem gets solved by hybrid filtering. However, most current hybridization methods (Burke, 2002) construct two separate profiles and implement the online process for each filtering tech￾nique separately. Finally, some merger to give the final result is used. What if we construct a model according to M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399 1389
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有