ARTICLE N PRESS Knowledge-Based Systems xxx(2011)XXX-XXX Contents lists available at SciVerse Science Direct Knowledge-Based Systems ELSEVIER journalhomepagewww.elsevier.com/locate/knosys Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization Xin Luo", Yunni Xia, Qingsheng Zhu College of Computer Science, Chongqing Universiry, Chongqing 400044, China ARTICLE INFO A BSTRACT The Matrix-Factorization( MF) based models have become popular when building Collaborative Filtering 6 March 2011 (CF)recommenders, due to the high accuracy and scalability. However, most of the current MF based revised form 7 September 2011 9 September 2011 models are batch models that are incapable of being incrementally updated; while in real world applica Available online xxxx tions users always enjoy receiving quick responses from the system once they have made feedbacks In this work, we aim to design an incremental cf recommender based on the regularized matrix Factoriza- tion(RMF). To achieve this objective, we first simplify the training rule of RMF to propose the SI-RMF nder system which provides a simple mathematic form for further investigation; whereby we design two Incremental RMF models, respectively are the Incremental RMF (IRMF)and the Incremental RMF with linear biases (IRMF-B) The experiments on two large, real datasets suggest positive results, which prove the efficiency Matrix Factorization our e 2011 Elsevier B V. All rights reserved. Incremental learning 1 Introduction ences, and then makes recommendations according to the personal tastes. Compared with CB. CF requires no domain knowledge; in In this era of information, online consumption is indispensable addition, CF offers the potential to uncover patterns that would s daily life. Millions of commodities are provided online be difficult or impossible to profile using CB techniques. Both of by numerous web retailers, who are suffering from a serious prob- these characteristics make real world applications usually rely on lem: how to help their potential customers find their personal CF recommenders 8-13]. favorites? Personalized recommender systems, a specified cate- nside a CF recommender, the user preferences (either explicit gory of knowledge based systems which can provide people with ratings or implicit behaviors such as clicks )on involved items are suggestions according to individual interests, have emerged to ad- quantized into a user-item rating matrix, where high ratings suggest dress this concern since the early 1990s [ 1-6. With recommender strong preferences. So the problem of CF can be regarded as the miss systems, the web merchants can make their online stores more hu- ing data estimation, where the main task is to estimate the unknown man-oriented by offering personalized recommendations which user-item pairs of the rating matrix based on the known entries. can greatly enhance the user experience. According to a recent sur- According to the recent progress in Cf techniques, current CF algo- vey, this thriving subfield of data mining is becoming more and rithms are primarily based on two models: the Neighborhood Based more important to e-commerce providers]. Model (NBM)[ 14-18 and the Latent Factor Model (LFM)[19-27). Generally speaking, there are two kinds of subjects within a NBM constructs the relationships between users or items to build recommender system: the users who will benefit from the the neighborhoods of corresponding entities, and makes predictions recommendation service and the items which the system is going based on the known ratings by the active users neighbors, or those n the active items neighbors. Different from NBM, LFM transforms ecommender systems fall into two groups: the Content Based both items and users into the same latent factor space, characterizes (CB)and the Collaborative Filtering(CF)[5.CB profiles items based each entity with a feature vector inferred from the existing ratings, their contents, and then associates users with content-matching and makes predictions for unknown ratings using the inner products products by comparing the corresponding profiles. On the other of the corresponding vector pairs. Compared with NBM, LFM offers a plates users 'behaviors to model individual prefer- higher expressive ability to describe aspects of data, usually along with higher accuracy; LFM has when designing CF recommenders. Corresponding author. Tel. +86 13996169379 Luo). xiayunni@yahoo con (Y. Xia). qszhuecqueducn(Q. Zhu) trix Factorization (MF). The earliest work of this kind is proposed by 0950-7051/S-see front matter o 2011 Elsevier B.V. All rights reserved. doi:10.1016/ knosys.2011.09006 Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization Xin Luo ⇑ , Yunni Xia, Qingsheng Zhu College of Computer Science, Chongqing University, Chongqing 400044, China article info Article history: Received 16 March 2011 Received in revised form 7 September 2011 Accepted 9 September 2011 Available online xxxx Keywords: Recommender system Collaborative Filtering Latent Factor Model Matrix Factorization Regularized Incremental learning abstract The Matrix-Factorization (MF) based models have become popular when building Collaborative Filtering (CF) recommenders, due to the high accuracy and scalability. However, most of the current MF based models are batch models that are incapable of being incrementally updated; while in real world applications users always enjoy receiving quick responses from the system once they have made feedbacks. In this work, we aim to design an incremental CF recommender based on the Regularized Matrix Factorization (RMF). To achieve this objective, we first simplify the training rule of RMF to propose the SI-RMF, which provides a simple mathematic form for further investigation; whereby we design two Incremental RMF models, respectively are the Incremental RMF (IRMF) and the Incremental RMF with linear biases (IRMF-B). The experiments on two large, real datasets suggest positive results, which prove the efficiency of our strategy. 2011 Elsevier B.V. All rights reserved. 1. Introduction In this era of information, online consumption is indispensable to people’s daily life. Millions of commodities are provided online by numerous web retailers, who are suffering from a serious problem: how to help their potential customers find their personal favorites? Personalized recommender systems, a specified category of knowledge based systems which can provide people with suggestions according to individual interests, have emerged to address this concern since the early 1990s [1–6]. With recommender systems, the web merchants can make their online stores more human-oriented by offering personalized recommendations which can greatly enhance the user experience. According to a recent survey, this thriving subfield of data mining is becoming more and more important to e-commerce providers [7]. Generally speaking, there are two kinds of subjects within a recommender system: the users who will benefit from the recommendation service and the items which the system is going to recommend. According to pioneering research, online recommender systems fall into two groups: the Content Based (CB) and the Collaborative Filtering (CF) [5]. CB profiles items based on their contents, and then associates users with content-matching products by comparing the corresponding profiles. On the other hand, CF accumulates users’ behaviors to model individual preferences, and then makes recommendations according to the personal tastes. Compared with CB, CF requires no domain knowledge; in addition, CF offers the potential to uncover patterns that would be difficult or impossible to profile using CB techniques. Both of these characteristics make real world applications usually rely on CF recommenders [8–13]. Inside a CF recommender, the user preferences (either explicit ratings or implicit behaviors such as clicks) on involved items are quantized into a user-item rating matrix, where high ratings suggest strong preferences. So the problem of CF can be regarded as the missing data estimation, where the main task is to estimate the unknown user-item pairs of the rating matrix based on the known entries. According to the recent progress in CF techniques, current CF algorithms are primarily based on two models: the Neighborhood Based Model (NBM) [14–18] and the Latent Factor Model (LFM) [19–27]. NBM constructs the relationships between users or items to build the neighborhoods of corresponding entities, and makes predictions based on the known ratings by the active user’s neighbors, or those on the active item’s neighbors. Different from NBM, LFM transforms both items and users into the same latent factor space, characterizes each entity with a feature vector inferred from the existing ratings, and makes predictions for unknown ratings using the inner products of the corresponding vector pairs. Compared with NBM, LFM offers a higher expressive ability to describe various aspects of data, usually along with higher accuracy; LFM has consequently become popular when designing CF recommenders. One most successful kind of approaches to LFM is based on Matrix Factorization (MF). The earliest work of this kind is proposed by 0950-7051/$ - see front matter 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2011.09.006 ⇑ Corresponding author. Tel.: +86 13996169379. E-mail addresses: luoxin21@cqu.edu.cn (X. Luo), xiayunni@yahoo.com.cn (Y. Xia), qszhu@cqu.edu.cn (Q. Zhu). Knowledge-Based Systems xxx (2011) xxx–xxx Contents lists available at SciVerse ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X Luo et al/ Knowledge-Based Systems xxx(2011)xx-xx Sarwar et al. employing the Singular Value Decomposition(SVD) avoid overfitting, it is natural that the data of V cannot be used dur- [19 More recently, several MF techniques have been successfully ing the creation of this recommender as VnT=o applied to implementing LFM, including the Probabilistic Latent During the NetFlix Prize, one of the most popular approaches for Semantic Analysis by Hofmann [ 20, the Maximum Margin Matrix solving the CF problem is based on Matrix Factorization. This ctorization by Srebro et al. [21, and the Expectation Maximiza- should be primarily credited to Webb's publication of the rMF tion for Matrix Factorization by Kuzucz et al. [22]. During the Netf- model [23 Similar to the Svd based approach proposed by Sarwar Prize, Brandyn Webb published the Regularized Matrix et al. [19. the goal of rMf is also to construct a low rank approx Factorization [23(RMF, which is based on Gorrell et al.s works imation of R. However, different from traditional SvD techniques. on Generalized Hebbian Learning 28, 29 under the pseudonym Si- the RMF model can deal with matrices with missing values. As de- mon Funk, which is accurate highly scalable and easy to imple scribed in [23, let f denote the dimension of the feature space, ent. Inspired by RMF. many researchers have implemented and PE Rmx denote the user feature matrix where each row represents further investigated MF based approaches. Paterek [24 Takacs et a specified user, and Q ER/xm denote the item feature matrix al. [25], Salakhutdinov et al. [26 and Koren 27 all have proposed where each column represents a specified item(naturally f< m phisticated MF based on the CF recommenders, which possess andf n), then the approximation of the rating by user u on item high recommendation performance i could be confined to the inner product of the corresponding user Although the aforementioned LFM-based recommenders have item feature vector pair given by high scalability and accuracy. they all require a batch-training pr cess based on static training datasets. However, in real-world Tui puq applications, this pre-condition can be rarely satisfied. In modern Thus the task concentrates on appropriately estimating the val- e-commerce systems, user feedbacks vary at every millisecond, ues of parameters in P and Q In RMF, the parameter training is per constantly being generated. One strategy to tackle this rapid data formed by applying the stochastic gradient decent (SgD)to the expansion is to totally rebuild the model if the data growth exceeds approximation error, which is denoted by the regularized squared a pre-defined threshold; however, this strategy will lead to a huge error(RSE)on T: rapid response to user feedback. In fact, the desirable way to cope RSE=2( q1)+A(pull+l1). rith such situations is to enable the recommenders to learn incre- uIET mentally through updating the current model in accordance with where ll denotes the standard Euclidean norm. Note that the bulk the newly arriving data. behind the parameter i is the Tikhonov regularizing term for avoid- In this work we focus on implementing an incremental CF rec- ing overfitting. Then the local minimum of the rSe by Eq (2)can be ommender based on MF techniques. To achieve this objective, we reached by employing the SGD solver, which loops through all rat intend to investigate the training process of mf based recom- ings in T and modifies the involved parameters by moving in the mender, and incrementally update the trained parameters accord- opposite direction of the gradient for each training example ing to the newly arriving data. The main contributions of this There are two methods for updating the feature matrices when ork include learning from each training example. One is to initialize each feature in the feature matrices with a constant value, and train a single fea- The sequence independent Regularized Matrix Factorization ture based on the rating residuals of already trained features for each SI-RMF)model, an RMF variant with the removal of sensitivity training example. The other is to initialize each entry in P and Q with to the input sequence of training examples, and a simple math random values uniformly chosen from a pre-defined scale, and up- ematic form for making incremental updates. date the involved latent features simultaneously at each training The Incremental RMF (IRMF), which can incrementally learn example Through empirical studies[25]. Takacs et al. find that these from new examples without retraining the entire model. two attitudes can lead to approximate prediction accuracy: how RMF with linear biases(IRMF-B), an incremental CF recom- ever, the latter can provide a faster convergence rate For each train- mender which incorporates the linear biases for providing mo ing example, when using the latter method to update the involved accurate predictions than IRMf while maintaining the incre- latent feature vectors, the updating rule is formulated by mental learning ability. Empirical studies on two large, real datasets. (P,Q)= arg min rsE→ ∫RSE=-2(m-P29+21p & RSE=-2(Tui-Puqi)Pu+ 2iqi Section 2 gives the preliminaries. Then Section 3 describes our pu←pa+n(ra-pq)1-ip) methods. The empirical validations are given in Section 4. Finally. Section 5 concludes q←q+n(Ta-p2q)u-/q where n denotes the learning rate. As a standard stochastic gradient 2. Preliminaries solver, the system will train all involved feature factors with Eq. 3) on the given training instances sequentially. This training process and a user set U, then users' preferences on items could be denoted epoch. It is common that several training epochs are needed till y a Ul x Il matrix R where each row vector denotes a specified the model converges: and the epoch number mainly depends on er, each column vector denotes a specified item and each enti the values of n and 7, as well as the size of the training dataset. Also ruie r denotes the user u,s preference on item i, usually with high according to Takacs et al.[25, an early stopping criterion is needed alues indicating strong preferences(note that the entries in r for avoiding overfitting could depend on either explicit user preferences like ratings, or im- Due to its accuracy, efficiency and ease of implementation, RMF plicit user preferences like clicks). Let Rx denote the set consisting of is implemented and further investigated by many researchers 24- known entries in R. Given a finite set of ratings T C Rx as the training 27]. Several variants of this model have been proposed, among set, the objective is to construct a heuristic estimator, so-called rec- which a very successful one is the rmf with linear biases(hereaf- mender, which can estimate the unknown ratings with minima ter referred to as RMF-B)proposed by Paterek [24], and Koren [27] accumulative error based on known ratings in T. The accumulative RMF-B incorporates linear biases into the prediction rule(1)in or- error of the recommender is evaluated on a validation set V Rk to der to decompose the observed rating into several component Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Base yst.(2011).doi:10.1016/ knosys2011.09.006
Sarwar et al. employing the Singular Value Decomposition (SVD) [19]. More recently, several MF techniques have been successfully applied to implementing LFM, including the Probabilistic Latent Semantic Analysis by Hofmann [20], the Maximum Margin Matrix Factorization by Srebro et al. [21], and the Expectation Maximization for Matrix Factorization by Kuzucz et al. [22]. During the Netflix Prize, Brandyn Webb published the Regularized Matrix Factorization [23] (RMF, which is based on Gorrell et al.’s works on Generalized Hebbian Learning [28,29]) under the pseudonym Simon Funk, which is accurate, highly scalable and easy to implement. Inspired by RMF, many researchers have implemented and further investigated MF based approaches. Paterek [24], Takács et al. [25], Salakhutdinov et al. [26] and Koren [27] all have proposed sophisticated MF based on the CF recommenders, which possess high recommendation performance. Although the aforementioned LFM-based recommenders have high scalability and accuracy, they all require a batch-training process based on static training datasets. However, in real-world applications, this pre-condition can be rarely satisfied. In modern e-commerce systems, user feedbacks vary at every millisecond, constantly being generated. One strategy to tackle this rapid data expansion is to totally rebuild the model if the data growth exceeds a pre-defined threshold; however, this strategy will lead to a huge amount of repetitive training overhead, whilst it still cannot make rapid response to user feedback. In fact, the desirable way to cope with such situations is to enable the recommenders to learn incrementally through updating the current model in accordance with the newly arriving data. In this work we focus on implementing an incremental CF recommender based on MF techniques. To achieve this objective, we intend to investigate the training process of MF based recommender, and incrementally update the trained parameters according to the newly arriving data. The main contributions of this work include: – The sequence independent Regularized Matrix Factorization (SI-RMF) model, an RMF variant with the removal of sensitivity to the input sequence of training examples, and a simple mathematic form for making incremental updates. – The Incremental RMF (IRMF), which can incrementally learn from new examples without retraining the entire model. – IRMF with linear biases (IRMF-B), an incremental CF recommender which incorporates the linear biases for providing more accurate predictions than IRMF while maintaining the incremental learning ability. – Empirical studies on two large, real datasets. Section 2 gives the preliminaries. Then Section 3 describes our methods. The empirical validations are given in Section 4. Finally, Section 5 concludes. 2. Preliminaries The problem of CF can be defined as follows: given an item set I and a user set U, then users’ preferences on items could be denoted by a jUjjIj matrix R where each row vector denotes a specified user, each column vector denotes a specified item and each entry rui 2 R denotes the user u’s preference on item i, usually with high values indicating strong preferences (note that the entries in R could depend on either explicit user preferences like ratings, or implicit user preferences like clicks). Let RK denote the set consisting of known entries in R. Given a finite set of ratings T RK as the training set, the objective is to construct a heuristic estimator, so-called recommender, which can estimate the unknown ratings with minimal accumulative error based on known ratings in T. The accumulative error of the recommender is evaluated on a validation set V RK; to avoid overfitting, it is natural that the data of V cannot be used during the creation of this recommender as V \ T = /. During the NetFlix Prize, one of the most popular approaches for solving the CF problem is based on Matrix Factorization. This should be primarily credited to Webb’s publication of the RMF model [23]. Similar to the SVD based approach proposed by Sarwar et al. [19], the goal of RMF is also to construct a low rank approximation of R. However, different from traditional SVD techniques, the RMF model can deal with matrices with missing values. As described in [23], let f denote the dimension of the feature space, P 2 Rmf denote the user feature matrix where each row represents a specified user, and Q 2 Rfn denote the item feature matrix where each column represents a specified item (naturally f m and f n), then the approximation of the rating by user u on item i could be confined to the inner product of the corresponding useritem feature vector pair given by: ^rui ¼ puqi: ð1Þ Thus the task concentrates on appropriately estimating the values of parameters in P and Q. In RMF, the parameter training is performed by applying the stochastic gradient decent (SGD) to the approximation error, which is denoted by the regularized squared error (RSE) on T: RSE ¼ X u;i2T rui pT uqi þ kðkpuk 2 þ kqik2 Þ; ð2Þ where kk denotes the standard Euclidean norm. Note that the bulk behind the parameter k is the Tikhonov regularizing term for avoiding overfitting. Then the local minimum of the RSE by Eq. (2) can be reached by employing the SGD solver, which loops through all ratings in T and modifies the involved parameters by moving in the opposite direction of the gradient for each training example. There are two methods for updating the feature matrices when learning from each training example. One is to initialize each feature in the feature matrices with a constant value, and train a single feature based on the rating residuals of already trained features for each training example. The other is to initialize each entry in P and Q with random values uniformly chosen from a pre-defined scale, and update the involved latent features simultaneously at each training example. Through empirical studies [25], Takacs et al. find that these two attitudes can lead to approximate prediction accuracy; however, the latter can provide a faster convergence rate. For each training example, when using the latter method to update the involved latent feature vectors, the updating rule is formulated by: ðP; QÞ ¼ arg min p;q RSE ) @ @pu RSE ¼ 2ðrui puqiÞqi þ 2kpu @ @qi RSE ¼ 2ðrui puqiÞpu þ 2kqi ( ) pu pu þ gððrui puqiÞqi kpuÞ qi qi þ gððrui puqiÞpu kqiÞ ; ð3Þ where g denotes the learning rate. As a standard stochastic gradient solver, the system will train all involved feature factors with Eq. (3) on the given training instances sequentially. This training process would be repeated for several times, each time is called a training epoch. It is common that several training epochs are needed till the model converges; and the epoch number mainly depends on the values of g and k, as well as the size of the training dataset. Also, according to Takács et al. [25], an early stopping criterion is needed for avoiding overfitting. Due to its accuracy, efficiency and ease of implementation, RMF is implemented and further investigated by many researchers [24– 27]. Several variants of this model have been proposed, among which a very successful one is the RMF with linear biases (hereafter referred to as RMF-B) proposed by Paterek [24], and Koren [27]. RMF-B incorporates linear biases into the prediction rule (1) in order to decompose the observed rating into several components, 2 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS h of which explains only the relevant part of a signal. Paterek iterative learning method, where during each training epoch the integrates two linear biases, bu and bi, which respectively denote training examples are learned sequentially. Let r(u)denote the rat- the observed biases of user u and item i [24 In addition, Koren ing set of a specified user u, then each training instance in R(u)can suggests incorporating the global average u for specifying the sta- be orderly marked as ru. l, ru.2,..., uk, . rux (here we only mark the tistical characteristics of the training data [27]. Thus, the prediction known ratings by user u, so the subscript k denotes the kth item rule(1)turns into ated by user u, but not the kth item in the rating matrix)where Tui=u+bu+ bi+puqi K=R(u). Let pl denote the initial value of pu at the beginning (4) of one training epoch, and pa denote the value of Pu after the train- turns into ing example TuI is learned, by applying feature updating rule( 3). RSE=∑u--b0-b-93+(6+的2+p+n Pa=Po)+n((rui-proqi where the linear biases bu and bi are trained along with the latent features pu and qi: and the global average u is estimated by the ob- (1-na)p(o)+n(ru.1-poqh,)a1 served average of the training dataset since it captures the statistic where h, denotes the current status of qr.Let characteristic of the known data The aforementioned RMF based models have been proved to be highly accurate and scalable. However, since they are batch learning Then Eq(6)is transformed into: methods, they naturally require batch training processes based on static training datasets. The item and user sets are supposed to be pl=cp)+n(rui-pio qf )afb stationary, and once the rating data expands to a certain extent, iscover new patterns from the new data is to totally Similarly, we can infer the situation from ru2 to Tux as follows: hatural that the training data are gradually accumulated, whilst p2=pd+n(u2-plq22)922 i users always expect to receive quick feedback once they evaluate a p3)=p(2)+n(ru3-P2)9 3 3)93 specified item. In such situation, recommenders being able to learn from the newly arriving data incrementally are necessary Incremental recommendation is an attractive topic in the area of Pu)=cP(-1)+/(Tux-plK-1qkx )ak k) recommender systems, and there are several relevant pioneering By combining(8)and(9), we obtain the expression of the train- works. Papagelis et al. study incremental CF in the context of sim larity-based K-Nearest-Neighborhood(KNN) model, and propose ing result of pu after one training epoch: an incremental user-oriented KNN based recommender 130). Mir- pl )=cpa+ck-in(ru -poq( )qf)ck-2in(ruz-po q2a)92) KNN, and propose the incremental item-oriented KNN model [31. The empirical studies in [31 show that these two models where qh a) denotes the temporal state of qk when the system is going can achieve similar accuracy, while the latter provides better scala- to learn the kth example rated by user u. Similarly, let R()denote the bility due to the relatively small number of items. Nonetheless, rating set on item i and mark each training instance in r(i) as r1.T2. based KNN models can be easily outperformed by global-optimistic b..,,h.i,..., THj where H= R(i)I(here we only mark the known rat- ings on item i, so the subscript h denotes the hth user rating on menders based on the state-of-the-art models. Agarwal et al. pro item i, but not the hth user in the rating matrix), we can then infer pose a powerful incremental recommender named FOBFM based the expression of the training result of q after one training epoch as: ch9+“(-)+“(-r2P)r unavailable under many circumstances like in the Neflix Prize Since mF based recommenders are highly accurate and scalable, +…+(m-rq“1)e (11) it is expectable to implement incremental recommenders based By combining(10)and(11), we locate the task of incremental MF techniques. There are also some pioneering works referring to learning in finding the expression of pl+) and oH+l) based on this issue. Sawar et al. propose the incremental SVD based recom- pi) and g( h) respectively. However, Eqs. (10)and(11)are too com- mender based on the fold-in technique [33 Takacs et aL. [25], and plicated for implementing the incremental update, in the following update by fixing the current model and training the latent factors of new users/items on arrival of corresponding feedbacks. However, The final outputs are sensitive to the input sequence of the current MF based incremental recommenders can only update the examples. Since c<1, the impact of the early inputted example ding del roughly by including the new items/users will be gradually weakened as the training process is going the already trained latent factors. In this work, we aim at designing an MF based incremental recommender, which can not only deal The variance of a latent feature learnt from the current example is not only dependent on its initial value, but also on the tem with the new users/items, but also update the already trained fea tures. In the following section, we will discuss our method in detail. ing sample; e.g. the value of p() depends on the value of gth) 3. Method Note that usually it will take several training epochs to reach the model converge; and the training results of the current epoch First of all, we focus on designing an incremental CF recom- would be used as the initial value of the subsequent epoch. So, it mender based on the original RMF model From the prediction rule would be intractable to trace the variance of the features during (1), RSE (2)and feature updating rule(3), we see that RMF is an the training process based on(10)and(11) Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
each of which explains only the relevant part of a signal. Paterek integrates two linear biases, bu and bi, which respectively denote the observed biases of user u and item i [24]. In addition, Koren suggests incorporating the global average l for specifying the statistical characteristics of the training data [27]. Thus, the prediction rule (1) turns into: ^rui ¼ l þ bu þ bi þ puqi ; ð4Þ and the RSE (2) turns into: RSE¼X u;i2T ðrui lbu bi puqiÞ 2 þk b2 u þb2 i þ kpuk2 þ kqik2 ; ð5Þ where the linear biases bu and bi are trained along with the latent features pu and qi; and the global average l is estimated by the observed average of the training dataset since it captures the statistic characteristic of the known data. The aforementioned RMF based models have been proved to be highly accurate and scalable. However, since they are batch learning methods, they naturally require batch training processes based on static training datasets. The item and user sets are supposed to be stationary, and once the rating data expands to a certain extent, the only way to discover new patterns from the new data is to totally rebuild the recommender. However, in real-world applications, it is natural that the training data are gradually accumulated, whilst users always expect to receive quick feedback once they evaluate a specified item. In such situation, recommenders being able to learn from the newly arriving data incrementally are necessary. Incremental recommendation is an attractive topic in the area of recommender systems, and there are several relevant pioneering works. Papagelis et al. study incremental CF in the context of similarity-based K-Nearest-Neighborhood (KNN) model, and propose an incremental user-oriented KNN based recommender [30]. Miranda et al. further investigate this incremental user-oriented KNN, and propose the incremental item-oriented KNN model [31]. The empirical studies in [31] show that these two models can achieve similar accuracy, while the latter provides better scalability due to the relatively small number of items. Nonetheless, based on the progress in CF area during the Netflix Prize, similarity based KNN models can be easily outperformed by global-optimistic methods like RMF, which calls for designing incremental recommenders based on the state-of-the-art models. Agarwal et al. propose a powerful incremental recommender named FOBFM based on bilinear factors [32], yet their model requires the extra information about users/items such as the demographic data, which are unavailable under many circumstances like in the Neflix Prize. Since MF based recommenders are highly accurate and scalable, it is expectable to implement incremental recommenders based on MF techniques. There are also some pioneering works referring to this issue. Sawar et al. propose the incremental SVD based recommender based on the fold-in technique [33]; Takács et al. [25], and Rendle and Schmidt-Thieme [34], suggest implementing the online update by fixing the current model and training the latent factors of new users/items on arrival of corresponding feedbacks. However, current MF based incremental recommenders can only update the model roughly by including the new items/users but not updating the already trained latent factors. In this work, we aim at designing an MF based incremental recommender, which can not only deal with the new users/items, but also update the already trained features. In the following section, we will discuss our method in detail. 3. Method First of all, we focus on designing an incremental CF recommender based on the original RMF model. From the prediction rule (1), RSE (2) and feature updating rule (3), we see that RMF is an iterative learning method, where during each training epoch the training examples are learned sequentially. Let R(u) denote the rating set of a specified user u, then each training instance in R(u) can be orderly marked as ru,1,ru,2,...,ru,k,...,ru,K (here we only mark the known ratings by user u, so the subscript k denotes the kth item rated by user u, but not the kth item in the rating matrix) where K = jR(u)j. Let pð0Þ u denote the initial value of pu at the beginning of one training epoch, and pð1Þ u denote the value of pu after the training example ru,1 is learned, by applying feature updating rule (3), we get: pð1Þ u ¼ pð0Þ u þ g ru;1 pð0Þ u qðh1Þ 1 qðh1Þ 1 kp0 u ¼ ð1 gkÞpð0Þ u þ gðru;1 pð0Þ u qðh1Þ 1 Þqðh1Þ 1 ; ð6Þ where h1 denotes the current status of q1. Let c ¼ 1 gk; ð7Þ Then Eq. (6) is transformed into: pð1Þ u ¼ cpð0Þ u þ g ru;1 pð0Þ u qðh1Þ 1 qðh1Þ 1 : ð8Þ Similarly, we can infer the situation from ru,2 to ru,K as follows: pð2Þ u ¼ cpð1Þ u þ gðru;2 pð1Þ u qðh2Þ 2 Þqðh2Þ 2 ; pð3Þ u ¼ cpð2Þ u þ gðru;3 pð2Þ u qðh3Þ 3 Þqðh3Þ 3 ; ; pðKÞ u ¼ cpðK1Þ u þ gðru;K pðK1Þ u qðhK Þ K ÞqðhK Þ K : ð9Þ By combining (8) and (9), we obtain the expression of the training result of pu after one training epoch: pðKÞ u ¼ cK pð0Þ u þ cK1g ru;1 pð0Þ u qðh1Þ 1 qðh1Þ 1 þ cK2g ru;2 pð1Þ u qðh2Þ 2 qðh2Þ 2 þþ gðru;K pðK1Þ u qðhK Þ K ÞqðhK Þ K ; ð10Þ where qðhkÞ k denotes the temporal state of qk when the system is going to learn the kth example rated by user u. Similarly, let R(i) denote the rating set on item i and mark each training instance in R(i) as r1,i,r2,- i,...,rh,i,...,rH,i where H = jR(i)j (here we only mark the known ratings on item i, so the subscript h denotes the hth user rating on item i, but not the hth user in the rating matrix), we can then infer the expression of the training result of qi after one training epoch as: qðHÞ i ¼ cHqð0Þ i þ cH1g r1;i pðk1Þ 1 qð0Þ i pðk1Þ 1 þ cH2g r2;i pðk2Þ 2 qð1Þ i pðk2Þ 2 þþ g rH;i pðkHÞ H qðH1Þ i pðkHÞ H : ð11Þ By combining (10) and (11), we locate the task of incremental learning in finding the expression of pðKþ1Þ u and qðHþ1Þ i based on pðKÞ u and qðHÞ i respectively. However, Eqs. (10) and (11) are too complicated for implementing the incremental update, in the following two aspects: – The final outputs are sensitive to the input sequence of the examples. Since c < 1, the impact of the early inputted examples will be gradually weakened as the training process is going on. – The variance of a latent feature learnt from the current example is not only dependent on its initial value, but also on the temporal state of the counter latent factor relevant to the active training sample; e.g., the value of pðKÞ u depends on the value of qðhk Þ k . Note that usually it will take several training epochs to reach the model converge; and the training results of the current epoch would be used as the initial value of the subsequent epoch. So, it would be intractable to trace the variance of the features during the training process based on (10) and (11). X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx 3 Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X Luo et aL/ Knowledge-Based Systems xxx(2011)xxxXx Nonetheless, the complexity of the training process is mainly where Hk denotes the size of the rating set on the kth item rated by caused by the rmf models dependence on the input sequence of user u. Similarly, we can obtain the training results of qi after each xamples. Thus, we can simplify Eqs.(10) and(11) straightfor- training epoch as follows: wardly by removing this sequence dependence. A common way to realize the sequence-independence is randomizing the ler of the training instances [35]: however, this approach q-cq9+Mn∑(n-pq9)p simplify the training process since it will not alternate the nature of RMF. Thus, we intend to eliminate the sequence depen- (17) dence of RMF model in a different way. +pHll 'n)o(H)n(Kn) 3. 1 Sequence independent RMF le start with rewriting(10)and(11)as where kh denotes the size of the rating set by the hth user who rates item l 'qi)(u)+.+(rux-p(K-Dakh)) For implementing an incremental model, we can construct q=cq)+B, B=c-n(r1i-pk q()p()++n(TH-PkHq)pHa pu and gh+ a depending on the current trained results pl ' and Note that a and b are both expressed in an iterative form, where form, where q in each training epoch, and iteratively repeat this process to fi- the results having been trained so far would be used as the initial nally reach the desired parameters pu t' and value on the successive example. So by assuming that each training examples are learned simultaneously, we can transform the designing the incremental update rule of user feature vector pu by analyzing the training process in de A'=akn(TuI-p1o91 )90+.+dxn(/ux-Peoak k 3.2.1.Constructing pu+)on based on puo) B=Sun(ru-Pa 1 )Po)+.+BH/(THi-Pqi8 )ps Given that the batch training has been completed, which means, for user u, the feature vector p( has been successfully con- where ak and BH, which depend on the sizes of R(u )and r() denoted structed; and the interim training results pi (1(-r8) By combining(12)-(14), we can obtain: p)=Cp+A,A=甲 (rut-poqk )a to p1)=点"9+xn∑(-pq)q +B,B=H(=GE(MH-PIOq1 )Ps “()述(m erefer to the model employing the feature updating rule(15)as the sequence independent RMF(SI-RMF)model. In practice we find =ck+(p 2-cp(o)+ck+p(o)+ox+1 (Tux+1-p( q u)qrl that the removal of sequence sensitivity can make the model slightly Including the newly arriving example. more accurate than the primitive RMe, alongside having much sim- pler mathematic form for incremental update. However, the change of the training rule will also lead to the restriction on the learning Note that pi), plo)and g, are cached interim values: since K is rate-the SI-RMF model can only converge when the value of n is rel- atively small. We will further discuss this issue in Section 4. known, so k+1 and ak can be inferred from(14); c is the constant given in(7). As we marked, Eq(18)can be divided into two parts 3. 2. Incremental rme the first part rebalances the trained parameters, and the second part Based on the feature updating rule(15). for a user feature vector can obtain nk+l) by making incremental update on plf? Ch(18)we includes pu, by denoting the training result after each training epoch wit From(16)we see that the training result of one training epoch (ruk-Po q ko )ato. will be used as the initial value of the subsequent one. So when constructing pik +l)based ve must take the difference be. (b) tween p() and P(+1) into account. Let AP, =P(k+1)-P(,then p=cp+3n∑(rx-p28q") pu +l)can be given by Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Base yst.(2011).doi:10.1016/ knosys2011.09.006
Nonetheless, the complexity of the training process is mainly caused by the RMF model’s dependence on the input sequence of examples. Thus, we can simplify Eqs. (10) and (11) straightforwardly by removing this sequence dependence. A common way to realize the sequence–independence is randomizing the input order of the training instances [35]; however, this approach cannot simplify the training process since it will not alternate the iterative nature of RMF. Thus, we intend to eliminate the sequence dependence of RMF model in a different way. 3.1. Sequence independent RMF We start with rewriting (10) and (11) as follows: pðKÞ u ¼cK pð0Þ u þA; A¼cK1gðru;1 pð0Þ u qðh1Þ 1 Þqðh1Þ 1 þþgðru;K pðK1Þ u qðhK Þ K ÞqðhK Þ K ; qðHÞ i ¼cHqð0Þ i þB; B¼cH1gðr1;i pðk1Þ 1 qð0Þ i Þpðk1Þ 1 þþgðrH;i pðkH Þ H qðHÞ i ÞpðkH Þ H : ð12Þ Note that A and B are both expressed in an iterative form, where the results having been trained so far would be used as the initial value on the successive example. So by assuming that each training examples are learned simultaneously, we can transform the expressions of A and B into: A0 ¼ aK g ru;1 pð0Þ u qð0Þ 1 qð0Þ 1 þþ aK g ru;K pð0Þ u qð0Þ K qð0Þ K ; B0 ¼ bHg r1;i pð0Þ u qð0Þ 1 pð0Þ u þþ bHg rH;i pð0Þ u qð0Þ H pð0Þ u ; ð13Þ where aK and bH, which depend on the sizes of R(u) and R(i) denoted by K and H respectively, are parameters for averaging the importance of each training example. In order to ensure that both A0 and B0 maintain a consistent magnitude to that of A and B, we choose aK and bH by averaging the sum of the parameters related to c in (12), formulated by: aK ¼ XK1 k¼0 ck , K ¼ 1 cK Kð1 cÞ ; bH ¼ XH1 h¼0 ch , H ¼ 1 cH Hð1 cÞ : ð14Þ By combining (12)–(14), we can obtain: pðKÞ u ¼ cK pð0Þ u þ A0 ; A0 ¼ g 1 cK Kð1 cÞ XK k¼1 ruk pð0Þ u qð0Þ k qð0Þ k ; qðHÞ i ¼ cHqð0Þ i þ B0 ; B0 ¼ g 1 cH Hð1 cÞ XH h¼1 rhi pð0Þ h qð0Þ i pð0Þ h : ð15Þ We refer to themodel employing the feature updating rule (15) as the sequence independent RMF (SI-RMF) model. In practice we find that the removal of sequence sensitivity can make the model slightly more accurate than the primitive RMF, alongside having much simpler mathematic form for incremental update. However, the change of the training rule will also lead to the restriction on the learning rate – the SI-RMF model can only converge when the value of g is relatively small. We will further discuss this issue in Section 4. 3.2. Incremental RMF Based on the feature updating rule (15), for a user feature vector pu, by denoting the training result after each training epoch with pðKÞ u ðdÞ , we can get: pðKÞ u ð1Þ ¼ cK pð0Þ u þ aK g XK k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k ; pðKÞ u ðDÞ ¼ cK pðKÞ u ðD1Þ þaK g XK k¼1 ru;k pðKÞ u ðD1Þ qðHkÞ k ðD1Þ ! qðHkÞ k ðD1Þ : ð16Þ where Hk denotes the size of the rating set on the kth item rated by user u. Similarly, we can obtain the training results of qi after each training epoch as follows: qðHÞ i ð1Þ ¼ cHqð0Þ i þ bHg XH h¼1 rh;i pð0Þ h qð0Þ i pð0Þ h ; qðHÞ i ðDÞ ¼ cH qðHÞ i ðD1Þ þbHg XH h¼1 rh;i pðKhÞ h ðD1Þ qðHÞ i ðD1Þ ! pðKhÞ h ðD1Þ : ð17Þ where Kh denotes the size of the rating set by the hth user who rates item i. For implementing an incremental model, we can construct pðKþ1Þ u ðdÞ and qðHþ1Þ i ðdÞ depending on the current trained results pðKÞ u ðdÞ and qðHÞ i ðdÞ in each training epoch, and iteratively repeat this process to fi- nally reach the desired parameters pðKþ1Þ u ðDÞ and qðHþ1Þ i ðDÞ . We begin by designing the incremental update rule of user feature vector pu by analyzing the training process in detail. 3.2.1. Constructing pðKþ1Þ u ð1Þ based on pðKÞ u ð1Þ Given that the batch training has been completed, which means, for user u, the feature vector pðKÞ u ðDÞ has been successfully constructed; and the interim training results pðKÞ u ðdÞ ð1 6 d 6 K 1Þ have been cached. Then, once a new example related to user u, marked ru,K+1, has arrived, by deducing the batch training result of pðKþ1Þ u ð1Þ from (16), we can build pðKþ1Þ u ð1Þ with pðKÞ u ð1Þ as follows: pðKÞ u ð1Þ ¼cK pð0Þ u þaK g XK k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k ; pðKþ1Þ u ð1Þ ¼cKþ1pð0Þ u þaKþ1g XKþ1 k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k ¼cKþ1pð0Þ u þaKþ1g XK k¼1 ru;k pð0Þ u qð0Þ k qð0Þ k þ ru;Kþ1 pð0Þ u qð0Þ Kþ1 qð0Þ Kþ1 " # ¼aKþ1 aK ðpðKÞ u ð1Þ cK pð0Þ u ÞþcKþ1pð0Þ u |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Rebalancing the trained parameter: þaKþ1gðru;Kþ1 pð0Þ u qð0Þ Kþ1Þqð0Þ Kþ1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Including the newly arriving example: : ð18Þ Note that pðKÞ u ð1Þ ; pð0Þ u and qð0Þ Kþ1 are cached interim values; since K is known, so ak+1 and ak can be inferred from (14); c is the constant given in (7). As we marked, Eq. (18) can be divided into two parts: the first part rebalances the trained parameters, and the second part includes variance on the newly arriving example. So, with (18) we can obtain pðKþ1Þ u ð1Þ by making incremental update on pðKÞ u ð1Þ . 3.2.2. Constructing pðKþ1Þ u ð2Þ based on pðKÞ u ð2Þ From (16) we see that the training result of one training epoch will be used as the initial value of the subsequent one. So when constructing pðKþ1Þ u ð2Þ based on pðKÞ u ð2Þ , we must take the difference between pðKÞ u ð1Þ and pðKþ1Þ u ð1Þ into account. Let Dpu ð1Þ ¼ pðKþ1Þ u ð1Þ pðKÞ u ð1Þ , then pðKþ1Þ u ð2Þ can be given by: 4 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS After the first training epoch, for Pu: P(+1)=K+1 p(o)-1) training epoch. for qr: q +1 p+1)=HPx1)+2x1∑[tn-p4+)q +H+1(m+1-px)q =cx4pg++xk+∑(rak-(p+△pn)q")q for each user h∈R().h≠H+1 p)=p8) ra-pA)q"+)△ (d+1) (d) (d)/(d) Based on the incremental updating rule for latent features in Including the newly arriving (21)and (22), we design the training process of our Incremental Regularized Matrix Factorization (IRMF) model in Table 1.As ∑(△pq段) shown, irMF divides the training process into two phases, respec- tively are the batch training phase and the incremental updating phase. Tackle the initial difference onp During the batch training phase, the feature training rule is the Eq (20)denotes that for each item kE R(u), k+K+ 1, we can apply same as in the primitive RMF model, whilst IRMF requires cach- the bulk behind BH, to include the incremental difference brought by pik+I)based on the current trained result ing the interim training results after each training epoch for incremental updates. This implies that the storage complexity 3.2.3.Constructing puk(a based on pu a(d>2) During the subsequent training epochs, the interactions be- ween the latent features will become very complicated. After ning epoch, for each user v∈R(k){k∈R(u),k≠K+1} p] would be affected by the incremental difference on ql)in poch, the affect of the incremental; Level I difference will spread to each item j∈R(b)(U∈R(k)(k∈R(u) kK+1). Actually, the broadcast of the affect caused by a newly arriving example can be described by the chain depicted in Fig. 1 @1 Since the 3nd training/E k0术+明 Since our objective is to design an incremental recommender Since t aInIng D∈R(k)(k∈R(u),k=K+1 able to respond quickly to user feedback, we only consider the fea ture interactions at Levels 1 and 2 shown in Fig. 1. Thus, for a new k∈R(u)k≠K+ arriving example Fux+1. the incremental feature updating rule of pu fig 1. The broadcast of the affect by the incremental difference on p caused by a can be summarized as follows Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
qðHk Þ k ð2Þ ¼ cHk qðHk Þ k ð1Þ þbHk g XHk h–u rh;k pðKhÞ h ð1Þ qðHk Þ k ð1Þ !pðKhÞ h ð1Þ þ ru;k pðKÞ u ð1Þ qðHk Þ k ð1Þ !pðKÞ h ð1Þ " # q0ðHk Þ k ð2Þ ¼ cHk qðHk Þ k ð1Þ þbHk g XHk h–u rh;k pðKhÞ h ð1Þ qðHk Þ k ð1Þ !pðKhÞ h ð1Þ þ ru;k pðKþ1Þ u ð1Þ qðHk Þ k ð1Þ !pðKþ1Þ h ð1Þ " # ¼ cHk qðHk Þ k ð1Þ þbHk g XHk h–u rh;k pðKhÞ h ð1Þ qðHk Þ k ð1Þ !pðKhÞ h ð1Þ " þ ru;k pðKÞ u ð1Þ þDpu ð1Þ !qðHk Þ k ð1Þ ! pðKÞ u ð1Þ þDpu ð1Þ !# ¼ qðHk Þ k ð2Þ þbHk g ru;k pðKþ1Þ u ð1Þ qðHk Þ i ð1Þ ÞDp u ð1Þ Dp u ð1Þ qðHk Þ i ð1Þ !pðKÞ u ð1Þ " # |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Tackle the initial difference onpu: : ð19Þ Eq. (19) can be decomposed into three parts: the first part and the second part respectively rebalances the trained parameter and includes the variance on the newly arriving example; moreover, since the initial value changes at the very beginning of this training epoch, so we need the third part to tackle the initial difference on pu. In the meanwhile, from (17) we see that for each training example ru,k in R(u) except ru,K+1, the training result of the corresponding item feature vector qðHk Þ k ð2Þ will be impacted by the incremental difference on pðKþ1Þ u ð1Þ . Formally, for each item k 2 R(u), k – K + 1, the difference on qðHkÞ k ð2Þ brought by pðKþ1Þ u ð1Þ can be given by: pðKÞ u ð2Þ ¼cK pðKÞ uð1Þ þaK g XK k¼1 ru;k pðKÞ u ð1Þ qðHk Þ k ð1Þ !qðHk Þ k ð1Þ ; pðKþ1Þ u ð2Þ ¼cKþ1 pðKþ1Þ u ð1Þ þaKþ1g XKþ1 k¼1 ru;k pðKþ1Þ u ð1Þ qðHk Þ k ð1Þ !qðHk Þ k ð1Þ ¼cKþ1 pðKþ1Þ u ð1Þ þaKþ1g XK k¼1 ðru;k ðpðKÞ u ð1Þ þDpu ð1Þ ÞqðHk Þ k ð1Þ ÞqðHk Þ k ð1Þ " þ ru;Kþ1 pðKþ1Þ u ð1Þ qðHKþ1Þ Kþ1 ð1Þ !qðHKþ1Þ Kþ1 ð1Þ # ; ¼aKþ1 aK pðKÞ u ð2Þ cK pðKÞ u ð1Þ !þcKþ1pðKþ1Þ u ð1Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Rebalancing the trained parameter: þaKþ1g ru;Kþ1 pðKþ1Þ u ð1Þ qðHKþ1 Þ Kþ1 ð1Þ !qðHKþ1Þ Kþ1 ð1Þ " |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Including the newly arriving example: XK k¼1 ðDp u ð1Þ qðHk Þ k ð1Þ ÞqðHk Þ k ð1Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Tackle the initial difference onpu: 3 7 7 7 7 5 : ð20Þ Eq. (20) denotes that for each item k 2 R(u), k – K + 1, we can apply the bulk behind bHk to include the incremental difference brought by pðKþ1Þ u ð1Þ based on the current trained result. 3.2.3. Constructing pðKþ1Þ u ðdÞ based on pðKÞ u ðdÞ(d > 2) During the subsequent training epochs, the interactions between the latent features will become very complicated. After the 3rd training epoch, for each user v 2 RðkÞfk 2 RðuÞ; k–K þ 1g; pðKv Þ v ð3Þ would be affected by the incremental difference on qðHk Þ k ð2Þ in (20); and after the 4th training epoch, the affect of the incremental difference will spread to each item j 2 R(v){v 2 R(k){k 2 R(u), k – K + 1}}. Actually, the broadcast of the affect caused by a newly arriving example can be described by the chain depicted in Fig. 1. Since our objective is to design an incremental recommender able to respond quickly to user feedback, we only consider the feature interactions at Levels 1 and 2 shown in Fig. 1. Thus, for a newly arriving example ru,K+1, the incremental feature updating rule of pu can be summarized as follows: After the first training epoch, for pu : pðKþ1Þ u ð1Þ ¼ aKþ1 aK pðKÞ u ð1Þ cK pð0Þ u ! þ cKþ1pð0Þ u þ aKþ1g ru;Kþ1 pð0Þ u qð0Þ Kþ1 qð0Þ Kþ1: After the dth (d > 1) training epoch, for pu :pðKþ1Þ u ðdÞ ¼aKþ1 aK pðKÞ u ðdÞ cK pðKÞ u ðd1Þ !þcKþ1 pðKþ1Þ u ðd1Þ þaKþ1g ru;Kþ1 pðKþ1Þ u ðd1Þ qðHKþ1Þ Kþ1 ðd1Þ !qðHKþ1Þ Kþ1 ðd1Þ XK k¼1 Dpu ðd1Þ qðHk Þ k ðd1Þ !qðHk Þ k ðd1Þ " #; ð21Þ for each item k 2 R(u), k – K + 1: q0ðHkÞ k ðdþ1Þ ¼ qðHkÞ k ðdþ1Þ þbHk g ru;k pðKþ1Þ u ðdÞ qðHkÞ i ðdÞ ! Dpu ðdÞ Dpu ðdÞ qðHkÞ i ðdÞ ! pðKÞ u ðdÞ " #: Similarly, we can infer the feature updating rule of item i on a newly arriving example rH+1,i as: After the first training epoch, for qi : qðHþ1Þ i ð1Þ ¼ bHþ1 bH qðHÞ i ð1Þ cHqð0Þ i ! þ cHþ1qð0Þ i þ bHþ1gðrHþ1;i pð0Þ Hþ1qð0Þ i Þpð0Þ h : After the dth (d > 1) training epoch, for qi : qðHþ1Þ i ðdÞ ¼bHþ1 bH qðHÞ i ðdÞ cH qðHÞ i ðd1Þ !þcHþ1 qðHþ1Þ i ðd1Þ þbHþ1g rHþ1;i pðKHþ1Þ Hþ1 ðd1Þ qðHþ1Þ i ðd1Þ !pðKHþ1Þ Hþ1 ðd1Þ XH h¼1 pðKhÞ h ðd1Þ Dqi ðd1Þ !pðKhÞ h ðd1Þ " #; ð22Þ for each user h 2 R(i), h – H + 1: p0ðKhÞ h ðdþ1Þ ¼ pðKhÞ h ðdþ1Þ þaKh g rh;i pðKhÞ h ðdÞ qðHþ1Þ i ðdÞ ! Dqi ðdÞ pðKhÞ h ðdÞ Dqi ðdÞ ! qðHÞ i ðdÞ " #: Based on the incremental updating rule for latent features in (21) and (22), we design the training process of our Incremental Regularized Matrix Factorization (IRMF) model in Table 1. As shown, IRMF divides the training process into two phases, respectively are the batch training phase and the incremental updating phase. – During the batch training phase, the feature training rule is the same as in the primitive RMF model, whilst IRMF requires caching the interim training results after each training epoch for incremental updates. This implies that the storage complexity Fig. 1. The broadcast of the affect by the incremental difference on pu caused by a new example. X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx 5 Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X Luo et aL/ Knowledge-Based Systems xxx(2011)xx-Xx The constructing process of IRMF The constructing process of IRM ueRo i, n;=R(I 1. Input raining dataset (ul(u, DEn which can be easily updated. When making predictions, we com- Integer D specifying the number of training epochs bine(25)and(26) directly to build bu and bias: Train the latent features over T with(3 ·n Cache the interim results of the dth epoch B2+ni Incremental updating phase Because the linear biases are not trained along with the latent factors but estimated through the unbiased estimator, the latent feature matrices are actually constructed on the residuals of the For user u and item k∈Ru)k≠ observed ratings from the linear biases given by Incrementally update pu and qk using(21) Tui= Tui-k-bu-bi Restore the interim results of the dth epoc Based on the infered re propose the IRMF with linear biases(hereafter referred to as IRMF-B), of which the training pro- cess is summarized in Table 2. As shown, the training process of IRMF-B is also divided into the batch training phase and the incre- of the RMF model is D+1(including the initial value)times mental updating phase can be stored in external files to alleviate the space complexity n the During the batch training phase, the model first estimates the Once a new training example has arrived, the model can load the global average l, and the necessary parameters for constructing corresponding interim parameters, and then make incremental the linear biases bu and bi shown in (26); then the residual set p updates according to the feature updating rule(21)and (22) is built with(28 ); finally the latent feature matrices are built on T, and the interim results of each training epoch are cached in 3.3. Incorporating Linear Biases external files During the incremental updating phase, on arrival of a new As Paterek [24] and Koren [27 ] suggest, an effective way to fur example, the model first incrementally updates the parameters ther improve the prediction accuracy of RMF is incorporating the for estimating the linear biases; then the residual rating of this linear biases. When applying this strategy to the incremental ap- newly arriving example is built with(28); finally, the corre- proach, the linear biases also should be updated incrementally sponding latent features are incrementally updated using(21) for keeping consistent with the latent factors. First of all, recall that and(22). in the RMF-B model, the global average u which captures the sta- tistic characteristics of the given training data is estimated by the Note that the arrival of a new example will change the dist verage of the observed ratings as follows, tion of the entire training set, which actually requires rebui ∑mu/T 23) Table 2 The constructing process of IRMF-B where T denotes the training set. Once a new training example rw, m The constructing process of IRMF-B arrives, u can be updated by Batch training phase =({·+rwm)/(T+1). Integer d specifying the number of training Nonetheless, with regard to the observed user bias bu and Integerfspecifying the feature space dimeg epochs item bias b the situation turns different. In the RMF-B model 2. Estimate the linear biases the system estimates the linear biases through simultaneously training with the feature vectors, which can lead to a compli For each user u, record su and nu or each item i, record s, and n cated update process. For simplicity, we choose to estimate the the residual training set T using(28) near biases via the unbiased linear estimator [16, 18]. which en- bles convenient updates. By applying this estimator, bu and b in the latent features over T with(3) can be denoted b Cache the interim results of the dth epoch b odate A using (24) bu (ru--b)/(B2+R(u) Compute the residual ratings using(28) 3. Iterate while d< D Note that since bu and bi are both dependent on the global aver For user u mk∈R(u).k≠ age, if we pre-estimate the linear biases for each user and item, the ntally update pu and qk using (21) update of u can lead to updating the linear biases over the entire ser and item sets. So instead we only record the sum rating and Incrementally update q and ph using(22) the size of the rating set by each user/on each item, given by ache the interim results of the dth epoch Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Base yst.(2011).doi:10.1016/ knosys2011.09.006
of the IRMF model is D + 1 (including the initial value) times that of the original RMF model. In practice, these interim results can be stored in external files to alleviate the space complexity in the memory. – Once a new training example has arrived, the model can load the corresponding interim parameters, and then make incremental updates according to the feature updating rule (21) and (22). 3.3. Incorporating Linear Biases As Paterek [24] and Koren [27] suggest, an effective way to further improve the prediction accuracy of RMF is incorporating the linear biases. When applying this strategy to the incremental approach, the linear biases also should be updated incrementally for keeping consistent with the latent factors. First of all, recall that in the RMF-B model, the global average l which captures the statistic characteristics of the given training data is estimated by the average of the observed ratings as follows, l ¼ X ðu;iÞ2T ru;i , jTj; ð23Þ where T denotes the training set. Once a new training example rw,m arrives, l can be updated by: l0 ¼ ðl jTj þ rw;mÞ=ðjTj þ 1Þ: ð24Þ Nonetheless, with regard to the observed user bias bu and item bias bi, the situation turns different. In the RMF-B model, the system estimates the linear biases through simultaneously training with the feature vectors, which can lead to a complicated update process. For simplicity, we choose to estimate the linear biases via the unbiased linear estimator [16,18], which enables convenient updates. By applying this estimator, bu and bi can be denoted by: bi ¼ X ðu;iÞ2RðiÞ ðru;i lÞ , ðb1 þ jRðiÞjÞ; bu ¼ X ðu;iÞ2RðuÞ ðru;i l biÞ , ðb2 þ jRðuÞjÞ: ð25Þ Note that since bu and bi are both dependent on the global average, if we pre-estimate the linear biases for each user and item, the update of l can lead to updating the linear biases over the entire user and item sets. So instead we only record the sum rating and the size of the rating set by each user/on each item, given by: su ¼ X ðu;iÞ2RðuÞ ru;i; nu ¼ jRðuÞj; si ¼ X ðu;iÞ2RðiÞ ru;i; ni ¼ jRðiÞj; ð26Þ which can be easily updated. When making predictions, we combine (25) and (26) directly to build bu and bi as: bi ¼ si l ni b1 þ ni ; bu ¼ su l nu P i2RðuÞbi b2 þ ni : ð27Þ Because the linear biases are not trained along with the latent factors but estimated through the unbiased estimator, the latent feature matrices are actually constructed on the residuals of the observed ratings from the linear biases given by: r0 u;i ¼ ru;i l bu bi: ð28Þ Based on the inference above, we propose the IRMF with linear biases (hereafter referred to as IRMF-B), of which the training process is summarized in Table 2. As shown, the training process of IRMF-B is also divided into the batch training phase and the incremental updating phase. – During the batch training phase, the model first estimates the global average l, and the necessary parameters for constructing the linear biases bu and bi shown in (26); then the residual set T0 is built with (28); finally the latent feature matrices are built on T0 , and the interim results of each training epoch are cached in external files. – During the incremental updating phase, on arrival of a new example, the model first incrementally updates the parameters for estimating the linear biases; then the residual rating of this newly arriving example is built with (28); finally, the corresponding latent features are incrementally updated using (21) and (22). Note that the arrival of a new example will change the distribution of the entire training set, which actually requires rebuilding Table 1 The constructing process of IRMF. The constructing process of IRMF Batch training phase 1. Input: Training dataset {ru,ij(u, i) 2 T} Integer D specifying the number of training epochs Integer f specifying the feature space dimension 2. Iterate while d < D: Train the latent features over T with (3) Cache the interim results of the dth epoch Incremental updating phase 1. Input: Newly arriving example ru,i 2. Iterate while d < D: For user u and item k 2 R(u), k – i: Incrementally update pu and qk using (21) For item i and user h 2 R(i), h – u: Incrementally update qi and ph using (22) Restore the interim results of the dth epoch Table 2 The constructing process of IRMF-B. The constructing process of IRMF-B Batch training phase 1. Input: Training dataset {ru,ij(u, i) 2 T} Integer D specifying the number of training epochs Integer f specifying the feature space dimension 2. Estimate the linear biases: Estimate l using (23) For each user u, record su and nu For each item i, record si and ni Compute the residual training set T0 using (28) 3. Iterate while d < D: Train the latent features over T0 with (3) Cache the interim results of the dth epoch Incremental updating phase 1. Input: Newly arriving training example ru,i 2. Update the linear biases: Update l using (24) Update su and nu Update si and ni Compute the residual ratings using (28) 3. Iterate while d < D: For user u and item k 2 R(u), k – i: Incrementally update pu and qk using (21) For item i and user h 2 R(i), h – u: Incrementally update qi and ph using (22) Cache the interim results of the dth epoch 6 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS the entire residual rating set and feature matrices. However, this is Evaluation Metric: The recommenders'prediction accuracy is acceptable in real-world applications. So we only apply the measured by root mean squared error(RMsE)[ 38, 39), which is a incrementally updated linear biases to computing the residual rat- widely used metric for evaluating the statistical accuracy of recom- ing on the new example. mendation algorithms, given by 3.4. Complexity analysis RMSE=∑(u-ra)2/ Based on(21), we see that the IRMF model is incrementally datable with regard to user u on a new example rui during each where v denotes the validation set. For a given recommender,low training epoch as follows RMSE means high prediction accuracy updating the latent user feature pu, which calls for iterating on user u's rating set with a computational complexity at o(U ) 4.1. Performance validationfor SI-RMF which costs a computational complexity at o(0. f) Experiment settings. In this part of experiments, we aim to com- pare the accuracy of RMF and sl-RMf, for checking whether the re- Since the above two works could be done within the same loop, moval of the sequential sensitivity could lead to a compromise in so the computation complexity for model update on user u over ru recommendation accuracy. The initial values of the feature matri- is o(m: /). Similarly, we can infer that the updatecomplexity tion in the range of [-0.02, 0.02, following Takacs et al.s the item feature qr on Tu is o(# ).Given D specifying the training suggestion(25). Either experiment dataset was divided into five epoch count, then the computation complexity of the incremental disjoint parts for employ update over rui in IRMF can be formulated by Results During the experiments, we found that the change of the training rule in SI-RMF lead to the restriction on the learning rate baesmRMHFacoingto(27)imimenend2(29)themodeoudverewthotareateysmalaleonFor (+需)×D aking Sl-RMF converge, n should be no more than 7 x 10-on MLIM, and 3 x 10-5on NF5M. For comparison, the corresponding In terms of the complexity caused by incorporatin s linear threshold learning rate of RMF are 2 x 10-2 and 1 x 10-on MLIM e con- struction of the observed linear bias of the active user u which needs and NF5M respectively Table 3 summarizes the performance of RMF and SI-RMF with to iterate on user u's rating set with a complexity at o().So, the i-0.02 and f=20. We also depict the RMSE comparison between lexity of the incremental update in IRMF-B can be RMF and SI-RMF in Fig. 2. From the experiment results, we have (i×0+1+需/) two findings With the same learning rate, RMF and Sl-RMF have the In contrast, completely retraining the model calls for iterating over vergence rate, approximate recommendation accuracy(in the entire training set D times, with a computation complexity at our experiments, Sl-RMF could always outperform RMF slightly ) O(×D× Smaller value of n will lead to higher recommendation accuracy owever, as mentioned in Section 3. 2, due to the requirement for (represented by lower RMSE), as well as more necessary train caching the interim training results of the feature matrices, the stor- ing epochs to converge. age complexity of IRMF would be D+ 1 times that of the batch mod el. Fortunately, since these interim values are not necessary for Based on these findings, we can conclude that the learning rule making predictions, we can maintain them not in the memory but of SI-RMF will not affect the prediction accuracy: however, it re- in the external files, and load the required data when needed quires small learning rate to make the model converge, which leads to many training epochs. Nevertheless, we cannot simply conclude that Sl-RMF converges slower than rMf, since their con- verging epoch counts are very close to each other under the same parameter setting. Moreover, from(15)-(17), we can find that dur Datasets. The experiments were conducted on two datasets. The ing each training epoch of Sl-RMF, the variant of each latent feature first dataset is the MovieLens 1 M dataset(hereafter referred to er re the MLIM dataset). This dataset is collected by the GroupLens Re- very beginning of this epoch, which makes it convenient to speed movies by 6040 ich contains 1 million ratings applied to 3900 up the training process with parallel programming models, e.g Lens web site users with a rating scale on the [0, 5] interva the Map-Reduce technique[40]. A thorough discussion over this is- The density of the rating matrix in MLIM is 4.25% The other dataset used in our experiments is the Netflix dataset. Table 3 This dataset contains over 100 million ratings from 480 thousand Performance comparison between RMF and SI-RMF. randomly-chosen, anonymous Netflix customers over 17 K movie Model Value of n to converge RMSE titles. The rating scale lies in the range of Goldberg et al. [1] and MLM(i=0.02) Adomavicius and Tuzhilin [5. In our experiment, we extracted subset consisting of the ratings on the first 1000 movies. This sub- 08591 set is hereafter referred to as the nesm dataset which contains SI-RME 00007 5,010, 199 ratings by 404, 555 users. The data density of the rating F5M(=0.002) matrix in NFSM is 1. 24% In our experiments, we applied the 5-fold cross-validation experiment setting on both datasets, which is a SRME 0.00003 ommon setting for model evaluation 36, 37]. Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
the entire residual rating set and feature matrices. However, this is unacceptable in real-world applications. So we only apply the incrementally updated linear biases to computing the residual rating on the new example. 3.4. Complexity analysis Based on (21), we see that the IRMF model is incrementally adaptable with regard to user u on a new example ru,i during each training epoch as follows: – updating the latent user feature pu, which calls for iterating on user u’s rating set with a computational complexity at O jTj jUj f ; – updating the latent features for items in user u’s rating set, which costs a computational complexity at O jTj jUj f . Since the above two works could be done within the same loop, so the computation complexity for model update on user u over ru,i is O jTj jUj f . Similarly, we can infer that the update complexity for the item feature qi on ru,i is O jTj jIj f . Given D specifying the training epoch count, then the computation complexity of the incremental update over ru,i in IRMF can be formulated by: O jTj jUj þ jTj jIj D f : ð29Þ In terms of the complexity caused by incorporating the linear biases in IRMF-B, according to (27), it mainly depends on the construction of the observed linear bias of the active user u which needs to iterate on user u’s rating set with a complexity at O jTj jUj . So, the complexity of the incremental update in IRMF-B can be given by: O jTj jUj ðf þ 1Þ þ jTj jIj f D : ð30Þ In contrast, completely retraining the model calls for iterating over the entire training set D times, with a computation complexity at O(jTj D f). However, as mentioned in Section 3.2, due to the requirement for caching the interim training results of the feature matrices, the storage complexity of IRMF would be D + 1 times that of the batch model. Fortunately, since these interim values are not necessary for making predictions, we can maintain them not in the memory but in the external files, and load the required data when needed. 4. Experiments Datasets. The experiments were conducted on two datasets. The first dataset is the MovieLens 1 M dataset (hereafter referred to as the ML1M dataset). This dataset is collected by the GroupLens Research Project at the University of Minnesota through the MovieLens web site, which contains 1 million ratings applied to 3900 movies by 6040 users with a rating scale on the [0,5] interval. The density of the rating matrix in ML1M is 4.25%. The other dataset used in our experiments is the Netflix dataset. This dataset contains over 100 million ratings from 480 thousand randomly-chosen, anonymous Netflix customers over 17 K movie titles. The rating scale lies in the range of Goldberg et al. [1] and Adomavicius and Tuzhilin [5]. In our experiment, we extracted a subset consisting of the ratings on the first 1000 movies. This subset is hereafter referred to as the NF5M dataset, which contains 5,010,199 ratings by 404,555 users. The data density of the rating matrix in NF5M is 1.24%. In our experiments, we applied the 5-fold cross-validation experiment setting on both datasets, which is a common setting for model evaluation [36,37]. Evaluation Metric: The recommenders’ prediction accuracy is measured by root mean squared error (RMSE) [38,39], which is a widely used metric for evaluating the statistical accuracy of recommendation algorithms, given by: RMSE ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X u;i2V ðrui ^ruiÞ 2 , jVj vuut ; ð31Þ where V denotes the validation set. For a given recommender, low RMSE means high prediction accuracy. 4.1. Performance validation for SI-RMF Experiment settings. In this part of experiments, we aim to compare the accuracy of RMF and SI-RMF, for checking whether the removal of the sequential sensitivity could lead to a compromise in recommendation accuracy. The initial values of the feature matrices of both models were randomly drawn from a uniform distribution in the range of [0.02,0.02], following Takács et al.’s suggestion [25]. Either experiment dataset was divided into five disjoint parts for employing the fivefold cross-validation settings. Results.During the experiments, we found that the change of the training rule in SI-RMF lead to the restriction on the learning rate – the model would diverge without a relatively small value of g. For making SI-RMF converge, g should be no more than 7 104 on ML1M, and 3 105 on NF5M. For comparison, the corresponding threshold learning rate of RMF are 2 102 and 1 102 on ML1M and NF5M respectively. Table 3 summarizes the performance of RMF and SI-RMF with k = 0.02 and f = 20. We also depict the RMSE comparison between RMF and SI-RMF in Fig. 2. From the experiment results, we have two findings: – With the same learning rate, RMF and SI-RMF have the same convergence rate, and approximate recommendation accuracy (in our experiments, SI-RMF could always outperform RMF slightly). – Smaller value of g will lead to higher recommendation accuracy (represented by lower RMSE), as well as more necessary training epochs to converge. Based on these findings, we can conclude that the learning rule of SI-RMF will not affect the prediction accuracy; however, it requires small learning rate to make the model converge, which leads to many training epochs. Nevertheless, we cannot simply conclude that SI-RMF converges slower than RMF, since their converging epoch counts are very close to each other under the same parameter setting. Moreover, from (15)–(17), we can find that during each training epoch of SI-RMF, the variant of each latent feature is only related to the initial values of other related features at the very beginning of this epoch, which makes it convenient to speed up the training process with parallel programming models, e.g., the Map-Reduce technique [40]. A thorough discussion over this isTable 3 Performance comparison between RMF and SI-RMF. Model Value of g to converge Epochs to converge RMSE ML1M (k = 0.02) RMF 0.02 7 0.8673 0.0007 203 0.8591 SI-RMF 0.0007 205 0.8573 NF5M (k = 0.002) RMF 0.001 30 0.9557 0.00003 965 0.9484 SI-RMF 0.00003 964 0.9472 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx 7 Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X Luo et aL/ Knowledge-Based Systems xxx(2011)xx-Xx Afterwards we apply id to simulating the real-world user feed- △RMF backs, and use each model to make predictions for examples in id one by one: for batch models, we record the rmse of the model on each example: for incremental models, we first record the rmse on each example, and then use the actual ratings as new example to make incremental update. Note that the batch models could not learn new patterns from the examples in ID: whereas the incremental models could make incremental updates based on these examples. So we can expect that as the user feedbacks' accumulating, the incre- mental models will gradually outperform the corresponding batch version Through such experimental process, we could clearly detect the effect of the incremental update in IRMF and IRMF-B. The initial values of the feature matrices of either model are randoml Value of f from a uniform distribution on the range[0.02,0.02]. For (a)MLIM(y=0.0007,=0.02 the regularizing parameter i and the learning rate n are I tively set at 0.02 and 7 x 10-: on NFSM, the values of i and n are respectively 0.002 and 3 10. The value of f was set at 20 仝 Results. The experiment results on both the datasets are de- picted in Fig 3. Note that each point in Fig. 3 denotes the rMse of the corresponding recommender for all tested examples so far. 0.940 RMF dOwm b)NFSM(y=0.00003,=0.002) Fig. 2. The RMSE comparison between RMF and Sl-RMF on both datasets with the value of f raging from 20 to 500. The panel (a) depicts the situation on MLIM, and 0k80k900k panel (a). New Example Count (a)MLIM(=0.0007,=0.02) ue is beyond the scale of this paper; however, we would like to make a further investigation in our future w 1.55 4. 2. Per Experiment settings. In this part we intend to validate the perfor- mance of the incremental models. in order to check if our incre ental model could correctly learn the newly arriving examples. ve compare the performance of RMF, IRMF, RMF-B and IRMF-B. 9 1.25 And for a further empirical study, we also implement the Incre- mental RMF model implemented by retraining features described in [ 34(hereafter referred to as RMF-R) Since the objective is to validate the performance of the incre- Either experiment dataset was divided into two disjoint parts which were respectively marked as the batch training data, 005000k1.0M1.5M20M BD, consisting of 10% of the original data; and the incremental lew Example Count dataset, ID, consisting of 90% of the original data (b)NFSM(y=0.00003,=0.002 Bd was used as the training se menders or the batch training set for incremental recommend- Fig 3. The RMSE comparison between all tested recommenders with/=20.The ers, to build the corresponding models. NF5M. Both the panels share the legend in panel (a Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Base yst.(2011).doi:10.1016/ knosys2011.09.006
sue is beyond the scale of this paper; however, we would like to make a further investigation in our future work. 4.2. Performance validation for incremental models Experiment settings. In this part we intend to validate the performance of the incremental models. In order to check if our incremental model could correctly learn the newly arriving examples, we compare the performance of RMF, IRMF, RMF-B and IRMF-B. And for a further empirical study, we also implement the Incremental RMF model implemented by retraining features described in [34] (hereafter referred to as RMF-R). Since the objective is to validate the performance of the incremental models, we design the experiment process as follows: – Either experiment dataset was divided into two disjoint parts, which were respectively marked as the batch training data, BD, consisting of 10% of the original data; and the incremental dataset, ID, consisting of 90% of the original data. – First of all, BD was used as the training set for batch recommenders or the batch training set for incremental recommenders, to build the corresponding models. – Afterwards we apply ID to simulating the real-world user feedbacks, and use each model to make predictions for examples in ID one by one: for batch models, we record the RMSE of the model on each example; for incremental models, we first record the RMSE on each example, and then use the actual ratings as new example to make incremental update. – Note that the batch models could not learn new patterns from the examples in ID; whereas the incremental models could make incremental updates based on these examples. So we can expect that as the ‘user feedbacks’ accumulating, the incremental models will gradually outperform the corresponding batch version. Through such experimental process, we could clearly detect the effect of the incremental update in IRMF and IRMF-B. The initial values of the feature matrices of either model are randomly drawn from a uniform distribution on the range [0.02,0.02]. For ML1M, the regularizing parameter k and the learning rate g are respectively set at 0.02 and 7 104 ; on NF5M, the values of k and g are respectively 0.002 and 3 105 . The value of f was set at 20. Results. The experiment results on both the datasets are depicted in Fig. 3. Note that each point in Fig. 3 denotes the RMSE of the corresponding recommender for all tested examples so far, Fig. 2. The RMSE comparison between RMF and SI-RMF on both datasets with the value of f raging from 20 to 500. The panel (a) depicts the situation on ML1M, and the panel (b) depicts the situation on NF5M. Both the panels share the legend in panel (a). Fig. 3. The RMSE comparison between all tested recommenders with f = 20. The panel (a) depicts the situation on ML1M, and the panel (b) depicts the situation on NF5M. Both the panels share the legend in panel (a). 8 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X. Luo et aL/ Knowledge-Based Systems 9 The RMSE comparison on MLIM(The D.P. rows denote the decreasing percents of the training process of RMF; whereby we find that RMF is an iter- RMSe by the incremental models compared to the batch models) tive recommender where the inter-dependence between param uring the training process leads Method ount complicated to make incremental updates. To address this prob- 100K 200K lem, we propose the sl-RMF model which removes the 9639 sensitivity by simultaneously learning the involved examples by ach user on each item and averaging the gain on the correspond D P. ( ing latent factor over each example Being sequence-insensitive, Sl- 0.9501 0.9571 0.9525 0.9532 RMF provides a much simpler mathematic form where during each IRMF-B 0.9235 training epoch the final training output of each latent factor is only D P. (% 4.14 decided by the related ratings and the initial value of correspond ing latent features; in other words, the training result of each latent feature is no longer bounded by the temporal states of the counter features on the related ratings. Empirical studies show that the The RMSE comparison on NFSM. prediction accuracy of SI-RMF and RMF is very close: however, Count the change of learning rule in SI-RMF requires small learning rate to converge. One possible reason for this phenomenon is the rela- tively low updating frequency on the latent features in SI-RMF. 1.1315 Nonetheless, we also find that Sl-RMF can be parallelized conve- niently, since during each training epoch the parameters have no inter-dependence on each other. Therefore, it is feasible to speed RMF-B 1.1173 1.0561 1.0429 1.0416 up the training process via parallel programming. We plan to fur- ther investigate this point in our future work. With SI-RMF, we subsequently propose the incremental recom mender IRMF, which can incrementally update specified latent fea- ture through constructing the expression over K+1 example e.g.(0.9096, 100,000) denotes that there are 100,000 examples based on the trained model over K examples And further, by inte- grating the linear biases which can be also updated incrementally From Fig 3, we can see that at the very beginning, the predic- into the IRMF model, we propose the IRMF-B model. From the ion accuracy of the batch models and the corresponding incre. experimental results on two large real datasets, we find that both pdate, the incremental models gradually outperformed the batch models can incrementally learn from the newly arriving examples models as the user feedbacks'increasing. For a clearer view, we can offer supplementary description of the characteristics of the also summarize the RMSE of our incremental model, and the corre- rating data, IRMF-B proves to be more accurate. a drawback of sponding batch model at several typical points in Tables 4 and 5 here we can see that the effects of the incremental update tended training results after each epoch, which leads to high storage com- to become more obvious as the amount of user feedbacks'in- plexity. Currently, we address this issue by storing these interim creases. This phenomenon is reasonable, since the incremental up- results in external files, and loading the corresponding data when dates enabled the incremental recommenders to keep track of the needed, for alleviating the burden on the memory. We also intend varlance in user rating pattern Moreover, from Fig 3b and Table 5 we can observe an evident to seek for the better solution to this problem in the future. effect of incorporating linear biases into the RMF based model. This is caused by the fact that the density of the rating matrix of NFSM Acknowledgements is much lower than that of MLIM; and due to the extreme sparsity We are very grateful to the anonymous reviewers for their infor of the rating matrix, the pure RMF based methods could not cover mative and insightful comments, which help us to further improv the user rating patterns well, which leads to rather high prediction this work. This research is fully supported by the National Key Tech error rate. Nonetheless, by incorporating linear biases, which play a nology R&D Program of China under Grant No. 2011BAH25B01 supplementary but very important role in describing the data char- acteristics, RMF-B and IRMF-B made considerably more precise References recommendations than rmf and irme do Based on the analysis above we can summarize the exper [1 D. Goldberg. D. Nichols, B M. Oki, D. Terry, Using Collaborative Filtering to results as follows: leave an Informationtapestry. Commun. ACM 35(1992)61-70. As user feedbacks accumulate. irMf and IRMF-B could obvi- ACM Conference on Computer Supported Cooperative Work, ACM, North batch models RMF-B and another model RMF-R. This re proven to b When dealing with extremely sparse data, the incorporation of [4]w. Hill. L Stead, M. Rosenstein, G. Furnas, Recommending and evaluating linear biases can play a critical role in keeping the recommenda Conference on Human Factors in Computing Systems, ACM, [5] G. Adom Tuzhilin, Toward the next generation tion quality in either batch or incremental mo possible extensions, IEEE Trans. Data Eng. 17(2005)73 [6J. Ma, G. Zhang, J. Lu, A state-based knowledge representation approach for 5. Conclusion this work, we aim to design an incremental recommender he impact of recommender systems on sales, J Manage. Inf. Syst. 27(2010) based on the RMF model To achieve this objective, we first analyze Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based syst(2011.do:10.016/ j knosys:201109006
e.g. (0.9096,100,000) denotes that there are 100,000 examples tested, and the RMSE so far is 0.9096. From Fig. 3, we can see that at the very beginning, the prediction accuracy of the batch models and the corresponding incremental versions was very close. However, due to the incremental update, the incremental models gradually outperformed the batch models as the ‘user feedbacks’ increasing. For a clearer view, we also summarize the RMSE of our incremental model, and the corresponding batch model at several typical points in Tables 4 and 5, where we can see that the effects of the incremental update tended to become more obvious as the amount of ‘user feedbacks’ increases. This phenomenon is reasonable, since the incremental updates enabled the incremental recommenders to keep track of the variance in user rating patterns. Moreover, from Fig. 3b and Table 5 we can observe an evident effect of incorporating linear biases into the RMF based model. This is caused by the fact that the density of the rating matrix of NF5M is much lower than that of ML1M; and due to the extreme sparsity of the rating matrix, the pure RMF based methods could not cover the user rating patterns well, which leads to rather high prediction error rate. Nonetheless, by incorporating linear biases, which play a supplementary but very important role in describing the data characteristics, RMF-B and IRMF-B made considerably more precise recommendations than RMF and IRMF do. Based on the analysis above, we can summarize the experiment results as follows: – As user feedbacks accumulate, IRMF and IRMF-B could obviously outperform the corresponding batch models RMF and RMF-B, and another incremental model RMF-R. This indicates that our incremental designs are proven to be effective empirically. – When dealing with extremely sparse data, the incorporation of linear biases can play a critical role in keeping the recommendation quality in either batch or incremental model. 5. Conclusion In this work, we aim to design an incremental recommender based on the RMF model. To achieve this objective, we first analyze the training process of RMF; whereby we find that RMF is an iterative recommender where the inter-dependence between parameters during the training process leads to a mathematic form too complicated to make incremental updates. To address this problem, we propose the SI-RMF model which removes the sequencesensitivity by simultaneously learning the involved examples by each user/on each item and averaging the gain on the corresponding latent factor over each example. Being sequence-insensitive, SIRMF provides a much simpler mathematic form where during each training epoch the final training output of each latent factor is only decided by the related ratings and the initial value of corresponding latent features; in other words, the training result of each latent feature is no longer bounded by the temporal states of the counter features on the related ratings. Empirical studies show that the prediction accuracy of SI-RMF and RMF is very close; however, the change of learning rule in SI-RMF requires small learning rate to converge. One possible reason for this phenomenon is the relatively low updating frequency on the latent features in SI-RMF. Nonetheless, we also find that SI-RMF can be parallelized conveniently, since during each training epoch the parameters have no inter-dependence on each other. Therefore, it is feasible to speed up the training process via parallel programming. We plan to further investigate this point in our future work. With SI-RMF, we subsequently propose the incremental recommender IRMF, which can incrementally update specified latent feature through constructing the expression over K + 1 examples based on the trained model over K examples. And further, by integrating the linear biases which can be also updated incrementally into the IRMF model, we propose the IRMF-B model. From the experimental results on two large real datasets, we find that both models can incrementally learn from the newly arriving examples efficiently; however, due to the incorporation of linear biases that can offer supplementary description of the characteristics of the rating data, IRMF-B proves to be more accurate. A drawback of our incremental models is the requirement to cache the interim training results after each epoch, which leads to high storage complexity. Currently, we address this issue by storing these interim results in external files, and loading the corresponding data when needed, for alleviating the burden on the memory. We also intend to seek for the better solution to this problem in the future. Acknowledgements We are very grateful to the anonymous reviewers for their informative and insightful comments, which help us to further improve this work. This research is fully supported by the National Key Technology R&D Program of China under Grant No. 2011BAH25B01. References [1] D. Goldberg, D. Nichols, B.M. Oki, D. Terry, Using Collaborative Filtering to Weave an Informationtapestry, Commun. ACM 35 (1992) 61–70. [2] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl, Grouplens: an open architecture for collaborative filtering of netnews, in: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, ACM, North Carolina, 1994. [3] K. Lang, NewsWeeder: learning to filter netnews, in: Proceedings of the 12th International Conference on Machine Learning, Morgan Kaufmann publishers Inc., 1995. [4] W. Hill, L. Stead, M. Rosenstein, G. Furnas, Recommending and evaluating choices in a virtual community of use, in: Proceedings of the 1995 SIGCHI Conference on Human Factors in Computing Systems, ACM, 1995. [5] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng. 17 (2005) 734–749. [6] J. Ma, G. Zhang, J. Lu, A state-based knowledge representation approach for information logical inconsistency detection in warning systems, Knowl-Based Syst. 23 (2010) 125–131. [7] B. Pathak, R. Garfinkel, R.D. Gopal, R. Venkatesan, F. Yin, Empirical analysis of the impact of recommender systems on sales, J. Manage. Inf. Syst. 27 (2010) 159–188. Table 4 The RMSE comparison on ML1M (The D.P. rows denote the decreasing percents of RMSE by the incremental models compared to the batch models). Method Count 50 K 100 K 200 K 400 K 500 K RMF 0.9758 0.9617 0.9688 0.9639 0.9651 IRMF 0.9469 0.9314 0.9342 0.9285 0.9249 D.P. (%) 2.97 3.15 3.57 3.67 4.17 RMF-B 0.9634 0.9501 0.9571 0.9525 0.9532 IRMF-B 0.9235 0.9096 0.9103 0.9042 0.9018 D.P. (%) 4.14 4.26 4.89 5.09 5.39 Table 5 The RMSE comparison on NF5M. Method Count 500 K 1 M 2 M 3 M 4.5 M RMF 1.1315 1.0889 1.0882 1.1588 1.4792 IRMF 1.0553 1.0076 0.9969 1.0592 1.3474 D.P. (%) 6.73 7.47 8.39 8.60 8.91 RMF-B 1.1173 1.0838 1.0561 1.0429 1.0523 IRMF-B 1.0416 1.0047 0.9655 0.9518 0.9589 D.P. (%) 6.78 7.30 8.58 8.74 8.88 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx 9 Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006
ARTICLE N PRESS X Luo et aL/ Knowledge-Based Systems xxx(2011)xx-Xx [8Y. Peng, G Kou, Y Shi, Z. Chen, A descriptive framework for the field of data [26] R Salakhutdinov, A Mnih, Probabilistic matrix factorization, Adv Neural Inf. lining and knowledge discovery. Int J. Inf. Tech. Process. Syst.(2008)20. 9]G ystems: a survey of the state-of-the-art and possible extensions, IEEE Trans. Y Kow& IEE owL Data Eng17(2005)734-749. [28] G. Gorrell, B. ebian algorithm [10] C. Hayes, P. Cunningham, Context boosting collaborative recommendations. is, in: Proceedings of the Interspeech 200 [29] G. Gorrell, Generalized He algorithm for incremental singular value ing formal conce of the 11 th on for Computational [12] J-M. Yang, K.F. Li, Recommendation based on rational inferences in 9)105-11 [13] J. Bobadilla, F. Serradilla, A. Hemando, Collaborative filtering adapted to 1141 B sarwar, C. karypis. 0. Konstan. , Reidl, Item based collaborative litering Proceedings of the 15th International Symposium on Methodologies oga Springs, Springer-Verlag, Berlin, 2005. [31]C. Miranda, A Jorg Itering for web recommendations, in: Proceedings of the 14th Portuguese Artificial Intelligence, Univ Aveiro, Springer-Verlag, Berlin, ACM Trans. Inf. Syst. 22(2004)143-177 [16]R. Bell, Y. Ko he 13th ACM SIGI ional Conference on knowledge 52o-ses the behavior of recommender systems. Knowl-Based Syst 23(2014 ACM SIGKDD Washington, D.C. 2010 ce on Knowledge Discovery and oceedings of the 5th International Conference on Computer and filtering model, in: Proceedings of the 14th ACM SIGKDD International ata mining. ACM, Neveda, 2008. [34]S. Rendle, L Schmidt-Thieme, Online-updating [19] B. Sarwar, G. Karypis, J. Konstan. J. ReidL, Application of dimensionality recommender systems-a case study, in: Proceedings of the ACM the 2nd ACm conference on Recommender m in, Latent semantic models for collaborative filtering, ACM Trans. Inf. [35 R. Motwani, P. 1996)33-37.Raghavan, Randomized Algorithms, ACM Comput. Surv. 28 21N. 221 [371J. Pizarro, E. Guerrero, P Galindo, Multiple rison procedures applied to Knowledge Discovery and Data Mining. ACM, California, 2007. nodel selection, Neural Comput. 48(2002)155-173 23B tive fltering 2006121 . html>,2006( visIt0902.11) [24]A. Paterek, Improving reg singular value decomposition for [39 G Shani, A onference on Knowledge Discovery and Data Mining. ACM, California, 2007. [25]G. Takacs, L. Pilaszy Bottyan Nemeth. D Tikky, Scalable collaborative filtering [40 C Chu, S.K. Kim, Y Lin, Y Yu, G Bradski, A.Y. Ng, K approaches for large recommender systems. ]. Mach. Learn. Res. 10(2009) machine learning on multicore, in: Proceedings of the 21st Ann onfercence on Neural Information Processing Systems, vancouver, 2007 Please cite this article in press as: X. Luo et al. Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Base yst.(2011).doi:10.1016/ knosys2011.09.006
[8] Y. Peng, G. Kou, Y. Shi, Z. Chen, A descriptive framework for the field of data mining and knowledge discovery, Int. J. Inf. Tech. Decis. 7 (2008) 639–682. [9] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng. 17 (2005) 734–749. [10] C. Hayes, P. Cunningham, Context boosting collaborative recommendations, Knowl-Based Syst. 17 (2004) 131–138. [11] P. Boucher-Ryan, D. Bridge, Collaborative recommending using formal concept analysis, Knowl-Based Syst. 19 (2006) 309–315. [12] J.-M. Yang, K.F. Li, Recommendation based on rational inferences in collaborative filtering, Knowl-Based Syst. 22 (2009) 105–114. [13] J. Bobadilla, F. Serradilla, A. Hernando, Collaborative filtering adapted to recommender systems of e-learning, Knowl-Based Syst. 22 (2009) 261–265. [14] B. Sarwar, G. Karypis, J. Konstan, J. Reidl, Item based collaborative filtering recommendation algorithms, in: Proceedings of the 10th International Conference on World Wide Web, ACM, Hong Kong, 2001. [15] M. Deshpande, G. Karypis, Item-based top-N recommendation algorithms, ACM Trans. Inf. Syst. 22 (2004) 143–177. [16] R. Bell, Y. Koren, Improved neighborhood-based collaborative filtering, in: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, California, 2007. [17] J. Bobadilla, F. Serradilla, J. Bernal, A new collaborative filtering metric that improves the behavior of recommender systems, Knowl-Based Syst. 23 (2010) 520–528. [18] Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Neveda, 2008. [19] B. Sarwar, G. Karypis, J. Konstan, J. Reidl, Application of dimensionality reduction in recommender systems – a case study, in: Proceedings of the ACM WebKDD 2000, ACM, 2000. [20] T. Hofmann, Latent semantic models for collaborative filtering, ACM Trans. Inf. Syst. 22 (2004) 89–115. [21] N. Srebro, J.D.M. Rennie, T.S. Jaakola, Maximum-margin matrix factorization, Adv. Neural Inf. Process. Syst. (2005) 17. [22] M. Kurucz, A. Benczúr, K. Csalogány, Methods for large scale svd with missing values, in: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, California, 2007. [23] B. Webb, Netflix Update: Try This at Home. , 2006 (visit 09.02.11). [24] A. Paterek, Improving regularized singular value decomposition for collaborative filtering, in: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, California, 2007. [25] G. Takács, I. Pilászy, Bottyán Németh, D. Tikky, Scalable collaborative filtering approaches for large recommender systems, J. Mach. Learn. Res. 10 (2009) 623–656. [26] R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, Adv. Neural Inf. Process. Syst. (2008) 20. [27] Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender systems, IEEE Comput. 42 (2009) 30–37. [28] G. Gorrell, B. Webb, Generalized Hebbian algorithm for latent semantic analysis, in: Proceedings of the Interspeech 2005, 2005. [29] G. Gorrell, Generalized Hebbian algorithm for incremental singular value decomposition in natural language processing, in: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, 2006. [30] M. Papagelis, R. Ioannis, D. Plexousakis, E. Theoharopoulos, Incremental collaborative filtering for highly-scalable recommendation algorithms, in: Proceedings of the 15th International Symposium on Methodologies for Intelligent Systems, Saratoga Springs, Springer-Verlag, Berlin, 2005. [31] C. Miranda, A. Jorge, Item-based and user-based incremental collaborative filtering for web recommendations, in: Proceedings of the 14th Portuguese Conference on Artificial Intelligence, Univ Aveiro, Springer-Verlag, Berlin, 2009. [32] D. Agarwal, B.-C. Chen, P. Elango, Fast online learning through offline initialization for time-sensitive recommendation, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Washington, D.C, 2010. [33] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Incremental singular value decomposition algorithms for highly scalable recommender systems, in: Proceedings of the 5th International Conference on Computer and Information Science, 2002. [34] S. Rendle, L. Schmidt-Thieme, Online-updating regularized kernel matrix factorization models for large-scale recommender systems, in: Proceedings of the 2nd ACM conference on Recommender systems, ACM, Lausanne, Switzerland, 2008. [35] R. Motwani, P. Raghavan, Randomized Algorithms, ACM Comput. Surv. 28 (1996) 33–37. [36] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proceedings of the 14th International Joint Conference on Artificial Intelligence, San Mateo, 1995. [37] J. Pizarro, E. Guerrero, P. Galindo, Multiple comparison procedures applied to model selection, Neural Comput. 48 (2002) 155–173. [38] J. Herlocker, J. Konstan, L. Terveen, J. Riedl, Evaluating collaborative filtering recommender systems, ACM Trans. Inf. Syst. 22 (2004) 5–53. [39] G. Shani, A. Gunawardana, Evaluating Recommendation Systems. , 2009 (visit 09.02.11). [40] C. Chu, S.K. Kim, Y. Lin, Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun, Map-reduce for machine learning on multicore, in: Proceedings of the 21st Annual Confercence on Neural Information Processing Systems, Vancouver, 2007. 10 X. Luo et al. / Knowledge-Based Systems xxx (2011) xxx–xxx Please cite this article in press as: X. Luo et al., Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization, Knowl. Based Syst. (2011), doi:10.1016/j.knosys.2011.09.006