REQUEST: A Query language for Customizing recommendations Gediminas adomavicius Department of Information and Decision Sciences Carlson school of management University of Minnesota das @umn. edi Alexander tuzhilin Information, Operations Management Sciences Department Stern School of business New York University atuzhili@stern. nyu. edu Rong zheng Information, Operations Management Sciences Department Stern school of business New York University zheng @stern. nyu. edu Abstract Initially popularized by amazon. com, recommendation technologies have become widespread over the past several years. However, the types of recommendations available to the users in these recommender systems are typically determined by the vendor and therefore are not flexible. In this paper we address this problem by presenting the recommendation query language REQUEST that allows users to customize recommendations by formulating them in the ways satisfying personalized needs of the users. REQUESt is based on the multidimensional model of recommender systems that supports additional contextual dimensions besides classical User and Item dimensions and also OLAP-type aggregation and filtering capabilities The paper also presents a recommendation algebra, shows how REQUEST recommendations can be mapped into this algebra, and analyzes the expressive power of the query language and the algebra. The paper also shows how users can customize their recommendations using REQUEST queries through a series of examples Keywords: personalization, recommender systems, recommendation query language, recommendation
REQUEST: A Query Language for Customizing Recommendations Gediminas Adomavicius Department of Information and Decision Sciences Carlson School of Management University of Minnesota gedas@umn.edu Alexander Tuzhilin Information, Operations & Management Sciences Department Stern School of Business New York University atuzhili@stern.nyu.edu Rong Zheng Information, Operations & Management Sciences Department Stern School of Business New York University rzheng@stern.nyu.edu Abstract Initially popularized by Amazon.com, recommendation technologies have become widespread over the past several years. However, the types of recommendations available to the users in these recommender systems are typically determined by the vendor and therefore are not flexible. In this paper we address this problem by presenting the recommendation query language REQUEST that allows users to customize recommendations by formulating them in the ways satisfying personalized needs of the users. REQUEST is based on the multidimensional model of recommender systems that supports additional contextual dimensions besides classical User and Item dimensions and also OLAP-type aggregation and filtering capabilities. The paper also presents a recommendation algebra, shows how REQUEST recommendations can be mapped into this algebra, and analyzes the expressive power of the query language and the algebra. The paper also shows how users can customize their recommendations using REQUEST queries through a series of examples. Keywords: personalization, recommender systems, recommendation query language, recommendation algebra
Introduction Recommender systems represent an important class of personalization technologies that help users to deal with information overload in e-commerce and numerous other applications. There has been much work done in the area of recommender systems over the past decade since the introduction of the first papers on the subject [12, 21, 22], especially after these technologies were popularized by amazon and Netflix, well as after the establishment of the $1,000,000 Netflix Prize Competition that attracted over 43,000 contestants from 180 countries [6]. A recent survey of the rapidly growing field of recommender systems can be found in [3] Most of the work in recommender systems focuses on a two-dimensional paradigm of recommending items to users or users to items(e.g, books to customers or customers for books). Although there are ifferent types of approaches to deriving recommendations, including the ranking- [10 and market- basket-analysis-based [18] the majority of the academic work in recommender systems and implementations of commercial systems, including Amazon and Netflix, focuses on the rating-based approach [3], where recommendations use explicit or implicit ratings provided by the end-users Rating-based approaches are usually classified into content-based, collaborative, and hybrid [7]. In content-based recommendation methods, rating R(u, i)of item i for user u is typically estimated based on the ratings R(u, i) assigned by the same user u to other items i 'that are "similar"to item i in terms of their content. For example, in order to recommend movies to user u, the content-based approach tries to understand user preferences by analyzing commonalities among the content of the movies user u has rated highly before. Then, only the movies that have a high degree of similarity with customers past preferences are recommended Collaborative recommender systems try to predict rating R(u, i) of item i for user u based on how other" similar"users u previously rated item i. Here user similarity "is defined in terms of the distance between the ratings users u and u assigned to the items that both of them rated, the most popular types of distance metrics being correlation- and cosine-based measures between two rating vectors [3]. Then collaborative filtering methods recommend those items to the user that she has not rated yet and that were
1 1. Introduction Recommender systems represent an important class of personalization technologies that help users to deal with information overload in e-commerce and numerous other applications. There has been much work done in the area of recommender systems over the past decade since the introduction of the first papers on the subject [12, 21, 22], especially after these technologies were popularized by Amazon and Netflix, as well as after the establishment of the $1,000,000 Netflix Prize Competition that attracted over 43,000 contestants from 180 countries [6]. A recent survey of the rapidly growing field of recommender systems can be found in [3]. Most of the work in recommender systems focuses on a two-dimensional paradigm of recommending items to users or users to items (e.g., books to customers or customers for books). Although there are different types of approaches to deriving recommendations, including the ranking- [10] and marketbasket-analysis-based [18], the majority of the academic work in recommender systems and implementations of commercial systems, including Amazon and Netflix, focuses on the rating-based approach [3], where recommendations use explicit or implicit ratings provided by the end-users. Rating-based approaches are usually classified into content-based, collaborative, and hybrid [7]. In content-based recommendation methods, rating R(u,i) of item i for user u is typically estimated based on the ratings R(u,i') assigned by the same user u to other items i' that are “similar” to item i in terms of their content. For example, in order to recommend movies to user u, the content-based approach tries to understand user preferences by analyzing commonalities among the content of the movies user u has rated highly before. Then, only the movies that have a high degree of similarity with customer’s past preferences are recommended. Collaborative recommender systems try to predict rating R(u,i) of item i for user u based on how other “similar” users u' previously rated item i. Here “user similarity” is defined in terms of the distance between the ratings users u and u' assigned to the items that both of them rated, the most popular types of distance metrics being correlation- and cosine-based measures between two rating vectors [3]. Then collaborative filtering methods recommend those items to the user that she has not rated yet and that were
highly rated by the similar use Content and collaborative methods can be combined into a hybrid approach in several different ways [7. One popular way to combine them is by learning and maintaining user profiles based on the content analysis of the items preferred by the users, and then directly comparing the resulting profiles to determine similar users in order to make collaborative recommendations. Other types of hybrid method are also possible and are described in [ 3] Although the traditional two-dimensional user/item paradigm described above is suitable for some applications, such as recommending books and music CDs, it is significantly less suitable for the context-rich"applications, such as traveling or shopping applications. For example, when recommending vacations to travelers, one would likely recommend a different vacation to a customer in the winter than in the summer, i.e., the time-of-travel context is clearly important when making recommendations. Similarly, when recommending groceries, a"" shopping cart [26] needs to take into account not only information about products and customers, but also such information as shopping date/time, store, who accompanies the primary shopper, products already placed into the shopping cart and its location in the store. Clearly, the two-dimensional paradigm of classical recommender systems is less suitable for these applications To provide better recommendations in such contextually rich applications, one may need to consider ther dimensions besides Item and User. For example, when Netflix or any on-demand movie provider recommends movies, it may consider such additional dimensions as Time when the movie was seen Company in which the movie was seen(e.g, alone, with friends, parents, etc. ) and Place in which it was seen(e. g, in the theater or at home). a completely different movie may be recommended by Netflix to a student when he wants to see it on a Saturday night with his girlfriend in a movie theater than when he wants to see it on Thursday evening with his parents at home. In 4, 5 we proposed a new multidimensional approach to recommender systems where we incorporated multiple dimensions and the OLAP-based cubes of ratings into the recommendation model. To estimate missing ratings in multidimensional cubes, we proposed the reduction-based method in [4] and the heuristic-based and
2 highly rated by the similar users. Content and collaborative methods can be combined into a hybrid approach in several different ways [7]. One popular way to combine them is by learning and maintaining user profiles based on the content analysis of the items preferred by the users, and then directly comparing the resulting profiles to determine similar users in order to make collaborative recommendations. Other types of hybrid methods are also possible and are described in [3]. Although the traditional two-dimensional user/item paradigm described above is suitable for some applications, such as recommending books and music CDs, it is significantly less suitable for the “context-rich” applications, such as traveling or shopping applications. For example, when recommending vacations to travelers, one would likely recommend a different vacation to a customer in the winter than in the summer, i.e., the time-of-travel context is clearly important when making recommendations. Similarly, when recommending groceries, a “smart” shopping cart [26] needs to take into account not only information about products and customers, but also such information as shopping date/time, store, who accompanies the primary shopper, products already placed into the shopping cart, and its location in the store. Clearly, the two-dimensional paradigm of classical recommender systems is less suitable for these applications. To provide better recommendations in such contextually rich applications, one may need to consider other dimensions besides Item and User. For example, when Netflix or any on-demand movie provider recommends movies, it may consider such additional dimensions as Time when the movie was seen, Company in which the movie was seen (e.g., alone, with friends, parents, etc.), and Place in which it was seen (e.g., in the theater or at home). A completely different movie may be recommended by Netflix to a student when he wants to see it on a Saturday night with his girlfriend in a movie theater than when he wants to see it on Thursday evening with his parents at home. In [4, 5] we proposed a new multidimensional approach to recommender systems where we incorporated multiple dimensions and the OLAP-based cubes of ratings into the recommendation model. To estimate missing ratings in multidimensional cubes, we proposed the reduction-based method in [4] and the heuristic-based and
model-based methods in [2] However, the multidimensional approach described in [4] and the classical two-dimensional recommendation methods have one significant limitation in common. These methods are hard-wired by the developers into the recommender systems, are inflexible and limited in their expressiveness, and therefore, neglect some possible needs of the users. For example, a typical recommender system would recommend the top k items to a user, or the best k users for a product. This situation is quite limited especially in multidimensional settings, where the number of possible recommendations increases significantly with the number of dimensions [5]. Therefore, there is a need to empower end-users and other stakeholders by providing them with the tools for expressing recommendations that are of interest to them [3, 15]. For example, Jane Doe may need a recommendation for the best two dates to go on vacation to Jamaica with her boy friend. Also, Netflix or an on-demand movie service, such as provided by the Time warner Cable, can envision a web-based interface to a multidimensional cube of ratings that lets the users express the recommendations that are of interest to them or automatically tailors recommendations based on a given context, such as the time of day or the day of week. For example, a certain user(. g, Tom) may seek recommendations for him and his girlfriend of top 3 movies and the best times to see them over the weekend, and he enters this request into the recommender system via the web-based interface. Such query-based recommendation applications are not limited to on-demand movies but are relevant to a broad range of recommendation applications, including retailing, financial, travel and other applications. Furthermore, we believe that flexible recommendation capabilities would be appealing to a variety of different users, and not just to the end-users who are direct recipients of recommendations. For example, such functionality would be useful to the analysts of a company providing recommendation services, who may want to take advantage of all the knowledge that their recommender system holds and analyze it from a variety of different perspectives("show me the top 2 movie genres for each user age bracket", etc. One tool for expressing such requests is a recommendation language that is similar to how database sers use query languages to retrieve information from databases. In fact, one may try to use SQL for this
3 model-based methods in [2]. However, the multidimensional approach described in [4] and the classical two-dimensional recommendation methods have one significant limitation in common. These methods are hard-wired by the developers into the recommender systems, are inflexible and limited in their expressiveness, and, therefore, neglect some possible needs of the users. For example, a typical recommender system would recommend the top k items to a user, or the best k users for a product. This situation is quite limited, especially in multidimensional settings, where the number of possible recommendations increases significantly with the number of dimensions [5]. Therefore, there is a need to empower end-users and other stakeholders by providing them with the tools for expressing recommendations that are of interest to them [3, 15]. For example, Jane Doe may need a recommendation for the best two dates to go on vacation to Jamaica with her boyfriend. Also, Netflix or an on-demand movie service, such as provided by the Time Warner Cable, can envision a web-based interface to a multidimensional cube of ratings that lets the users express the recommendations that are of interest to them or automatically tailors recommendations based on a given context, such as the time of day or the day of week. For example, a certain user (e.g., Tom) may seek recommendations for him and his girlfriend of top 3 movies and the best times to see them over the weekend, and he enters this request into the recommender system via the web-based interface. Such query-based recommendation applications are not limited to on-demand movies but are relevant to a broad range of recommendation applications, including retailing, financial, travel and other applications. Furthermore, we believe that flexible recommendation capabilities would be appealing to a variety of different users, and not just to the end-users who are direct recipients of recommendations. For example, such functionality would be useful to the analysts of a company providing recommendation services, who may want to take advantage of all the knowledge that their recommender system holds and analyze it from a variety of different perspectives (“show me the top 2 movie genres for each user age bracket”, etc.). One tool for expressing such requests is a recommendation language that is similar to how database users use query languages to retrieve information from databases. In fact, one may try to use SQL for this
purpose, and the above recommendation for Tom can be specified in SQL as SELECT R. Movield, R.Timeld, R. Userld, R. Companionld, AVG(R PersonalRating) FROM MovieRecommender r, User U, Time T, Companion C WHERE R. USerld= U. Userld AND R.Timeld=T.Timeld AND R. Companionld =C Compani AND U. Name=Tom"ANDT.TimeOfWeek="weekend"" AND C Type="Girlfriend GROUP BY R. Movield, R.Timeld, R. Userld, R. Companionld where User and Companion are the relations storing information about customers and different types of companions, MovieRecommender is the ratings table, and Time is the temporal dimension table Although doable, this SQl query and, more generally, SQL at large would have the following problems when used for recommendation purposes. First, notice that SQL does not exactly provide the requested recommendation: it returns the list of tuples(movies, times to see them, users, etc. ) but does not specify what is recommended to whom and does not provide the top 3 recommended movie/time pairs. More generally, as it will be shown in the paper, there are certain recommendations that cannot be expressed in sQL (but can in the language proposed in this paper ) Second, SQL is a comprehensive, general-purpose query language and, therefore, many of the possible SQL queries do not represent recommendations Therefore, in order to help the end-user formulate recommendations correctly and meaningfully, one may want to impose elaborate constraints on SQL to be able to restrict the language for the recommendation task. We have studied this issue and realized that it is very hard to develop a simple, elegant and intuitive system of such constraints for SQL. a better alternative would be to introduce a language that is directly defined on top of the"native" " multidimensional recommendation model. Third, the above SQl query is fairly cumbersome: it constitutes a join of four relational tables, has 6 conditions in the where clause has the group by statement and the aggregation function AVG. Clearly, there should be a better and a more intuitive way to express this simple type of recommendation, and this observation served as a direct motivation for developing a special-purpose recommendation language. This necessity to replace cumbersome SQL queries with more elegant and intuitive formulations grows substantially for the significantly more complex recommendations, such as the ones presented in Section 3. Fourth, this cumbersomeness may have not only a cognitive effect on the users writing queries, but could possibly also affect query performance in some cases, since processing multiple join queries can be a very time
4 purpose, and the above recommendation for Tom can be specified in SQL as SELECT R.MovieId, R.TimeId, R.UserId, R.CompanionId, AVG(R.PersonalRating) FROM MovieRecommender R, User U, Time T, Companion C WHERE R.UserId = U.UserId AND R.TimeId = T.TimeId AND R.CompanionId = C.CompanionId AND U.Name = “Tom” AND T.TimeOfWeek = “weekend” AND C.Type = “Girlfriend” GROUP BY R.MovieId, R.TimeId, R.UserId, R.CompanionId where User and Companion are the relations storing information about customers and different types of companions, MovieRecommender is the ratings table, and Time is the temporal dimension table. Although “doable,” this SQL query and, more generally, SQL at large would have the following problems when used for recommendation purposes. First, notice that SQL does not exactly provide the requested recommendation: it returns the list of tuples (movies, times to see them, users, etc.), but does not specify what is recommended to whom and does not provide the top 3 recommended movie/time pairs. More generally, as it will be shown in the paper, there are certain recommendations that cannot be expressed in SQL (but can in the language proposed in this paper). Second, SQL is a comprehensive, general-purpose query language and, therefore, many of the possible SQL queries do not represent recommendations. Therefore, in order to help the end-user formulate recommendations correctly and meaningfully, one may want to impose elaborate constraints on SQL to be able to restrict the language for the recommendation task. We have studied this issue and realized that it is very hard to develop a simple, elegant and intuitive system of such constraints for SQL. A better alternative would be to introduce a language that is directly defined on top of the “native” multidimensional recommendation model. Third, the above SQL query is fairly cumbersome: it constitutes a join of four relational tables, has 6 conditions in the WHERE clause, has the GROUP BY statement and the aggregation function AVG. Clearly, there should be a better and a more intuitive way to express this simple type of recommendation, and this observation served as a direct motivation for developing a special-purpose recommendation language. This necessity to replace cumbersome SQL queries with more elegant and intuitive formulations grows substantially for the significantly more complex recommendations, such as the ones presented in Section 3. Fourth, this cumbersomeness may have not only a cognitive effect on the users writing queries, but could possibly also affect query performance in some cases, since processing multiple join queries can be a very time-
consuming operation. In summary, the above issues can be attributed to the task and model mismatch SQL is a general-purpose query language, which makes it a less intuitive and a less useful tool for users in the vertical" application domain of recommender systems, where SQL may not have some specialized capabilities important for recommender systems. Also, SQL is based on the relational data model, and multidimensional recommendations on the multidimensional model 4] would need to be mapped into the relational model to support SQL queries, which leads to various translation problems. To avoid these issues, it is advantageous to develop a specialized recommendation language based on the characteristics of the application domain that also supports the multidimensional recommendation model, as we do in the To provide flexible and user-driven recommendations and to address the previously specified issues with using SQL as a recommendation language, we designed a new recommendation query language REQUEST, which allows its users to express in a flexible manner a broad range of recommendations that are tailored to their own individual needs and, therefore, more accurately reflect their interests. For example, the earlier recommendation for Tom can be expressed in REQUEST as RECOMMEND Movie, Time TO User, Companion USING MoVie Recommendet RESTRICT USer Name=“Tom” AND Time. Timeofweek= weekend” AND Companion.Type=ˇ Girlfriend ASed oN Personal Rating SHOW TOP 3 where Movie Recommender is a 5-dimensional cube of ratings having dimensions User Movie. Time Companion, and Theater; also, Personal Rating represents the ratings measure for the cube The above REqueSt query is based on the olaP paradigm [9], which is a natural choice for querying multidimensional recommender systems, since the data model of REQUEST matches the multidimensional data model of the ratings cube. Besides reQuest, we also present a multidimensional REQUEST is an acronym for REcommendation QUEry STatements. The initial query language, called RQL, was introduced in an earlier workshop paper [5], where only the preliminary ideas of how to define the query language were presented. In this paper, we systematically redesigned the language by formally introducing its syntax, semantics, and the corresponding recommendation algebra. This allowed us to significantly extend capabilities of the language over its preliminary version [5]. To reflect these major changes, we renamed the language from rQl to REQUEST
5 consuming operation. In summary, the above issues can be attributed to the task and model mismatch. SQL is a general-purpose query language, which makes it a less intuitive and a less useful tool for users in the “vertical” application domain of recommender systems, where SQL may not have some specialized capabilities important for recommender systems. Also, SQL is based on the relational data model, and multidimensional recommendations on the multidimensional model [4] would need to be mapped into the relational model to support SQL queries, which leads to various translation problems. To avoid these issues, it is advantageous to develop a specialized recommendation language based on the characteristics of the application domain that also supports the multidimensional recommendation model, as we do in the paper. To provide flexible and user-driven recommendations and to address the previously specified issues with using SQL as a recommendation language, we designed a new recommendation query language REQUEST, 1 which allows its users to express in a flexible manner a broad range of recommendations that are tailored to their own individual needs and, therefore, more accurately reflect their interests. For example, the earlier recommendation for Tom can be expressed in REQUEST as RECOMMEND Movie, Time TO User, Companion USING MovieRecommender RESTRICT User.Name = “Tom” AND Time.TimeOfWeek=“weekend” AND Companion.Type = “Girlfriend” BASED ON PersonalRating SHOW TOP 3 where MovieRecommender is a 5-dimensional cube of ratings having dimensions User, Movie, Time, Companion, and Theater; also, PersonalRating represents the ratings measure for the cube. The above REQUEST query is based on the OLAP paradigm [9], which is a natural choice for querying multidimensional recommender systems, since the data model of REQUEST matches the multidimensional data model of the ratings cube. Besides REQUEST, we also present a multidimensional 1 REQUEST is an acronym for REcommendation QUEry STatements. The initial version of our recommendation query language, called RQL, was introduced in an earlier workshop paper [5], where only the preliminary ideas of how to define the query language were presented. In this paper, we systematically redesigned the language by formally introducing its syntax, semantics, and the corresponding recommendation algebra. This allowed us to significantly extend capabilities of the language over its preliminary version [5]. To reflect these major changes, we renamed the language from RQL to REQUEST
ecommendation algebra that is used for defining certain" core"parts of REQUEST queries. We also describe how these core reQueST queries can be processed by mapping them into this algebra This paper makes the following contributions. It proposes language REQUEST for expressing flexible user-driven recommendations and presents its syntax and semantics. It also presents recommendation algebra RA, which enhances the systematic definition of REQUEST. We also show how the core rEQUeSt queries can be mapped into RA, thus providing a way to process these queries and compare the expressive power of REQUESt and ra 2. Background: Multidimensional Recommender Systems A multidimensional ratings cube is defined as a tuple(D, M, H, E, L)as follows Dimensions D). D=id,, d,2,. dni is a set of n dimensions, where each d, is a dimension name. For example, in addition to the standard User and Movie dimensions of the traditional movie recommender systems, such as Movielens [19], we consider other contextual dimensions [4, 5], such as Time, Theater and Companion, i.e., D=(User, Movie, Time, Theater, Companion) Attribute Hierarchies(H). Each dimension d; is represented by a set of attributes AF(ai, ., ait) where each ai is an attribute name; e.g. Atime=(Date, DayOfWeek, TimeOfWeek, Month, Quarter, Year). The domain of attribute x of dimension d is denoted as dom(dx), e.g., dom(Time. DayOfWeek)= Mon, Tue, Wed, Thu, Fri, Sat, Sun) and dom(Time. TimeOfWeek)= weekday, weekend j The multidimensional recommendation model allows for OLAP-based aggregation hierarchies [4, 5 that help aggregate ratings according to the methods described in [4]. In particular, attributes A, of dimension d; form a directed acyclic graph(i.e, a hierarchy )H;=(Ai, Ei) with set of nodes A; (i.e, each ode corresponds to an attribute)and set of edges Ei. There exists a directed edge in Hi from attribute E Ai to attribute y e Ai, iff every value of x uniquely determines the value of y, i.e., if attribute y is functionally dependent on attribute x. Such an edge will be denoted (x, y)or x]y. We will assume that has a single root node, Root(Hi, which we will call the key dimension attribute, consistent with the standard database terminology. Let H=( H1,..., Hn) Given hierarchy H, and attribute di- x EAi, we define SubGraph(H, d; x)to be a subgraph of H; rooted
6 recommendation algebra that is used for defining certain “core” parts of REQUEST queries. We also describe how these core REQUEST queries can be processed by mapping them into this algebra. This paper makes the following contributions. It proposes language REQUEST for expressing flexible user-driven recommendations and presents its syntax and semantics. It also presents recommendation algebra RA, which enhances the systematic definition of REQUEST. We also show how the core REQUEST queries can be mapped into RA, thus providing a way to process these queries, and compare the expressive power of REQUEST and RA. 2. Background: Multidimensional Recommender Systems A multidimensional ratings cube is defined as a tuple (D, M, H, E, L) as follows. Dimensions (D). D = {d1, d2, …, dn} is a set of n dimensions, where each di is a dimension name. For example, in addition to the standard User and Movie dimensions of the traditional movie recommender systems, such as MovieLens [19], we consider other contextual dimensions [4, 5], such as Time, Theater and Companion., i.e., D = {User, Movie, Time, Theater, Companion}. Attribute Hierarchies (H). Each dimension di is represented by a set of attributes Ai={ai1, …, ait} where each aij is an attribute name; e.g. Atime={Date, DayOfWeek, TimeOfWeek, Month, Quarter, Year}. The domain of attribute x of dimension d is denoted as dom(d.x), e.g., dom(Time.DayOfWeek) = { Mon, Tue, Wed, Thu, Fri, Sat, Sun } and dom(Time.TimeOfWeek) = { weekday, weekend }. The multidimensional recommendation model allows for OLAP-based aggregation hierarchies [4, 5] that help aggregate ratings according to the methods described in [4]. In particular, attributes Ai of dimension di form a directed acyclic graph (i.e., a hierarchy) Hi = (Ai, Ei) with set of nodes Ai (i.e., each node corresponds to an attribute) and set of edges Ei. There exists a directed edge in Hi from attribute x ∈ Ai to attribute y ∈ Ai, iff every value of x uniquely determines the value of y, i.e., if attribute y is functionally dependent on attribute x. Such an edge will be denoted (x, y) or xÆy. We will assume that Hi has a single root node, Root(Hi), which we will call the key dimension attribute, consistent with the standard database terminology. Let H = { H1, …, Hn }. Given hierarchy Hi and attribute di.x ∈Ai, we define SubGraph(Hi, di.x) to be a subgraph of Hi rooted
at dix, i.e., it defines the graph containing all the nodes and edges reachable from di-x Elements(E). Each dimension di in a cube is represented by a set of elements Er. For instance dimension Movie in our example is represented by all the movies available for the users to rate. For simplicity and without loss of generality, we use the domain of the key dimension attribute to represent the set of elements of d, i.e., E;: dom(root(Hi). An example of the elements' set for the User dimension would be a set of all user IDs available in the data. Let E=( El,..., Eni Measures(M). M=(mI, m2, ., mk) represents a set of measures, where each m, is a different type of a rating from domain dom(mi). The measures can either be numeric or Boolean. A numeric measure usually represents a discrete finite ordered value, e.g, a movie rating on the scale of (1,., N.A Boolean measure can be used to represent a"status flag denoting the state of a rating or its specific characteristic, e.g., indicating whether a given movie has been seen by a given user Example 1. Consider the application for recommending movies to users that has the following dimensions, each dimension defined by the attributes specified in parentheses Movie: the set of all the movies that can be recommended it is defined as Movie(MovieID Title, Length, Release Y ear, Director, Genre) User: the people to whom movies are recommended; it is defined as User(UserlD, Name, Address, Age, Gender, Profession) Theater: the movie theaters showing the movies; it is defined as Theater(TheaterlD, Name Address, Capacity, City, Time: the time when the movie can be or has been seen; it is defined as Time(Date, DayOfWeek, TimeofWeek, Month, Quarter, Year) ompanion: represents a person or a group of persons with whom one can see the movie. It defined as Companion( companion Type), where attribute companionType has values and“ others We also use three rating measures in this example: PublicRating, a numeric measure specifying how much the general public liked the movie; Personal Rating, a numeric measure specifying how much a particular person liked or is predicted to like the movie in the settings specified by the Time, Theater, and Companion dimensions; and Consumed, a Boolean measure specifying whether or not a given user has actually seen a given movie in a given context. The Personal Rating assigned to a movie by a person depends on where and how the movie has been
7 at di.x, i.e., it defines the graph containing all the nodes and edges reachable from di.x. Elements (E). Each dimension di in a cube is represented by a set of elements Ei. For instance, dimension Movie in our example is represented by all the movies available for the users to rate. For simplicity and without loss of generality, we use the domain of the key dimension attribute to represent the set of elements of di, i.e., Ei := dom(Root(Hi)). An example of the elements’ set for the User dimension would be a set of all user IDs available in the data. Let E = { E1, …, En }. Measures (M). M = {m1, m2, …, mk} represents a set of measures, where each mi is a different type of a rating from domain dom(mi). The measures can either be numeric or Boolean. A numeric measure usually represents a discrete finite ordered value, e.g., a movie rating on the scale of {1, …, N}. A Boolean measure can be used to represent a “status flag” denoting the state of a rating or its specific characteristic, e.g., indicating whether a given movie has been seen by a given user. Example 1. Consider the application for recommending movies to users that has the following dimensions, each dimension defined by the attributes specified in parentheses: • Movie: the set of all the movies that can be recommended; it is defined as Movie(MovieID, Title, Length, ReleaseYear, Director, Genre). • User: the people to whom movies are recommended; it is defined as User(UserID, Name, Address, Age, Gender, Profession). • Theater: the movie theaters showing the movies; it is defined as Theater(TheaterID, Name, Address, Capacity, City, State, Country). • Time: the time when the movie can be or has been seen; it is defined as Time(Date, DayOfWeek, TimeOfWeek, Month, Quarter, Year). • Companion: represents a person or a group of persons with whom one can see the movie. It is defined as Companion(companionType), where attribute companionType has values “alone”, “friends”, “girlfriend/boyfriend”, “family”, “co-workers”, and “others”. We also use three rating measures in this example: PublicRating, a numeric measure specifying how much the general public liked the movie; PersonalRating, a numeric measure specifying how much a particular person liked or is predicted to like the movie in the settings specified by the Time, Theater, and Companion dimensions; and Consumed, a Boolean measure specifying whether or not a given user has actually seen a given movie in a given context. The PersonalRating assigned to a movie by a person depends on where and how the movie has been
seen, with whom and at what time. Finally, we consider the following aggregation hierarchies Movie: MovielD Genre; User: UserID- Age, UserID Gender, UserID+Profession Theater: TheaterID> City State Country; Time: Date, DayOfWeek TimeOfWeek, Date> Month, Quarter, Year Cube Cells(L). Each cube is a partially defined rating function R from an n-dimensional space of E,X E. to a k- dimensional space of measures,ie.,R:E×…×En→>dom(m1)×….xdom(mk) Alternatively a cube can be perceived as a set of cells L, each cell /E L consisting of the tuple(address, content), i.e., 7 =( address, conten), where address=(a1,…,an),a∈E, and content=(B1,….,B),B∈dom(m) Since the mapping R is partial, content can also have value NULL for some cells. We also use the notation L[address]= content to refer to a specific cell, and l[address]. m; to refer to a specific measure within a cell. Furthermore, the ratings R(al,., an)of the recommendation space S=Eix E2x...x En are either explicitly provided by the users or are implicitly inferred by the system as described below. For example, R(Aviator, Jane, theaters, 2/19/2005, boyfriend)=(6,8, True)means that Jane gave rating 6 (i.e, Personal Rating=6)to"Aviator"that she actually saw (i.e, Consumed= True) with her boyfriend on February 19, 2005 in movie theater 5, but the general public gave the movie the rating of 8(i.e ublicRating =8) Given these preliminaries, the recommendation problem is defined as follows. First, the system need to estimate the unknown ratings and make the rating function R total [4]. Second, to make a recommendation, one needs to select certain non-overlapping""dimensions di,..., dik(k < n)and certain"for whom"dimensions dl,., diI(<n)and, accordingly, recommend for each tuple (a,l, c∈E1x, nEil tuple(a1,…,a;)∈Enx,, XEik maximizing the rating R(a1,…,an) across all the tuples (al,, an)coinciding with(al,., ainE Enix..xEjr on corresponding dimensions dil,.,djl Since the rating cube is only partially filled, it is important to estimate the unspecified ratings for ecommendation purposes. This multidimensional rating estimation problem is addressed in [4], where the reduction-based method of estimating unknown ratings in terms of the known ratings is presented. To
8 seen, with whom and at what time. Finally, we consider the following aggregation hierarchies: Movie: MovieID Æ Genre; User: UserID Æ Age, UserID Æ Gender, UserID Æ Profession; Theater: TheaterID Æ City Æ State Æ Country; Time: Date Æ DayOfWeek Æ TimeOfWeek, Date Æ MonthÆ QuarterÆ Year. Cube Cells (L). Each cube is a partially defined rating function R from an n-dimensional space of E1 × … × En to a k-dimensional space of measures, i.e., R: E1 × …× En → dom(m1) × … × dom(mk). Alternatively, a cube can be perceived as a set of cells L, each cell l ∈ L consisting of the tuple (address, content), i.e., l = (address, content), where address = (α1, …, αn), αi ∈ Ei, and content = (β1, …, βk), βi ∈ dom(mi). Since the mapping R is partial, content can also have value NULL for some cells. We also use the notation L[address] = content to refer to a specific cell, and L[address].mj to refer to a specific measure within a cell. Furthermore, the ratings R(α1, …, αn) of the recommendation space S = E1× E2× …× En are either explicitly provided by the users or are implicitly inferred by the system as described below. For example, R(Aviator, Jane, theater5, 2/19/2005, boyfriend) = (6, 8, True) means that Jane gave rating 6 (i.e., PersonalRating = 6) to “Aviator” that she actually saw (i.e., Consumed = True) with her boyfriend on February 19, 2005 in movie theater 5, but the general public gave the movie the rating of 8 (i.e., PublicRating = 8). Given these preliminaries, the recommendation problem is defined as follows. First, the system needs to estimate the unknown ratings and make the rating function R total [4]. Second, to make a recommendation, one needs to select certain non-overlapping “what” dimensions di1, …, dik (k < n) and certain “for whom” dimensions dj1, …, djl (l < n) and, accordingly, recommend for each tuple (αj1, …, αjl)∈ Ej1×…×Ejl tuple (αi1, …, αik)∈ Ei1×…×Eik maximizing the rating R(α1,…, αn) across all the tuples (α1,…, αn) coinciding with (αj1, …, αjl)∈ Ej1×…×Ejl on corresponding dimensions dj1,…,djl. Since the rating cube is only partially filled, it is important to estimate the unspecified ratings for recommendation purposes. This multidimensional rating estimation problem is addressed in [4], where the reduction-based method of estimating unknown ratings in terms of the known ratings is presented. To
understand how it works assume that we want to recommend a movie to jane Doe who wants to see it with her boyfriend on Saturday in a movie theater. If the Time dimension is partitioned into weekend and weekday components and since Saturday falls on a weekend, the reduction-based approach uses only the ratings for the movies seen on weekends by customers with their boyfriends/girlfriends in the movi theaters in order to provide recommendations for Jane Doe. It was shown that this approach outperforms the standard collaborative filtering in multidimensional settings under certain conditions [4]. Alternative multidimensional rating estimation methods include heuristic- and model-based approaches [2] In this paper we focus on the querying capabilities of the REQUEST language and, therefore, we assume that the multidimensional ratings cube is fully pre-computed before users start issuin recommendation queries. In other words, we assume that all the unknown ratings have been estimated using any of the aforementioned rating estimation techniques. How to perform rating estimation"on demand" based on the query that was issued on a partially filled ratings cube constitutes an interesting future research problem, as we mention in Section The work described in [4] focuses on presenting the multidimensional recommendation model and does not specify how to express a wide variety of recommendations that are possible in multidimensional settings. In the next section we address this limitation by presenting the query language REQUESt for expressing such recommendations 3. Recommendation Query Language REQUEST In this section, we describe the language by providing various examples of rEQUESt queries in Section 3.1, then present its syntax in Section 3.2 and semantics in Section 3. 3 3.1. Introducing REQUEST via Examples All the examples presented in this section are based on the 5-dimensional MovieRecommender schema from Example 1. The first example presents the most basic and traditional recommendation request Query 1: Recommend the best movies to users RECOMMEND Movie To User USING MOVIe Recommender based oN PersonalRating
9 understand how it works, assume that we want to recommend a movie to Jane Doe who wants to see it with her boyfriend on Saturday in a movie theater. If the Time dimension is partitioned into weekend and weekday components and since Saturday falls on a weekend, the reduction-based approach uses only the ratings for the movies seen on weekends by customers with their boyfriends/girlfriends in the movie theaters in order to provide recommendations for Jane Doe. It was shown that this approach outperforms the standard collaborative filtering in multidimensional settings under certain conditions [4]. Alternative multidimensional rating estimation methods include heuristic- and model-based approaches [2]. In this paper we focus on the querying capabilities of the REQUEST language and, therefore, we assume that the multidimensional ratings cube is fully pre-computed before users start issuing recommendation queries. In other words, we assume that all the unknown ratings have been estimated using any of the aforementioned rating estimation techniques. How to perform rating estimation “on demand” based on the query that was issued on a partially filled ratings cube constitutes an interesting future research problem, as we mention in Section 6. The work described in [4] focuses on presenting the multidimensional recommendation model and does not specify how to express a wide variety of recommendations that are possible in multidimensional settings. In the next section we address this limitation by presenting the query language REQUEST for expressing such recommendations. 3. Recommendation Query Language REQUEST In this section, we describe the language by providing various examples of REQUEST queries in Section 3.1, then present its syntax in Section 3.2 and semantics in Section 3.3. 3.1. Introducing REQUEST via Examples All the examples presented in this section are based on the 5-dimensional MovieRecommender schema from Example 1. The first example presents the most basic and traditional recommendation request. Query 1: Recommend the best movies to users: RECOMMEND Movie TO User USING MovieRecommender BASED ON PersonalRating