正在加载图片...
and the worst-case execution times (on the right Y-axis) for the workflow for different subsets of the course relation with 5K, 10K. 5K and 20K courses. We sliced the course relation so that each larger set is a superset of the smaller sets. If we compare Figure 10(a)and Figure 8(b), we see that worst times for the related-course workflow are higher than the collab- orative filtering worst execution times. Comparing Figures 10(b) and 8(c), we observe that there are more course-to-course con- (a)Number of Operators Size of results nections(based on common words in course title and description) than student-to-student (based on co-rated courses). The most connected (in the sample of 500 courses) has over 10000 connections to other courses while the most-connected student con- nects to 2753 students. On average, a course is connected to 5756 courses and the least connected course connects to just 7 courses. In the whole database. over half of the courses have over 5000 con- 50000 ections. The above observations lead to an important conclusion despite the density of course-to-course, the system performance is 薛 of Operators Relation Size still very reasonable even in the worst-cases (c)Filtering Operator Output (d) Reducing Input Size Friends-of-friends. Recommend operators can be combined in of interconnected operators. We study the ef- fect of a growing sequence of recommend operators to evaluate the Figure 11: Performance Results: friends-of-friends workflow stem performance for recommendation workflows that are more Figure 11(c) shows what happens when we select only the top 100 complex than content-based and collaborative filtering ones. similar students at the output of each recommend operator. Figure We start with a workflow that contains one recommend operato 11(d)shows that if we scale down the size of the input relations, we hat computes the set of similar students to an individual. We cal an scale system performance up to workflows of many operators these students"friends of the student then. we consider a work One known method to achieve that is to pre-cluster users so tha flow of two recommend operators: the second one takes as input user is compared only to users within the same cluster. the"friends"found by the first recommend operator and computes friends" of them. In the same way, we take workflows of 3, 4 and In summary, our experiments show that with Flex Recs it is easy 5 recommend operators in a row, the output of one being the input to create multiple workflows and execute them transparently over to the other, for computing"friends "of"friends". As the individ the same flexible recommendation system that combines extensi- ual who is input to the first operator, we use the most connected bility with reasonable performance student, i.e., a student that through his ratings is connected to the greatest number of students. Hence, we will study the worst-case 6. CONCLUSIONS AND FUTURE WORK execution times of friends-of-friends workflows In this paper, we have argued that we need to decouple the defi Figure 11(a) shows execution times using different functions(but nition of a recommendation process from its execution in order to the same within any single workflow for different workflow sizes. upport flexible recommendations over structured data. We have All comparisons are performed against the whole student relation presented FlexRecs, a framework for declaratively defining recom- (10K size). We explain these results with the help of Figure 11(b), mendations combining traditional relational operators and special which shows the size of the workflow output for different work- recommendation operator ing FlexRecs, designers can easily flow sizes. Essentially, this figure shows the output of each of the capture multiple recommendation paradigms and experiment with operators. Execution times for up to 3 operators reflect the impac novel recommendation strategies and users can dynamically per- of two factors: the increasing number of operators and the increas sonalize recommendation to fit their needs. We have provided sev- ing number of tuples processed. The first operator performs few eral example workflows that capture recommendation approaches comparisons(since comparing to an individual student) and finds found in literature as well as novel paradigms from our live course a small number of friends(100). The second recommend finds planning site. As an application of declarativeness principle, we friends for these 100 people, so the additional time that it needs have presented an extensible prototype system that realizes the pro- reasonable. But it finds many friends(over 2800), which are fed posed framework over a conventional RDBMS and we have dis- into the third recommend and that explains the gradual increase in cussed experimental results that show its potential and flexibility Figure I1(b) shows that the size of the output of the third and of timization is analogous to what a query engine d k fogle to study ecution times observed for three operators in a row. The execu- A nice feature of our framework is that it makes possible to study tion time of the third operator dominates the first two. However f multiple recommendation workf in a traditional ubsequent operators reaches an upper bound. This was expected database system, except that now we have to handle new opera since the size of the output could be at most equal to the size of the tors. We are currently working on scaling them over very large students relation, and in our case it is smaller, since not all students inputs. The framework opens the door to researchers to try differ are transitively"friends"with each other. Hence, Figure 11(b)re- ent implementations and optimizations for the introduced operators veals that when having more than 3 operators, we only experience and for recommendations. For example, the implementation of the in performance the additive effect of having more operators recommend tor depends on comparison functions and there Of course, we are using a worst-case scenario solely to stress tes are many candidates, some more effective and other more efficient. our system. In practice, we may have filters that reduce the size of The system could automatically balance complexity and effective- ss and identify the best option. similar students(Figure 11(c)), or its inputs, e.g. by selecting only semantics for the operators in order to make them more efficient, a subset of students to be compared(Figure 11(d). To illustrate, for example for building execution plans that re-order the operatorsand the worst-case execution times (on the right Y-axis) for the workflow for different subsets of the course relation with 5K, 10K, 15K and 20K courses. We sliced the course relation so that each larger set is a superset of the smaller sets. If we compare Figure 10(a) and Figure 8(b), we see that worst￾times for the related-course workflow are higher than the collab￾orative filtering worst execution times. Comparing Figures 10(b) and 8(c), we observe that there are more course-to-course con￾nections (based on common words in course title and description) than student-to-student (based on co-rated courses). The most￾connected course (in the sample of 500 courses) has over 10000 connections to other courses while the most-connected student con￾nects to 2753 students. On average, a course is connected to 5756 courses and the least connected course connects to just 7 courses. In the whole database, over half of the courses have over 5000 con￾nections. The above observations lead to an important conclusion: despite the density of course-to-course, the system performance is still very reasonable even in the worst-cases. Friends-of-friends. Recommend operators can be combined in sequences or trees of interconnected operators. We study the ef￾fect of a growing sequence of recommend operators to evaluate the system performance for recommendation workflows that are more complex than content-based and collaborative filtering ones. We start with a workflow that contains one recommend operator that computes the set of similar students to an individual. We call these students “friends” of the student. Then, we consider a work- flow of two recommend operators: the second one takes as input the “friends” found by the first recommend operator and computes “friends” of them. In the same way, we take workflows of 3, 4 and 5 recommend operators in a row, the output of one being the input to the other, for computing “friends” of “friends”. As the individ￾ual who is input to the first operator, we use the most connected student, i.e., a student that through his ratings is connected to the greatest number of students. Hence, we will study the worst-case execution times of friends-of-friends workflows. Figure 11(a) shows execution times using different functions (but the same within any single workflow) for different workflow sizes. All comparisons are performed against the whole student relation (10K size). We explain these results with the help of Figure 11(b), which shows the size of the workflow output for different work- flow sizes. Essentially, this figure shows the output of each of the operators. Execution times for up to 3 operators reflect the impact of two factors: the increasing number of operators and the increas￾ing number of tuples processed. The first operator performs few comparisons (since comparing to an individual student) and finds a small number of friends (100). The second recommend finds friends for these 100 people, so the additional time that it needs is reasonable. But it finds many friends (over 2800), which are fed into the third recommend and that explains the gradual increase in execution times observed for three operators in a row. The execu￾tion time of the third operator dominates the first two. However, Figure 11(b) shows that the size of the output of the third and of subsequent operators reaches an upper bound. This was expected since the size of the output could be at most equal to the size of the students relation, and in our case it is smaller, since not all students are transitively “friends” with each other. Hence, Figure 11(b) re￾veals that when having more than 3 operators, we only experience in performance the additive effect of having more operators. Of course, we are using a worst-case scenario solely to stress test our system. In practice, we may have filters that reduce the size of the output of a recommend operator, i.e., by selecting the top k most similar students (Figure 11(c)), or its inputs, e.g. by selecting only a subset of students to be compared (Figure 11(d)). To illustrate, 210000 280000 350000 e (ms) Pearson Euclidean 0 70000 140000 12345 Tim e # of Operators (a) Number of Operators (b) Size of results 100000 150000 200000 250000 e (ms) no filter top 100 0 50000 100000 12345 Tim e # of operators (c) Filtering Operator Output 70000 140000 210000 280000 350000 Time (ms) 2500 5000 7500 10000 0 70000 1 2 3 4 5 Relation Size T # of Operators (d) Reducing Input Size Figure 11: Performance Results: friends-of-friends workflow Figure 11(c) shows what happens when we select only the top 100 similar students at the output of each recommend operator. Figure 11(d) shows that if we scale down the size of the input relations, we can scale system performance up to workflows of many operators. One known method to achieve that is to pre-cluster users so that a user is compared only to users within the same cluster. In summary, our experiments show that with FlexRecs it is easy to create multiple workflows and execute them transparently over the same flexible recommendation system that combines extensi￾bility with reasonable performance. 6. CONCLUSIONS AND FUTURE WORK In this paper, we have argued that we need to decouple the defi- nition of a recommendation process from its execution in order to support flexible recommendations over structured data. We have presented FlexRecs, a framework for declaratively defining recom￾mendations combining traditional relational operators and special recommendation operators. Using FlexRecs, designers can easily capture multiple recommendation paradigms and experiment with novel recommendation strategies and users can dynamically per￾sonalize recommendation to fit their needs. We have provided sev￾eral example workflows that capture recommendation approaches found in literature as well as novel paradigms from our live course planning site. As an application of declarativeness principle, we have presented an extensible prototype system that realizes the pro￾posed framework over a conventional RDBMS and we have dis￾cussed experimental results that show its potential and flexibility. A nice feature of our framework is that it makes possible to study the optimization of multiple recommendation workflows. Such op￾timization is analogous to what a query engine does in a traditional database system, except that now we have to handle new opera￾tors. We are currently working on scaling them over very large inputs. The framework opens the door to researchers to try differ￾ent implementations and optimizations for the introduced operators and for recommendations. For example, the implementation of the recommend operator depends on comparison functions and there are many candidates, some more effective and other more efficient. The system could automatically balance complexity and effective￾ness and identify the best option. One could define more restricted semantics for the operators in order to make them more efficient, for example for building execution plans that re-order the operators
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有