正在加载图片...
2500 (a) Collab. Filtering phases Collab. Filterin times Figure 9: Performance Results: majors workflow 25005000750010000 Student Relation size (c)Student connections (d)Comparison functions (a) Related courses run times Figure 8: Performance results: collaborative filtering Figure 10: Performance Results: related courses workfiow 5. EXPERIMENTS In this section, we examine the feasibility and performance of ure 8(b)shows that it is not the input sizes that primarily affect flexible recommendations. For this purpose, we study different scalability but the density of ratings. However, as in most cases. workflows with different characteristics, i.e., different compariso our site has relatively sparse data. Only around 1 /5 of the students unctions, different number of operators, different input sizes, and different outputs. Our Flex Recs recommendation system is written Figure 8(d)shows how(average) times are shaped by the com- in Java on top of MySQL. In our evaluation, we used real data from parison function used. We compare the workflow execution times our production system. Times are in msecs. using the inverse Euclidean and Pearson for computing the sin ilarity between students. We pick these two functions from our Collaborative Filtering. This workflow(Example 3, Sec. 3.6) library, because they are quite representative in terms of compu- generates course recommendations for a student based on the rat- tational complexity and practical importance. In this way, we can ings of similar students. It is interesting because it is a very com get a sense of how "flexible"the system is when objects are com- non approach in practice and it uses two back-to-back recommend pared in different ways. As in Figure 8(a), we ran the workflow operators.Figure 8(a)shows execution times for different-size sets for different subsets of the student relation and on every user in from the student relation. The sets have been generated so that: the the respective subset and we take average times. The figure shows 2. 5K set is contained in the 5K set. which is contained in the 7. 5K it workflows using different functions can be executed in reason- et, and so forth. The workflow is run on every user in the able times for different sizes of the student relation We have also tive set and we take average times. Similarity between students is observed with the other workflows that using either of the two func- computed using Pearson. Gen Time is the time for building a rec- tions does not have a significant impact on performance. Overall, is the time required to execute a plan and collect the recommenda- regarding the flexibility of the system w r t. different types of ob- ions Query execution times dominate because the queries depend ject comparisons and its scalability w.r.t. the data processed. heavily on aggregations that are required for computing similarity scores. In what follows, our evaluation focuses on run times only Major recommendation. This workflow(Example 7, Sec. 3.6) Figure 8(b) shows the average execution times (on the left y. recommends majors to a student based on the majors of similar stu- xis) and the worst-case execution times(on the right Y-axis) for dents. In the evaluation, similarity between students is estimated different sizes of the student relation Increasing the number of stu using the Pearson correlation. We ran the workflow for different dents increases the number of comparisons(more students will be subsets of the student relation with 2. 5K. 5K and 10K students and compared to the individual for who recommendations are intended) computed majors for all students in each subset. Figure shows and ultimately increases the number of similar students found by the average (left Y-axis)and the worst-case(right Y-axis ey he first recommend operator, who in turn may contribute more times of the workfow for the different sizes of the student relation andidate courses as input to the second operator. On average. the We observe that for the average student in our dataset, the workflow orkflow scales very well as the number of students in the relation is very efficient(it takes less than 25msec in all cases ). The worst ncreases. We observed similar trends when varying the size of the case times are still very good. Worst-cases are students that are course relation. For the sake of space, we do not report them here ell-connected. It is interesting to note that the major recommen- Depending on the courses one has rated, one may becom dation workflow, although very close to the collaborative filterin ted with other students in multiple ways. The number of con- approach for courses, is more efficient: there are few majors but nections affects performance and explains the worst-case times ob. possibly many courses to recommend served in Figure 8(b). Figure 8(c)shows the effect of the number of Related courses. This workflow(Example 1, Sec. 3.6)finds connections to performance. We consider three cases in the whole related courses for a given course. In the experiment, we used the student relation(10K). The best-case user has no connections Jaccard similarity function on the course title and description and other students. The most-connected student connects to 2753 stu- a sample of 500 randomly selected courses. For each one of them. lents through co-rated courses. On average. a student connects to the workflow returned all related courses from the course relation. 992 students in the database. This figure in combination with Fig. Figure 10(a)shows the average execution times(on the left Y-axis)10 15 20 25 e (ms) Gen time Run time 0 5 10 2500 5000 7500 10000 Tim e Student Relation Size (a) Collab. Filtering phases 300 400 25 30 s) avg worst case 200 300 15 20 e (m worst-case 100 200 5 10 15 Tim e 0 0 T 5 2500 5000 7500 10000 Student Relation Size (b) Collab. Filtering run times 3000 250 300 350 ons ms) Time 2000 150 200 250 necti o e ( m Student-to￾Student 1000 50 100 150 Con n Tim e 0 0 50 B t A W t C Best Avg Worst Student Types (c) Student connections 10 15 20 25 e (ms) Pearson Euclidean 0 5 10 2500 5000 7500 10000 Tim e Student Relation Size (d) Comparison functions Figure 8: Performance results: collaborative filtering 5. EXPERIMENTS In this section, we examine the feasibility and performance of flexible recommendations. For this purpose, we study different workflows with different characteristics, i.e., different comparison functions, different number of operators, different input sizes, and different outputs. Our FlexRecs recommendation system is written in Java on top of MySQL. In our evaluation, we used real data from our production system. Times are in msecs. Collaborative Filtering. This workflow (Example 3, Sec. 3.6) generates course recommendations for a student based on the rat￾ings of similar students. It is interesting because it is a very com￾mon approach in practice and it uses two back-to-back recommend operators. Figure 8(a) shows execution times for different-size sets from the student relation. The sets have been generated so that: the 2.5K set is contained in the 5K set, which is contained in the 7.5K set, and so forth. The workflow is run on every user in the respec￾tive set and we take average times. Similarity between students is computed using Pearson. Gen Time is the time for building a rec￾ommendation plan, i.e., building the set of SQL queries. Run Time is the time required to execute a plan and collect the recommenda￾tions. Query execution times dominate because the queries depend heavily on aggregations that are required for computing similarity scores. In what follows, our evaluation focuses on run times only. Figure 8(b) shows the average execution times (on the left Y￾axis) and the worst-case execution times (on the right Y-axis) for different sizes of the student relation. Increasing the number of stu￾dents increases the number of comparisons (more students will be compared to the individual for who recommendations are intended) and ultimately increases the number of similar students found by the first recommend operator, who in turn may contribute more candidate courses as input to the second operator. On average, the workflow scales very well as the number of students in the relation increases. We observed similar trends when varying the size of the course relation. For the sake of space, we do not report them here. Depending on the courses one has rated, one may become con￾nected with other students in multiple ways. The number of con￾nections affects performance and explains the worst-case times ob￾served in Figure 8(b). Figure 8(c) shows the effect of the number of connections to performance. We consider three cases in the whole student relation (10K). The best-case user has no connections to other students. The most-connected student connects to 2753 stu￾dents through co-rated courses. On average, a student connects to 992 students in the database. This figure in combination with Fig￾Majors Page 1 100 150 200 250 20 30 e (ms) avg worst-case Page 1 0 50 100 0 10 2500 5000 7500 10000 Tim e Student Relation Size Figure 9: Performance Results: majors workflow 400 600 150 200 250 e (ms) avg worst-case 0 200 0 50 100 5000 10000 15000 20000 Tim e Course Relation Size (a) Related courses run times RelatedCourses Page 1 6000 9000 12000 300 400 500 600 ections e (ms) Time Course-to￾course Page 1 0 3000 6000 0 100 200 300 Best Avg Worst Conn e Tim e Course Types (b) Course connections Figure 10: Performance Results: related courses workflow ure 8(b) shows that it is not the input sizes that primarily affect scalability but the density of ratings. However, as in most cases, our site has relatively sparse data. Only around 1/5 of the students have more than 900 connections. Figure 8(d) shows how (average) times are shaped by the com￾parison function used. We compare the workflow execution times using the inverse Euclidean and Pearson for computing the sim￾ilarity between students. We pick these two functions from our library, because they are quite representative in terms of compu￾tational complexity and practical importance. In this way, we can get a sense of how “flexible” the system is when objects are com￾pared in different ways. As in Figure 8(a), we ran the workflow for different subsets of the student relation and on every user in the respective subset and we take average times. The figure shows that workflows using different functions can be executed in reason￾able times for different sizes of the student relation. We have also observed with the other workflows that using either of the two func￾tions does not have a significant impact on performance. Overall, the results of the collaborative filtering workflows are good news regarding the flexibility of the system w.r.t. different types of ob￾ject comparisons and its scalability w.r.t. the data processed. Major recommendation. This workflow (Example 7, Sec. 3.6) recommends majors to a student based on the majors of similar stu￾dents. In the evaluation, similarity between students is estimated using the Pearson correlation. We ran the workflow for different subsets of the student relation with 2.5K, 5K and 10K students and computed majors for all students in each subset. Figure 9 shows the average (left Y-axis) and the worst-case (right Y-axis) execution times of the workflow for the different sizes of the student relation. We observe that for the average student in our dataset, the workflow is very efficient (it takes less than 25msec in all cases). The worst￾case times are still very good. Worst-cases are students that are well-connected. It is interesting to note that the major recommen￾dation workflow, although very close to the collaborative filtering approach for courses, is more efficient: there are few majors but possibly many courses to recommend. Related courses. This workflow (Example 1, Sec. 3.6) finds related courses for a given course. In the experiment, we used the Jaccard similarity function on the course title and description and a sample of 500 randomly selected courses. For each one of them, the workflow returned all related courses from the course relation. Figure 10(a) shows the average execution times (on the left Y-axis)
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有