正在加载图片...
Algorithm Ours 002069063059055 059056051 Ours 0040620.55051046 TTerm PLSA NA 0.560.50.470.42 Table 1: Cross Validation Results: Results of term PLSA and i 80/ our algorithm at different e threshold levels 70 are shown in Table 1 The precision is highest when E =0.02, with precision at I being 69% and precision at 5 being 59%. The precision at I for term PLSA is 56% and precision at 5 being 47%. We also compared precision at 5 of our algorithm with that of 5060708090100110 term PLSA in each fold. In all folds, our algorithm using e Cross validation Precision 5 ranging from 0.2 to 0.4, consistently achieved higher preci sion than term PLSA. Our results showed that the propose Figure 2: Cross Validation and User Study Precision: algorithm outperforms term PLSA. This is because our algo- sion from the two evaluation methods are not strongly rithm, in addition to calculating the term vector cosine simi- lated larity used in term PLSA, also takes into account the similar- ity between one URL's tag vector and the other URLs term recommended tags vector. In addition, it is worth noting that the lower the E sim- the users original recommended tags usually affect ilarity threshold, the better precision we are able to obtain n which tags to use Suchanek et Associated tags were added to the dataset due to the ob- al, 2008 ervation that the user-specified tag set is often incomplete a total of 19 e. all with computer science back lowever, this step could potentially be"corrupting"our ground, participated in the experiment outlined above on 31 dataset. Our experiments showed that 95.52% of the cor- randomly picked URLs from the testing set of the first fold of ectly guessed tags are original, and only 4.48% matches the cross validation, and we collected at least five responses for associated tags. Therefore, we can deduce that the effect of every URL. The system recommended tags for the 31 URls corruption, if any, would be minimal were obtained from the first fold of cross validation for the For the URLs that had unsatisfactory tagging results, we E=0.02 experiment. Results from our user study show that found that they can generally be attributed into three general 32% of the tags the participants provided matches the top five causes. The first cause happens when the url is a frequently tags the system recommends. 69.45% of the tags the system updated webpage, e.g. daily news, so the tags for that page recommended were also marked as relevant by the parti would only be relevant to the contents of the page at the time pant the page content was crawled. Another situation is when the After analyzing the results of the user study, we discov page contents of the webpage is relatively small com- ered that some webpages have more tags marked as irrelevant to the advertisements on the webpage. Advertisements than other webpages, which may be because the tags recom- relevant to the contents of the page, yet they would mended by the system were of poor quality. We would also severely distort the term vector on short webpages. Parsing like to know whether the webpages that did poorly in the user HTML tags to remove sidebar information may be a solution study also achieved a low accuracy in the cross validation ex to this problem, but a myriad of webpage and sidebar formats periment. Therefore, using the results from the cross valida- adds to the difficulty of implementing this solution. Finally, tion experiment, we calculated the precision at 5 for each of webpages talking about a relatively rare topic(for example, the 31 URLs, and the average precision is 70.97%, which is spinning pens)would have few similar webpages, thus result- remarkably similar to the results obtained from the user study ing in decreased relevance of recommended tags (69.45%0). Since the obtained average precisions were very similar, we would like to know whether there is any correla 3 Results of User Study tion between the precision of the URLs obtained from the two In the second set of experiments, we gathered data on the pre- methods of evaluation cision of system-recommended tags through an user study. Figure 2 is the scatter plot showing the precision obtained The user study was conducted as follows. A participant was from cross validation versus the precision obtained from th first shown a webpage and asked to view the contents of the user study. The correlation is found to be 0. 47. A low correla- page. Next, the participant was asked to key in some tags tion means that the same URL achieves different accuracies in that he/she will use to annotate the page. Once the participant different methods of evaluation. For those cases where a URL confirms that he/she has completed the above step, the sys- achieves better in the user study than the cross validation, we m will show the participant the five best recommended tags can conclude that the ground truth set is incomplete, because rovided by our algorithm and ask the participant to mark the there are still some relevant tags missing in the ground truth tags they deem irrelevant to the website. We allowed the par- set, causing the precision to drop. For the other case, where ticipant to apply tags to the webpage first before viewing the a URL achieves better in cross validation than the user study, 2068Algorithm P@1 P@3 P@5 P@10 Ours 0.02 0.69 0.63 0.59 0.55 Ours 0.03 0.66 0.59 0.56 0.51 Ours 0.04 0.62 0.55 0.51 0.46 Term PLSA N/A 0.56 0.50 0.47 0.42 Table 1: Cross Validation Results: Results of term PLSA and our algorithm at different threshold levels. are shown in Table 1. The precision is highest when = 0.02, with precision at 1 being 69% and precision at 5 being 59%. The precision at 1 for term PLSA is 56% and precision at 5 being 47%. We also compared precision at 5 of our algorithm with that of term PLSA in each fold. In all folds, our algorithm, using ranging from 0.2 to 0.4, consistently achieved higher preci￾sion than term PLSA. Our results showed that the proposed algorithm outperforms term PLSA. This is because our algo￾rithm, in addition to calculating the term vector cosine simi￾larity used in term PLSA, also takes into account the similar￾ity between one URL’s tag vector and the other URL’s term vector. In addition, it is worth noting that the lower the sim￾ilarity threshold, the better precision we are able to obtain. Associated tags were added to the dataset due to the ob￾servation that the user-specified tag set is often incomplete. However, this step could potentially be “corrupting” our dataset. Our experiments showed that 95.52% of the cor￾rectly guessed tags are original, and only 4.48% matches the associated tags. Therefore, we can deduce that the effect of “corruption”, if any, would be minimal. For the URLs that had unsatisfactory tagging results, we found that they can generally be attributed into three general causes. The first cause happens when the URL is a frequently updated webpage, e.g. daily news, so the tags for that page would only be relevant to the contents of the page at the time the page content was crawled. Another situation is when the actual page contents of the webpage is relatively small com￾pared to the advertisements on the webpage. Advertisements are irrelevant to the contents of the page, yet they would severely distort the term vector on short webpages. Parsing HTML tags to remove sidebar information may be a solution to this problem, but a myriad of webpage and sidebar formats adds to the difficulty of implementing this solution. Finally, webpages talking about a relatively rare topic (for example, spinning pens) would have few similar webpages, thus result￾ing in decreased relevance of recommended tags. 4.3 Results of User Study In the second set of experiments, we gathered data on the pre￾cision of system-recommended tags through an user study. The user study was conducted as follows. A participant was first shown a webpage and asked to view the contents of the page. Next, the participant was asked to key in some tags that he/she will use to annotate the page. Once the participant confirms that he/she has completed the above step, the sys￾tem will show the participant the five best recommended tags provided by our algorithm and ask the participant to mark the tags they deem irrelevant to the website. We allowed the par￾ticipant to apply tags to the webpage first before viewing the Figure 2: Cross Validation and User Study Precision: Preci￾sion from the two evaluation methods are not strongly corre￾lated. recommended tags because recommended tags usually affect the user’s original intent on which tags to use [Suchanek et al., 2008]. A total of 19 people, all with computer science back￾ground, participated in the experiment outlined above on 31 randomly picked URLs from the testing set of the first fold of cross validation, and we collected at least five responses for every URL. The system recommended tags for the 31 URLs were obtained from the first fold of cross validation for the = 0.02 experiment. Results from our user study show that 32% of the tags the participants provided matches the top five tags the system recommends. 69.45% of the tags the system recommended were also marked as relevant by the partici￾pant. After analyzing the results of the user study, we discov￾ered that some webpages have more tags marked as irrelevant than other webpages, which may be because the tags recom￾mended by the system were of poor quality. We would also like to know whether the webpages that did poorly in the user study also achieved a low accuracy in the cross validation ex￾periment. Therefore, using the results from the cross valida￾tion experiment, we calculated the precision at 5 for each of the 31 URLs, and the average precision is 70.97%, which is remarkably similar to the results obtained from the user study (69.45%). Since the obtained average precisions were very similar, we would like to know whether there is any correla￾tion between the precision of the URLs obtained from the two methods of evaluation. Figure 2 is the scatter plot showing the precision obtained from cross validation versus the precision obtained from the user study. The correlation is found to be 0.47. A low correla￾tion means that the same URL achieves different accuracies in different methods of evaluation. For those cases where a URL achieves better in the user study than the cross validation, we can conclude that the ground truth set is incomplete, because there are still some relevant tags missing in the ground truth set, causing the precision to drop. For the other case, where a URL achieves better in cross validation than the user study, 2068
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有