正在加载图片...
Table 1. Overview of the acquired dataset #usernames: 128 448 44 215 #unique tags: 220 647 294 80 ems(pages, publications ) 962 3671 437245 fits more or less the power-law, with 421 840 pages tagged by one tag only and one page having 171 distinct tags2 Similar graph for the publications of the CiteULike dataset is shown on Fig. 2 Again, we see the power law distribution, with 625 658 publications tagged by one tag only and one publication tagged by 1 708 distinct tags We know already that power law distributions tend to arise in social systems where many people express their preferences among many options. Therefore, by observing the power law in both datasets, we were assured that datasets are valid difference hough users(which could be an issue especially for our delicious obtain e dataset)to perform a crowd-based analysis. However, we found an interesting popularity of tags. In our delicious dataset, we can find many tags which are shared among 5+% of all users whereas in the CiteULike dataset, a tag marked as"most active"on CiteULike website reaches popularity of 1 to 2 percents only. We continue in delicious feeds harvesting in order to determine, hether the overall popularity of tags marked as popular in our current dataset will decrease 3.2 Results We executed the algorithm 1(see section 2)several times on both folksonomies with different setups and were observing the resulting trees. The setups of the variables for different runs of the algorithm are summarized in the Table 2. The floating average overlap threshold from the table means that the actual threshold is computed"on-the-fly"as a fraction of current parent-child overlap The manual inspection of resulting hierarchies proved the viabilit proach, where meaningful relations between keywords were created uld definitely provide a good basis for a tag-based user modeling. The quality of the result depends highly on the configuration of the algorithm. The rest of this section is devoted to analysis of the impact of algorithms variables to the y We provide summarization of basic attributes of the produced hierarchies the Table 3, with examples on Fig 3 and 4(complete results can be seen ishttp:// the most tagged publication according to the CiteULike linkout database is surpris-Table 1. Overview of the acquired dataset. delicious citeulike #usernames: 128 448 44 215 #records: 2 957 144 5 228 356 #processed users: 2 234 44 215 #unique tags: 220 647 294 806 #unique items(pages,publications): 962 367 1 437 245 fits more or less the power-law, with 421 840 pages tagged by one tag only and one page having 171 distinct tags2 . Similar graph for the publications of the CiteULike dataset is shown on Fig. 2. Again, we see the power law distribution, with 625 658 publications tagged by one tag only and one publication tagged by 1 708 distinct tags3 . We know already that power law distributions tend to arise in social systems where many people express their preferences among many options. Therefore, by observing the power law in both datasets, we were assured that datasets are valid and contain enough users (which could be an issue especially for our delicious dataset) to perform a crowd-based analysis. However, we found an interesting difference in popularity of tags. In our delicious dataset, we can find many tags which are shared among 5+% of all users whereas in the CiteULike dataset, a tag marked as “most active” on CiteULike website reaches popularity of 1 to 2 percents only. We continue in delicious feeds harvesting in order to determine, whether the overall popularity of tags marked as popular in our current dataset will decrease. 3.2 Results We executed the algorithm 1 (see section 2) several times on both folksonomies with different setups and were observing the resulting trees. The setups of the variables for different runs of the algorithm are summarized in the Table 2. The floating average overlap threshold from the table means that the actual threshold is computed “on-the-fly” as a fraction of current parent-child overlap. The manual inspection of resulting hierarchies proved the viability of our approach, where meaningful relations between keywords were created, which would definitely provide a good basis for a tag-based user modeling. The quality of the result depends highly on the configuration of the algorithm. The rest of this section is devoted to analysis of the impact of algorithm’s variables to the created hierarchy. We provide summarization of basic attributes of the produced hierarchies in the Table 3, with examples on Fig. 3 and 4 (complete results can be seen 2 that page is http://www.scribd.com/ 3 the most tagged publication according to the CiteULike linkout database is surpris￾ingly “about:blank
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有