正在加载图片...
section title headers 7000 New tags data points to feasibly analyze visually 6000 New users One form of tag vocabulary growth occurs at a diminishing rate over time, which we can perhaps=5,000 expect for a social bookmarking service, as it 9 implies increasing stability in the tag vocabulary. However,for CiteULike, the tag vocabulary seems 9 to be consistently growing. When we plotted the 3.000 new tags'cumulative frequency (their aggregate 3 2.000 ear, as the green line in Figure 4 shos p was lin e think this consistent growth is due to the 1,000 proportional increase in the number of new users. In the citeulike data we identified users as new 5791315171921232527 when they applied a tag for the first time. We cat- Month(November 2004 to February 2007) egorized new users across time (per month), and their cumulative frequency was a linear relation- Figure 4. Cumulative frequency of new tags and new users over ship(the red line in Figure 4), implying that they're time New tags and new users seem to be consistently growing in a also consistently growing linear fashio To compare the cumulative frequencies of new tags and new users across time on the same scale, we calculated the cumulative frequency percent- more accurate and robust tag reuse metric Shilad age. For new tags, we calculate the cumulative fre Sen and colleagues developed for MovieLens, one quency of new tags per month as a percentage of that calculates the number of users per tag accord the total number of tags for new users, and we cal- ing to the following formula culate cumulative frequency month as a percentage of the total number of users. tag reuse The cumulative frequency percentages of new 2(# of distinct users for each tag)/# of tags tags and new users over time are perfectly corre- lated (0.997), both growing at a linear rate and Given that each tag will have at least one associ dependent on each other, which is consistent with ated user, the minimum value for tag reuse is 1 our speculation that as new users apply tags, they users per tag. For CiteULike, tag reuse was 1.59 create new ones users per tag. This is fairly low for tag reuse based on baseline figures from the movielens Tag Reuse analysis. 3 For a social bookmarking service to be highly col We also calculated how many tag reuse occur- laborative, we expect the tag vocabulary to con- rences existed for each tag (number of tag appli verge and tag reuse to increase significantly over cations per tag minus one). The average number of time. We can measure tag reuse in many ways- tag reuse occurrences was 3.9; however, the medi for example, a simple metric is to calculate the an and modal numbers were both zero. This indi- number of tag reuse applications cates that most tags werent reused, but a few tags were reused many times. tag reuse applications Figure 5a shows how many tags have been tag applications -distinct tags reused The r-axis indicates tag reuse occurrences, whereas the y-axis indicates the number of tags The minimum value for tag applications is the We've sorted the data in ascending order of tag number of distinct tags, which implies that the reuse occurrences. For ple, data point“A minimum value for the number of tag reuse appli- indicates that 1,014 tags were reused once; data cations is zero(that is, there is no tag reuse). Using point"B"indicates that 514 tags were reused twice, his metric, CiteULike had 25, 715 tag reuse applica- and so on. The data resembles a power-law distri- tions in our analysis. bution: y= 2043.6.rb/, R=0.9469(the data set This number doesn't tell us a whole lot about included 3, 058 tags for a range of 1 to 48 tag reuse he amount of tag reuse, however. Thus, we use the occurrences). NOVEMBER. DECEMBER 2007data points to feasibly analyze visually). One form of tag vocabulary growth occurs at a diminishing rate over time, 5 which we can perhaps expect for a social bookmarking service, as it implies increasing stability in the tag vocabulary. However, for CiteULike, the tag vocabulary seems to be consistently growing. When we plotted the new tags’ cumulative frequency (their aggregate summation) across time, the relationship was lin￾ear, as the green line in Figure 4 shows. We think this consistent growth is due to the proportional increase in the number of new users. In the CiteULike data, we identified users as new when they applied a tag for the first time. We cat￾egorized new users across time (per month), and their cumulative frequency was a linear relation￾ship (the red line in Figure 4), implying that they’re also consistently growing over time. To compare the cumulative frequencies of new tags and new users across time on the same scale, we calculated the cumulative frequency percent￾age. For new tags, we calculate the cumulative fre￾quency of new tags per month as a percentage of the total number of tags for new users, and we cal￾culate cumulative frequency of new users per month as a percentage of the total number of users. The cumulative frequency percentages of new tags and new users over time are perfectly corre￾lated (0.997), both growing at a linear rate and dependent on each other, which is consistent with our speculation that as new users apply tags, they create new ones. Tag Reuse For a social bookmarking service to be highly col￾laborative, we expect the tag vocabulary to con￾verge and tag reuse to increase significantly over time. We can measure tag reuse in many ways — for example, a simple metric is to calculate the number of tag reuse applications: tag reuse applications = tag applications – distinct tags The minimum value for tag applications is the number of distinct tags, which implies that the minimum value for the number of tag reuse appli￾cations is zero (that is, there is no tag reuse). Using this metric, CiteULike had 25,715 tag reuse applica￾tions in our analysis. This number doesn’t tell us a whole lot about the amount of tag reuse, however. Thus, we use the more accurate and robust tag reuse metric Shilad Sen and colleagues developed for MovieLens, 3 one that calculates the number of users per tag accord￾ing to the following formula: tag reuse = ! (# of distinct users for each tag) / # of tags Given that each tag will have at least one associ￾ated user, the minimum value for tag reuse is 1.0 users per tag. For CiteULike, tag reuse was 1.59 users per tag. This is fairly low for tag reuse based on baseline figures from the MovieLens analysis. 3 We also calculated how many tag reuse occur￾rences existed for each tag (number of tag appli￾cations per tag minus one). The average number of tag reuse occurrences was 3.9; however, the medi￾an and modal numbers were both zero. This indi￾cates that most tags weren’t reused, but a few tags were reused many times. Figure 5a shows how many tags have been reused. The x-axis indicates tag reuse occurrences, whereas the y-axis indicates the number of tags. We’ve sorted the data in ascending order of tag reuse occurrences. For example, data point “A” indicates that 1,014 tags were reused once; data point “B” indicates that 514 tags were reused twice, and so on. The data resembles a power-law distri￾bution: y = 2043.6x–1.6727, R2 = 0.9469 (the data set included 3,058 tags for a range of 1 to 48 tag reuse occurrences). NOVEMBER • DECEMBER 2007 19 section title headers Figure 4. Cumulative frequency of new tags and new users over time.New tags and new users seem to be consistently growing in a linear fashion. 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Month (November 2004 to February 2007) Cumulative frequency New tags New users
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有