正在加载图片...
4. DATA ANALYSIS For the information similarity between a group and each group The goal of this study is to explore the informatio member, we measured not only the Jaccard similarity coefficient, patterns between group members or the patterns betwee but the group and member fractions. The latter two variables and the members. Especially, we are interested in the sin measure the direction of influence. For example, user A is one of shared information on four levels information item the members of group #I and group #l and user a have 450 items and macro and micro tags level similarity and 100 items, respectively. If there are 90 items in common, 90% of member A's collection is overlapped with the group #Is First, item level similarity measures the number of common items collection but only 20% of group #I's collection is covered by (ie articles) between two group members'collection or between member As collection. Depending on the way we counted the a group and one of the group members collection. This item information overlap the similarities are different. The member similarity is the most fundamental unit of measurement. Second, fraction(eq. 2)is the portion of shared information on the center we take into account metadata as a way to measure the similarity of a group member. On the other hand, the group fraction(eq 3) beyond the item level. Due to the iregular opportunistic nature of is the portion of shared information on the center of a group. For the bookmarking process, users with similar interests may not the above example, the member fraction of member A for the necessarily end up with very similar collection. Therefore, we group #I is 90% which is the portion of shared information in the mpare the users' interest similarities using metadata. Since the user A,'s collection and it is the same value of group fraction of information items in Citeulike are bibliographic references, the group #l for member A. The group fraction of member A for authorship is taken into consideration as metadata. For instance group #l is 20% and it is the same value with the member fraction two members may have two different papers written by one author. of group #l for member A. This relative similarity measures were This indicates that they are having similar interests even though counted for all levels, from item level and metadata level to micro they do not share exactly same item. Since the Citeulike users are and macro-level tags able to navigate articles by clicking authors name, we considered that the authorship metadata may be an important way for the papers Tag similarity was assessed by counting the number of shared tags M MNG G on two levels: micro level and macro level On micro-level a tag was counted as shared if it was used by both users to tag the same common information item. The rationale behind this approach is that if two users annotate the same tags on the same item they Figure 5. Information Overlap between Member and G nderstand that item as a similar meaning because tags are cognitive expression showing how users comprehend one item Member Fraction =(Member n Group)/Member with different viewpoints(Hung, Huang et al. 2008). Lastly, when Group Fraction=(Member n Group/Group two users do not share many identical information items but share many identical tags, they could be closely related. Therefore, we 5. THE RESULTS explored macro-level tag similarity, which counted common tags used by both users regardless of the tagged item 5.1 Information Similarity between two 4.1 Dependent Variables Group Members ince the sizes of items and tag collections varied dramatically In the following section, we tested whether and how from member to member or from group to group, we examined omed on. Since group activity is based on sir rs who participated in the same group not only absolute numbers(i.e. raw number of common items metadata, or tags) but relative (normalized) measurements. assumed that their personal collection may be lough to Specifically, we used two different sets of dependent variables for be a useful information source to each other the comparison between group members and the comparison First, we compared the absolute number of common items. Upper between a group and the members. two rows of the table 2 show the mean numbers of common For the calculation of information similarity between two group information items between the same group members pairs and embers, we used the Jaccard similarity ent- the portion between random pairs. The Mann-Whitney non-parametric test of shared items in both members'union set(refer to eq 1)-as an was used to assess the significance of the mean differences. The rected relative measures(Guy, Zwerdling et al. 2009) two users who are in the same group (M = 0.75)shared significantly larger number of common items than random pairs (M =0.02). The same results were observed in the comparison of relative similarity measures. A) B In the comparison metadata (common authors ), the numbers of common metadata in group member pairs(M and we also found the same results in relative similarity W42a237 were almost 3 times larger than that of the random pairs(M=2.77) Figure 4. Information Overlap between Member A and B These results are statistically significant (as described on the Jaccard Similarity Coefficient lower part of the Table 2) An B/(AU B) eq(1)4. DATA ANALYSIS The goal of this study is to explore the information sharing patterns between group members or the patterns between a group and the members. Especially, we are interested in the similarity of shared information on four levels - information item, metadata, and macro and micro tags level similarity. First, item level similarity measures the number of common items (i.e. articles) between two group members’ collection or between a group and one of the group members’ collection. This item similarity is the most fundamental unit of measurement. Second, we take into account metadata as a way to measure the similarity beyond the item level. Due to the irregular opportunistic nature of the bookmarking process, users with similar interests may not necessarily end up with very similar collection. Therefore, we compare the users’ interest similarities using metadata. Since the information items in Citeulike are bibliographic references, the authorship is taken into consideration as metadata. For instance, two members may have two different papers written by one author. This indicates that they are having similar interests even though they do not share exactly same item. Since the Citeulike users are able to navigate articles by clicking author’s name, we considered that the authorship metadata may be an important way for the users to find interesting papers. Tag similarity was assessed by counting the number of shared tags on two levels: micro level and macro level. On micro-level a tag was counted as shared if it was used by both users to tag the same common information item. The rationale behind this approach is that if two users annotate the same tags on the same item, they understand that item as a similar meaning because tags are cognitive expression showing how users comprehend one item with different viewpoints (Hung, Huang et al. 2008). Lastly, when two users do not share many identical information items but share many identical tags, they could be closely related. Therefore, we explored macro-level tag similarity, which counted common tags used by both users regardless of the tagged item. 4.1 Dependent Variables Since the sizes of items and tag collections varied dramatically from member to member or from group to group, we examined not only absolute numbers (i.e. raw number of common items, metadata, or tags) but relative (normalized) measurements. Specifically, we used two different sets of dependent variables for the comparison between group members and the comparison between a group and the members. For the calculation of information similarity between two group members, we used the Jaccard similarity coefficient – the portion of shared items in both members’ union set (refer to eq. 1) – as an undirected relative measures (Guy, Zwerdling et al. 2009). Figure 4. Information Overlap between Member A and B       A  B/A  B eq. 1 For the information similarity between a group and each group member, we measured not only the Jaccard similarity coefficient, but the group and member fractions. The latter two variables measure the direction of influence. For example, user A is one of the members of group #1 and group #1 and user A have 450 items and 100 items, respectively. If there are 90 items in common, 90% of member A’s collection is overlapped with the group #1’s collection but only 20% of group #1’s collection is covered by member A’s collection. Depending on the way we counted the information overlap, the similarities are different. The member fraction (eq. 2) is the portion of shared information on the center of a group member. On the other hand, the group fraction (eq. 3) is the portion of shared information on the center of a group. For the above example, the member fraction of member A for the group #1 is 90% which is the portion of shared information in the user A’s collection and it is the same value of group fraction of group #1 for member A. The group fraction of member A for group #1 is 20% and it is the same value with the member fraction of group #1 for member A. This relative similarity measures were counted for all levels, from item level and metadata level to micro and macro-level tags. Figure 5. Information Overlap between Member and Group      Member  Group/Member eq. 2 *+,    Member  Group/Group eq. 3 5. THE RESULTS 5.1 Information Similarity between two Group Members In the following section, we tested whether and how much two users who participated in the same group share common information. Since group activity is based on similar interests, we assumed that their personal collection may be similar enough to be a useful information source to each other. First, we compared the absolute number of common items. Upper two rows of the Table 2 show the mean numbers of common information items between the same group members’ pairs and between random pairs. The Mann-Whitney non-parametric test was used to assess the significance of the mean differences. The two users who are in the same group (M = 0.75) shared significantly larger number of common items than random pairs (M = 0.02). The same results were observed in the comparison of relative similarity measures. In the comparison metadata (common authors), the absolute numbers of common metadata in group member pairs (M = 7.78) were almost 3 times larger than that of the random pairs (M = 2.77) and we also found the same results in relative similarity powers. These results are statistically significant (as described on the lower part of the Table 2).
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有