正在加载图片...
474 J Mending et al. Information Systems 35(2010)467-482 verb-object style--inform complainant--received only two counts overall. The estimated probabilit Rank totals for the three label types. being mentioned among the three most ambiguous ones was 0. 13 for verb-object labels, 0.24 for action-noun erb-object labels Action-noun labels Rest labels, and 0.45 for the rest group. The 95% confidence served ranked total 49 tervals show little overlap: 0.08-0 19 for verb-objec xpected ranked total 58 label. 0.17-031 for action-noun labels. and 032-0.58 for the rest, which correspond to our expectations. To calculate reliability of the assessments made by the study A second test uses the composite reliability measure pc. participants, we calculated Cohens Kappa [50] statistic to which represents the proportion of measure variance examine the level of agreement between study partici- attributable to the underlying trait. Scales with p values pants on which labels were most ambiguous. The Kappa greater than 0.5 are considered to be reliable 44].For the statistic measures inter-rater reliability whilst controlling PU measures, we obtained a Cronbach's o value of 0.857 for change agreement, and is the generally agreed to be and a pe value of 0.884, suggesting adequate reliability of the most adequate tool to measure inter-rater reliability the measures. to establish validity of the measures, we 51. We obtained a Kappa value of 0.607, which can be examined convergent and discriminant validity of the PU classified as substantial or good 51 measures. Convergent validity can be tested using three As per our hypothesis H1. we were interested in testing criteria suggested by Fornell and Larcker[54] whether the differences between the label types as noted are significant. An analysis of variance(ANovA)test was not applicable, since the variance of the variable values is (1)All indicator factor loadings should be significant and not homogeneously distributed and because the depen exceed 0.6 dent variable is not on scale level. Instead, we applied(2)Construct composite reliabilities p should exceed 0.8 Friedman s two-way analysis of variance by ranks 52 fo (3) Average variance extracted(Ave) by each construct each participant, we determined an individual ranking of chould exceed the variance due to measurement error the three label types. This was achieved as follows. For for that construct (i.e, AvE should exceed 0.50). each label type, we determined its relative proportion mong the labels that were rated as most ambiguous by that participant. This gives us 29 matched evaluations, Factor loadings for the two PU measures were 0.936 and leading to rank totals for the three label types as shown in 0.936 and significant at p=0.000. Composite reliability of Table 2. As can be seen, verb-object labels receive the the Pu construct was estimated to be 0.884, and average lowest rank total, which means that this type is least often variance extracted was computed to be 0.936. These considered as containing ambiguous labels We advance results suggest adequate convergent validity. To check the null hypothesis that there are no differences in for discriminant validity. ve considered whether mea- individual rankings of the three label types, i.e., that each sures used for the pu construct would cross-load on label type would be mentioned similarly in the top three other constructs considered (in our case, measures for lists in each of the 29 evaluations. In seeking to refute this notation familiarity ) The test for discriminant validity is null hypothesis, we computed the Friedman statistic z met when the AvE for each construct exceeds the squared Note that the Friedman statistic z is distributed approxi correlation between that and any other construct consi mately as chi square [52, p 168]. For this case, it turns out dered in the factor correlation matrix. The squared that z2=6.28 with df=2, which means a significant correlation between the pu and the familiarity factor difference in the rankings of the three labeling styles at a was computed to be 0.030, which shows that the ave 95% confidence level. This result lends support to measures for both PU(0.936)and notation familiarity hypothesis H1. We conclude that verb-object style label (0.927)well exceeded the squared correlation between are indeed least frequently perceived as being ambiguous, the factors. Appendix B summarizes factor loadings. followed by action-noun style labels, and finally rest communalities, and correlations. Next, to test the hypotheses, we first constructed a Perceived usefulness: In the third part of the question- box-plot for the average total factor s for the pu laire, we recorded the perceived usefulness of six activity variable and examined the rank correlations as well as the labels, two for each label type. We used two measures for differences in variance between the average total factor PU as described above. More specifically, the used scales scores for the different label types. Fig. 2 gives the box measure the extent to which a label is useful for under- plots. anding and improves the performance when understand As illustrated by the box-plot in Fig. 2, verb-object We received 174 responses(6x 29) that we were able labels were found to be best in terms of their perceived link to label types Based on these data, we examined usefulness, followed by action-noun labels, and then the the hypotheses H2a-H2c. rest group. Perusal of Table 3 further shows that the Before proceeding with hypothesis testing, we first reported 95% confidence intervals around the means kamined reliability and validity of the Pu measures used. hardly overlap between the label types. In particular. the Reliability refers to the internal consistency of scales verb-object style can easily be distinguished from the The most widely used test for internal consistency is action-noun style: the upper bounds of the confidence Cronbachs a, which should be higher than 0.8531 intervals for the action-noun style are strictly lower thanverb–object style—inform complainant—received only two counts overall. The estimated probability of a label for being mentioned among the three most ambiguous ones was 0.13 for verb–object labels, 0.24 for action-noun labels, and 0.45 for the rest group. The 95% confidence intervals show little overlap: 0.08–0.19 for verb–object label, 0.17–0.31 for action-noun labels, and 0.32–0.58 for the rest, which correspond to our expectations. To calculate reliability of the assessments made by the study participants, we calculated Cohen’s Kappa [50] statistic to examine the level of agreement between study partici￾pants on which labels were most ambiguous. The Kappa statistic measures inter-rater reliability whilst controlling for change agreement, and is the generally agreed to be the most adequate tool to measure inter-rater reliability [51]. We obtained a Kappa value of 0.607, which can be classified as substantial or good [51]. As per our hypothesis H1, we were interested in testing whether the differences between the label types as noted are significant. An analysis of variance (ANOVA) test was not applicable, since the variance of the variable values is not homogeneously distributed and because the depen￾dent variable is not on scale level. Instead, we applied Friedman’s two-way analysis of variance by ranks [52]. For each participant, we determined an individual ranking of the three label types. This was achieved as follows. For each label type, we determined its relative proportion among the labels that were rated as most ambiguous by that participant. This gives us 29 matched evaluations, leading to rank totals for the three label types as shown in Table 2. As can be seen, verb–object labels receive the lowest rank total, which means that this type is least often considered as containing ambiguous labels. We advance the null hypothesis that there are no differences in individual rankings of the three label types, i.e., that each label type would be mentioned similarly in the top three lists in each of the 29 evaluations. In seeking to refute this null hypothesis, we computed the Friedman statistic w2 r . Note that the Friedman statistic w2 r is distributed approxi￾mately as chi square [52, p. 168]. For this case, it turns out that w2 r ¼ 6:28 with df ¼ 2, which means a significant difference in the rankings of the three labeling styles at a 95% confidence level. This result lends support to hypothesis H1. We conclude that verb–object style labels are indeed least frequently perceived as being ambiguous, followed by action-noun style labels, and finally rest labels. Perceived usefulness: In the third part of the question￾naire, we recorded the perceived usefulness of six activity labels, two for each label type. We used two measures for PU as described above. More specifically, the used scales measure the extent to which a label is useful for under￾standing and improves the performance when understand￾ing. We received 174 responses ð6 29Þ that we were able to link to label types. Based on these data, we examined the hypotheses H2a–H2c. Before proceeding with hypothesis testing, we first examined reliability and validity of the PU measures used. Reliability refers to the internal consistency of scales. The most widely used test for internal consistency is Cronbach’s a, which should be higher than 0.8 [53]. A second test uses the composite reliability measure pc, which represents the proportion of measure variance attributable to the underlying trait. Scales with pc values greater than 0.5 are considered to be reliable [44]. For the PU measures, we obtained a Cronbach’s a value of 0.857, and a pc value of 0.884, suggesting adequate reliability of the measures. To establish validity of the measures, we examined convergent and discriminant validity of the PU measures. Convergent validity can be tested using three criteria suggested by Fornell and Larcker [54]: (1) All indicator factor loadings should be significant and exceed 0.6. (2) Construct composite reliabilities pc should exceed 0.8. (3) Average variance extracted (AVE) by each construct should exceed the variance due to measurement error for that construct (i.e., AVE should exceed 0.50). Factor loadings for the two PU measures were 0.936 and 0.936 and significant at p ¼ 0:000. Composite reliability of the PU construct was estimated to be 0.884, and average variance extracted was computed to be 0.936. These results suggest adequate convergent validity. To check for discriminant validity, we considered whether mea￾sures used for the PU construct would cross-load on other constructs considered (in our case, measures for notation familiarity). The test for discriminant validity is met when the AVE for each construct exceeds the squared correlation between that and any other construct consi￾dered in the factor correlation matrix. The squared correlation between the PU and the familiarity factor was computed to be 0.030, which shows that the AVE measures for both PU (0.936) and notation familiarity (0.927) well exceeded the squared correlation between the factors. Appendix B summarizes factor loadings, communalities, and correlations. Next, to test the hypotheses, we first constructed a box-plot for the average total factor scores for the PU variable, and examined the rank correlations as well as the differences in variance between the average total factor scores for the different label types. Fig. 2 gives the box plots. As illustrated by the box-plot in Fig. 2, verb–object labels were found to be best in terms of their perceived usefulness, followed by action-noun labels, and then the rest group. Perusal of Table 3 further shows that the reported 95% confidence intervals around the means hardly overlap between the label types. In particular, the verb–object style can easily be distinguished from the action-noun style: the upper bounds of the confidence intervals for the action-noun style are strictly lower than ARTICLE IN PRESS Table 2 Rank totals for the three label types. Verb–object labels Action-noun labels Rest Observed ranked total 49 57 68 Expected ranked total 58 58 58 474 J. Mendling et al. / Information Systems 35 (2010) 467–482
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有