Information Systems 35(2010)467-482 Contents lists available at Science Direct Informotion Information Systems ELSEVIER journalhomepagewww.elsevier.com/locate/infosys Activity labeling in process modeling: Empirical insights and recommendations J. Mendling a, * H.A. Reijers b ,].Recker b Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands Queensland University of Technology, 126 Margaret Street, QLD, 4000 Brisbane, australia ARTICLE INFO ABSTRACT Few studies have investigated the factors contributing to the successful practice cess modeling. In particular, studies that contribute siness process modeling process models that facilitate communication and understanding are scarce. Although the value of process models is not only dependent on the choice of graphical constructs Survey but also on their annotation with textual labels, there has been hardly any work on the Systems analysis and design uality of these labels. Accordingly, the research presented in this paper examines activity labeling practices in process modeling. Based on empirical data from process modeling practice, we identify and discuss different labeling styles and their use in ocess modeling praxis. We perform a grammatical analysis of these styles and use data from an experiment with process modelers to examine a range of hypotheses about the usability of the different styles. Based on our findings, we suggest specific programs of research towards better tool support for labeling practices. Our work contributes to the emerging stream of research investigating the practice of process modeling and thereby contributes to the overall body of knowledge about conceptual modeling e 2009 Elsevier B V. All rights reserved. 1. Introduction potentially other artifacts such as external stakeholders and performance metrics, see e.g.[6. Similar to other In recent years, the conceptual mapping of processes in forms of conceptual modeling, process models are first and foremost required to be intuitive and easily under reason engage in conceptual modeling [1 and is standable, especially in information systems project considered as a key instrument for the analysis and phases that are concerned with requirements documenta design of process-aware information systems [2], service- oriented architectures [3]. and web services [4] alike To Process modeling has been around for some 30 years. hat end, process models typically describe in a graphical However, only of late has research started to examine way at least the activities, events, states, and control flor quality aspects pertaining to process modeling. In fact, logic that constitute a business process 5. Additionally quality issues of conceptual modeling in general have or process models may also include information regarding recently been receiving increased attention in academia the involved data, organizational and IT resources, and [ 8]. Notwithstanding the research findings collected to date, surprisingly little is known about the actual " practice of process modeling"and the factors that Corresponding author. Tel. +49731389492. contribute to building agood"process model, for E-mail addresses: jan. mending@wiwi. hu-berlin de (. Mending) example one that aids human understanding of the ha. reijersetue nI (H.A. Reijers).j recker@qut. edu. au ( Recker). depicted business domain [9. Work has been carried 0306-4379/s-see front matter e 2009 Elsevier B.V. All rights reserved
Activity labeling in process modeling: Empirical insights and recommendations J. Mendling a,, H.A. Reijers b , J. Recker c a Humboldt-Universita¨t zu Berlin, Spandauer Straße 1, 10178 Berlin, Germany b Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands c Queensland University of Technology, 126 Margaret Street, QLD, 4000 Brisbane, Australia article info Keywords: Business process modeling Model quality Survey Systems analysis and design abstract Few studies have investigated the factors contributing to the successful practice of process modeling. In particular, studies that contribute to the act of developing process models that facilitate communication and understanding are scarce. Although the value of process models is not only dependent on the choice of graphical constructs but also on their annotation with textual labels, there has been hardly any work on the quality of these labels. Accordingly, the research presented in this paper examines activity labeling practices in process modeling. Based on empirical data from process modeling practice, we identify and discuss different labeling styles and their use in process modeling praxis. We perform a grammatical analysis of these styles and use data from an experiment with process modelers to examine a range of hypotheses about the usability of the different styles. Based on our findings, we suggest specific programs of research towards better tool support for labeling practices. Our work contributes to the emerging stream of research investigating the practice of process modeling and thereby contributes to the overall body of knowledge about conceptual modeling quality. & 2009 Elsevier B.V. All rights reserved. 1. Introduction In recent years, the conceptual mapping of processes in the form of process models has emerged as a primary reason to engage in conceptual modeling [1] and is considered as a key instrument for the analysis and design of process-aware information systems [2], serviceoriented architectures [3], and web services [4] alike. To that end, process models typically describe in a graphical way at least the activities, events, states, and control flow logic that constitute a business process [5]. Additionally, process models may also include information regarding the involved data, organizational and IT resources, and potentially other artifacts such as external stakeholders and performance metrics, see e.g., [6]. Similar to other forms of conceptual modeling, process models are first and foremost required to be intuitive and easily understandable, especially in information systems project phases that are concerned with requirements documentation and communication [7]. Process modeling has been around for some 30 years. However, only of late has research started to examine quality aspects pertaining to process modeling. In fact, quality issues of conceptual modeling in general have only recently been receiving increased attention in academia [8]. Notwithstanding the research findings collected to date, surprisingly little is known about the actual ‘‘practice of process modeling’’ and the factors that contribute to building a ‘‘good’’ process model, for example one that aids human understanding of the depicted business domain [9]. Work has been carried Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/infosys Information Systems ARTICLE IN PRESS 0306-4379/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2009.03.009 Corresponding author. Tel.: +49 7 313 89492. E-mail addresses: jan.mendling@wiwi.hu-berlin.de (J. Mendling), h.a.reijers@tue.nl (H.A. Reijers), j.recker@qut.edu.au (J. Recker). Information Systems 35 (2010) 467–482
J Mending et al./ Information Systems 35(2010)467-482 out, for instance, that examined the impact of process 2. Background model structure, model user competency and proces modeling language on process model understanding. In presenting the background to our research, we refer to While the impact of structural properties is clearly a theory of multimedia learning identified [10. it is also reported that model readers cience. This theory suggests that labeling practices are systematically overestimate their ability to draw correct indeed significant factors contributing to how well or how conclusions from a model [9]. It was also found that the poorly process models can be understood by their end users. choice of languages used for process modeling(e.g, BPMN To determine what a good labeling style is, we then identify versus EPCs) has only insignificant effects on proces different styles of labeling being used in practice. We model understanding [11]. Other research has successfully describe how the exploration of a large number of real-life cal constructs and their meaning process models gives us this insight One of the styles that is alidity encountered is the usage of verb-object labels. as this style n process models, e.g,[13. or is widely promoted in the literature [18-20], we formulate such as data and resources, e.g., several hypotheses on its presumed superiority over the [415 other styles encountered in our exploration. This situation raises the question of other antecedents of process model understandability. Most of the previous 2.1. Theoretical foundation york has focused on syntactic quality aspects [16]. In contrast, semantic and pragmatic aspects of model quality Dual Coding Theory [21] suggests that indiv have mostly been neglected. In particular, little attention two separate channels-visual and auditory-that they use has been devoted to a very essential task in process when processing information. The two channels comple modeling-the labeling of the graphical constructs, in particular of the constructs representing"activities"( ment each other, such that receiving simultaneous in- formation through each channel improves understanding compared to receiving information through one channel meaning of any construct in a process model is only revealed when model users read and intuitively under- material better when it is provided through both auditory tand the labels assigned to the construct. Current (i. e, words)and visual (ie, images)channels practice indicates that the labeling of activity constructs Based on this observation, the Cognitive Theory of is a rather arbitrary task in modeling initiatives and on Multimedia Learning(CTML)[23, 24] suggests that learn- ing material intended to be received, understood and that is sometimes done without a great deal of thought retained by its recipients should be presented using both [17. This can undermine the understandability of the resulting models in cases where the meaning of the labels words and pictures. This sounds conducive to the task of is ambiguous, not readily understandable, or simply process modeling, where both visual (graphical con- structs) and audit labels and text annotations counter-intuitive to the reader material are available to add information about a business Accordingly, in our work we seek to address this gap domain in a process model. However, due to the overall nd contribute to the existing line of work towards more limited number of graphical constructs used in a process understandable process models. The objective of our model-there are typically few if not only one graphical research is to investigate the styles that are in use to annotate activities in process models construct for representing activities-most of the critical affect the understandability of such models. More domain information is contained in the textual labels the constructs, viz., in auditory channels Based on CTML it precisely. the aim of this paper is to suggest, based on can thus be expected that model understanding can be our e pirical findings, an imperative style for modelers to create more understandable process models improved if better guidance can be provided for the act of We proceed as follows. In Section 2 we discuss abeling of process model constructs. the theoretical foundation for our work and investi- he general principle that our expectation builds on is described by Mayer[24 as the"multimedia principal" And Section 3 we discuss the design of conduct of and indeed,prior research on conceptual modeling has success. findings from an experiment with process modelers. In model understanding Empirically observable differences in Section 4 we then discuss the implications of our findings model understanding based on the multimedia principal nd suggest specific programs of research towards better support for process model labeling practices. We conclude were found, for instance, in the data modeling domain in Section 5 by reviewing our contributions, and discuss- [25, 26 as well as in the process modeling domain 11 2. 2. Labeling styles in practice For business process modeling, the labeling of con- i We recognize the need to extrapolate our research to other aspects structs such as activities is often more art than science. In the data. resource and control flow rspective. We deemed the focus on"activity constructs"a suitabl starting point for our endeavor due to the centrality of the"activity" 2 Indeed, most people read by speaking out the words of the text in their mind, which even suppresses visual activation [22
out, for instance, that examined the impact of process model structure, model user competency and process modeling language on process model understanding. While the impact of structural properties is clearly identified [10], it is also reported that model readers systematically overestimate their ability to draw correct conclusions from a model [9]. It was also found that the choice of languages used for process modeling (e.g., BPMN versus EPCs) has only insignificant effects on process model understanding [11]. Other research has successfully investigated the graphical constructs and their meaning in process models, e.g., [12], the expressiveness and validity of control flow aspects in process models, e.g., [13], or process-related aspects such as data and resources, e.g., [14,15]. This situation raises the question of other antecedents of process model understandability. Most of the previous work has focused on syntactic quality aspects [16]. In contrast, semantic and pragmatic aspects of model quality have mostly been neglected. In particular, little attention has been devoted to a very essential task in process modeling—the labeling of the graphical constructs, in particular of the constructs representing ‘‘activities’’ (or ‘‘tasks’’, or ‘‘work to be performed’’) in a process model. This is rather surprising given that—clearly—the true meaning of any construct in a process model is only revealed when model users read and intuitively understand the labels assigned to the construct. Current practice indicates that the labeling of activity constructs is a rather arbitrary task in modeling initiatives and one that is sometimes done without a great deal of thought [17]. This can undermine the understandability of the resulting models in cases where the meaning of the labels is ambiguous, not readily understandable, or simply counter-intuitive to the reader. Accordingly, in our work we seek to address this gap and contribute to the existing line of work towards more understandable process models. The objective of our research is to investigate the styles that are in use to annotate activities in process models and how these styles affect the understandability of such models.1 More precisely, the aim of this paper is to suggest, based on our empirical findings, an imperative style for modelers to create more understandable process models. We proceed as follows. In Section 2 we discuss the theoretical foundation for our work and investigate current labeling practices in process modeling. In Section 3 we discuss the design of, conduct of, and findings from an experiment with process modelers. In Section 4 we then discuss the implications of our findings and suggest specific programs of research towards better support for process model labeling practices. We conclude in Section 5 by reviewing our contributions, and discussing some conclusions. 2. Background In presenting the background to our research, we refer to a theory of multimedia learning originating from cognitive science. This theory suggests that labeling practices are indeed significant factors contributing to how well or how poorly process models can be understood by their end users. To determine what a good labeling style is, we then identify different styles of labeling being used in practice. We describe how the exploration of a large number of real-life process models gives us this insight. One of the styles that is encountered is the usage of verb–object labels. As this style is widely promoted in the literature [18–20], we formulate several hypotheses on its presumed superiority over the other styles encountered in our exploration. 2.1. Theoretical foundation Dual Coding Theory [21] suggests that individuals have two separate channels—visual and auditory—that they use when processing information. The two channels complement each other, such that receiving simultaneous information through each channel improves understanding compared to receiving information through one channel only. In other words, individuals understand informational material better when it is provided through both auditory (i.e., words) and visual (i.e., images) channels.2 Based on this observation, the Cognitive Theory of Multimedia Learning (CTML) [23,24] suggests that learning material intended to be received, understood and retained by its recipients should be presented using both words and pictures. This sounds conducive to the task of process modeling, where both visual (graphical constructs) and auditory (labels and text annotations) material are available to add information about a business domain in a process model. However, due to the overall limited number of graphical constructs used in a process model—there are typically few if not only one graphical construct for representing activities—most of the critical domain information is contained in the textual labels of the constructs, viz., in auditory channels. Based on CTML it can thus be expected that model understanding can be improved if better guidance can be provided for the act of labeling of process model constructs. The general principle that our expectation builds on is described by Mayer [24] as the ‘‘multimedia principal’’. And indeed, prior research on conceptual modeling has successfully demonstrated that the multimedia principal informs model understanding. Empirically observable differences in model understanding based on the multimedia principal were found, for instance, in the data modeling domain [25,26] as well as in the process modeling domain [11]. 2.2. Labeling styles in practice For business process modeling, the labeling of constructs such as activities is often more art than science. In ARTICLE IN PRESS 1 We recognize the need to extrapolate our research to other aspects of process models, such as the data, resource and control flow perspective. We deemed the focus on ‘‘activity constructs’’ a suitable starting point for our endeavor due to the centrality of the ‘‘activity’’ concept in process modeling. 2 Indeed, most people read by speaking out the words of the text in their mind, which even suppresses visual activation [22]. 468 J. Mendling et al. / Information Systems 35 (2010) 467–482
J. Mending et aL./Information Systems 35(2010) 467-482 469 practice, a number of Table 1 typically suggest a verb- Distribution of activity label styles in the SAP reference model. I. ver 出 This convention is similar to a style that is advocated in Action-noun labels guidelines that support the creation of understandable 11830 1201 19.838 use case descriptions, a widely accepted requirements tool 60% in object-oriented software engineering [27, 28 We will refer to this labeling style of activities as the verb-object style. But as much promotion it receives in the process reference model overall, this situation does not imply that modeling domain, both anecdotal evidence and causal the verb-object style is strictly enforced within this inspection of real process models indicate that this subset. Rather, it is applied to only about two-third of labeling style is neither universally nor consistently the"action-oriented (60% of all activity labels ). The applied. Even the practical guide for process modeling n-oriented"labels (34% of with ARIS 29, pp 66-70 shows models with both actions all activity labels) labels where the action is as verbs and as nouns. Also, one may think that the more grammatically captured as a noun. This noun can be information contained in the labels. the clearer the either a gerund of the verb or a noun that is derived from a ning will be to the reader. Recent research, howeve verb, like order processing or invoice verification. We will uncovered that shorter activity labels improve model refer to this style of labeling as the action-noun style. The derstanding 30] overall result from classifying all 19, 838 activity labels can To get a better idea of the variety in labeling styles be seen in Table 1 being applied in practice, we turn to the SAP Reference We will now consider these data in more detail Model [31]. The development of the SAP reference model More precisely, for each of the labeling styles found, we started in 1992 and first models were presented perform a grammatical analysis using the lexical database 03 [31. p. Vll]. Since then, it was developed furthe yordNet [35 to identify potential types of interpretation ersion 4.6 of SAP R/3, which was released in 2000. ambiguity. This grammatical analysis builds on the the sap reference model includes 604 business identification of syntactic categories such as noun and process models depicted using the Event-driven Process verb. Further categories like adjective and adverb could Chains(Epc) notation, capturing information about the also be used but do not pertain to activity labeling in SAP R/ 3 functionality to support the business processes in process modeling, which is why we excluded these a wide range of organizations. With the SAP solution being categories from our analysis. For many words, the he market leading tool in the Enterprise Systems market syntactic category can be identified purely syntactically, we feel that the examination of SaP process models gives as for instance with the word grammar, which is a noun. us a good understanding of the use of process models in Some words, however, are ambiguous regarding the real-life business contexts. Amongst other application category they belong to(when analyzed in isolation). areas, the SaP reference model denotes a frequently used Consider the word design, which can be a verb (to design) tool in the implementation of SAP systems [32], and much or a noun( the design) depending on the grammatical literature has covered its development and use 31 context As these examples from natural language prod Furthermore, it is frequently referenced in research papers sing show, ambiguity can be a significant impediment to s a typical reference model and used in previous ease of understanding. In light of this observation we thus examinations of process modeling, e.g, [10, 33, 34 argue that those labeling styles should be considere together, the 604 EPC models in the SaP reference process modeling that are least susceptible to ambiguity. m include ctivity labels, which we all We illustrate our argument with examples from the SAP lly inspected and classified In 94% of these cases Reference Model: instances activity labels refer to a certain Verb-object labels: Most of the verb-object labels seem action that should be undertaken, such as check billing intuitively understandable to us. Still, there are some block or order execution. This is not so for 6% of the labels, cases that are ambiguous from a grammatical point of because they neither include a verb nor a noun that view: The English language allows for a so-called zero refers to an action, consider, for instance, status analysis derivation beyond the suffix -ize and the suffix (iy cash position. We will refer to this style as the rest derivation of verbs from nouns [36]. As a consequence, the same word can both be a noun and a verb. Consider Note that the epC models considered were designed for example, the labels measure processing, export license based on the functionality and the terminology of the SAP heck, and process cost planning. They have in common ystem which might create different biases. On the one that the first word can be a verb, but reading it as an hand, system terminology could potentially be less object describing an action is also possible. Measure models o compared to labeling in conceptual design processing could potentially refer to the processing of a n the other hand the labels could be more measure or to the measurement of a processing. The same precise than labels in conceptual modeling practice. Yet, observation holds for the other labels. Some of these neither the high frequency of verb-object styles nor the ambiguities can be resolved by considering context variety of labeling styles in use directly suggest such bias. information, such as the labels of the other activities Despite the wide proliferation of 18, 648"acti the same process model. If the verb-object style was oriented"labels of the 19, 838 activity labels in the Sap consistently used as a standard throughout a process
practice, a number of informal guidelines exist that typically suggest a verb–object convention (e.g., approve order, verify invoice) for labeling activities, e.g., [18–20]. This convention is similar to a style that is advocated in guidelines that support the creation of understandable use case descriptions, a widely accepted requirements tool in object-oriented software engineering [27,28]. We will refer to this labeling style of activities as the verb–object style. But as much promotion it receives in the process modeling domain, both anecdotal evidence and causal inspection of real process models indicate that this labeling style is neither universally nor consistently applied. Even the practical guide for process modeling with ARIS [29, pp. 66–70] shows models with both actions as verbs and as nouns. Also, one may think that the more information contained in the labels, the clearer the meaning will be to the reader. Recent research, however, uncovered that shorter activity labels improve model understanding [30]. To get a better idea of the variety in labeling styles being applied in practice, we turn to the SAP Reference Model [31]. The development of the SAP reference model started in 1992 and first models were presented at CEBIT’93 [31, p. VII]. Since then, it was developed further until version 4.6 of SAP R/3, which was released in 2000. Overall, the SAP reference model includes 604 business process models depicted using the Event-driven Process Chains (EPC) notation, capturing information about the SAP R/3 functionality to support the business processes in a wide range of organizations. With the SAP solution being the market leading tool in the Enterprise Systems market we feel that the examination of SAP process models gives us a good understanding of the use of process models in real-life business contexts. Amongst other application areas, the SAP reference model denotes a frequently used tool in the implementation of SAP systems [32], and much literature has covered its development and use [31]. Furthermore, it is frequently referenced in research papers as a typical reference model and used in previous examinations of process modeling, e.g., [10,33,34]. Altogether, the 604 EPC models in the SAP reference model include 19,838 activity labels, which we all manually inspected and classified. In 94% of these cases (18,648 instances), the activity labels refer to a certain action that should be undertaken, such as check billing block or order execution. This is not so for 6% of the labels, because they neither include a verb nor a noun that refers to an action, consider, for instance, status analysis cash position. We will refer to this style as the rest category. Note that the EPC models considered were designed based on the functionality and the terminology of the SAP system which might create different biases. On the one hand, system terminology could potentially be less intuitive compared to labeling in conceptual design models. On the other hand, the labels could be more precise than labels in conceptual modeling practice. Yet, neither the high frequency of verb–object styles nor the variety of labeling styles in use directly suggest such bias. Despite the wide proliferation of 18,648 ‘‘actionoriented’’ labels of the 19,838 activity labels in the SAP reference model overall, this situation does not imply that the verb–object style is strictly enforced within this subset. Rather, it is applied to only about two-third of the ‘‘action-oriented’’ labels (60% of all activity labels). The remaining subset of the ‘‘action-oriented’’ labels (34% of all activity labels) denote labels where the action is grammatically captured as a noun. This noun can be either a gerund of the verb or a noun that is derived from a verb, like order processing or invoice verification. We will refer to this style of labeling as the action-noun style. The overall result from classifying all 19,838 activity labels can be seen in Table 1. We will now consider these data in more detail. More precisely, for each of the labeling styles found, we perform a grammatical analysis using the lexical database WordNet [35] to identify potential types of interpretation ambiguity. This grammatical analysis builds on the identification of syntactic categories such as noun and verb. Further categories like adjective and adverb could also be used but do not pertain to activity labeling in process modeling, which is why we excluded these categories from our analysis. For many words, the syntactic category can be identified purely syntactically, as for instance with the word grammar, which is a noun. Some words, however, are ambiguous regarding the category they belong to (when analyzed in isolation). Consider the word design, which can be a verb (to design) or a noun (the design) depending on the grammatical context. As these examples from natural language processing show, ambiguity can be a significant impediment to ease of understanding. In light of this observation we thus argue that those labeling styles should be considered in process modeling that are least susceptible to ambiguity. We illustrate our argument with examples from the SAP Reference Model: Verb–object labels: Most of the verb–object labels seem intuitively understandable to us. Still, there are some cases that are ambiguous from a grammatical point of view: The English language allows for a so-called zero derivation beyond the suffix -ize and the suffix (i)fy derivation of verbs from nouns [36]. As a consequence, the same word can both be a noun and a verb. Consider, for example, the labels measure processing, export license check, and process cost planning. They have in common that the first word can be a verb, but reading it as an object describing an action is also possible. Measure processing could potentially refer to the processing of a measure or to the measurement of a processing. The same observation holds for the other labels. Some of these ambiguities can be resolved by considering context information, such as the labels of the other activities in the same process model. If the verb–object style was consistently used as a standard throughout a process ARTICLE IN PRESS Table 1 Distribution of activity label styles in the SAP reference model. Verb–object labels Action-noun labels Rest Sum 11,830 6808 1201 19,838 60% 34% 6% 100% J. Mendling et al. / Information Systems 35 (2010) 467–482 469
470 J Mending et al./ Information Systems 35(2010)467-482 model, it would be clear to interpret the first term as a labeling styles on the pragmatic quality of process models in terms of unambiguously facilitating action [16 and Action-noun labels: With respect to action-noun labels, usage 37. We summarize our expectations as follows. some of these can be easily interpreted, but again there irst, we formulated and grounded our expectation that can be cases of grammatical ambiguity. Consider, for model understanding can be improved by guiding the act nstance, notification printing. Again, there are two poten of labeling following the theory of multimedia learning. In tial interpretations: a notification is printed, or someone is search for candidate guidelines for labeling activities, tified of a printing job. Alternatively, the verb could just anecdotal evidence the study of the sap reference model have been forgotten by the modeler. This interpretation is and our literature review suggest the verb-object labeling likely in cases where the action noun could also be an tyle to be the strongest candidate style. Our empirical object, like order, which can refer to both an action or exploration of the SaP reference model indeed confirmed bject. We call this type of ambiguity the action-object the wide application of this style in practice. Yet, we also mbiguity. In such cases, the model reader might be found that this style is not the only style being applied: a tempted to infer the action by considering the context of large fraction of activity labels follows an action-noun he activity. Syntactically, the label could be easily style, and there are also other (rest styles to be found in extended with such semantically diverse verbs as start, process models. Our grammatical analysis of the three stop, or schedule. Using a verb-object style would have modeling styles, as described in the previous section. voided the problem of action-object ambiguity and the suggested that the verb-object style appears to be the necessity of having to infer a verb to establish th least susceptible to various types of interpretation appropriate meanin ambiguity, indicating its superiority in terms of clarity of Rest labels: Some of the rest labels clearly point to specification. pecific business object, for instance status analysis cash In light of these observations, we suggest the following osition, such that a verb could potentially be inferred primary conjecture that we seek to test in our study. Based from the context. Yet there are also activity labels like on our grammatical analysis, we theorize that process DEOV and Jamsostek that are altogether difficult to under- modelers perceive the verb-object style to be superior to stand. Presumably, the first one refers to the german he action-noun and rest labeling style alongside two regulation for data storage and transmission (DEUV Datenerfassungs- und Ubertragungsverordnung) and the second to the Indonesian social security system. Clearly perceived ambiguity: the degree to which an individual labels of the"rest"category require crystal clear context believes that a label is am guous, an information otherwise an inference of the action to be perceived usefulness(PU): the degree to which an performed is a highly problematic task due to the individual believes that a label is useful for under occurrence of verb-inference ambiguity, i. e, the problem tanding the process modeled. of inferring from the context of the label the type of action to be performed as part of the considered proce task. This conjecture rests on the observation that the In conclusion the three different classe verb-objective style is less prone to result in misinter- ifferent types of ambiguities. For the verb-object style. d i we found instances of zero-derivation ambiguity in the grammatical analysis showed that it is least susceptible AP reference model. Altogether, we identified exactly to ambiguity. We thus advance the following two 600 labels with such ambiguity: these labels contained primary hypotheses we seek to test in this study. First, 23 different verbs including change, design, process, and we theorize that users working with process models report. For the action-noun style, this problem class is have a clear preference for labeling styles that avoid relevant, too. Furthermore, this style is susceptible to action-object ambiguity. if an action noun can also refer to an object. We counted 615 cases of such ambiguities. H1. Verb-object style labels are least frequently perceived Finally, the rest group of labels, which do not mention as being ambiguous, followed by action-noun style labels. ction at all, faces verb-inference ambiguity(1190 cases). and finally rest labels. These three ambiguity classes differ in occurrence fre- Second, we theorize that end users working with quency: while the zero-derivation ambiguity requires the models have different perceptions of the useful- unlikely combination of a verb and an action object, th ess of the labels for understanding the process modeled, action-object ambiguity is found more often since many dependent on the labeling style in which the label is documents in a business context are synonymous to articulated. More specifically n action noun(e.g, order, receipt, confirmation). The verb-inference ambiguity is the most significant one, since H2a. Verb-object style labels are perceived as more all labels of the rest group suffer from it useful for understanding the process model than action- noun 2.3. Hypothes H2b. Verb-object style labels are perceived as more On basis of the findings discussed above, our conter useful for understanding the process model than rest tion is to conjecture about the influence of choice of style label
model, it would be clear to interpret the first term as a verb. Action-noun labels: With respect to action-noun labels, some of these can be easily interpreted, but again there can be cases of grammatical ambiguity. Consider, for instance, notification printing. Again, there are two potential interpretations: a notification is printed, or someone is notified of a printing job. Alternatively, the verb could just have been forgotten by the modeler. This interpretation is likely in cases where the action noun could also be an object, like order, which can refer to both an action or an object. We call this type of ambiguity the action-object ambiguity. In such cases, the model reader might be tempted to infer the action by considering the context of the activity. Syntactically, the label could be easily extended with such semantically diverse verbs as start, stop, or schedule. Using a verb–object style would have avoided the problem of action-object ambiguity and the necessity of having to infer a verb to establish the appropriate meaning. Rest labels: Some of the rest labels clearly point to a specific business object, for instance status analysis cash position, such that a verb could potentially be inferred from the context. Yet there are also activity labels like DEU¨V and Jamsostek that are altogether difficult to understand. Presumably, the first one refers to the German regulation for data storage and transmission (DEU¨ V Datenerfassungs- und U¨ bertragungsverordnung) and the second to the Indonesian social security system. Clearly labels of the ‘‘rest’’ category require crystal clear context information, otherwise an inference of the action to be performed is a highly problematic task due to the occurrence of verb-inference ambiguity, i.e., the problem of inferring from the context of the label the type of action to be performed as part of the considered process task. In conclusion, the three different classes exhibit different types of ambiguities. For the verb–object style, we found instances of zero-derivation ambiguity in the SAP reference model. Altogether, we identified exactly 600 labels with such ambiguity; these labels contained 23 different verbs including change, design, process, and report. For the action-noun style, this problem class is relevant, too. Furthermore, this style is susceptible to action-object ambiguity, if an action noun can also refer to an object. We counted 615 cases of such ambiguities. Finally, the rest group of labels, which do not mention an action at all, faces verb-inference ambiguity (1190 cases). These three ambiguity classes differ in occurrence frequency: while the zero-derivation ambiguity requires the unlikely combination of a verb and an action object, the action-object ambiguity is found more often since many documents in a business context are synonymous to an action noun (e.g., order, receipt, confirmation). The verb-inference ambiguity is the most significant one, since all labels of the rest group suffer from it. 2.3. Hypotheses On basis of the findings discussed above, our contention is to conjecture about the influence of choice of labeling styles on the pragmatic quality of process models in terms of unambiguously facilitating action [16] and usage [37]. We summarize our expectations as follows. First, we formulated and grounded our expectation that model understanding can be improved by guiding the act of labeling following the theory of multimedia learning. In search for candidate guidelines for labeling activities, anecdotal evidence, the study of the SAP reference model, and our literature review suggest the verb–object labeling style to be the strongest candidate style. Our empirical exploration of the SAP reference model indeed confirmed the wide application of this style in practice. Yet, we also found that this style is not the only style being applied: a large fraction of activity labels follows an action-noun style, and there are also other (rest) styles to be found in process models. Our grammatical analysis of the three modeling styles, as described in the previous section, suggested that the verb–object style appears to be the least susceptible to various types of interpretation ambiguity, indicating its superiority in terms of clarity of specification. In light of these observations, we suggest the following primary conjecture that we seek to test in our study. Based on our grammatical analysis, we theorize that process modelers perceive the verb–object style to be superior to the action-noun and rest labeling style alongside two dimensions: perceived ambiguity: the degree to which an individual believes that a label is ambiguous, and perceived usefulness (PU): the degree to which an individual believes that a label is useful for understanding the process modeled. This conjecture rests on the observation that the verb–objective style is less prone to result in misinterpretation and confounding complexity. After all, our grammatical analysis showed that it is least susceptible to ambiguity. We thus advance the following two primary hypotheses we seek to test in this study. First, we theorize that users working with process models have a clear preference for labeling styles that avoid ambiguity: H1. Verb–object style labels are least frequently perceived as being ambiguous, followed by action-noun style labels, and finally rest labels. Second, we theorize that end users working with process models have different perceptions of the usefulness of the labels for understanding the process modeled, dependent on the labeling style in which the label is articulated. More specifically: H2a. Verb–object style labels are perceived as more useful for understanding the process model than actionnoun style labels. H2b. Verb–object style labels are perceived as more useful for understanding the process model than rest style labels. ARTICLE IN PRESS 470 J. Mendling et al. / Information Systems 35 (2010) 467–482
J. Mending et aL./Information Systems 35(2010) 467-482 H2C. Action-noun style labels are perceived as more 3. Research method useful for understanding the process model than rest 3.1. Research design and conduct Hypotheses H2a-H2c rest on the assumption that the To test the hypotheses advanced in the previous perceived usefulness of a label is negatively influenced by section, we developed a(self-administered )questionnaire the perceived ambiguity of the labeling style used, based to gather quantitative insights. With this questionnaire we on the contention that the grammatical style of a labeling asked participants about the perceived ambiguity of pe can lead to misinterpretation and confounding certain activity labels, as well as their perceived useful- omplexity. To gather empirical evidence for this conten- ness. Along with the questionnaire, we presented to the tion, we advance the following, additional hypothesis that participant a number of activity labels as part of a specific process model. This has been done for several reasons ss model is H3. Perceived ambiguity of a labeling style is nega- tively associated with the perceived usefulness of the interpreted in isolation. Various other labels in the model label and the control flow relationship between the activities establish a context against which a single label is In our study, we also need to consider that differences in nterpreted. Since we do not aim to gain insight int he perceptions about the ambiguity and usefulness of a labels per se but in their use in process models, we have to process model label can also stem from differences present all the labels that are discussed in the ques between the study participants. Recent experimental tionnaire in the context of a model. Second, we had to research on choose a model from practice; otherwise there would 26,38,39]. has indicated significant differences in the have been the risk that we would(unconsciously) tailor it understanding of conceptual models stemming from two to meet our hypotheses. Third, this process model had to characteristics of the conceptual model readers, these show a substantial variation in the labeling styles being being knowledge of the application domain (e.g. [381) and used so that we can limit potential bias in our research familiarity with the technique or notation used for ce ceptual modeling (eg, [ 26)). CTML [24 suggests that Following these considerations we selected a model nowledge of the domain covered in the of a complaint process from a department of a Dutch conceptual modeling lowers the cognitive load required governmental agency which is concerned with complaint to develop a mental model of the information displayed handling(see Fig. 1). The model follows the EPC nota- in the conceptual model, and hence, model under- tion, which is one of the most popular modeling standing will be easier. This is because readers can bring techniques in industry [1]. Indeed, it is the same to bear an understanding of the semantics, relevant technique as applied in the saP reference model. In an entities or procedures that make up the applie EPC, so-called functions (rectangles)correspond to the various tasks that may need to be executed (e.g, register or knowledge of the conceptual modeling artifact (i. e, receipt date of complaint letter ). Events (hexagons)de the method, technique or notation used) has been shown to increase the quality of the models produced cuted (eg,"customer at desk"). Logical connectors (e.g,[40, 41), and sometimes to increase the under (circles) define routing rules. In particular, there are standing of the models produced [38]. The noted interac three types of connectors: the logical AND for concur- tion effects of notation familiarity are speculated to rency, XOR for exclusive choices, and OR for inclusive stem from a modelers self-perception about his or her choices. Functions, events, and connectors are the modeling skills. In other words, a modeler that deems classical elements of control flow modeling. These himself or herself to be experienced, may approach routing elements are also included in other modeling modeling tasks and outcomes differently to someone that languages like BPMN, YAWL, and UML Activity Dia- believes oneself to be a novice grams, which supports generalizability and repeatability In light of these findings we thus advance the of our procedure. following, additional exploratory hypotheses that seek to The given model roughly describes the following investigate how knowledge about the application domain procedure to handle the complaints that the agency and familiarity with the process modeling notation used receives. A new case is opened if a new complaint is ct as moderating variables to the propositions outlined received--be it by means of a phone call, personal contact or letter. In some situations, the complaint must be referred to another party, either internal or external to the H4a. Knowledge about the application domain moderates agency involved. Internal referrals have to be put on a so- he strength of the relationship between labeling style and called incident agenda, while external referrals always rceived usefulness of the label require a confirmation. In both cases the referral is archived in parallel. As a final step in this procedure, the H4b. Familiarity with the process complainant is informed. If no referral is required,a tion moderates the strength of the re complaint analysis is conducted. Later, the complaint is ween labeling style and perceived u archived and the complainant is contacted, with ar label optional follow up(see Fig. 1)
H2c. Action-noun style labels are perceived as more useful for understanding the process model than rest style labels. Hypotheses H2a–H2c rest on the assumption that the perceived usefulness of a label is negatively influenced by the perceived ambiguity of the labeling style used, based on the contention that the grammatical style of a labeling type can lead to misinterpretation and confounding complexity. To gather empirical evidence for this contention, we advance the following, additional hypothesis that we will test: H3. Perceived ambiguity of a labeling style is negatively associated with the perceived usefulness of the label. In our study, we also need to consider that differences in the perceptions about the ambiguity and usefulness of a process model label can also stem from differences between the study participants. Recent experimental research on conceptual modeling, most notably [26,38,39], has indicated significant differences in the understanding of conceptual models stemming from two characteristics of the conceptual model readers, these being knowledge of the application domain (e.g., [38]) and familiarity with the technique or notation used for conceptual modeling (e.g., [26]). CTML [24] suggests that previous knowledge of the domain covered in the conceptual modeling lowers the cognitive load required to develop a mental model of the information displayed in the conceptual model, and hence, model understanding will be easier. This is because readers can bring to bear an understanding of the semantics, relevant entities or procedures that make up the application domain depicted in a model. Similarly, expertise or knowledge of the conceptual modeling artifact (i.e., the method, technique or notation used) has been shown to increase the quality of the models produced (e.g., [40,41]), and sometimes to increase the understanding of the models produced [38]. The noted interaction effects of notation familiarity are speculated to stem from a modeler’s self-perception about his or her modeling skills. In other words, a modeler that deems himself or herself to be experienced, may approach modeling tasks and outcomes differently to someone that believes oneself to be a novice. In light of these findings we thus advance the following, additional exploratory hypotheses that seek to investigate how knowledge about the application domain and familiarity with the process modeling notation used act as moderating variables to the propositions outlined above: H4a. Knowledge about the application domain moderates the strength of the relationship between labeling style and perceived usefulness of the label. H4b. Familiarity with the process modeling notation moderates the strength of the relationship between labeling style and perceived usefulness of the label. 3. Research method 3.1. Research design and conduct To test the hypotheses advanced in the previous section, we developed a (self-administered) questionnaire to gather quantitative insights. With this questionnaire we asked participants about the perceived ambiguity of certain activity labels, as well as their perceived usefulness. Along with the questionnaire, we presented to the participant a number of activity labels as part of a specific process model. This has been done for several reasons. First, a label in a business process model is never interpreted in isolation. Various other labels in the model and the control flow relationship between the activities establish a context against which a single label is interpreted. Since we do not aim to gain insight into labels per se but in their use in process models, we have to present all the labels that are discussed in the questionnaire in the context of a model. Second, we had to choose a model from practice; otherwise there would have been the risk that we would (unconsciously) tailor it to meet our hypotheses. Third, this process model had to show a substantial variation in the labeling styles being used so that we can limit potential bias in our research design. Following these considerations we selected a model of a complaint process from a department of a Dutch governmental agency, which is concerned with complaint handling (see Fig. 1). The model follows the EPC notation, which is one of the most popular modeling techniques in industry [1]. Indeed, it is the same technique as applied in the SAP reference model. In an EPC, so-called functions (rectangles) correspond to the various tasks that may need to be executed (e.g., register receipt date of complaint letter). Events (hexagons) describe the situation before and after a function is executed (e.g., ‘‘customer at desk’’). Logical connectors (circles) define routing rules. In particular, there are three types of connectors: the logical AND for concurrency, XOR for exclusive choices, and OR for inclusive choices. Functions, events, and connectors are the classical elements of control flow modeling. These routing elements are also included in other modeling languages like BPMN, YAWL, and UML Activity Diagrams, which supports generalizability and repeatability of our procedure. The given model roughly describes the following procedure to handle the complaints that the agency receives. A new case is opened if a new complaint is received—be it by means of a phone call, personal contact, or letter. In some situations, the complaint must be referred to another party, either internal or external to the agency involved. Internal referrals have to be put on a socalled incident agenda, while external referrals always require a confirmation. In both cases the referral is archived in parallel. As a final step in this procedure, the complainant is informed. If no referral is required, a complaint analysis is conducted. Later, the complaint is archived and the complainant is contacted, with an optional follow up (see Fig. 1). ARTICLE IN PRESS J. Mendling et al. / Information Systems 35 (2010) 467–482 471
J Mending et al. Information Systems 35(2010)467-482 phone ca priate incident system Incident agenda external party nform Follow up Fig. 1. The complaint handling proces The complaint process model in Fig. 1 is at the heart of on, preliminary knowledge of process modeling, and ire, which is subdivided into three parts. In number of EPC models created. These questions were used the first part we recorded demographic information about to gather information about the demographic distribution the participants including gender, years of tertiary educa
The complaint process model in Fig. 1 is at the heart of our questionnaire, which is subdivided into three parts. In the first part we recorded demographic information about the participants including gender, years of tertiary education, preliminary knowledge of process modeling, and number of EPC models created. These questions were used to gather information about the demographic distribution of the study participants. ARTICLE IN PRESS Incoming phone call Customer at desk Complaint letter XOR Call registration Complaint to be written down with form AZ2 Register receipt date of complaint letter Complaint at appropriate place Complaint analysis Follow up must be planned Contact required with complainant Complaint must be archived Client contact procedure XOR Follow up Archiving system V V Internal referral External referral Refer with form B4 Refer with form B2 V Referral in archiving system XOR Confirmation required Telephone confirmation to external party End V To be put on incident agenda Incident agenda Archiving system XOR V Inform Inform complainant End Fig. 1. The complaint handling process. 472 J. Mendling et al. / Information Systems 35 (2010) 467–482
J. Mending et aL./Information Systems 35(2010) 467-482 In order to measure knowledge about the application archiving system and incident agenda for the rest group. We complaints handling), we asked consider our selection strategy sufficiently randomized participants whether they had previous experience with based on the observation that neither our research complaints handling processes (yes/no). Since we did not objectives nor our hypotheses address the choice of word much domain knowledge in a student population items or the specificity of the word items used within of a more extensive scale(like the one described in these labels. Hence there was no motivation for us to dent was not considered. In order to measure respon- prefer any particular label over another. familiarity with the EPC notation, we adapted a three-item scale for notation familiarity developed by 3.2. Results Recker [42. which is based on Gemino and wands three pre-test questions about the familiarity, competence, and Demographics: The questionnaire of our survey w confidence of respondents with respect to an analysis filled out by 29 students who were at that time following nethod(see Appendix A and [26]). Accordingly, the three- a post-graduate course on process modeling at Eindhoven item familiarity scale assesses familiarity with the(EPC) University of Technology in the Netherlands. Participation process modeling notation in a sense of generally felt was voluntary, and as a reward we offered the students a familiarity(Fam1), self-perceived competence with the copy of the study results. Twenty-five participants were the notation(Fam3). Appendix A lists all items used in a notation(Fam2)and self-perceived confidence in usin male, while 4 were female. While some of the participants ly had followed university courses for one year, most of them had done so for three years or more, with 3. 8 years The second part of the questionnaire shows the process of study being the mean value. Half of the population had model as depicted in Fig. 1. In order to gather data to preliminary experience with business process modeling. examine hypothesis H1, the participants were asked either professionally or through previous courses. Four to identify the top three activity labels that they consider persons had not yet worked with EPCs, but the average to be the most ambiguous. In the third part, we sought to participant had known them for three months and created ather data to examine hypotheses H2a-H2c. In order to 10 models so far. Altogether, 25 out of the 29 participants evaluate usefulness perceptions, we developed a two-item self-assessed their familiarity with EPCs as better than measurement scale that stresses the act of understanding. 3(average total factor score), with the median being 4.5. Specifically, we used the Perceived Usefulness scales We included a brief description of the epc notation similar eveloped by Maes and poels [43 as a basis for our o[45, p. 36] such that the participants would in any case measurement development. The motivation is that their have the necessary background to understand the process PU measures were developed specifically for the con model. Finally, there were six persons who had some ceptual modeling context. Our scales were worded overal preliminary knowledge of complaint handling processes. I found [label] useful for understanding the process modeled Overall, the study population contained individuals and Overall, I think [label improves my performance when with some application domain knowledge and familiarity understanding the process modeled. We asked the partici- of the EPC notation, but without high levels of either. for their perception in these terms of six activit Studies using students have been often criticized for lack labels from the process model, using a 7-point Likert scale of external validity. Despite this criticism, we agree with with the anchor points ""Disagree strongly"and"Agree Gemino and Wand [26, 46, Recker and Dreiling [11]as well as Batra et al. (47 that the selection of students over practition this ty research can in fact be of the 12 distinct labels shown in Fig. 1 but instead to advisable. Results from both domain understanding and record these measures for six labels only. We have done problem solving tasks could have been confounded by so for the pragmatic reason of not making our data participants that are able to bring to bear prior application nd the conduct of the experiment domain knowledge in one of the areas [48. Also, post unnecessarily long. Considering six labels allowed us to graduate students (like the one participating in our study) obtain 6 (labels)x 29(number of responses)= 174 data have been found to be adequate proxies for analysts with points for hypothesis testing, which we deemed sufficient low to medium expertise levels (46, 49 for our analysis. We arbitrarily selected two labels for each Perceived amb iguity: The second part of the question- of the three styles we identified in the previous section, naire focused on the relationship between label types and these being register receipt date of complaint letter and inform complainant as verb-object labels, registration and the participants to identify those three activity labels that follow up that follow the action-noun style, as well as they consider to be the most ambiguous. Since there are 12 distinct labels in the model and 29 participants, we received 348 assessments whether a particular label (belonging to a certain label type) was considered to be among the three most ambiguous ones. The labels incident to a textual description of the agenda, complaint analysis, and archiving system were business process,which essentially is a tautology. Also note that we focus mentioned most frequently(14, 13, and 12 times). Note actual usage [44 The research by Maes and Poels [431 that the first and third labels belong to the rest group. much broader in its goal to reveal the contribution of different while complaint analysis follows the action-noun style dimensions to the quality of conceptual model In contrast, the most ambiguous label following the
In order to measure knowledge about the application domain (in our case: complaints handling), we asked participants whether they had previous experience with complaints handling processes (yes/no). Since we did not expect much domain knowledge in a student population, the use of a more extensive scale (like the one described in [26]) was not considered. In order to measure respondents’ familiarity with the EPC notation, we adapted a three-item scale for notation familiarity developed by Recker [42], which is based on Gemino and Wand’s three pre-test questions about the familiarity, competence, and confidence of respondents with respect to an analysis method (see Appendix A and [26]). Accordingly, the threeitem familiarity scale assesses familiarity with the (EPC) process modeling notation in a sense of generally felt familiarity (Fam1), self-perceived competence with the notation (Fam2) and self-perceived confidence in using the notation (Fam3). Appendix A lists all items used in the questionnaire. The second part of the questionnaire shows the process model as depicted in Fig. 1. In order to gather data to examine hypothesis H1, the participants were asked to identify the top three activity labels that they consider to be the most ambiguous. In the third part, we sought to gather data to examine hypotheses H2a–H2c. In order to evaluate usefulness perceptions, we developed a two-item measurement scale that stresses the act of understanding. Specifically, we used the Perceived Usefulness scales developed by Maes and Poels [43] as a basis for our measurement development. The motivation is that their PU measures were developed specifically for the conceptual modeling context. Our scales were worded Overall, I found [label] useful for understanding the process modeled and Overall, I think [label] improves my performance when understanding the process modeled. 3 We asked the participants for their perception in these terms of six activity labels from the process model, using a 7-point Likert scale with the anchor points ‘‘Disagree strongly’’ and ‘‘Agree strongly’’. We chose not to measure perceived usefulness for each of the 12 distinct labels shown in Fig. 1 but instead to record these measures for six labels only. We have done so for the pragmatic reason of not making our data collection instrument—and the conduct of the experiment— unnecessarily long. Considering six labels allowed us to obtain 6 (labels) 29 (number of responses) ¼ 174 data points for hypothesis testing, which we deemed sufficient for our analysis. We arbitrarily selected two labels for each of the three styles we identified in the previous section, these being register receipt date of complaint letter and inform complainant as verb–object labels, registration and follow up that follow the action-noun style, as well as archiving system and incident agenda for the rest group. We consider our selection strategy sufficiently randomized based on the observation that neither our research objectives nor our hypotheses address the choice of word items or the specificity of the word items used within these labels. Hence, there was no motivation for us to prefer any particular label over another. 3.2. Results Demographics: The questionnaire of our survey was filled out by 29 students who were at that time following a post-graduate course on process modeling at Eindhoven University of Technology in the Netherlands. Participation was voluntary, and as a reward we offered the students a copy of the study results. Twenty-five participants were male, while 4 were female. While some of the participants only had followed university courses for one year, most of them had done so for three years or more, with 3.8 years of study being the mean value. Half of the population had preliminary experience with business process modeling, either professionally or through previous courses. Four persons had not yet worked with EPCs, but the average participant had known them for three months and created 10 models so far. Altogether, 25 out of the 29 participants self-assessed their familiarity with EPCs as better than 3 (average total factor score), with the median being 4.5. We included a brief description of the EPC notation similar to [45, p. 36] such that the participants would in any case have the necessary background to understand the process model. Finally, there were six persons who had some preliminary knowledge of complaint handling processes. Overall, the study population contained individuals with some application domain knowledge and familiarity of the EPC notation, but without high levels of either. Studies using students have been often criticized for lack of external validity. Despite this criticism, we agree with Gemino and Wand [26,46], Recker and Dreiling [11] as well as Batra et al. [47] that the selection of students over practitioners in this type of research can in fact be advisable. Results from both domain understanding and problem solving tasks could have been confounded by participants that are able to bring to bear prior application domain knowledge in one of the areas [48]. Also, postgraduate students (like the one participating in our study) have been found to be adequate proxies for analysts with low to medium expertise levels [46,49]. Perceived ambiguity: The second part of the questionnaire focused on the relationship between label types and perceived ambiguity, as stated in hypothesis H1. We asked the participants to identify those three activity labels that they consider to be the most ambiguous. Since there are 12 distinct labels in the model and 29 participants, we received 348 assessments whether a particular label (belonging to a certain label type) was considered to be among the three most ambiguous ones. The labels incident agenda, complaint analysis, and archiving system were mentioned most frequently (14, 13, and 12 times). Note that the first and third labels belong to the rest group, while complaint analysis follows the action-noun style. In contrast, the most ambiguous label following the ARTICLE IN PRESS 3 We chose not to adapt the PU1 item from [43]. This item cannot be reasonably applied to text labels. The item would have read Overall, I think the [label] would be an improvement to a textual description of the business process, which essentially is a tautology. Also note that we focus on perceived usefulness in our experiment for its importance as a key antecedent to actual usage [44]. The research by Maes and Poels [43] is much broader in its goal to reveal the contribution of different dimensions to the quality of conceptual models. J. Mendling et al. / Information Systems 35 (2010) 467–482 473
474 J Mending et al. Information Systems 35(2010)467-482 verb-object style--inform complainant--received only two counts overall. The estimated probabilit Rank totals for the three label types. being mentioned among the three most ambiguous ones was 0. 13 for verb-object labels, 0.24 for action-noun erb-object labels Action-noun labels Rest labels, and 0.45 for the rest group. The 95% confidence served ranked total 49 tervals show little overlap: 0.08-0 19 for verb-objec xpected ranked total 58 label. 0.17-031 for action-noun labels. and 032-0.58 for the rest, which correspond to our expectations. To calculate reliability of the assessments made by the study A second test uses the composite reliability measure pc. participants, we calculated Cohens Kappa [50] statistic to which represents the proportion of measure variance examine the level of agreement between study partici- attributable to the underlying trait. Scales with p values pants on which labels were most ambiguous. The Kappa greater than 0.5 are considered to be reliable 44].For the statistic measures inter-rater reliability whilst controlling PU measures, we obtained a Cronbach's o value of 0.857 for change agreement, and is the generally agreed to be and a pe value of 0.884, suggesting adequate reliability of the most adequate tool to measure inter-rater reliability the measures. to establish validity of the measures, we 51. We obtained a Kappa value of 0.607, which can be examined convergent and discriminant validity of the PU classified as substantial or good 51 measures. Convergent validity can be tested using three As per our hypothesis H1. we were interested in testing criteria suggested by Fornell and Larcker[54] whether the differences between the label types as noted are significant. An analysis of variance(ANovA)test was not applicable, since the variance of the variable values is (1)All indicator factor loadings should be significant and not homogeneously distributed and because the depen exceed 0.6 dent variable is not on scale level. Instead, we applied(2)Construct composite reliabilities p should exceed 0.8 Friedman s two-way analysis of variance by ranks 52 fo (3) Average variance extracted(Ave) by each construct each participant, we determined an individual ranking of chould exceed the variance due to measurement error the three label types. This was achieved as follows. For for that construct (i.e, AvE should exceed 0.50). each label type, we determined its relative proportion mong the labels that were rated as most ambiguous by that participant. This gives us 29 matched evaluations, Factor loadings for the two PU measures were 0.936 and leading to rank totals for the three label types as shown in 0.936 and significant at p=0.000. Composite reliability of Table 2. As can be seen, verb-object labels receive the the Pu construct was estimated to be 0.884, and average lowest rank total, which means that this type is least often variance extracted was computed to be 0.936. These considered as containing ambiguous labels We advance results suggest adequate convergent validity. To check the null hypothesis that there are no differences in for discriminant validity. ve considered whether mea- individual rankings of the three label types, i.e., that each sures used for the pu construct would cross-load on label type would be mentioned similarly in the top three other constructs considered (in our case, measures for lists in each of the 29 evaluations. In seeking to refute this notation familiarity ) The test for discriminant validity is null hypothesis, we computed the Friedman statistic z met when the AvE for each construct exceeds the squared Note that the Friedman statistic z is distributed approxi correlation between that and any other construct consi mately as chi square [52, p 168]. For this case, it turns out dered in the factor correlation matrix. The squared that z2=6.28 with df=2, which means a significant correlation between the pu and the familiarity factor difference in the rankings of the three labeling styles at a was computed to be 0.030, which shows that the ave 95% confidence level. This result lends support to measures for both PU(0.936)and notation familiarity hypothesis H1. We conclude that verb-object style label (0.927)well exceeded the squared correlation between are indeed least frequently perceived as being ambiguous, the factors. Appendix B summarizes factor loadings. followed by action-noun style labels, and finally rest communalities, and correlations. Next, to test the hypotheses, we first constructed a Perceived usefulness: In the third part of the question- box-plot for the average total factor s for the pu laire, we recorded the perceived usefulness of six activity variable and examined the rank correlations as well as the labels, two for each label type. We used two measures for differences in variance between the average total factor PU as described above. More specifically, the used scales scores for the different label types. Fig. 2 gives the box measure the extent to which a label is useful for under- plots. anding and improves the performance when understand As illustrated by the box-plot in Fig. 2, verb-object We received 174 responses(6x 29) that we were able labels were found to be best in terms of their perceived link to label types Based on these data, we examined usefulness, followed by action-noun labels, and then the the hypotheses H2a-H2c. rest group. Perusal of Table 3 further shows that the Before proceeding with hypothesis testing, we first reported 95% confidence intervals around the means kamined reliability and validity of the Pu measures used. hardly overlap between the label types. In particular. the Reliability refers to the internal consistency of scales verb-object style can easily be distinguished from the The most widely used test for internal consistency is action-noun style: the upper bounds of the confidence Cronbachs a, which should be higher than 0.8531 intervals for the action-noun style are strictly lower than
verb–object style—inform complainant—received only two counts overall. The estimated probability of a label for being mentioned among the three most ambiguous ones was 0.13 for verb–object labels, 0.24 for action-noun labels, and 0.45 for the rest group. The 95% confidence intervals show little overlap: 0.08–0.19 for verb–object label, 0.17–0.31 for action-noun labels, and 0.32–0.58 for the rest, which correspond to our expectations. To calculate reliability of the assessments made by the study participants, we calculated Cohen’s Kappa [50] statistic to examine the level of agreement between study participants on which labels were most ambiguous. The Kappa statistic measures inter-rater reliability whilst controlling for change agreement, and is the generally agreed to be the most adequate tool to measure inter-rater reliability [51]. We obtained a Kappa value of 0.607, which can be classified as substantial or good [51]. As per our hypothesis H1, we were interested in testing whether the differences between the label types as noted are significant. An analysis of variance (ANOVA) test was not applicable, since the variance of the variable values is not homogeneously distributed and because the dependent variable is not on scale level. Instead, we applied Friedman’s two-way analysis of variance by ranks [52]. For each participant, we determined an individual ranking of the three label types. This was achieved as follows. For each label type, we determined its relative proportion among the labels that were rated as most ambiguous by that participant. This gives us 29 matched evaluations, leading to rank totals for the three label types as shown in Table 2. As can be seen, verb–object labels receive the lowest rank total, which means that this type is least often considered as containing ambiguous labels. We advance the null hypothesis that there are no differences in individual rankings of the three label types, i.e., that each label type would be mentioned similarly in the top three lists in each of the 29 evaluations. In seeking to refute this null hypothesis, we computed the Friedman statistic w2 r . Note that the Friedman statistic w2 r is distributed approximately as chi square [52, p. 168]. For this case, it turns out that w2 r ¼ 6:28 with df ¼ 2, which means a significant difference in the rankings of the three labeling styles at a 95% confidence level. This result lends support to hypothesis H1. We conclude that verb–object style labels are indeed least frequently perceived as being ambiguous, followed by action-noun style labels, and finally rest labels. Perceived usefulness: In the third part of the questionnaire, we recorded the perceived usefulness of six activity labels, two for each label type. We used two measures for PU as described above. More specifically, the used scales measure the extent to which a label is useful for understanding and improves the performance when understanding. We received 174 responses ð6 29Þ that we were able to link to label types. Based on these data, we examined the hypotheses H2a–H2c. Before proceeding with hypothesis testing, we first examined reliability and validity of the PU measures used. Reliability refers to the internal consistency of scales. The most widely used test for internal consistency is Cronbach’s a, which should be higher than 0.8 [53]. A second test uses the composite reliability measure pc, which represents the proportion of measure variance attributable to the underlying trait. Scales with pc values greater than 0.5 are considered to be reliable [44]. For the PU measures, we obtained a Cronbach’s a value of 0.857, and a pc value of 0.884, suggesting adequate reliability of the measures. To establish validity of the measures, we examined convergent and discriminant validity of the PU measures. Convergent validity can be tested using three criteria suggested by Fornell and Larcker [54]: (1) All indicator factor loadings should be significant and exceed 0.6. (2) Construct composite reliabilities pc should exceed 0.8. (3) Average variance extracted (AVE) by each construct should exceed the variance due to measurement error for that construct (i.e., AVE should exceed 0.50). Factor loadings for the two PU measures were 0.936 and 0.936 and significant at p ¼ 0:000. Composite reliability of the PU construct was estimated to be 0.884, and average variance extracted was computed to be 0.936. These results suggest adequate convergent validity. To check for discriminant validity, we considered whether measures used for the PU construct would cross-load on other constructs considered (in our case, measures for notation familiarity). The test for discriminant validity is met when the AVE for each construct exceeds the squared correlation between that and any other construct considered in the factor correlation matrix. The squared correlation between the PU and the familiarity factor was computed to be 0.030, which shows that the AVE measures for both PU (0.936) and notation familiarity (0.927) well exceeded the squared correlation between the factors. Appendix B summarizes factor loadings, communalities, and correlations. Next, to test the hypotheses, we first constructed a box-plot for the average total factor scores for the PU variable, and examined the rank correlations as well as the differences in variance between the average total factor scores for the different label types. Fig. 2 gives the box plots. As illustrated by the box-plot in Fig. 2, verb–object labels were found to be best in terms of their perceived usefulness, followed by action-noun labels, and then the rest group. Perusal of Table 3 further shows that the reported 95% confidence intervals around the means hardly overlap between the label types. In particular, the verb–object style can easily be distinguished from the action-noun style: the upper bounds of the confidence intervals for the action-noun style are strictly lower than ARTICLE IN PRESS Table 2 Rank totals for the three label types. Verb–object labels Action-noun labels Rest Observed ranked total 49 57 68 Expected ranked total 58 58 58 474 J. Mendling et al. / Information Systems 35 (2010) 467–482
J. Mending et aL./Information Systems 35(2010) 467-482 475 700 5.00 点 3.00 Action-nonu style Rest style Label Fig. 2. Box-plot of perceived usefulness rankings, by label type. Table 3 differences across the different label styles were statisti- cally significant with F= 18.495, p=0.000, thereby confirming our test results. Perceived usefulness avg. total factor score) To test whether there are significant pair-wise differ- ences between the label types(verb-object versus action Verh-oh noun, verb-object versus rest, and action-noun versus 95% upper bound 5.304 rest), we repeated the ANova analysis using the Contrast 5% lower bound 4.696 function [55 to detect pair-wise differences. For perceived usefulness, the contrast between verb-object and action- Action-noun noun style was significant ontrastvalue=0.879 95% upper bound 4480 t=3.665, p=0.000, while the contrast between verb-ob- Mean ject and rest style was significant contrastvalue= 1 448 5% lower bound t=6.036, p=0.000. Finally, the contrast between action- nd rest style was significant at contrastvalue 569, t=2.371, p=0.019. These results further lend 956 upper bound 3.905 3.552 strong support to hypotheses H2a-H2c In summation, the 5% lower bound 3.199 reported findings support our hypotheses H2a-H2c that verb-object styles are regarded more useful than action noun styles, and rest styles Perceived ambiguity's effect on perceived usefulness: As he lower bounds for the verb-object style. These results discussed in the hypothesis development section, our lend initial support to hypotheses H2a-H2c. study rests on the assumption that ambiguity of textual As a next step, we examined whether the noted labels is an impediment to the perceived usefulness of the differences are statistically significant. In the data, we label for understanding the process modeled To test this identified a significant negative Spearman rank correla assumption as specified in hypothesis H3, we once again tion between the label style and its perceived usefulnes performed an ANOVA test. 0.430 at 99% significance level). This finding suggests Support for hypothesis H3 exists if there are significant that a deviation from the verb-object style to any of the differences in the average total factor scores for perceived other two is connected with lower usefulness perceptions. usefulness for labels that are either considered ambig hence lending further support to hypotheses H2a-H2c. uous, or not, with the expectation that the average total Additionally, based on the data displayed in Table 3 we factor score will be lower for the group that considered a performed an analysis of variance test implemented in particular labeling style to be ambiguous. Prior to conduct. SPSS 16.0 [55 to further examine the differences in the ANOVA assumptions were tested and showed no violation. average total factors scores for PU. Between-group Table 4 provides the results
the lower bounds for the verb–object style. These results lend initial support to hypotheses H2a–H2c. As a next step, we examined whether the noted differences are statistically significant. In the data, we identified a significant negative Spearman rank correlation between the label style and its perceived usefulness (0:430 at 99% significance level). This finding suggests that a deviation from the verb–object style to any of the other two is connected with lower usefulness perceptions, hence lending further support to hypotheses H2a–H2c. Additionally, based on the data displayed in Table 3 we performed an analysis of variance test implemented in SPSS 16.0 [55] to further examine the differences in the average total factors scores for PU. Between-group differences across the different label styles were statistically significant with F ¼ 18:495, p ¼ 0:000, thereby confirming our test results. To test whether there are significant pair-wise differences between the label types (verb–object versus actionnoun, verb–object versus rest, and action-noun versus rest), we repeated the ANOVA analysis using the Contrast function [55] to detect pair-wise differences. For perceived usefulness, the contrast between verb–object and actionnoun style was significant at contrastValue ¼ 0:879, t ¼ 3:665, p ¼ 0:000, while the contrast between verb–object and rest style was significant contrastValue ¼ 1:448, t ¼ 6:036, p ¼ 0:000. Finally, the contrast between actionnoun and rest style was significant at contrastValue ¼ 0:569, t ¼ 2:371, p ¼ 0:019. These results further lend strong support to hypotheses H2a–H2c. In summation, the reported findings support our hypotheses H2a–H2c that verb–object styles are regarded more useful than actionnoun styles, and rest styles. Perceived ambiguity’s effect on perceived usefulness: As discussed in the hypothesis development section, our study rests on the assumption that ambiguity of textual labels is an impediment to the perceived usefulness of the label for understanding the process modeled. To test this assumption as specified in hypothesis H3, we once again performed an ANOVA test. Support for hypothesis H3 exists if there are significant differences in the average total factor scores for perceived usefulness for labels that are either considered ambiguous, or not, with the expectation that the average total factor score will be lower for the group that considered a particular labeling style to be ambiguous. Prior to conduct, ANOVA assumptions were tested and showed no violation. Table 4 provides the results. ARTICLE IN PRESS 7.00 6.00 5.00 4.00 3.00 2.00 1.00 Verb-object style Action-nonu style Rest style Total factor score for PU Label type Fig. 2. Box-plot of perceived usefulness rankings, by label type. Table 3 Perceived usefulness of label types. Perceived usefulness (avg. total factor score) Verb–object 95% upper bound 5.304 Mean 5.000 95% lower bound 4.696 Action-noun 95% upper bound 4.480 Mean 4.121 95% lower bound 3.761 Rest 95% upper bound 3.905 Mean 3.552 95% lower bound 3.199 J. Mendling et al. / Information Systems 35 (2010) 467–482 475
J Mending et al./ Information Systems 35(2010)467-482 Average perceived usefulness scores for ambiguous versus unambiguous label types. Unambiguous label, N= 132 Ambiguous label. N= 4 StDev Mean F 4.538 1241 3.238 1.495 3 4 confirm our assump- low familiarity ) Both variables have been described in for hypothesis H3. The Section 3. 1. Appendix A lists all items used in the questionnaire. We obtained the following results at were not listed as ambiguous by the participants (reported average total Application domain knowledge does not showa factor scores are 4.538 in contrast to 3. 238). The ANOVA significant interaction effect on the relationship be- test showed these differences to be statistically significant tween label type and perceived usefulness(F= 1.36 p=0. 245. partial eta square=0.008). Accordingly, hy- Moderating effects: As discussed in the demo pothesis H4a must be refuted section, the participants ranged in terms of their Notation familiarity does not show a significant parity with the EpC notation used in the process mod interaction effect on the relationship betw well as in their knowledge of the chosen application type and perceived usefulness(F= 1.334, p=0.239, domain(complaints handling). More precisely, six part artial eta square=0.006). Accordingly, hypothesis ipants brought to bear experience with complaint H4b must be refuted handling domain, and 17 out of 29 participants were above the median in notation familiarity These results are similar to those reported in [11, 26. Again we first established reliability and validity of the which also did not indicate sig measure "familiarity with the epC notation" Cronbachs ox gnificant moderation effects or the familiarity scale was computed to be 0.914, and of their measures of application domain knowledge or composite reliability was computed to be 0.859. Factor familiarity with the notation on understanding of con- loadings for the three familiarity measures were 0.919 ceptual models-and contrary to those reported in 0.930 and 0.931. all significant at p=0.000. Average [38, 39]. both of which reported some spurious effects on variance extracted of the familiarity construct was a number of the dependent variables they considered In estimated to be 0.927. as described above, AvE also indicate that understanding of textual labels contained in expertise gained familiarity construct. Altogether, these result suggest from previous notation usage or from previous knowledge adequate reliability and validity. Appendix B summarizes of the considered domain. In light of the other results In order to test hypotheses H4a and H4b, we examined presented above, the findings suggest that a labels perceived usefulness of the labels between two sets of two of the labels itsel sd dependent on the grammatical style he differences in the average total factor scores for usefulness is indee groups of participants (high/low application domain knowledge and high/low familiarity with the EPC nota 3.3. Discussion tion). Support for the hypotheses would then exist if the differences in the dependent variables between the The support for our hypotheses strongly suggests that a groups would be significant. We used an analysis of verb-object labeling style is rightfully proposed as covariance(ANCOVA)test implemented in SPSS 16.0 to preferred way of activity labeling. Indeed, our results test the hypotheses. ANCOVA is an appropriate analysis indicate strong and favorable percep towards a technique because it allows to control for potential effects superiority of the verb-object labeling style. Given the of covariates in the examination of dependent variable scores between two treatment groups [55]. ANcoVa perceived usefulness) play in informing actual usage assumptions of equal slopes were tested prior to conduct, behavior [44, 56, 57, we deem this finding instrumental d showed no violation of normality. to explaining, and supporting, process model understand- We used two covariates in the analysis of the effect on ability. However, whilst process modelers tend to favor beling type on perceived usefulness. The first is the verb-object styles, this situation does not necessarily binary variable" Knowledge of the complaints handling reflect actual usage for activity labeling. In fact, our domain", which simply establishes the existence of any exploration of the usage frequency of activity labels in the relevant knowledge in this domain. As a second covariate, SAP reference model indicates that a large proportion of we used the median of the total factor score of the three labels found in practice cannot be interpreted as genuine item"Familiarity " scale, to separate the respondents pool implementations of this style(see Section 2). In contrast, in two groups using a dummy variable(high familiarity/ ur results indicate that there is wide variety in labeling
The results displayed in Table 4 confirm our assumption and lend strong support for hypothesis H3. The average total factor score for perceived usefulness was higher for those label types that were not listed as ambiguous by the participants (reported average total factor scores are 4.538 in contrast to 3.238). The ANOVA test showed these differences to be statistically significant at p ¼ 0:000. Moderating effects: As discussed in the demographics section, the participants ranged in terms of their familiarity with the EPC notation used in the process model, as well as in their knowledge of the chosen application domain (complaints handling). More precisely, six participants brought to bear experience with complaints handling domain, and 17 out of 29 participants were above the median in notation familiarity. Again we first established reliability and validity of the measure ‘‘familiarity with the EPC notation’’. Cronbach’s a for the familiarity scale was computed to be 0.914, and composite reliability was computed to be 0.859. Factor loadings for the three familiarity measures were 0.919, 0.930 and 0.931, all significant at p ¼ 0:000. Average variance extracted of the familiarity construct was estimated to be 0.927. As described above, AVE also exceeded the squared correlation between the PU and the familiarity construct. Altogether, these result suggest adequate reliability and validity. Appendix B summarizes factor loadings, communalities, and correlations. In order to test hypotheses H4a and H4b, we examined the differences in the average total factor scores for perceived usefulness of the labels between two sets of two groups of participants (high/low application domain knowledge and high/low familiarity with the EPC notation). Support for the hypotheses would then exist if the differences in the dependent variables between the groups would be significant. We used an analysis of covariance (ANCOVA) test implemented in SPSS 16.0 to test the hypotheses. ANCOVA is an appropriate analysis technique because it allows to control for potential effects of covariates in the examination of dependent variable scores between two treatment groups [55]. ANCOVA assumptions of equal slopes were tested prior to conduct, and showed no violation of normality. We used two covariates in the analysis of the effect on labeling type on perceived usefulness. The first is the binary variable ‘‘Knowledge of the complaints handling domain’’, which simply establishes the existence of any relevant knowledge in this domain. As a second covariate, we used the median of the total factor score of the three item ‘‘Familiarity’’ scale, to separate the respondents pool in two groups using a dummy variable (high familiarity/ low familiarity). Both variables have been described in Section 3.1. Appendix A lists all items used in the questionnaire. We obtained the following results: Application domain knowledge does not show a significant interaction effect on the relationship between label type and perceived usefulness (F ¼ 1:363, p ¼ 0:245, partial eta square ¼ 0:008). Accordingly, hypothesis H4a must be refuted. Notation familiarity does not show a significant interaction effect on the relationship between label type and perceived usefulness (F ¼ 1:334, p ¼ 0:239, partial eta square ¼ 0:006). Accordingly, hypothesis H4b must be refuted. These results are similar to those reported in [11,26], which also did not indicate significant moderation effects of their measures of application domain knowledge or familiarity with the notation on understanding of conceptual models—and contrary to those reported in [38,39], both of which reported some spurious effects on a number of the dependent variables they considered. In the context of the study reported in this paper, the results indicate that understanding of textual labels contained in process models is independent from any expertise gained from previous notation usage or from previous knowledge of the considered domain. In light of the other results presented above, the findings suggest that a label’s usefulness is indeed dependent on the grammatical style of the labels itself. 3.3. Discussion The support for our hypotheses strongly suggests that a verb–object labeling style is rightfully proposed as a preferred way of activity labeling. Indeed, our results indicate strong and favorable perceptions towards a superiority of the verb–object labeling style. Given the key role that usage beliefs (such as perceived ambiguity or perceived usefulness) play in informing actual usage behavior [44,56,57], we deem this finding instrumental to explaining, and supporting, process model understandability. However, whilst process modelers tend to favor verb–object styles, this situation does not necessarily reflect actual usage for activity labeling. In fact, our exploration of the usage frequency of activity labels in the SAP reference model indicates that a large proportion of labels found in practice cannot be interpreted as genuine implementations of this style (see Section 2). In contrast, our results indicate that there is wide variety in labeling. ARTICLE IN PRESS Table 4 Average perceived usefulness scores for ambiguous versus unambiguous label types. Unambiguous label, N ¼ 132 Ambiguous label, N ¼ 42 ANOVA Mean StDev Mean StDev F Sig. Perceived usefulness 4.538 1.241 3.238 1.495 31.553 0.000 476 J. Mendling et al. / Information Systems 35 (2010) 467–482