Data Knowledge Engineering 70(2011)483-503 Contents lists available at Science Direct Data Knowledge Engineering ELSEVIER journalhomepagewww.elsevier.com/locate/datak Recommendation-based editor for business process modeling Agnes Koschmider a,, Thomas Hornung Andreas Oberweisa Institute of Applied Informatics and Formal Description Methods, Karlsruhe Institute of Technology, Germay b Institute of Computer Science, Albert-Ludwigs University, Freiburg. Germany ARTICLE INFO A BSTRACT proper and efficient modeling of business processes, it is editors adequately. With only minimal modeling support. uctivity of novice in revised form 7 February 2011 7 February 2011 process modelers may be low when starting process me n this article. we online 24 February 2011 present a theoretically sound and empirically validated recommendation-based modeling support system, which covers different aspects of business process modeling. We consider basic functionality, such as an intuitive search interface, as well as advanced concepts like patterns observed in other users'preferences. Additionally, we propose a multitude of Recommender system interaction possibilities with the recommendation system, e. g different metrics that can be used in isolation or an overall recommender component that combines several sub metrics into tanking function one comprehensive score. We validate a prototype implementation of the recommendation Process modeling support stem with exhaustive user experiments based on real-life process models To our knowledge, this is the only comprehensive recommendation system for business process modeling that is available o 2011 Elsevier B V. All rights reserved 1 Introduction Although most business process models nowadays are created with the help of graphic editors, the learning curve for experienced users is still very steep [61]. Pure awareness of the modeling language syntax is often insufficient. A profound orking knowledge of the user is required in order to efficiently and effectively apply a modeling language in practice. This is confirmed by [8 who posit that the main driver for successful process modeling is the users modeling expertise To increaseuser productivity most of the currently available modeling tools focus on providing a repository of graphical symbols and advanced visualization techniques. However, there is room for improvement, and a full-fledged modeling support system should focus on retaining high fidelity to the user's modeling intentions. 1. 1. Problem description Currently, business process modelers can choose between a variety of formal and semi-formal modeling languages and standards, e.g, 4, 50,69, for which there exists a multitude of different modeling tools. Usually, these tools provide a simple epository of graphical symbols, which represent the building blocks of the underlying modeling formalisms. However, during process modeling there is a lack of specific user support, i.e., no suggestions are provided by the system on how to finish appropriately an already started business process model. New support tools that assist the user at modeling time are required to mprove the quality of process models and to increase the productivity of the modeler One of the main problems of suggesting appropriate process models to the user is to detect her modeling intention. A similar problem is tackled by recommender systems. Here, user preferences and opinions from individual users are collected and E-mail addresses: agnes, koschmiderekitedu(A Koschmider). hornungteinformatik uni-freiburg de (t hornung), andreas. oberweisekitedu(A. oberweis) 69-023X/S-see front matter o 2011 Elsevier B.v. All rights reserve
Recommendation-based editor for business process modeling Agnes Koschmider a,⁎, Thomas Hornung b , Andreas Oberweis a a Institute of Applied Informatics and Formal Description Methods, Karlsruhe Institute of Technology, Germany b Institute of Computer Science, Albert-Ludwigs University, Freiburg, Germany article info abstract Article history: Received 7 January 2009 Received in revised form 7 February 2011 Accepted 7 February 2011 Available online 24 February 2011 To ensure proper and efficient modeling of business processes, it is important to support users of process editors adequately. With only minimal modeling support, the productivity of novice business process modelers may be low when starting process modeling. In this article, we present a theoretically sound and empirically validated recommendation-based modeling support system, which covers different aspects of business process modeling. We consider basic functionality, such as an intuitive search interface, as well as advanced concepts like patterns observed in other users' preferences. Additionally, we propose a multitude of interaction possibilities with the recommendation system, e.g., different metrics that can be used in isolation or an overall recommender component that combines several sub metrics into one comprehensive score. We validate a prototype implementation of the recommendation system with exhaustive user experiments based on real-life process models. To our knowledge, this is the only comprehensive recommendation system for business process modeling that is available. © 2011 Elsevier B.V. All rights reserved. Keywords: Recommender system Process model search Indexing Ranking function Process modeling support 1. Introduction Although most business process models nowadays are created with the help of graphic editors, the learning curve for inexperienced users is still very steep [61]. Pure awareness of the modeling language syntax is often insufficient. A profound working knowledge of the user is required in order to efficiently and effectively apply a modeling language in practice. This is confirmed by [8], who posit that the main driver for successful process modeling is the user's modeling expertise. To increase user productivity most of the currently available modeling tools focus on providing a repository of graphical symbols and advanced visualization techniques. However, there is room for improvement, and a full-fledged modeling support system should focus on retaining high fidelity to the user's modeling intentions. 1.1. Problem description Currently, business process modelers can choose between a variety of formal and semi-formal modeling languages and standards, e.g., [4,50,69], for which there exists a multitude of different modeling tools. Usually, these tools provide a simple repository of graphical symbols, which represent the building blocks of the underlying modeling formalisms. However, during process modeling there is a lack of specific user support, i.e., no suggestions are provided by the system on how to finish appropriately an already started business process model. New support tools that assist the user at modeling time are required to improve the quality of process models and to increase the productivity of the modeler. One of the main problems of suggesting appropriate process models to the user is to detect her modeling intention. A similar problem is tackled by recommender systems. Here, user preferences and opinions from individual users are collected and Data & Knowledge Engineering 70 (2011) 483–503 ⁎ Corresponding author. E-mail addresses: agnes.koschmider@kit.edu (A. Koschmider), hornungt@informatik.uni-freiburg.de (T. Hornung), andreas.oberweis@kit.edu (A. Oberweis). 0169-023X/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.datak.2011.02.002 Contents lists available at ScienceDirect Data & Knowledge Engineering j o u r n a l h om e p a g e : www. e l s ev i e r. c om / l o c a t e / d a t a k
484 A Koschmider et al. Data S Knowledge Engineering 70(2011 )483-503 aggregated. This information is then used to suggest appropriate items, e.g., books from a fixed collection to new users. For a recommendation system in the business process modeling domain, the following aspects are essential the user's modeling intentions, the modeling context, an the modeling history of a related community of users. The next section introduces a running example that is used in the remainder of the paper. Let the following scenario be given as Fig 1. The user wants to model a process describing the handling of order quests. Her intention is to model this prod a customer perspective( this means that technical details can be neglected ). a a query interface, she can search for nodel fragments concerning customer requests. The results of the q uery are displayed according to a ranking function, and she can then insert the desired process model fragment into the active workspace which best matches her modeling intention. Subsequently, if she is uncertain as to how to complete the process model she has two options: she can either search again via the query interface for fitting process model activities, or she can invoke the recommender component which automatically suggests appropriate process model parts for completing the model. Unlike the query component, the recommender component can only be invoked after she has already started modeling the business proces p ernests(among others)the CustomerOrder In our example, the user has opted for the recommender component, whicl process model for completion. If the user decides to insert this recommendation in her workspace, she can configure this process del by inserting or deleting elements. Finally, she can store the modified process model version in a process repository for future process model reuse 13. Contributions This paper describes an empirically validated business process modeling editor, which assists users twofold in purpose- oriented modeling of processes. First, the user can search via a query interface for business process models or process model parts (logically coherent groups of elements belonging together, e. g approval, billing or assembly ) The user can save time by reusing already existing process model parts. Second, we use an automatic tagging mechanism in order to unveil the modeling intention of a user at process modeling time and to better meet the users model requirements. We validated the support system with two experiments using real-life business processes modeled with Petri nets and a prototype implementation. For the validation, Petri nets were serialized to PNML 72]. The first experiment focused on the feasibility and usefulness of the modeling support system. This experiment confirmed that users are willing to follow up recommendations and prefer reusing elements rather than modeling elements from scratch. In the second experiment, the focus was on the evaluation of the benefits in using the recommendation system. This evaluation confirmed that the recommendation uery Interface modeler interaction Recommendations 0+ Process Repository modeler interaction Fig 1 User interaction scenario for finding an appropriate process model part
aggregated. This information is then used to suggest appropriate items, e.g., books from a fixed collection to new users. For a recommendation system in the business process modeling domain, the following aspects are essential: • the user's modeling intentions, • the modeling context, and • the modeling history of a related community of users. The next section introduces a running example that is used in the remainder of the paper. 1.2. Running example Let the following scenario be given as shown in Fig. 1. The user wants to model a process describing the handling of order requests. Her intention is to model this process from a customer perspective (this means that technical details can be neglected). Via a query interface, she can search for process model fragments concerning customer requests. The results of the query are displayed according to a ranking function, and she can then insert the desired process model fragment into the active workspace which best matches her modeling intention. Subsequently, if she is uncertain as to how to complete the process model she has two options: she can either search again via the query interface for fitting process model activities, or she can invoke the recommender component which automatically suggests appropriate process model parts for completing the model. Unlike the query component, the recommender component can only be invoked after she has already started modeling the business process. In our example, the user has opted for the recommender component, which suggests (among others) the CustomerOrder process model for completion. If the user decides to insert this recommendation in her workspace, she can configure this process model by inserting or deleting elements. Finally, she can store the modified process model version in a process repository for future process model reuse. 1.3. Contributions This paper describes an empirically validated business process modeling editor, which assists users twofold in purposeoriented modeling of processes. First, the user can search via a query interface for business process models or process model parts (logically coherent groups of elements belonging together, e.g., approval, billing or assembly). The user can save time by reusing already existing process model parts. Second, we use an automatic tagging mechanism in order to unveil the modeling intention of a user at process modeling time and to better meet the user's model requirements. We validated the support system with two experiments using real-life business processes modeled with Petri nets and a prototype implementation. For the validation, Petri nets were serialized to PNML [72]. The first experiment focused on the feasibility and usefulness of the modeling support system. This experiment confirmed that users are willing to follow up recommendations and prefer reusing elements rather than modeling elements from scratch. In the second experiment, the focus was on the evaluation of the benefits in using the recommendation system. This evaluation confirmed that the recommendation Fig. 1. User interaction scenario for finding an appropriate process model part. 484 A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503
A Koschmider et al. Data 8 Knowledge Engineering 70(2011)483-503 is equally useful for different types of users and that the quality of the recommendations improves over time through user ack. The evaluation results highlight additional benefits of the modeling support tool: the tagging-based system increases the quality of the process models by highlighting the corresponding process model parts that violate the correctness criteria (e.g, structural deadlocks, which occur if an alternativ initiated by an OR-split is synchronized by an AND-join in a process model). the system overcomes the limitation of a controlled vocabulary for labeling process elements, and the process fragments are used with process vocabularies that might differ from the vocabulary used in the currently edited business process model The functionalities of the recommendation system have been implemented for Petri net-based process models. Howe generality of our approach makes it possible to apply the presented methods also for business processes modeled with The remainder of the paper is structured as follows. Section 2 presents a tagging algorithm and the creation of the proces repository index In Section 3 we describe two modes of the search interface. Additionally, we extend the search functionality in order to consider relevant process models in case they do not conform exactly to the specified query. The cumulative ranking unction and the complete recommendation algorithm are illustrated in Section 4. The theoretical underpinning for two empirical studies is presented in Section 5. The findings of the studies are reported in Section 6. Section 7 compares our approach with related work, and Section 8 concludes the paper with an outlook on future research. 2. Tagging of business process models Each business process model in the repository is associated with metadata, which is used for the recommendation and search functionality. As foundation, we use an Information Retrieval-based tagging approach over the process activity /state descriptions For more elaborate queries, we identify a set of relevant criteria which can be provided by users for each process model. The intuition is that new process models in the repository can be found based on the automatically acquired metadata(automatic tagging, cf. Section 2.1): over time, different users can refine the available information about a process, thus enabling higher-level queries(manual tagging, cf Section 2.2 ). In the remainder of this paper we use the term ' tag to identify a single word occurring in a metadata criteria. 2. 1. Automatic tagging The motivation of Information Retrieval is to be able to find an item(normally a text document) by providing the search engi with only few keywords that adequately capture the intended information need. For this, first the significant keywords or tags that describe the desired item need to be acquired. Additionally, it is valuable, if a rating can be imposed on these tags, e.g. for a text document this is usually done by counting the number of occurrences of each word in the text and assigning the word with the highest frequency the highest rating. This automatic indexing or tagging of documents is typically the basis for efficient retrieval by a search engine. While the assignment of tags to items is straightforward in the case of text documents just use the top-k words with the highest frequency after stop word removal). it is less obvious how to identify the salient concepts of business process models and convert them into appropriate tags for later retrieval In the following, we present our automatic tagging approach that is geared towards identifying the most descriptive tags for business process models. The tag extraction and scoring for business process models is inspired by the Term and Document Frequency measure, which is elatively efficient to compute(cf[23)). Each place and transition in a Petri net representation of a business process model is labeled with a description that specifies the purpose of each process activity or state, respectively. This means, we can regard each word of these descriptions as tag candidates for the business process model (or item, respectively). More generally, we associate with each business process model a tag characterization of the form a1-., an+Tn, where the a reflects the attributes of the process model that are searchable later on(cf. Section 3), Ti is the set of associated indexed tags, and n is the number of indexed attributes of the business process model For each business process model we index the place and transition descriptions plus additional metadata criteria, as described in the remainder of this section. This allows us to use standard Information Retrieval techniques(cf. 58)to build up an index over business process models. We rst remove common English words from the set of tag candidates because they appear so often in a typical natural language corpus that they do not convey any meaning specific to the business process. This phenomenon is often referred to as zipfs law, which states is assigned a tag score for this business process model based on a modified version of the tf-idf mem: o word removal, each keyword that the frequency of any word is inversely proportional to its rank in the frequency table[76].After sto TagScore(t:) TF(i) Spilt is a tag in pill Here, TF(ti) is the frequency of the tag t in transition or place labels, N is the total number of distinct tag candidates(after stop vord removal), IP denotes the total number of indexed business process models, and I(p lt is a tag in pill is the number of business term frequencyinverse document frequency
system is equally useful for different types of users and that the quality of the recommendations improves over time through user feedback. The evaluation results highlight additional benefits of the modeling support tool: • the tagging-based system increases the quality of the process models by highlighting the corresponding process model parts that violate the correctness criteria (e.g., structural deadlocks, which occur if an alternative flow initiated by an OR-split is synchronized by an AND-join in a process model), • the system overcomes the limitation of a controlled vocabulary for labeling process elements, and • the process fragments are used with process vocabularies that might differ from the vocabulary used in the currently edited business process model. The functionalities of the recommendation system have been implemented for Petri net-based process models. However, the generality of our approach makes it possible to apply the presented methods also for business processes modeled with other languages. The remainder of the paper is structured as follows. Section 2 presents a tagging algorithm and the creation of the process repository index. In Section 3 we describe two modes of the search interface. Additionally, we extend the search functionality in order to consider relevant process models in case they do not conform exactly to the specified query. The cumulative ranking function and the complete recommendation algorithm are illustrated in Section 4. The theoretical underpinning for two empirical studies is presented in Section 5. The findings of the studies are reported in Section 6. Section 7 compares our approach with related work, and Section 8 concludes the paper with an outlook on future research. 2. Tagging of business process models Each business process model in the repository is associated with metadata, which is used for the recommendation and search functionality. As foundation, we use an Information Retrieval-based tagging approach over the process activity/state descriptions. For more elaborate queries, we identify a set of relevant criteria which can be provided by users for each process model. The intuition is that new process models in the repository can be found based on the automatically acquired metadata (automatic tagging, cf. Section 2.1); over time, different users can refine the available information about a process, thus enabling higher-level queries (manual tagging, cf. Section 2.2). In the remainder of this paper we use the term ‘tag’ to identify a single word occurring in a metadata criteria. 2.1. Automatic tagging The motivation of Information Retrieval is to be able to find an item (normally a text document) by providing the search engine with only few keywords that adequately capture the intended information need. For this, first the significant keywords or tags that describe the desired item need to be acquired. Additionally, it is valuable, if a rating can be imposed on these tags, e.g. for a text document this is usually done by counting the number of occurrences of each word in the text and assigning the word with the highest frequency the highest rating. This automatic indexing or tagging of documents is typically the basis for efficient retrieval by a search engine. While the assignment of tags to items is straightforward in the case of text documents (just use the top-k words with the highest frequency after stop word removal), it is less obvious how to identify the salient concepts of business process models and convert them into appropriate tags for later retrieval. In the following, we present our automatic tagging approach that is geared towards identifying the most descriptive tags for business process models. The tag extraction and scoring for business process models is inspired by the Term and Document Frequency measure, which is relatively efficient to compute (cf. [23]). Each place and transition in a Petri net representation of a business process model is labeled with a description that specifies the purpose of each process activity or state, respectively. This means, we can regard each word of these descriptions as tag candidates for the business process model (or item, respectively). More generally, we associate with each business process model a tag characterization of the form [a1→T1,…, an→Tn], where the ai reflects the attributes of the process model that are searchable later on (cf. Section 3), Ti is the set of associated indexed tags, and n is the number of indexed attributes of the business process model. For each business process model we index the place and transition descriptions plus additional metadata criteria, as described in the remainder of this section. This allows us to use standard Information Retrieval techniques (cf. [58]) to build up an index over business process models. We first remove common English words from the set of tag candidates because they appear so often in a typical natural language corpus that they do not convey any meaning specific to the business process. This phenomenon is often referred to as Zipf's law, which states that the frequency of any word is inversely proportional to its rank in the frequency table [76]. After stop word removal, each keyword is assigned a tag score for this business process model based on a modified version of the tf*idf metric1 : TagScore ti ð Þ : = TF ti ð Þ ∑N j= 1tj × log jP j jfpj jti is a tag in pjgj !: Here, TF(ti) is the frequency of the tag ti in transition or place labels, N is the total number of distinct tag candidates (after stop word removal), |P| denotes the total number of indexed business process models, and |{pj|ti is a tag in pj}| is the number of business 1 term frequency*inverse document frequency. A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503 485
A Koschmider et al. Data S Knowledge Engineering 70(2011 )483-503 CustomerOrder execution The process descnbes the notification of ■國 minimized fault rate standardized Save Cancel Fig. 2. Process description window. process models, where the tag t, appears. The purpose of the idf part(log( TP I stag. is to decrease the impact of common words that all business process models have in common. [26 observed that people often use a surprisingly great variety of words to refer to the same thing. In order to bridge the gap between different modeling vocabularies, we determine for each keyword the set of synonyms via an extended structure of WordNet2and assign the same tag score to each word in the synonym set. For this practical purpose, we index for each relevant attribute two versions: one with word Net information included and one without Based on the extended structure of tags, the system can also determine homonyms(two terms having the same pronunciation. but with different meaning)and tags with different abstraction levels. To uncover homonyms and different abstraction levels of tags, we use the similarity measures presented in [24]. We can always extract the abovementioned tags from the business process model because they form an intrinsic part of the business process model. In the next section, we present a set of additional metadata criteria where the user can enhance the tag characterization of a process model. Each metadata criteria corresponds to an attribute of the tag characterization and thus can be either searched in isolation or in conjunction with other attributes, such as the default attributes mentioned above. Additionally both the search and the recommender component can work solely on automatically acquired data. 2. 2. Manual tagging Apart from the whole process model, users can identify coherent parts within a business process model, e.g., order approval omplaints handling, or order receipt. We index these parts in the same way as if they were regular business process models and additionally store a pointer to the business process model with which they are associated. For example, for a business process model that consists of three distinct process model parts, we would include four tag characterizations in our index: one for the whole process model, and one for each of the three parts as well. Fig. 2 shows the process model description window via which the insertion of the following metadata criteria is allowed: Process name: each business process model or part can be identified with a describing label (e.g, customer order). Purpose: the purpose fulfilled by this process model(part): analysis, documentation, execution or re-engineering (if required the user can annotate more purpose criteria Objective description: the objective fulfilled by this process model (e g, modeling handling of an order request). Process description: a textual description of the process model (part), and Property: that results from practical modeling experiences, e.g., standard denotes a standardized process. If required, the user can introduce more annotation properties. 2http://wordnetprinceton.edu!
process models, where the tag ti appears. The purpose of the idf part log j P j j fpj jti is a tag in pjg j is to decrease the impact of common words that all business process models have in common. [26] observed that people often use a surprisingly great variety of words to refer to the same thing. In order to bridge the gap between different modeling vocabularies, we determine for each keyword the set of synonyms via an extended structure of WordNet2 and assign the same tag score to each word in the synonym set. For this practical purpose, we index for each relevant attribute two versions: one with WordNet information included and one without. Based on the extended structure of tags, the system can also determine homonyms (two terms having the same pronunciation, but with different meaning) and tags with different abstraction levels. To uncover homonyms and different abstraction levels of tags, we use the similarity measures presented in [24]. We can always extract the abovementioned tags from the business process model because they form an intrinsic part of the business process model. In the next section, we present a set of additional metadata criteria where the user can enhance the tag characterization of a process model. Each metadata criteria corresponds to an attribute of the tag characterization and thus can be either searched in isolation or in conjunction with other attributes, such as the default attributes mentioned above. Additionally, both the search and the recommender component can work solely on automatically acquired data. 2.2. Manual tagging Apart from the whole process model, users can identify coherent parts within a business process model, e.g., order approval, complaints handling, or order receipt. We index these parts in the same way as if they were regular business process models and additionally store a pointer to the business process model with which they are associated. For example, for a business process model that consists of three distinct process model parts, we would include four tag characterizations in our index: one for the whole process model, and one for each of the three parts as well. Fig. 2 shows the process model description window via which the insertion of the following metadata criteria is allowed: • Process name: each business process model or part can be identified with a describing label (e.g., customer order), • Purpose: the purpose fulfilled by this process model (part): analysis, documentation, execution or re-engineering (if required the user can annotate more purpose criteria), • Objective description: the objective fulfilled by this process model (e.g., modeling handling of an order request), • Process description: a textual description of the process model (part), and • Property: that results from practical modeling experiences, e.g., standard denotes a standardized process. If required, the user can introduce more annotation properties. Fig. 2. Process description window. 2 http://wordnet.princeton.edu/. 486 A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503
A Koschmider et al. Data S Knowledge Engineering 70(2011)483-503 These metadata criteria are tated for process models or process model parts. In addition to this annotation, the user can notate each process model activity with the following metadata: cost: the costs for the design of a process activity, and quality: the quality of the design of a process activity Each of these metadata types is indexed and can be used for the query-based retrieval of business process models and process odel parts, as presented in the next sectior Example. Consider a process model part dealing with order approval. After the automatic tagging phase the corresponding tag haracterization looks as follows, description→{" order"," approval",…}…, 1st_el→{" check"," order",… Now, the user additionally tags the attribute purpose yielding the final tag characterization: description→{…}…,1stel→{-}, purpose→{" execution"H 3. Searching for process model parts The modeler has the possibility to use the search functionality at each stage of the process modeling. She can choose whether he wants to search for process model parts. the whole business process model, or both. For practical reasons, we differentiate our earch interface into two modes: 1)the most common search options and the underlying retrieval model, as discussed in Section 3. 1 and 2) the more elaborate and less frequently used options, as presented in Section 3. 2 In Section 3. 3 we introduce a method to suggest related process models, although they do not conform exactly to the specified query 3. 1. Basic search The implementation of the index and search functionality is based on the open source search engine Lucene. Lucene's data model is based on so-called documents which contain fields that have a name and a set of associated values that can be indexed. ie. to align it with the introduced syntax for tag characterizations, a Lucene document can be represented as fi-Vi.nfi-Vn. where the fi are the names of the fields and the vi the associated sets of values. Thus, the definition of tag characterizations can be mapped one-to-one to Lucene documents: each attribute ai is mapped to a field name fi and the set of associated tags Ti is mapped to the set of values Ve. The tag characterizations, i.e. the business process models, can then later be searched by providing search keywords for one attribute in isolation or for several attributes at the same time. The results of a query are scored by a mixture of he Vector Space Model and the Boolean Model (cf. 58)). desczintinuing our example from Section 1.2, the user is searching for both process model parts and entire business process models user activates WordNet in order to suggest process models, where process model objects have been labeled with respect to a different vocabulary. For each of the free text fields the user can use the standard Boolean operators AND, OR, and NOT. Additionally, she can po wildcard queries and perform fuzzy searches based on the Levenshtein distance, or Edit distance algorithm [18. The quality of the search results correlates positively with the metadata criteria introduced in the previous section In our xample, the search for a model with a documentation purpose thus requires a corresponding annotation of models beforehand. The first and last element search field were not mentioned earlier, but they are automatically acquired. this is done by using the labels of the first or last element(s) in the process model (part), converting them to attributes of the tag characterization, espectively. The advantage of these search criteria is that modelers can search for a specific input and output of the process model. For instance, the modeler is interested in process models starting with for instance send request. 3.2. Extended search To provide the range of possible designs and not to overlook variants of business processes, the process modeler can activate optional search criteria that further limit the query results(see Fig. 4) 2. 1. Process design cost and qu g To calculate the cost and the quality of a process design we adapted the functions presented in [32] for a generic process design. process design P, a business process model in our context, is a set of ordered pairs(a, Bi) P={(a1;)}
These metadata criteria are annotated for process models or process model parts. In addition to this annotation, the user can annotate each process model activity with the following metadata: • cost: the costs for the design of a process activity, and • quality: the quality of the design of a process activity. Each of these metadata types is indexed and can be used for the query-based retrieval of business process models and process model parts, as presented in the next section. Example. Consider a process model part dealing with order approval. After the automatic tagging phase the corresponding tag characterization looks as follows: ½ description→f g ”order”; ”approval”;… ; …; 1st el→f g ”check”; ”order”;… : Now, the user additionally tags the attribute purpose yielding the final tag characterization: ½ description→f g … ; …; 1st el→f g … ; purpose→f g ”execution” : 3. Searching for process model parts The modeler has the possibility to use the search functionality at each stage of the process modeling. She can choose whether she wants to search for process model parts, the whole business process model, or both. For practical reasons, we differentiate our search interface into two modes: 1) the most common search options and the underlying retrieval model, as discussed in Section 3.1 and 2) the more elaborate and less frequently used options, as presented in Section 3.2. In Section 3.3 we introduce a method to suggest related process models, although they do not conform exactly to the specified query. 3.1. Basic search The implementation of the index and search functionality is based on the open source search engine Lucene.3 Lucene's data model is based on so-called documents, which contain fields that have a name and a set of associated values that can be indexed, i.e. to align it with the introduced syntax for tag characterizations, a Lucene document can be represented as [f1→V1,…, f1→Vn], where the fi are the names of the fields and the Vi the associated sets of values. Thus, the definition of tag characterizations can be mapped one-to-one to Lucene documents: each attribute ai is mapped to a field name fi and the set of associated tags Ti is mapped to the set of values Vi. The tag characterizations, i.e. the business process models, can then later be searched by providing search keywords for one attribute in isolation or for several attributes at the same time. The results of a query are scored by a mixture of the Vector Space Model and the Boolean Model (cf. [58]). Continuing our example from Section 1.2, the user is searching for both process model parts and entire business process models describing customer approvals and orders. In this context she decides to use the query interface, as shown in Fig. 3. Additionally, the user activates WordNet in order to suggest process models, where process model objects have been labeled with respect to a different vocabulary. For each of the free text fields the user can use the standard Boolean operators AND, OR, and NOT. Additionally, she can pose wildcard queries and perform fuzzy searches based on the Levenshtein distance, or Edit distance algorithm [18]. The quality of the search results correlates positively with the metadata criteria introduced in the previous section. In our example, the search for a model with a documentation purpose thus requires a corresponding annotation of models beforehand. The first and last element search field were not mentioned earlier, but they are automatically acquired. This is done by using the labels of the first or last element(s) in the process model (part), converting them to attributes of the tag characterization, respectively. The advantage of these search criteria is that modelers can search for a specific input and output of the process model. For instance, the modeler is interested in process models starting with for instance send request. 3.2. Extended search To provide the range of possible designs and not to overlook variants of business processes, the process modeler can activate optional search criteria that further limit the query results (see Fig. 4). 3.2.1. Process design cost and quality To calculate the cost and the quality of a process design we adapted the functions presented in [32] for a generic process design. A process design P, a business process model in our context, is a set of ordered pairs (ai,θi): P = ai; θi f g ð Þ 3 http://lucene.apache.org. A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503 487
A Koschmider et al. Data S Knowledge Engineering 70(2011 )483-503 Query interface ustomer approval AND order] wordNet rst Ele Last Element Property esource Process Part O Business Process Both[Search Fig 3. Query for all process model (parts)that are related to customer approvals. where a is a process activity and 0, is the starting time of a,(e.., can be considered as an integer number indicating the relative point of time where activity a is executed. ) Each process activity a is characterized by a vector of K attribute values v=(vi1,, V, 1x..V,1,). e.g,(VL cost, Vi quality) for cost and quality of the process design. The evaluation of the process model design in attribute k is obtained from the evaluation V, 1, of the activities contained in the process model design via an aggregation function fi, which is specified for attribute k; e.g. the aggregation over all costs could b done with the function fcost The cost function for the process design is defined as follows: (aB)epi,cost where fcost is the sum of design costs of all process activities in the design. For attributes like cost, fcost is the sum of evaluations of all activities contained in the design. Thus, given a set R=(Pl, P2 ., PN)of N potential business process models matching the query we are interested in the Pi with the lowest costs of a process design, ie. minie(,N Ucost(P). The quality function is defined by the lowest quality of any process activity included in the process design fquality(P)= min(a, b, )epviquality where fquality is the quality of the process design. Here, for a setR=(Pl, P2. PN)of N potential business process models matching the query, we are interested in the P, with the highest quality of a process design, i.e, max=(1.. y quality()). a process model with lower design costs and of high quality is ranked higher than less significant process models. 3.2.2. Structural errors In our scenario a structural error can only occur in the interconnected process model, i. e the currently edited business process odel, which has been extended by a selected recommendation. An interconnected business process model is considered structurally correct if it complies with the well-structuredness property [68]. This structural property for business process models is M process design costs M process design quality structural errors earch Cancel Fig. 4. Optional search criteria
where ai is a process activity and θi is the starting time of ai (e.g., θi can be considered as an integer number indicating the relative point of time where activity ai is executed.). Each process activity ai is characterized by a vector of K attribute values vi=(vi, 11 ,vi, 12 ,…,vi, 1k ), e.g., (vi, cost,vi, quality) for cost and quality of the process design. The evaluation of the process model design in attribute k is obtained from the evaluation vi, 1k of the activities contained in the process model design via an aggregation function f1k , which is specified for attribute k; e.g. the aggregation over all costs could be done with the function fcost. The cost function for the process design is defined as follows: fcostð Þ P = ∑i:ð Þ ai;θi ∈Pvi;cost where fcost is the sum of design costs of all process activities in the design. For attributes like cost, fcost is the sum of evaluations of all activities contained in the design. Thus, given a set R= {P1, P2,…,PN} of N potential business process models matching the query, we are interested in the Pi with the lowest costs of a process design, i.e., mini ∈ {1,..,N}(fcost(Pi)). The quality function is defined by the lowest quality of any process activity included in the process design: fqualityð Þ P = mini: ai;θi ð Þ∈Pvi;quality where fquality is the quality of the process design. Here, for a set R= {P1, P2,…,PN} of N potential business process models matching the query, we are interested in the Pj with the highest quality of a process design, i.e., maxj ∈ {1,…,N}( fquality(Pj)). A process model with lower design costs and of high quality is ranked higher than less significant process models. 3.2.2. Structural errors In our scenario a structural error can only occur in the interconnected process model, i.e. the currently edited business process model, which has been extended by a selected recommendation. An interconnected business process model is considered structurally correct if it complies with the well-structuredness property [68]. This structural property for business process models is Fig. 4. Optional search criteria. Fig. 3. Query for all process model (parts) that are related to customer approvals. 488 A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503
A Koschmider et al. Data 8 Knowledge Engineering 70(2011)483-503 Process Name Score Description CustomerOrder This process verify customer order 97. 45 The process.I 10 Handle Orders 30.45 On the basis Open graphical view Fig. 5. Limited table-based result list of recommendation. violated if, for example, an alternative flow initiated by an OR-split is synchronized by an AND-join Syntactically correct processes are ranked higher than process recommendations that will cause deadlocks in case of interconnection. Beside the well- structuredness property the verification might be extended for checking free-choiceness. Free-choiceness is also a structural analysis property and can easily be decided. Due to capacity and resource restrictions, the recommendation-based editor incorporates only a partial structural analysis. a deadlock can also occur in the interconnected business process model if, for instance, the model is not s-coverable. We decided for the verification of structural errors (not behavioral), which can easily be detected whenever the user inserts new nodes into her workspace. The query posed by the user in Fig 3 resulted in 10 business process models. These recommendations are ranked(so far) by their Lucene score. The results are displayed to the user in a table-based result list as shown in Fig. 5. The final ranking algorithm is a weighted sum of different ranking criteria and will be explained in more detail in Section 4. If the user is interested in a recommendation she can open a graphical view by selecting the corresponding row. 33. Related search results In Fig. 5 the recommender system only suggests process models which match the user's modeling intention. However, proces models may be appropriate for a particular intention even when they have been designed for a different purpose. It is therefore often useful to suggest related process models although they do not conform exactly to the specified query. For this reason, we take a user profile into consideration which is based on the users search history. to define a user session, the following information is a sequence of accessed recommendations by the user u sr=, where qu, E Q and Q is the set of all posed queries by the user u, a sequence sm of newly created models or models opened for being editing by the user u where sm=<pmu, pm, and pmu, E P and P is the set of all newly created or edited models by the user u es Storing the user session data helps users to preview related process models with different modeling purposes. For this, we and provides the information required to fully understand the process model. Based on the documentation model the user will then generate a model for analysis purpose. Next, an execution model is created based upon a model, which fulfills the expectations defined in the scope of the business process modeling project Supported by the assumption that users follow these stages of a business process model life-cycle we can provide the feature of previewing all phases of the BPM life-cycle, from the early documentation of a process through subsequent phases of analysis and execution. Based on the user profile data, the system can also provide related process model parts(that were created by previous users). which are used in the user's current modeling domain(e.g, Manufacturing) and which succeed or precede the model pa question. The idea is that process parts that succeed or precede the part in question and were used in the same modeling domain he user is in at the moment can help to estimate the degree of fitness of a recommended model pa Therefore, each business process model (and thus each sub-process model part that occurs in this model) is assigned to a modeling domain before it is added to the process repository. We assume that the number of possible domains is usually known within a company; hence, we can provide the user with an interface where she can select the process domain. Additionally, the process property can be provided in this stage. ranched or the places of a backward branched transition are not forward branche A model is s-coverabl from s-components [68]-
violated if, for example, an alternative flow initiated by an OR-split is synchronized by an AND-join. Syntactically correct processes are ranked higher than process recommendations that will cause deadlocks in case of interconnection. Beside the wellstructuredness property the verification might be extended for checking free-choiceness. Free-choiceness4 is also a structural analysis property and can easily be decided. Due to capacity and resource restrictions, the recommendation-based editor incorporates only a partial structural analysis. A deadlock can also occur in the interconnected business process model if, for instance, the model is not s-coverable.5 We decided for the verification of structural errors (not behavioral), which can easily be detected whenever the user inserts new nodes into her workspace. The query posed by the user in Fig. 3 resulted in 10 business process models. These recommendations are ranked (so far) by their Lucene score. The results are displayed to the user in a table-based result list as shown in Fig. 5. The final ranking algorithm is a weighted sum of different ranking criteria and will be explained in more detail in Section 4. If the user is interested in a recommendation she can open a graphical view by selecting the corresponding row. 3.3. Related search results In Fig. 5 the recommender system only suggests process models which match the user's modeling intention. However, process models may be appropriate for a particular intention even when they have been designed for a different purpose. It is therefore often useful to suggest related process models although they do not conform exactly to the specified query. For this reason, we take a user profile into consideration which is based on the user's search history. To define a user session, the following information is used: • a sequence of accessed recommendations by the user u sr=brui ,rui + 1,…,run N, where rui ∈R and R is the set of all accessed recommendations by the user u, • a sequence of posed queries by the user u (after removing stop words) sq=bqui ,qui + 1,…, qum N, where qui ∈ Q and Q is the set of all posed queries by the user u, • a sequence sm of newly created models or models opened for being editing by the user u where sm=bpmui , pmui + 1,…, pmun N, and pmui ∈ P and P is the set of all newly created or edited models by the user u. Storing the user session data helps users to preview related process models with different modeling purposes. For this, we assume that users will first create a model for documentation purposes that depicts the current state of the business process model and provides the information required to fully understand the process model. Based on the documentation model the user will then generate a model for analysis purpose. Next, an execution model is created based upon a model, which fulfills the expectations defined in the scope of the business process modeling project. Supported by the assumption that users follow these stages of a business process model life-cycle we can provide the feature of previewing all phases of the BPM life-cycle, from the early documentation of a process through subsequent phases of analysis and execution. Based on the user profile data, the system can also provide related process model parts (that were created by previous users), which are used in the user's current modeling domain (e.g., Manufacturing) and which succeed or precede the model part in question. The idea is that process parts that succeed or precede the part in question and were used in the same modeling domain the user is in at the moment can help to estimate the degree of fitness of a recommended model part. Therefore, each business process model (and thus each sub-process model part that occurs in this model) is assigned to a modeling domain before it is added to the process repository. We assume that the number of possible domains is usually known within a company; hence, we can provide the user with an interface where she can select the process domain. Additionally, the process property can be provided in this stage. Fig. 5. Limited table-based result list of recommendation. 4 The transitions of a forward branched place are not backward branched or the places of a backward branched transition are not forward branched [20]. 5 A model is s-coverable if it can be composed from s-components [68]. A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503 489
A Koschmider et al. Data S Knowledge Engineering 70(2011 )483-503 Results 2 out of 10 CustomerOrder his process ives an overview of the approval process 3口 for customer orders. This process indudes o口r Show related process parts Orderappr customer order ss describes the venfcaton of order reque the automaled comparison of the. how related process parts Show related process models ig. 6. Graphical view of selected recommendations. Continuing our example of Fig. 5 the user decides to open the two recommendations Customer Order and verify customer order. If she wants to open process model parts which succeed or precede the model part in question she will push the button Show related process parts. The stages of the business process model life-cycle(documentation, analysis, execution) will then be visualized 4. Ranking of recommendation Then the modeling context(e.g the control-flow )is also crucial for the recommendation process To improve the ranking of recommendations we extend the lucene score, which was introduced in Section 3, with three additional criteria that are described in this section 1. Frequency score: describes how often a process model has been selected by other users and refers to an implicit user feedback and patterns observed in other users' preferences 2. Operation score: indicates the average number of deletions or insertions made when selecting a recommendation and also refers to patterns observed in other users' preference 3. Process design score and process design quality, which ntroduced in Section 3. 2 Assume that the user invokes the recommender component and is searching for business process model parts as well as for rocess models(cf Fig. 1). To determine relevant process model parts, we extract the labels of process elements currently being edited and remove common stop words, which yields the set Traw. The remaining query tag candidates Traw are then expanded with their related synonym sets, as described in Section 3, resulting in the set Tauery. The initial process model parts are then determined by querying the Lucene index, where the query term is the concatenation of all tags in the set Tquery. Subsequently, the frequency score, structural errors and deleted or inserted elements are computed to rerank the recommendations 4.1. Frequency score As already mentioned, we assume that users independently declare logically coherent process model parts which are stored with the metadata, as described in Fig. 2, in the repository. This bears the risk that users store useless process model parts in the repository because no consistency check is applied (a process model part with one element might be regarded as useless ). To remedy this, we integrate a frequency score into our ranking that reflects the reuse of a process model part in the past. If process model parts have been refreshed, respectively updated, the user will be informed about this with a corresponding remark Process nodels, which have been updated, are assigned with the same frequency score as the old process model version. If users deci against the updated process model (and prefer the previous version of the process model), then the frequency score will automatically decrease(relative to recommendations, which have been selected more often). that the user has requested a recor ation in an AND-branch the system also looks for a corresponding AND-branch Note that ITquenyl=2i-1 SynSet(raw). where Traw=[raw.,rawn)
Continuing our example of Fig. 5 the user decides to open the two recommendations Customer Order and verify customer order. If she wants to open process model parts which succeed or precede the model part in question she will push the button Show related process parts. The stages of the business process model life-cycle (documentation, analysis, execution) will then be visualized. 4. Ranking of recommendations The process recommendations depicted in Fig. 6 are ranked according to the Lucene score. This is a very rudimental ranking of recommendations. A more advanced ranking is required in case that the user already inserted process elements in her workspace. Then the modeling context (e.g., the control-flow6 ) is also crucial for the recommendation process. To improve the ranking of recommendations we extend the Lucene score, which was introduced in Section 3, with three additional criteria that are described in this section: 1. Frequency score: describes how often a process model has been selected by other users and refers to an implicit user feedback and patterns observed in other users' preferences, 2. Operation score: indicates the average number of deletions or insertions made when selecting a recommendation and also refers to patterns observed in other users' preferences, 3. Process design score and process design quality, which were introduced in Section 3.2. Assume that the user invokes the recommender component and is searching for business process model parts as well as for process models (cf. Fig. 1). To determine relevant process model parts, we extract the labels of process elements currently being edited and remove common stop words, which yields the set Traw. The remaining query tag candidates Traw are then expanded with their related synonym sets, as described in Section 3, resulting in the set Tquery. 7 The initial process model parts are then determined by querying the Lucene index, where the query term is the concatenation of all tags in the set Tquery. Subsequently, the frequency score, structural errors and deleted or inserted elements are computed to rerank the recommendations. 4.1. Frequency score As already mentioned, we assume that users independently declare logically coherent process model parts which are stored with the metadata, as described in Fig. 2, in the repository. This bears the risk that users store useless process model parts in the repository because no consistency check is applied (a process model part with one element might be regarded as useless). To remedy this, we integrate a frequency score into our ranking that reflects the reuse of a process model part in the past. If process model parts have been refreshed, respectively updated, the user will be informed about this with a corresponding remark. Process models, which have been updated, are assigned with the same frequency score as the old process model version. If users decide against the updated process model (and prefer the previous version of the process model), then the frequency score will automatically decrease (relative to recommendations, which have been selected more often). Fig. 6. Graphical view of selected recommendations. 6 In case that the user has requested a recommendation in an AND-branch the system also looks for a corresponding AND-branch. 7 Note that |Tquery|=∑i= 1 n |SynSet (rawi)|, where Traw= {raw1,…, rawn}. 490 A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503
A Koschmider et al. Data 8 Knowledge Engineering 70(2011)483-503 To determine the number of times a process model has been reused, we adapt the user count algorithm presented in 51]. Let U and P be the set of all users and processes, and pi be the number of selections of process j by the user i. The rating ruk, for the number of users u who have selected the process k is: where ti is calculated as follows: ∫0pk=0) ≥ tik equals O if the user i has never selected the process k; otherwise it equals 1. The ranking ruk, for the number of selections of all users is calculated by: The range of this value is [0, 1). The score freqScr for a user u selecting a process p can then be determined as: Let us assume that the second process in Fig. 5 has been selected more often than the first one(e.g. 5 vS. 3 times ). After reranking(due to the frequency), the recommendation system would list the process verify customer order higher than the process 4.2. Operation score The user can insert the first recommendation in Fig. 2 into her workspace even if it causes a TP handle (an AND-split is nchronized by an OR-join)[56. She needs to decide how to improve this business process model. In this context, she can delete he two transitions which are joined, or she can insert two places(following these two transitions), resulting in an AND join. To calculate the operation score, we use the following equation, which determines the average number of inserted elements avglnsNodes-1 x freqScore_1 t currans Nodes, where i=0,., n, avginsNodeso=0, freqScore is the frequency score for a specific process model and currinsNodes describes the number of newly inserted elements in a specific process model. To calculate the average number of deleted elements in a process model, we use the following definition avgDelNodesi-_, freqScorei-1 currDelNodes where analogously to this definition, i=0,., n, avgDelNodeso=0 and currDelNodes describes the number of deleted elements in a pecific process. To calculate the number of inserted respectively deleted elements(currans Nodes and currDelNodes)in a specific process model, we recursively retrieve all these elements. 4.3. Overall ranking The basic ranking is determined with the following formula R1: =WI X searchScr W2 x freqScr W3 x avglnsNodes Wa X avgDelNodes+ ws x fcost(P)+ w6 x Quality(P) 1W;=l, and the Lucene score is assigned the greatest weight, i., the Lucene score has the greatest impact on the g as was confirmed in our empirical study. If the user has not selected any optional criteria in the extended search, then both fcost(P)and fquality(P)equal zero For such a case, the reranking of search results of Fig. 6 gives the following final descending order as shown in Table 1 Process elements which cause(in case being edited) structural problems are highlighted with a gray rectangle in the graphical iew of the recommendation
To determine the number of times a process model has been reused, we adapt the user count algorithm presented in [51]. Let U and P be the set of all users and processes, and pij be the number of selections of process j by the user i. The rating ruk1 for the number of users u who have selected the process k is: ruk1 : = ∑i∈Utik jUj where tik is calculated as follows: tik : = 0ð Þ pik = 0 1ð Þ pik ≥ 1 tik equals 0 if the user i has never selected the process k; otherwise it equals 1. The ranking ruk2 for the number of selections of all users is calculated by: ruk2 : = ∑i∈Upik 1 + ∑i∈U∑j∈Ppij : The range of this value is [0, 1). The score freqScr for a user u selecting a process p can then be determined as: freqScr : = ruk1 + ruk2 2 : Let us assume that the second process in Fig. 5 has been selected more often than the first one (e.g., 5 vs. 3 times). After reranking (due to the frequency), the recommendation system would list the process verify customer order higher than the process part CustomerOrder. 4.2. Operation score The user can insert the first recommendation in Fig. 2 into her workspace even if it causes a TP handle (an AND-split is synchronized by an OR-join) [56]. She needs to decide how to improve this business process model. In this context, she can delete the two transitions which are joined, or she can insert two places (following these two transitions), resulting in an AND join. To calculate the operation score, we use the following equation, which determines the average number of inserted elements: avgInsNodesi : = avgInsNodesi−1 × freqScorei−1 + currInsNodesi freqScorei where i= 0, …, n, avgInsNodes0= 0, freqScore is the frequency score for a specific process model and currInsNodes describes the number of newly inserted elements in a specific process model. To calculate the average number of deleted elements in a process model, we use the following definition: avgDelNodesi : = avgDelNodesi−1 × freqScorei−1 + currDelNodesi freqScorei where analogously to this definition, i= 0, …, n, avgDelNodes0= 0 and currDelNodes describes the number of deleted elements in a specific process. To calculate the number of inserted respectively deleted elements (currInsNodes and currDelNodes) in a specific process model, we recursively retrieve all these elements. 4.3. Overall ranking The basic ranking is determined with the following formula: R1 : = w1 × searchScr + w2 × freqScr + w3 × avgInsNodes + w4 × avgDelNodes + w5 × fcostð Þ P + w6 × fqualityð Þ P where ∑6 i = 1wi = 1, and the Lucene score is assigned the greatest weight, i.e., the Lucene score has the greatest impact on the ranking as was confirmed in our empirical study. If the user has not selected any optional criteria in the extended search, then both values fcost(P) and fquality(P) equal zero. For such a case, the reranking of search results of Fig. 6 gives the following final descending order as shown in Table 1. Process elements which cause (in case being edited) structural problems are highlighted with a gray rectangle in the graphical view of the recommendations. A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503 491
A Koschmider et al./ Data 8 Knowledge Engineering 70(2011)483-503 Table 1 Ranked process model recommendations. Description Frequency Avg deletion Avg insertion Customer order 9802 verify customer order his describes 5. The usefulness and efficiency of a recommendation-based editor The goal of the empirical study presented in this section is to investigate the usefulness(can modelers complete their modeling task?) and efficiency (how long do modelers need?) of the recommendation-based editor. Usually these characteristics are measured through a questionnaire that asks participants about their impression of the system. during our study we logged all actions that were performed by respondents when working with the recommendation system in order to reason about its usefulness and efficiency. The participants of our study also received a questionnaire. They were only asked to justify their decision in selecting a recommendation instead of stating the system s usability (system design). Initially it is necessary to focus on the usefulness and efficiency of the modeling support system instead of asking about its system design. This study has been conducted in two steps and with inhomogeneous participants. Therefore, a discussion about working experiences with respect to usefulness and efficiency of the support system is also essential. To find arguments for the usefulness and efficiency of the recommendation-based editor we initially studied the Cognitive Load Theory( CLT)[65. CLT should provide a theoretical basis for hypothesizing about the willingness of using the recommendation- ased editor and about the reduction of modeling time(when using the system). CLT [65 suggests that learning is associated with cognitive overload. The working memory should have nough capabilities to cognitive load theory confirms that the working memory is very limited when handling new information because initially no mechanism is available that coordinates novel information. a heavy cognitive load results if novel material has to be understood. Rote learning in contrast reduces cognitive load but understanding suffers. Sweller [65 concludes that exercises reduce cognitive learners asked to imagine concepts or procedures are superior to learners asked to study those concepts or procedures Which load and finally the working memory overload. Recently, Sweller [66 stated that"the imagination effect according to Translated to the intention of this study, the recommendation of suitable process model parts can be regarded as an inspiration for the modeling task. Instead of putting effort in modeling a process from scratch users are guided in this task and thus cognitive oad decreases On the other hand it is worthy to examine if cognitive load increases due to the amount of information that needs multaneously to be processed and understood(e.g. the result list of recommendations, graphical view of recommendations). ardingly, two questions arise: Are process modelers willing to use our modeling support system or do they prefer to build up a process model from scratch? case users decide to reuse process models? The affirmation of this question is desirable since it would give a strong substantiation for modeling support tools. Following Sweller's notion the exploration of this question can also show a contrary result(cognitive overload due to the amount of e need also to investigate if inhomogeneity of modelers has an impact on the recommendation system. Therefore, the study has been conducted with participants of heterogeneous working experiences. In particular, we are interested in the following Is the tool equally useful for all users? Is the system more useful for process modeling than for process I experts? Our assumption is that process modelers with less experience are more willing ed by the support If process modelers follow the recommendations, does the system increase the productivi modelers? This assumptio should clarify whether or not process modelers can apply modeling processes independently of their level of experiences. To broaden knowledge about user behavior we studied works on expert/novice differences [19, 55, 73 and the Theory of Planned Behavior(TPB)[2). Several expert/novice studies have been conducted for programming(we refer to [19] for an overview of related works),case analysis [22] and computer interaction [55 One result of expert/novice studies in case analysis is that the time spent on analysis id not differ between novices and experts. All studies confirm qualitative differences between novices and experts because novices have less knowledge chunks than experts [73 Another study also determined that experts did not make less errors than novices when working with the computer 55]. These results on expert/novice differences should be considered in the study about the recommendation-based editor. The question is posed whether qualitative differences exist for process models that are created based upon models of previous users. The expert/novice studies give rise to suspicion that the modeling time remains the same for different types of users. If perceived usefulness is considered as time saved through using the recommendation system, then we assume an identical usefulness of our
5. The usefulness and efficiency of a recommendation-based editor The goal of the empirical study presented in this section is to investigate the usefulness (can modelers complete their modeling task?) and efficiency (how long do modelers need?) of the recommendation-based editor. Usually these characteristics are measured through a questionnaire that asks participants about their impression of the system. During our study we logged all actions that were performed by respondents when working with the recommendation system in order to reason about its usefulness and efficiency. The participants of our study also received a questionnaire. They were only asked to justify their decision in selecting a recommendation instead of stating the system's usability (system design). Initially it is necessary to focus on the usefulness and efficiency of the modeling support system instead of asking about its system design. This study has been conducted in two steps and with inhomogeneous participants. Therefore, a discussion about working experiences with respect to usefulness and efficiency of the support system is also essential. 5.1. Theoretical background To find arguments for the usefulness and efficiency of the recommendation-based editor we initially studied the Cognitive Load Theory (CLT) [65]. CLT should provide a theoretical basis for hypothesizing about the willingness of using the recommendationbased editor and about the reduction of modeling time (when using the system). CLT [65] suggests that learning is associated with cognitive overload. The working memory should have enough capabilities to learn and process new information and should not be cognitively overloaded in order to facilitate an effective learning. The cognitive load theory confirms that the working memory is very limited when handling new information because initially no mechanism is available that coordinates novel information. A heavy cognitive load results if novel material has to be understood. Rote learning in contrast reduces cognitive load but understanding suffers. Sweller [65] concludes that exercises reduce cognitive load and finally the working memory overload. Recently, Sweller [66] stated that “the imagination effect according to which learners asked to imagine concepts or procedures are superior to learners asked to study those concepts or procedures”. Translated to the intention of this study, the recommendation of suitable process model parts can be regarded as an inspiration for the modeling task. Instead of putting effort in modeling a process from scratch users are guided in this task and thus cognitive load decreases. On the other hand it is worthy to examine if cognitive load increases due to the amount of information that needs simultaneously to be processed and understood (e.g., the result list of recommendations, graphical view of recommendations). Accordingly, two questions arise: • Are process modelers willing to use our modeling support system or do they prefer to build up a process model from scratch? • Can the modeling time be reduced by using our system? If users follow the recommendations, will the modeling time decrease in case users decide to reuse process models? The affirmation of this question is desirable since it would give a strong substantiation for modeling support tools. Following Sweller's notion the exploration of this question can also show a contrary result (cognitive overload due to the amount of new information). We need also to investigate if inhomogeneity of modelers has an impact on the recommendation system. Therefore, the study has been conducted with participants of heterogeneous working experiences. In particular, we are interested in the following questions: • Is the tool equally useful for all users? Is the system more useful for process modeling beginners than for process modeling experts? Our assumption is that process modelers with less experience are more willing to be guided by the support system. • If process modelers follow the recommendations, does the system increase the productivity of the modelers? This assumption should clarify whether or not process modelers can apply modeling processes independently of their level of experiences. To broaden knowledge about user behavior we studied works on expert/novice differences [19,55,73] and the Theory of Planned Behavior (TPB) [2]. Several expert/novice studies have been conducted for programming (we refer to [19] for an overview of related works), case analysis [22] and computer interaction [55]. One result of expert/novice studies in case analysis is that the time spent on analysis did not differ between novices and experts. All studies confirm qualitative differences between novices and experts because novices have less knowledge chunks than experts [73]. Another study also determined that experts did not make less errors than novices when working with the computer [55]. These results on expert/novice differences should be considered in the study about the recommendation-based editor. The question is posed whether qualitative differences exist for process models that are created based upon models of previous users. The expert/novice studies give rise to suspicion that the modeling time remains the same for different types of users. If perceived usefulness is considered as time saved through using the recommendation system, then we assume an identical usefulness of our Table 1 Ranked process model recommendations. # Process name Score Description Frequency Avg. deletion Avg. insertion 2 Customer order 98.02 This process allows… 3 7 10 1 Verify customer order 97.45 This process describes… 55 3 492 A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503