User Model User-Adap Inter(2009)19: 167-206 DOI10.1007/s11257-008-9057-x ORIGINAL PAPER Interaction design guidelines on critiquing-based recommender systems Li Chen· Pearl Pu Received: 19 September 2007/ Accepted in revised form: 25 August 2008/ Published online: 3 October 2008 O Springer Science+Business Media B V. 2008 Abstract A critiquing-based recommender system acts like an artificial salesperson. It engages users in a conversational dialog where users can provide feedback in the form of critiques to the sample items that were shown to them. The feedback, in turn, enables the system to refine its understanding of the users preferences and prediction of what the user truly wants. The system is then able to recommend products that may better stimulate the user's interest in the next interaction cycle. In this paper, we report our extensive investigation of comparing various approaches in devising critiquing opportunities designed in these recommender systems. More specifically, we have investigated two major design elements which are necessary for a critiquing based recommender system: critiquing coverage--one vs. multiple items that are returned during each recommendation cycle to be critiqued; and critiquing aid- system-suggested critiques(i. e, a set of critique suggestions for users to select)vs user-initiated critiquing facility (i.e, facilitating users to create critiques on their own) Through a series of three user trials, we have measured how real-users reacted to systems with varied setups of the two elements. In particular, it was found that giving users the choice of critiquing one of multiple items(as opposed to just one) has significantly positive impacts on increasing users' decision accuracy(particularly in the first recommendation cycle)and saving their objective effort(in the later critiquing ycles). As for critiquing aids, the hybrid design with both system-suggested critiques and user-initiated critiquing support exhibits the best performance in inspiring users decision confidence and increasing their intention to return, in comparison with the uncombined exclusive approaches. Therefore, the results from our studies shed light L Chen(四)·PP Human Computer Interaction Group, School of Computer and Communication Sciences Swiss Federal Institute of Technology in Lausanne(EPFL), 1015 Lausanne, Switzerland e-mail: li chen( epfl. ch P Pu e-mail: pearl- pu(@epfl. ch
User Model User-Adap Inter (2009) 19:167–206 DOI 10.1007/s11257-008-9057-x ORIGINAL PAPER Interaction design guidelines on critiquing-based recommender systems Li Chen · Pearl Pu Received: 19 September 2007 / Accepted in revised form: 25 August 2008 / Published online: 3 October 2008 © Springer Science+Business Media B.V. 2008 Abstract A critiquing-based recommender system acts like an artificial salesperson. It engages users in a conversational dialog where users can provide feedback in the form of critiques to the sample items that were shown to them. The feedback, in turn, enables the system to refine its understanding of the user’s preferences and prediction of what the user truly wants. The system is then able to recommend products that may better stimulate the user’s interest in the next interaction cycle. In this paper, we report our extensive investigation of comparing various approaches in devising critiquing opportunities designed in these recommender systems. More specifically, we have investigated two major design elements which are necessary for a critiquingbased recommender system: critiquing coverage—one vs. multiple items that are returned during each recommendation cycle to be critiqued; and critiquing aid— system-suggested critiques (i.e., a set of critique suggestions for users to select) vs. user-initiated critiquing facility (i.e., facilitating users to create critiques on their own). Through a series of three user trials, we have measured how real-users reacted to systems with varied setups of the two elements. In particular, it was found that giving users the choice of critiquing one of multiple items (as opposed to just one) has significantly positive impacts on increasing users’ decision accuracy (particularly in the first recommendation cycle) and saving their objective effort (in the later critiquing cycles). As for critiquing aids, the hybrid design with both system-suggested critiques and user-initiated critiquing support exhibits the best performance in inspiring users’ decision confidence and increasing their intention to return, in comparison with the uncombined exclusive approaches. Therefore, the results from our studies shed light L. Chen (B) · P. Pu Human Computer Interaction Group, School of Computer and Communication Sciences, Swiss Federal Institute of Technology in Lausanne (EPFL), 1015 Lausanne, Switzerland e-mail: li.chen@epfl.ch P. Pu e-mail: pearl.pu@epfl.ch 123
L Chen P Pu on the design guidelines for determining the sweetspot balancing user initiative and system support in the development of an effective and user-centric critiquing-based recommender system. Keywords Critiquing-based recommender systems. Decision support Preference revision. User control. Example critiquing Dynamic critiquing. Hybrid critiquing User evaluation Usability. Human-computer interaction 1 Introduction According to adaptive decision theory(Payne et al. 1993), the human decision process is inherently highly constructive and adaptive to the current decision task and decision environment. In particular, when users are confronted with an unfamiliar product domain or a complex decision situation with overwhelming information, such as the current e-commerce environment, they are usually unable to accurately state their preferences at the outset( Viappiani et al. 2007) but likely construct them in a highly context-dependent fashion during their decision process(Tversky and Simonson 1993 Payne et al. 1999; Carenini and Poole 2002) In order to assist people in making accurate as well as confident decisions, espec in the complex decision setting, critiquing-based recommender systems have emerged in the form of both natural language models(Shimazu 2001; Thompson et al. 2004)and graphical user interfaces(Burke et al. 1996, 1997; Reilly et al. 2004; Pu and Kumar 2004). This type of system has been broadly recognized as an effective feedback mechanism that may guide users to efficiently target at their ideal products, which is particularly meaningful when users are searching for high-involvement products(e.g omputers, houses and cars) with the primary goal of avoiding any financial damage Other terms for these systems are conversational recommender systems(Smyth and McGinty 2003), conversational case-based reasoning systems(Shimazu 2001), and knowledge-based recommender systems(Burke et al. 1997; Burke 2000) More specifically, the critiquing-based recommender system mainly acts like an artificial salesperson that engages users in a conversational dialog where users can provide feedback in form of critiques(e.g, "I like this laptop, but prefer something heaper"or"with faster processor speed") to one of currently recommended items The feedback, in turn, enables the system to more accurately predict what the user truly wants and then return some products that may better interest the user in the next conversational cycle. The main component of this interaction model is therefore that of recommendation-and-critiquing, which is also called tweaking(Burke et al. 1997), critiquing feedback(Smyth and McGinty 2003), candidate/critiquing(Linden et al 1997), and navigation by proposing(Shimazu 2001) To our knowledge, the critiquing concept was first mentioned in the RABBITsystem (Williams and Tou 1982)as a new interface paradigm for formulating queries to a data- base. In recent years, it has evolved into two principal branches. One has been aimin to pro-actively generate a set of knowledge-based critiques that users may be prepared to accept as ways to improve the current pr this paper). This mechanism has been adopted in FindMe systems(Burke et al. 1997)
168 L. Chen, P. Pu on the design guidelines for determining the sweetspot balancing user initiative and system support in the development of an effective and user-centric critiquing-based recommender system. Keywords Critiquing-based recommender systems · Decision support · Preference revision · User control · Example critiquing · Dynamic critiquing · Hybrid critiquing · User evaluation · Usability · Human–computer interaction 1 Introduction According to adaptive decision theory (Payne et al. 1993), the human decision process is inherently highly constructive and adaptive to the current decision task and decision environment. In particular, when users are confronted with an unfamiliar product domain or a complex decision situation with overwhelming information, such as the current e-commerce environment, they are usually unable to accurately state their preferences at the outset (Viappiani et al. 2007) but likely construct them in a highly context-dependent fashion during their decision process (Tversky and Simonson 1993; Payne et al. 1999; Carenini and Poole 2002). In order to assist people in making accurate as well as confident decisions, especially in the complex decision setting, critiquing-based recommender systems have emerged in the form of both natural language models (Shimazu 2001; Thompson et al. 2004) and graphical user interfaces (Burke et al. 1996, 1997; Reilly et al. 2004; Pu and Kumar 2004). This type of system has been broadly recognized as an effective feedback mechanism that may guide users to efficiently target at their ideal products, which is particularly meaningful when users are searching for high-involvement products (e.g., computers, houses and cars) with the primary goal of avoiding any financial damage. Other terms for these systems are conversational recommender systems (Smyth and McGinty 2003), conversational case-based reasoning systems (Shimazu 2001), and knowledge-based recommender systems (Burke et al. 1997; Burke 2000). More specifically, the critiquing-based recommender system mainly acts like an artificial salesperson that engages users in a conversational dialog where users can provide feedback in form of critiques (e.g., “I like this laptop, but prefer something cheaper” or “with faster processor speed”) to one of currently recommended items. The feedback, in turn, enables the system to more accurately predict what the user truly wants and then return some products that may better interest the user in the next conversational cycle. The main component of this interaction model is therefore that of recommendation-and-critiquing, which is also called tweaking (Burke et al. 1997), critiquing feedback (Smyth and McGinty 2003), candidate/critiquing (Linden et al. 1997), and navigation by proposing (Shimazu 2001). To our knowledge, the critiquing concept was first mentioned in the RABBIT system (Williams and Tou 1982) as a new interface paradigm for formulating queries to a database. In recent years, it has evolved into two principal branches. One has been aiming to pro-actively generate a set of knowledge-based critiques that users may be prepared to accept as ways to improve the current product (termed system-suggested critiquesin this paper). This mechanism has been adopted in FindMe systems (Burke et al. 1997) 123
Interaction design guidelines and more recent Dynamic Critiquing agents(Reilly et al. 2004; McCarthy et al. 2005c) The advantage, as detailed in related literatures(Reilly et al. 2004; McCarthy et al. 2004b: Mc Sherry 2004), is that system-suggested critiques can not only expose the knowledge of remaining recommendation opportunities, but also potentially accel- erate the user's critiquing process if they can correspond well to the user's intended feedback criteria An alternative critiquing mechanism does not propose pre-computed critiques, but provides a facility to stimulate users to freely create and combine critiques themselves (so called user-initiated critiquing support in this paper). As a typical application, the Example Critiquing agent has been developed for this goal, and its focus is showing examples and facilitating users to compose their self-initiated critiques( Pu and Kumar 2004). In essence, the Example Critiquing agent is capable of allowing users to choose which feature(s)to be critiqued and how to critique it (or them)under their own control. Previous work proved that it enabled users to obtain significantly higher decision accuracy and preference certainty, compared to non critiquing-based systems such as a ranked list(Pu and Kumar 2004; Pu and Chen 2005) In addition to characterizing the critiquing-based recommender system in terms of its nature of critiquing support (i. e, system-suggested critiques or user-initiated cri tiquing support), another important factor is the number of items that the systemreturms during each recommendation cycle for users to critique. For example, FindMe and Dynamic Critiquing systems return one item, whereas Example Critiquing agents show multiple k items(e. g, k=7)at a cycle. Multi-item display provides users a chance to choose the product to be critiqued after making a comparison between several options Thus, there are in nature two crucial design components contained in a critiquing- based recommender system. One is its critiquing aid: suggesting critiques for users to select or aiding them to construct their own critiques. Another is the number of recommended items(called critiquing coverage in this paper ): suggesting a single vs. nultiple products for users to critique The options are inherently related to different levels of user control in either the process of identifying the critiqued reference or the process of specifying concrete critiquing criteria. As a matter of fact, perceived behavioral control has been regarded as an important determinant of user beliefs and actual behavior(Ajzen 1991). In the context of e-commerce, it has been found to have a positive effect on customers attitudes including their perceived ease of use, perceived usefulness and trust(Novak et al. 2000; Koufaris and Hampton-Sosa 2002). User control has been also determined as one of the fundamental principles for general user interface design(Shneiderman 1997)and Web usability(Nielsen 1994). However, there are few works having studied the effect of locus of user initiative in critiquing-based recommender systems. There is indeed a complex tradeoff that underlies the successful design: giving users too much control may cause them to perform an unnecessary complex critiquing, whereas giving little or no control may force users to accept system-suggested items even though they do not match users truly-intended choices. The goal of this paper is therefore to investigate the different degrees of user control vs system support in both critiquing aid and critiquing co influence users'actual decision performance and subjective attitudes ld positively erage, so as to identify the optimal combination of components that c
Interaction design guidelines 169 and more recent DynamicCritiquing agents (Reilly et al. 2004; McCarthy et al. 2005c). The main advantage, as detailed in related literatures (Reilly et al. 2004; McCarthy et al. 2004b; McSherry 2004), is that system-suggested critiques can not only expose the knowledge of remaining recommendation opportunities, but also potentially accelerate the user’s critiquing process if they can correspond well to the user’s intended feedback criteria. An alternative critiquing mechanism does not propose pre-computed critiques, but provides a facility to stimulate users to freely create and combine critiques themselves (so called user-initiated critiquing support in this paper). As a typical application, the ExampleCritiquing agent has been developed for this goal, and its focus is showing examples and facilitating users to compose their self-initiated critiques (Pu and Kumar 2004). In essence, the ExampleCritiquing agent is capable of allowing users to choose which feature(s) to be critiqued and how to critique it (or them) under their own control. Previous work proved that it enabled users to obtain significantly higher decision accuracy and preference certainty, compared to non critiquing-based systems such as a ranked list (Pu and Kumar 2004; Pu and Chen 2005). In addition to characterizing the critiquing-based recommender system in terms of its nature of critiquing support (i.e., system-suggested critiques or user-initiated critiquing support), another important factor is the number of items that the system returns during each recommendation cycle for users to critique. For example, FindMe and DynamicCritiquing systems return one item, whereas ExampleCritiquing agents show multiple k items (e.g., k = 7) at a cycle. Multi-item display provides users a chance to choose the product to be critiqued after making a comparison between several options. Thus, there are in nature two crucial design components contained in a critiquingbased recommender system. One is its critiquing aid: suggesting critiques for users to select or aiding them to construct their own critiques. Another is the number of recommended items (called critiquing coverage in this paper): suggesting a single vs. multiple products for users to critique. The options are inherently related to different levels of user control in either the process of identifying the critiqued reference or the process of specifying concrete critiquing criteria. As a matter of fact, perceived behavioral control has been regarded as an important determinant of user beliefs and actual behavior (Ajzen 1991). In the context of e-commerce, it has been found to have a positive effect on customers’ attitudes including their perceived ease of use, perceived usefulness and trust (Novak et al. 2000; Koufaris and Hampton-Sosa 2002). User control has been also determined as one of the fundamental principles for general user interface design (Shneiderman 1997) and Web usability (Nielsen 1994). However, there are few works having studied the effect of locus of user initiative in critiquing-based recommender systems. There is indeed a complex tradeoff that underlies the successful design: giving users too much control may cause them to perform an unnecessary complex critiquing, whereas giving little or no control may force users to accept system-suggested items even though they do not match users’ truly-intended choices. The goal of this paper is therefore to investigate the different degrees of user control vs. system support in both critiquing aid and critiquing coverage, so as to identify the optimal combination of components that could positively influence users’ actual decision performance and subjective attitudes. 123
L Chen P Pu To achieve our goal, we have conducted a series of three trials. In our first user trial we compared two well-known critiquing-based recommender agents which respec tively represent a typical setup combination of critiquing coverage and critiquing aid Concretely, one is the Dynamic Critiquing system that shows one recommended prod- uct during each interaction cycle, accompanied by a user-initiated unit critiquing area d a list of system-suggested compound critiques. Another is the Example Critiquin system that returns multiple products in a display and stimulates users in building and composing critiques to one of the shown products in their self-motivated way. The experimental results show that the Example Critiquing agent achieved significantly higher decision accuracy (in terms of both objective and subjective measures) and users'behavioral intentions (i.e, intention to purchase and return), while requiring lower level of interaction and cognitive effort. In the second trial, we modified Example Critiquing and DynamicCritiquing to make their critiquing coverage (i.e, the number of recommended items during each cycle) constant and keep the difference only on their critiquing aids. The results surprisingly showed that there is no significant difference between the two modified versions in terms of both objective and subjective measures. Further analysis of participants comments revealed the pros and cons of system-suggested critiques and user-initiated critiquing support. Additionally, combining the results with the first trials, we found that giving users the choice of critiquing one of multiple items(as opposed to just one) has significantly positive impacts on increasing their decision accuracy and confidence particularly in the first recommendation cycle and saving objective effort in the later critiquing rounds The third user trial was conducted to measure users' performance in a hybrid cri tiquing system where system-suggested critiques and user-initiated critiquing aid was combined on one screen. Analyzing users'critiquing application frequency in such system shows that the application of user-initiated critiquing support in creating users own critiques is relatively higher than picking suggested critique options. Moreove the respective practical effects of user-initiated and system-suggested critiquing facil ities were identified. That is, they are both significantly contributive to improve users decision confidence and return intention, and system-suggested critiques are even effective in saving effort perception Therefore, all of our trial results infer that giving users multiple recommended products as critiqued options and providing them both system-suggested and user- initiated critiquing aids for specifying concrete critiquing criteria can obtain substantial benefits Another contribution of our work is that we have established a user -evaluation framework. It contains both objective variables such as decision accuracy, task com pletion time and interaction effort, and subjective measures like perceived cognitive ffort, decision confidence and trusting intentions. All of these factors are fundamen tally important, given that a recommender systems ultimate goal should be to allow its users to achieve high decision accuracy and build high trust in it, and require then to expend a minimal amount of effort to obtain these benefits(Haubl and Trifts 2000: Chen and Pu 2005; Pu and Chen 2005) The rest of this paper is organized as follows. We first introduce existing critiquing based recommender systems, with Dynamic Critiquing and Example Critiquing as two
170 L. Chen, P. Pu To achieve our goal, we have conducted a series of three trials. In our first user trial, we compared two well-known critiquing-based recommender agents which respectively represent a typical setup combination of critiquing coverage and critiquing aid. Concretely, one is the DynamicCritiquing system that shows one recommended product during each interaction cycle, accompanied by a user-initiated unit critiquing area and a list of system-suggested compound critiques. Another is the ExampleCritiquing system that returns multiple products in a display and stimulates users in building and composing critiques to one of the shown products in their self-motivated way. The experimental results show that the ExampleCritiquing agent achieved significantly higher decision accuracy (in terms of both objective and subjective measures) and users’ behavioral intentions (i.e., intention to purchase and return), while requiring lower level of interaction and cognitive effort. In the second trial, we modified ExampleCritiquing and DynamicCritiquing to make their critiquing coverage (i.e., the number of recommended items during each cycle) constant and keep the difference only on their critiquing aids. The results surprisingly showed that there is no significant difference between the two modified versions in terms of both objective and subjective measures. Further analysis of participants’ comments revealed the pros and cons of system-suggested critiques and user-initiated critiquing support. Additionally, combining the results with the first trial’s, we found that giving users the choice of critiquing one of multiple items (as opposed to just one) has significantly positive impacts on increasing their decision accuracy and confidence particularly in the first recommendation cycle and saving objective effort in the later critiquing rounds. The third user trial was conducted to measure users’ performance in a hybrid critiquing system where system-suggested critiques and user-initiated critiquing aid was combined on one screen. Analyzing users’ critiquing application frequency in such system shows that the application of user-initiated critiquing support in creating users’ own critiques is relatively higher than picking suggested critique options. Moreover, the respective practical effects of user-initiated and system-suggested critiquing facilities were identified. That is, they are both significantly contributive to improve users’ decision confidence and return intention, and system-suggested critiques are even effective in saving effort perception. Therefore, all of our trial results infer that giving users multiple recommended products as critiqued options and providing them both system-suggested and userinitiated critiquing aids for specifying concrete critiquing criteria can obtain substantial benefits. Another contribution of our work is that we have established a user-evaluation framework. It contains both objective variables such as decision accuracy, task completion time and interaction effort, and subjective measures like perceived cognitive effort, decision confidence and trusting intentions. All of these factors are fundamentally important, given that a recommender system’s ultimate goal should be to allow its users to achieve high decision accuracy and build high trust in it, and require them to expend a minimal amount of effort to obtain these benefits (Häubl and Trifts 2000; Chen and Pu 2005; Pu and Chen 2005). The rest of this paper is organized as follows. We first introduce existing critiquingbased recommender systems, with DynamicCritiquing and ExampleCritiquing as two 123
Interaction design guidelines representatives. According to their respective characteristics, we summarize two main elements that can be varied to reflect different degrees of user control. We then intro- duce a user evaluation framework with major dependent variables measured in our experiments. Detailed descriptions of three user-trials then follow, including their materials, recruited participants, experimental procedures, results analyses and dis- cussions. Finally, we conclude our work and indicate its practical implications and future directions 2 Critiquingbased recommender systems Our investigation of existing critiquing-based recommender systems revealed that they basically follow a similar interaction model(see Fig. 1). The user first specifies her initial preferences on product attributes. The system then returns one or multiple rec- ommended items. either the user selects an item as her final choice and terminates the interaction, or she makes critiques by picking system-suggested critiques or defining critiques herself. If critiques were made, the system updates the recommendation(s) and the list of suggested critiques (if provided) in the next interaction cycle. This process continues until the user decides that she has found her most preferred product Most of existing systems fall into two specific branches: one is called single-item system-suggested critiquing since it recommends one item at a time and guides users to provide feedback by selecting a system-suggested critique; another d k-item tion cycle and: quing, because it provides multiple items during each recommenda- and creating their self-specified critiquing criteria to the product. In the following, we will introduce both approaches in detail with two typical applications as examples 2. 1 Single-item system-suggested critiquing The FindMe system was the first known single-item system-suggested critiquing sys- tem(Burke et al. 1996, 1997). It uses knowledge about the product domain to help user One or multiple example outcomes are 1 displayed Iggested and/or are elicited critiquing Fig. 1 The typical interaction model of a critiquing-based recommender system
Interaction design guidelines 171 representatives. According to their respective characteristics, we summarize two main elements that can be varied to reflect different degrees of user control. We then introduce a user evaluation framework with major dependent variables measured in our experiments. Detailed descriptions of three user-trials then follow, including their materials, recruited participants, experimental procedures, results analyses and discussions. Finally, we conclude our work and indicate its practical implications and future directions. 2 Critiquing-based recommender systems Our investigation of existing critiquing-based recommender systems revealed that they basically follow a similar interaction model (see Fig. 1). The user first specifies her initial preferences on product attributes. The system then returns one or multiple recommended items. Either the user selects an item as her final choice and terminates the interaction, or she makes critiques by picking system-suggested critiques or defining critiques herself. If critiques were made, the system updates the recommendation(s) and the list of suggested critiques (if provided) in the next interaction cycle. This process continues until the user decides that she has found her most preferred product. Most of existing systems fall into two specific branches: one is called single-item system-suggested critiquing since it recommends one item at a time and guides users to provide feedback by selecting a system-suggested critique; another is called k-item user-initiated critiquing, because it provides multiple items during each recommendation cycle and a critiquing aid that assists users in choosing one product to be critiqued and creating their self-specified critiquing criteria to the product. In the following, we will introduce both approaches in detail with two typical applications as examples. 2.1 Single-item system-suggested critiquing The FindMe system was the first known single-item system-suggested critiquing system (Burke et al. 1996, 1997). It uses knowledge about the product domain to help users accept User s initial preferences are elicited One or multiple example outcomes are displayed No more effort required Systemsuggested and/or user-initiated critiquing Fig. 1 The typical interaction model of a critiquing-based recommender system 123
L Chen P Pu navigate through the multi-dimensional space. An important interface component in FindMe is called tweaking, which allows users to critique the current recommendation by selecting one of the proposed simple tweaks(e. g, ""cheaper", "bigger"an When a user finds the current recommendation short of her expectations and responds to a tweak, the remaining candidates will be filtered to leave only those candidates satisfying the tweak c The critique suggestions in FindMe are called unit critiques since each of them constrains on a single feature at a time. More recently, a so-called dynamic critiquing method(Reilly et al. 2004; McCarthy et al. 2004a)has been developed with the objective of automatically generating a set of compound critiques, each of which can operate over multiple features simultaneously (e. g, ""Different Manufacture, Lower Resolution and Cheaper). A live-user trial showed that the integration of the dynamic critiquing method can effectively reduce users' intention cycles from an average of 29 in purely applying unit critiques to 6(McCarthy et al. 2005c). The compound critiques can also perform as explanations, revealing the ng recommendation opportunities except for the current product(Reilly et al. 2005). Therefore, we use the Dynamic Critiquing system as the representative to illustrate the main components lat a single-item system--suggested critiquing system may comprise 2.1.1 Dynamic Critiquing Figure 2 shows a sample Dynamic Critiquing interface where both unit and compound critiques are available to users as feedback options(Reilly et al. 2004; McCarthy et al. 2005c). It can be seen that the Dynamic Critiquing interface mainly contains three components: a single item as the current recommendation, a unit critiquing area and a list of compound critiques In the first recommendation cycle, an item that best matches the user's initially stated preferences is returned, and then after each critiquing action, a new item that satisfies the user's critique as well as being most similar to the previous recommended product will be shown as the current recommendation In the unit critiquing area, the system determines a set of main features, one of which users can choose to critique at a time. For each numerical feature(e. g, price). two critiquing directions are provided: increasing the value (e.g, more expensive)or decreasing it(e.g, cheaper), and for discrete features(e. g, manufacture) all of the relevant options are displayed under a drop-down menu. Therefore, this area performs more like a user-initiated unit critiquing support, rather than a limited small set of unit critique suggestions as in FindMe systems The list of three compound critiques are automatically computed by discoverin the recurring subsets of unit differences between the current recommended item and he remaining products using a data mining algorithm called Apriori(Agrawal et al 1993). More concretely, each remaining product, except the current recommendation, is first converted into a critique pattern indicating its differences from the current recommended product in terms of all main features(e.g,I(manufacture, =),(price ).)) Since there will be a number of critique patterns represent ing all of the remaining products, the Apriori algorithm is employed to discover the frequent association rules among features within these patterns. A set of compound
172 L. Chen, P. Pu navigate through the multi-dimensional space. An important interface component in FindMe is called tweaking, which allows users to critique the current recommendation by selecting one of the proposed simple tweaks (e.g., “cheaper”, “bigger” and “nicer”). When a user finds the current recommendation short of her expectations and responds to a tweak, the remaining candidates will be filtered to leave only those candidates satisfying the tweak. The critique suggestions in FindMe are called unit critiques since each of them constrains on a single feature at a time. More recently, a so-called dynamic critiquing method (Reilly et al. 2004; McCarthy et al. 2004a) has been developed with the objective of automatically generating a set of compound critiques, each of which can operate over multiple features simultaneously (e.g., “Different Manufacture, Lower Resolution and Cheaper”). A live-user trial showed that the integration of the dynamic critiquing method can effectively reduce users’ intention cycles from an average of 29 in purely applying unit critiques to 6 (McCarthy et al. 2005c). The compound critiques can also perform as explanations, revealing the remaining recommendation opportunities except for the current product (Reilly et al. 2005). Therefore, we use the DynamicCritiquing system as the representative to illustrate the main components that a single-item system-suggested critiquing system may comprise. 2.1.1 DynamicCritiquing Figure 2 shows a sample DynamicCritiquing interface where both unit and compound critiques are available to users as feedback options (Reilly et al. 2004; McCarthy et al. 2005c). It can be seen that the DynamicCritiquing interface mainly contains three components: a single item as the current recommendation, a unit critiquing area and a list of compound critiques. In the first recommendation cycle, an item that best matches the user’s initially stated preferences is returned, and then after each critiquing action, a new item that satisfies the user’s critique as well as being most similar to the previous recommended product will be shown as the current recommendation. In the unit critiquing area, the system determines a set of main features, one of which users can choose to critique at a time. For each numerical feature (e.g., price), two critiquing directions are provided: increasing the value (e.g., more expensive) or decreasing it (e.g., cheaper), and for discrete features (e.g., manufacture) all of the relevant options are displayed under a drop-down menu. Therefore, this area performs more like a user-initiated unit critiquing support, rather than a limited small set of unit critique suggestions as in FindMe systems. The list of three compound critiques are automatically computed by discovering the recurring subsets of unit differences between the current recommended item and the remaining products using a data mining algorithm called Apriori (Agrawal et al. 1993). More concretely, each remaining product, except the current recommendation, is first converted into a critique pattern indicating its differences from the current recommended product in terms of all main features (e.g., {(manufacture, =), (price, ),…}). Since there will be a number of critique patterns representing all of the remaining products, the Apriori algorithm is employed to discover the frequent association rules among features within these patterns. A set of compound 123
Interaction design guidelines found according to your preferences anon Powershot s2 Is Digital Camera[ Ads padm One recommended item ldjut your preferenees to find the right camera for you Manufaeturer xEnon +5JMEol 412 User-initiated unit Flash Memony+ 16MB criti LcD Sereen size Thiel nese 429 weighs 40479 have more matching cameras with the following 1. Less Cpbca Zoom an Thrner and Leter weigrt System-suggested mpound critiques Fig. 2 The DynamicCritiquing interface critique options(as the frequent association rules) will be then produced. For example supposing the occurrence of heavier laptops is highly frequently associated with the occurrence of cheaper prices in the remaining items, a compound critique with the form of ([weight >] [price <l)(i.e, heavier and cheaper)will be generated. Thus, the Dynamic Critiquing agent uses the Apriori algorithm to discover the highest recurring compound critiques representative of a given data set. It then favors those candidates with the lowest support values("support value "refers to the percentage of products that satisfy the critique). Such selection criterion was motivated by the fact that pre- senting critiques with lower support values provides a good balance between their kely applicability to the user and their ability to narrow the search(McCarthy et al 2004a,2005bc) In addition to functioning as critique suggestions, the dynamically generated com pound critiques have been also regarded as explanations exposing the recommendation opportunities that exist in the available products(McCarthy et al. 2004b; Reilly et al 2005). They may help users be familiar with the product domain and understand the relationship among different features within the alternatives. Users can be then stim- ulated to express more preferences or be prevented from making retrieval failures (Reilly et al. 2005) 2.2 K-item user-initiated critiquing Instead of suggesting pre-computed critiques for users to select, the purely user initiated critiquing approach focuses on showing examples and stimulating users to define critiques themselves. It does not limit the size of critiques a user can manip- ulate during each cycle, so that the user can post either unit or compound critiques
Interaction design guidelines 173 One recommended item User-initiated unit critiquing System-suggested compound critiques Fig. 2 The DynamicCritiquing interface critique options (as the frequent association rules) will be then produced. For example, supposing the occurrence of heavier laptops is highly frequently associated with the occurrence of cheaper prices in the remaining items, a compound critique with the form of {[weight >], [price <]} (i.e., heavier and cheaper) will be generated. Thus, the DynamicCritiquing agent uses the Apriori algorithm to discover the highest recurring compound critiques representative of a given data set. It then favors those candidates with the lowest support values (“support value” refers to the percentage of products that satisfy the critique). Such selection criterion was motivated by the fact that presenting critiques with lower support values provides a good balance between their likely applicability to the user and their ability to narrow the search (McCarthy et al. 2004a, 2005b,c). In addition to functioning as critique suggestions, the dynamically generated compound critiques have been also regarded as explanations exposing the recommendation opportunities that exist in the available products (McCarthy et al. 2004b; Reilly et al. 2005). They may help users be familiar with the product domain and understand the relationship among different features within the alternatives. Users can be then stimulated to express more preferences or be prevented from making retrieval failures (Reilly et al. 2005). 2.2 K-item user-initiated critiquing Instead of suggesting pre-computed critiques for users to select, the purely userinitiated critiquing approach focuses on showing examples and stimulating users to define critiques themselves. It does not limit the size of critiques a user can manipulate during each cycle, so that the user can post either unit or compound critiques 123
L Chen P Pu over any combination of features with freedom. In fact, the purpose of this type of critiquing support is to assist users in freely executing tradeoff navigation, which a process shown to improve users'decision accuracy and confidence(Pu and Kumar 2004: Pu and Chen 2005). The Expert Clerk(Shimazu 2001), ATA (Automated Travel Assistant)(Linden et al 1997) and Smart client( Pu and Faltings 2000) were all exam- ples of such systems. Nguyen et al. 2004 realized the idea mainly to support on-tour recommendations for mobile users Such system is mainly composed of two components: a recommender agent that computes a set of k items that best match the user's current preference model, and a critiquing component that allows the user to actively create critiquing criteria and then examine a new set of k tradeoff alternatives. Expert Clerk and atA display three items at a time. whereas Smart Client returned seven items in its recent versions. Users can select any of the displayed items and navigate to products that offer tradeoff potentials. As for the critiquing aid, ExpertClerk provides a natural language dialog to request for users'feedback, ATA stated that it developed a graphical interface but without detailed description, and Smart Client has constantly improved the usability of its critiquing facility through user evaluations. We have chosen a latest version of Smart Client, called Example Critiquing, to explain the typical constructs of a k-item user-initiated critiquing system 2. 2. 1 Example critiquing Smart Client was originally developed as an online preference-based search tool for finding flights(Pu and Faltings 2000: Torrens et al. 2002). Its elementary model is he example-and-critiquing interaction, which was subsequently applied to product catalogs of vacation packages, insurance policies, apartments, and more recent com- mercial products such as tablet PCs and digital cameras(Pu and Faltings 2004; Pu and Kumar 2004; Chen and Pu 2006) In the latest Example Critiquing system, the recommendation part can be further divided into two sub-components: the first set of recommendations computed accord ng to the user's initial preferences, and the set of tradeoff alternatives recommended after each critiquing process. For example, for product catalogs of digital cameras and tablet PCs, k items(e. g,k= 7)are displayed in both cases. The number k was determined according to(Faltings et al. 2004)that discussed the optimal number of displayed solutions based on catalog sizes In the critiquing panel(see Fig 3), three radio buttons are next to each main feature facilitating users to choose to"keep"its value, "improve"it, or accept a compromised value suggested by the system (i.e, via"Take any suggestion). In particular, users can freely compose compound critiques by combining criteria on any set of mul- tiple features. The interface also supports users to perform simple similari critiquing(e.g,""show similar products with this one")by just keeping all current values, or define concrete value improvements on features(for example, under the "Improve"dropdown menu of price, there are options"S100 cheaper", ""$200 cheaper This kind of critiquing support has been also named as tradeoff assistance in some elated literatures(Pu and Kumar 2004; Chen and Pu 2006), since it is in nature to
174 L. Chen, P. Pu over any combination of features with freedom. In fact, the purpose of this type of critiquing support is to assist users in freely executing tradeoff navigation, which is a process shown to improve users’ decision accuracy and confidence (Pu and Kumar 2004; Pu and Chen 2005). The ExpertClerk (Shimazu 2001), ATA (Automated Travel Assistant) (Linden et al. 1997) and SmartClient (Pu and Faltings 2000) were all examples of such systems. Nguyen et al.2004 realized the idea mainly to support on-tour recommendations for mobile users. Such system is mainly composed of two components: a recommender agent that computes a set of k items that best match the user’s current preference model, and a critiquing component that allows the user to actively create critiquing criteria and then examine a new set of k tradeoff alternatives. ExpertClerk and ATA display three items at a time, whereas SmartClient returned seven items in its recent versions. Users can select any of the displayed items and navigate to products that offer tradeoff potentials. As for the critiquing aid, ExpertClerk provides a natural language dialog to request for users’ feedback, ATA stated that it developed a graphical interface but without detailed description, and SmartClient has constantly improved the usability of its critiquing facility through user evaluations. We have chosen a latest version of SmartClient, called ExampleCritiquing, to explain the typical constructs of a k-item user-initiated critiquing system. 2.2.1 ExampleCritiquing SmartClient was originally developed as an online preference-based search tool for finding flights (Pu and Faltings 2000; Torrens et al. 2002). Its elementary model is the example-and-critiquing interaction, which was subsequently applied to product catalogs of vacation packages, insurance policies, apartments, and more recent commercial products such as tablet PCs and digital cameras (Pu and Faltings 2004; Pu and Kumar 2004; Chen and Pu 2006). In the latest ExampleCritiquing system, the recommendation part can be further divided into two sub-components: the first set of recommendations computed according to the user’s initial preferences, and the set of tradeoff alternatives recommended after each critiquing process. For example, for product catalogs of digital cameras and tablet PCs, k items (e.g., k = 7) are displayed in both cases. The number k was determined according to (Faltings et al. 2004) that discussed the optimal number of displayed solutions based on catalog sizes. In the critiquing panel (see Fig. 3), three radio buttons are next to each main feature, facilitating users to choose to “keep” its value, “improve” it, or accept a compromised value suggested by the system (i.e., via “Take any suggestion”). In particular, users can freely compose compound critiques by combining criteria on any set of multiple features. The interface also supports users to perform simple similarity-based critiquing (e.g., “show similar products with this one”) by just keeping all current values, or define concrete value improvements on features (for example, under the “Improve” dropdown menu of price, there are options “$100 cheaper”, “$200 cheaper”, etc.). This kind of critiquing support has been also named as tradeoff assistance in some related literatures (Pu and Kumar 2004; Chen and Pu 2006), since it is in nature to 123
Interaction design guidelines tatem argot c 的1材1 随m的,1m,22?m items(k=7) 品二 [ fnd slmA produets with bentar vaues than thi The product user selected to critique 2mM段1mMmm,1m User-initiated critiquing facility sam pes for creating unit or Osteal zoom compound Removable Flash Maman o te MB 2 Fig. 3 The Example Critiquing interfaces facilitate a user to specify tradeoff criteria: improving on one or several attributes that are important to her, while accepting compromised values on less important ones Tradeoff process involving only one feature(unit critique)or multiple features(com pound critique)are respectively termed as simple and complex tradeoffs by Pu and Kumar(2004) The search engine of computing recommended alternatives is adjusted for different decision environments. For configurable products, it employs sophisticated constraint satisfaction algorithms and models user preferences as soft constraints (Torrens et al 2002). For multi-attribute products, it is in theory grounded on the Weighted Additive sum rule (WADD), a compensatory decision strategy for explicitly resolving con cting values(Payne et al. 1993). As required by WADD, the users preferences are structured as a set of (attribute's acceptable value, relative importance)pairs After a user specifies her initial preferences, all alternatives will be ranked by their weighted utilities, and the top k items best matching the user's stated requirements will
Interaction design guidelines 175 k recommended items (k = 7) The product user selected to critique User-initiated critiquing facility for creating unit or compound critiques Fig. 3 The ExampleCritiquing interfaces facilitate a user to specify tradeoff criteria: improving on one or several attributes that are important to her, while accepting compromised values on less important ones. Tradeoff process involving only one feature (unit critique) or multiple features (compound critique) are respectively termed as simple and complex tradeoffs by Pu and Kumar (2004). The search engine of computing recommended alternatives is adjusted for different decision environments. For configurable products, it employs sophisticated constraint satisfaction algorithms and models user preferences as soft constraints (Torrens et al. 2002). For multi-attribute products, it is in theory grounded on the Weighted Additive sum rule (WADD), a compensatory decision strategy for explicitly resolving con- flicting values (Payne et al. 1993). As required by WADD, the user’s preferences are structured as a set of (attribute’s acceptable value, relative importance) pairs. After a user specifies her initial preferences, all alternatives will be ranked by their weighted utilities, and the top k items best matching the user’s stated requirements will 123
L Chen P Pu be returned. Among the initial set of recommendations, the user either accepts a result, or takes a near solution to activate the critiquing panel(by clicking on the button "Value Comparison"along with the product, see Fig 3). Once the critiquing criteria have been built in the critiquing panel, the system will refine the user's preference model and adjust the relative importance of all critiqued attributes (i.e, the weight of improved attribute(s)will be increased and that of compromised attribute(s)will be decreased) The search engine will then apply a combination of elimination-by-aspect(EBA)and ADD strategy(Payne et al. 1993). The combined strategy begins with EBA to first eliminate products that do not reach the minimal acceptable value (i.e, cutoff) of the improved attribute(s), and WADD is then applied to examine the remaining alternatives in more detail to select ones that best satisfy all of the user's tradeoff criteria. This example-and-critiquing process completes one cycle of interaction, and it continues as long as the user wants to refine the results. 3 Control variables In a summary, the components contained by both Dynamic Critiquing and E xample Critiquing can be categorized into two independent variables: the number of recom mendations that users could examine at a time based on which to perform critiquing, and the critiquing aid by which users could specify specific feedback criteria. As introduced before, two typical combinations of the two variables are single-item system-suggested critiquing and k-item user-initiated critiquing, but there should be more combination possibilities. In this section, we mainly discuss each variable's possible values 3.1 Critiquing coverage(the number of recommendations) Here we refer the critiquing coverage to the number of example products that are recommended to users for them to choose the final choice or critiqued object. In the Example Critiquing system, multiple examples are displayed during each recommen- dation cycle, because its objective is to stimulate users to make self-initiated critiques On the contrary, the lich system-suggested critiques are generated. This simple display strat egy has the advantage of not overwhelming users with too much information, but it deprives users of the right of choosing their own interested critiquing product, and potentially brings them the risk of engaging in a longer interaction session The critiquing coverage can be further separated into two sub-variables: the number of the first rounds recommendations right after users' initial preference specification (called Nr), and the number of items (i.e, tradeoff alternatives)in the later cycle afte each critiquing action(called NCR). The two numbers can be equal or different. For example, in Dynamic Critiquing and Example Critiquing, they are both equal to l or 7 It is also possible to set them differently, for example, NIR as I and NCR as 7 if users are only interested in one best matching product according to their initial preferences, but would like to see multiple alternatives comparable with their critiqued reference once critiquing a product
176 L. Chen, P. Pu be returned. Among the initial set of recommendations, the user either accepts a result, or takes a near solution to activate the critiquing panel (by clicking on the button “Value Comparison” along with the product, see Fig. 3). Once the critiquing criteria have been built in the critiquing panel, the system will refine the user’s preference model and adjust the relative importance of all critiqued attributes (i.e., the weight of improved attribute(s) will be increased and that of compromised attribute(s) will be decreased). The search engine will then apply a combination of elimination-by-aspect (EBA) and WADD strategy (Payne et al. 1993). The combined strategy begins with EBA to first eliminate products that do not reach the minimal acceptable value (i.e., cutoff) of the improved attribute(s), and WADD is then applied to examine the remaining alternatives in more detail to select ones that best satisfy all of the user’s tradeoff criteria. This example-and-critiquing process completes one cycle of interaction, and it continues as long as the user wants to refine the results. 3 Control variables In a summary, the components contained by both DynamicCritiquing and E xampleCritiquing can be categorized into two independent variables: the number of recommendations that users could examine at a time based on which to perform critiquing, and the critiquing aid by which users could specify specific feedback criteria. As introduced before, two typical combinations of the two variables are single-item system-suggested critiquing and k-item user-initiated critiquing, but there should be more combination possibilities. In this section, we mainly discuss each variable’s possible values. 3.1 Critiquing coverage (the number of recommendations) Here we refer the critiquing coverage to the number of example products that are recommended to users for them to choose the final choice or critiqued object. In the ExampleCritiquing system, multiple examples are displayed during each recommendation cycle, because its objective is to stimulate users to make self-initiated critiques. On the contrary, the FindMe and DynamicCritiquing agent only returns one product based on which system-suggested critiques are generated. This simple display strategy has the advantage of not overwhelming users with too much information, but it deprives users of the right of choosing their own interested critiquing product, and potentially brings them the risk of engaging in a longer interaction session. The critiquing coverage can be further separated into two sub-variables: the number of the first round’s recommendations right after users’ initial preference specification (called NIR), and the number of items (i.e., tradeoff alternatives) in the later cycle after each critiquing action (called NCR). The two numbers can be equal or different. For example, in DynamicCritiquing and ExampleCritiquing, they are both equal to 1 or 7. It is also possible to set them differently, for example, NIR as 1 and NCR as 7 if users are only interested in one best matching product according to their initial preferences, but would like to see multiple alternatives comparable with their critiqued reference once critiquing a product. 123