Artificial Intelligence Review (2005)24: 319-338 C Springer 2005 DOI10.1007/10462-005-9000-z Retrieval Failure and Recovery in Recommender Systems DAVID MCSHERRY School of Computing and Information Engineering. University of Ulster, Coleraine BT52 ISA, Northern Ireland, UK(Tel: +44(0)28 7032 4130, Fax:+44(0)28 7032 Abstract. In case-based reasoning(CBR) approaches to product recommendation, descriptions of the available products are stored in a case library and retrieved in response to a query representing the users requirements. We present an approach to recovery from the retrieval failures that often occur when the user's requirements are treated as constraints that must be satisfied. Failure to retrieve a matching case trig- gers a recovery process in which the user is invited to select from a recovery set of relaxations (or sub-queries) of her query that are guaranteed to succeed. The sug gested relaxations are ranked according to a simple measure of recovery cost defined n terms of the importance weights assigned to the query attributes. The recovery set for an unsuccessful query also serves as a guide to continued exploration of the prod uct space when none of the cases initially recommended by the system is acceptable to the user Keywords: case-based reasoning, recommender systems, retrieval failure, query 1. Introduction n spite of limitations such as lack of diversity in the recommended cases(e.g. McGinty and Smyth, 2003; McSherry, 2003c), similar y-based retrieval remains the dominant CBR approach to retrieval in recommender systems. However, it is increasingly common for the user's requirements to be treated initially as constraints that the recommended products must satisfy(Goker and Thompson, 2000: Bridge, 2002; Ricci et al., 2002; McSherry, 2004b: Thompson et al 2004). Typically these approaches rely on query relaxation to recover from the retrieval failures that occur when there is no product that satisfies all the user's requirements. We focus here on approaches in which relaxing a query means eliminating one or more constraints from the query rather than requiring the user to revise individual con- straints, for example as in Sermo(Bridge, 2002) In Ricci et al.s (2002)Intelligent Travel Recommender, the user is told how many results she will get, if any, by eliminating each of
DOI 10.1007/s10462-005-9000-z Artificial Intelligence Review (2005) 24:319–338 © Springer 2005 Retrieval Failure and Recovery in Recommender Systems DAVID MCSHERRY School of Computing and Information Engineering, University of Ulster, Coleraine BT52 1SA, Northern Ireland, UK (Tel: +44 (0)28 7032 4130; Fax: +44 (0)28 7032 4916; e-mail: dmg.mcsherry@ulster.ac.uk) Abstract. In case-based reasoning (CBR) approaches to product recommendation, descriptions of the available products are stored in a case library and retrieved in response to a query representing the user’s requirements. We present an approach to recovery from the retrieval failures that often occur when the user’s requirements are treated as constraints that must be satisfied. Failure to retrieve a matching case triggers a recovery process in which the user is invited to select from a recovery set of relaxations (or sub-queries) of her query that are guaranteed to succeed. The suggested relaxations are ranked according to a simple measure of recovery cost defined in terms of the importance weights assigned to the query attributes. The recovery set for an unsuccessful query also serves as a guide to continued exploration of the product space when none of the cases initially recommended by the system is acceptable to the user. Keywords: case-based reasoning, recommender systems, retrieval failure, query relaxation 1. Introduction In spite of limitations such as lack of diversity in the recommended cases (e.g. McGinty and Smyth, 2003; McSherry, 2003c), similarity-based retrieval remains the dominant CBR approach to retrieval in recommender systems. However, it is increasingly common for the user’s requirements to be treated initially as constraints that the recommended products must satisfy (Goker and Thompson, 2000; ¨ Bridge, 2002; Ricci et al., 2002; McSherry, 2004b; Thompson et al., 2004). Typically these approaches rely on query relaxation to recover from the retrieval failures that occur when there is no product that satisfies all the user’s requirements. We focus here on approaches in which relaxing a query means eliminating one or more constraints from the query rather than requiring the user to revise individual constraints, for example as in Sermo (Bridge, 2002). In Ricci et al.’s (2002) Intelligent Travel Recommender, the user is told how many results she will get, if any, by eliminating each of
DAVID MCSHERRY the constraints in her query. The Adaptive Place Advisor(Goker and Thompson, 2000; Thompson et al., 2004)is an in-car recommender system in which the selection of a constraint to be eliminated from an unsuccessful query, a suggestion that the user need not accept, is based on the systems current understanding of the user's personal priorities. The assistance that the user is given in these approaches is a clear improvement on traditional database approaches that force the user to adopt a trial-and-error approach to revising her query when there is no product that satisfies all her requirements(wilke et al., 1998). However, a problem that existing techniques often fail to address is that recovery from retrieval failure may not be possible by eliminating a single constraint(McSherry, 2003b, 2004b) Of course, if queries are incrementally elicited as in the Adap- tive Place Advisor, then recovery is always possible by eliminating the most recently elicited constraint. But often queries are not elicited incrementally, and even if recovery is possible by eliminating a sin- gle constraint, this may be a compromise that the user is not pre- pared to accept. In recent work we presented an approach to recovery from retrieval failure in which queries need not be elicited incremen- tally and there is no assumption that recovery is possible by eliminat ing a single constraint(McSherry, 2004b). The recovery process begins with an explanation of the retrieval failure that draws the user's atten- tion to combinations of features in her query for which there are no matching cases. The user is then guided in the selection of the most useful attribute. and associated constraint to be eliminated from her query at each stage of a mixed-initiative relaxation process. If not pre- pared to compromise on the attribute suggested for elimination at any stage. the user can select another attribute to be eliminated Go In this paper, we present an alternative and more direct approach recovery from retrieval failure in which the user is invited to selec from a recovery set of relaxations of her query that are guaranteed to succeed. We demonstrate the approach in a prototype recommender system called Front Case in which cases retrieved on successful com- pletion of the recovery process are ranked in order of similarity to initial query. In general, retrieved cases are presented to the user will depend on the systems overall recommendation strategy Adapted from techniques for providing cooperative responses to fail- a g database queries(e.g. Kaplan, 1982: Gaasterland et al.,1992: God also influ by recent work on the role of compromise in product recommendation
320 DAVID MCSHERRY the constraints in her query. The Adaptive Place Advisor (Goker and ¨ Thompson, 2000; Thompson et al., 2004) is an in-car recommender system in which the selection of a constraint to be eliminated from an unsuccessful query, a suggestion that the user need not accept, is based on the system’s current understanding of the user’s personal priorities. The assistance that the user is given in these approaches is a clear improvement on traditional database approaches that force the user to adopt a trial-and-error approach to revising her query when there is no product that satisfies all her requirements (Wilke et al., 1998). However, a problem that existing techniques often fail to address is that recovery from retrieval failure may not be possible by eliminating a single constraint (McSherry, 2003b, 2004b). Of course, if queries are incrementally elicited as in the Adaptive Place Advisor, then recovery is always possible by eliminating the most recently elicited constraint. But often queries are not elicited incrementally, and even if recovery is possible by eliminating a single constraint, this may be a compromise that the user is not prepared to accept. In recent work we presented an approach to recovery from retrieval failure in which queries need not be elicited incrementally and there is no assumption that recovery is possible by eliminating a single constraint (McSherry, 2004b). The recovery process begins with an explanation of the retrieval failure that draws the user’s attention to combinations of features in her query for which there are no matching cases. The user is then guided in the selection of the most useful attribute, and associated constraint, to be eliminated from her query at each stage of a mixed-initiative relaxation process. If not prepared to compromise on the attribute suggested for elimination at any stage, the user can select another attribute to be eliminated. In this paper, we present an alternative and more direct approach to recovery from retrieval failure in which the user is invited to select from a recovery set of relaxations of her query that are guaranteed to succeed. We demonstrate the approach in a prototype recommender system called Front Case in which cases retrieved on successful completion of the recovery process are ranked in order of similarity to the user’s initial query. In general, of course, the way in which the retrieved cases are presented to the user will depend on the system’s overall recommendation strategy. Adapted from techniques for providing cooperative responses to failing database queries (e.g. Kaplan, 1982; Gaasterland et al., 1992; Godfrey, 1997, 1998), our direct approach to recovery is also influenced by recent work on the role of compromise in product recommendation
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 321 (McSherry, 2003a, c, 2004a). Relaxations in the recovery set cover all possible sets of compromises that the user might be prepared to con- sider, at least without knowing the extent of the compromises involved The suggested relaxations are also ranked according to a measure of recovery cost defined in terms of the importance weights assigned to the query attributes. Another important benefit is that the recovery set for an unsuccessful query serves as a guide to continued exploration of the product space when none of the cases initially recommended by the system is acceptable to the user In Sections 2-4, we describe our new approach to recovery fror retrieval failure, investigate its feasibility in terms of cognitive load, and evaluate its performance on a well-known case library in the travel domain. In Section 5, we present a detailed account of the rec- ommendation process in Front Case and the assistance provided to the user when faced with retrieval failure. Related work is discussed in Section 6 and our conclusions are presented in Section 7 2. Recovery from Retrieval Failure We assume that the user's query 2, over a subset atts( @) of the case attributes, is represented as a set of constraints that the retrieved cases must satisfy. Depending on the attribute type, the constraint ca(g) ssociated with a given attribute a E atts(@) may be expressed, for example, in terms of a required value, a maximum or minimum value, or a range of acceptable values. We will refer to latts(@) as the length of th Definition 1: A given query Q1 is a sub-query of another query Q if atts(Q1)C atts(02)and ca(Q1)=Ca(02) for all a e atts(o1 If atts(@1)C atts(@2), we say that Q1 is a proper sub-query of @2 Given a query Q for which a retrieval failure has occurred, the aim of our direct approach to recovery from retrieval failure is to construct a recovery set of successful relaxations or sub-queries of Q that the user may be prepared to accept in spite of the compromises they involve. Recovery by query relaxation is always possible, in the worst case by eliminating all constraints from the unsuccessful query (or demoting them to soft constraints). As a failing query may have many successful sub-queries, our aim is to minimise the size of the
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 321 (McSherry, 2003a, c, 2004a). Relaxations in the recovery set cover all possible sets of compromises that the user might be prepared to consider, at least without knowing the extent of the compromises involved. The suggested relaxations are also ranked according to a measure of recovery cost defined in terms of the importance weights assigned to the query attributes. Another important benefit is that the recovery set for an unsuccessful query serves as a guide to continued exploration of the product space when none of the cases initially recommended by the system is acceptable to the user. In Sections 2–4, we describe our new approach to recovery from retrieval failure, investigate its feasibility in terms of cognitive load, and evaluate its performance on a well-known case library in the travel domain. In Section 5, we present a detailed account of the recommendation process in Front Case and the assistance provided to the user when faced with retrieval failure. Related work is discussed in Section 6 and our conclusions are presented in Section 7. 2. Recovery from Retrieval Failure We assume that the user’s query Q, over a subset atts(Q) of the case attributes, is represented as a set of constraints that the retrieved cases must satisfy. Depending on the attribute type, the constraint ca(Q) associated with a given attribute a ∈ atts(Q) may be expressed, for example, in terms of a required value, a maximum or minimum value, or a range of acceptable values. We will refer to |atts(Q)| as the length of the query. Definition 1: A given query Q1 is a sub-query of another query Q2 if atts(Q1) ⊆ atts(Q2) and ca(Q1) = ca(Q2) for all a ∈ atts(Q1). If atts(Q1) ⊂ atts(Q2), we say that Q1 is a proper sub-query of Q2. Given a query Q for which a retrieval failure has occurred, the aim of our direct approach to recovery from retrieval failure is to construct a recovery set of successful relaxations or sub-queries of Q that the user may be prepared to accept in spite of the compromises they involve. Recovery by query relaxation is always possible, in the worst case by eliminating all constraints from the unsuccessful query (or demoting them to soft constraints). As a failing query may have many successful sub-queries, our aim is to minimise the size of the
DAVID MCSHERRY recovery set while ensuring that it covers all possible sets of compro- mises that the user might be prepared to consider To illustrate our approach, Figure 1 shows all sub-queries of a query @24 involving four attributes a1, a2, a3, and a4. We denote by 234 the sub-query involving only attributes ar d a4, and by Q4 the sub-query involving only a3 and a4. We use a similar nota- tion for each of the other sub-queries except O, the empty query. We will assume in this example that all sub-queries of Q are successful queries except those that are marked'x'in Figure 1. As any query a sub-query of itself, Q 234 can be seen to have sixteen sub-queries, of which ten are successful Among the successful sub-queries of ol234 an obvious candidate be included in the recovery set is Q 4 as it is the only success- ful sub-query that involves only a single compromise. But if Q included in the recovery set, then there is no need to show the user any of its proper sub-queries. For example, although Q3 is also a successful sub-query, it involves the same compromise as Q4 and an additional compromise that Q 34+ does not involve. It is there fore reasonable to assume that the user would never choose Q in 1234 Q×Q×Q 34 Q× Figure 1. Sub-query lattice for an example query involving four attributes
322 DAVID MCSHERRY recovery set while ensuring that it covers all possible sets of compromises that the user might be prepared to consider. To illustrate our approach, Figure 1 shows all sub-queries of a query Q1234 involving four attributes a1, a2, a3, and a4. We denote by Q134 the sub-query involving only attributes a1, a3, and a4, and by Q34 the sub-query involving only a3 and a4. We use a similar notation for each of the other sub-queries except Ø, the empty query. We will assume in this example that all sub-queries of Q1234 are successful queries except those that are marked ‘×’ in Figure 1. As any query is a sub-query of itself, Q1234 can be seen to have sixteen sub-queries, of which ten are successful. Among the successful sub-queries of Q1234, an obvious candidate to be included in the recovery set is Q134 as it is the only successful sub-query that involves only a single compromise. But if Q134 is included in the recovery set, then there is no need to show the user any of its proper sub-queries. For example, although Q13 is also a successful sub-query, it involves the same compromise as Q134 and an additional compromise that Q134 does not involve. It is therefore reasonable to assume that the user would never choose Q13 in Figure 1. Sub-query lattice for an example query involving four attributes
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 323 preference to Q>, at least without knowing the extent of the com- promises involved Following the elimination of all the proper sub-queries of Q 34 th only remaining candidates among the successful sub-queries of @234 are 2-4 and Q-. It makes sense to include Q-+in the recovery set as the user may not be prepared to accept the compromise associated with Q 34. Inclusion of Q4 in the recovery set also means that Qcan be excluded as it involves an additional compromise. The successful sub-queries Q 34 and 024 that we have identified as successful relax ations that the user may wish to consider are in fact the maximally successful sub-queries of Q 234 in the sense of the following definition. Definition 2: A successful sub-query Q" of a given query Q is a max- imally successful sub-query of Q if there is no successful sub-query of e of which Q* is a proper sub-query For example, Q4 is a successful sub-query of @234 but not a max imally successful sub-query since @4 is also a successful sub-query In general, the recovery set for a failing query in our approach con- sists of all the maximally successful sub-queries of the failing query Definition 3: Given a failing query o, we define rs(@), the recovery set for g, to be the set of all maximally successful sub-queries of Q For the example Figure 1, rs(0234)=1034, 024. In the domain of personal computers, suppose that @24 is the query: lapto en size= 19. make=X and that its successful sub-queries are as shown in Figure 1. In front Case, the user would be informed that there is no match for her query and invited to select from the following sub-queries, both of which are guaranteed to succeed price <700, screen size= 19, make=X (Q 34) type=laptop, make=X (Q+) If not prepared to compromise on type, user may decide instead to compromise on price and screen size by selecting the sec- ond sub-query. In this case, the constraints associated with screen size are not eliminated in Front Case but demoted to soft con- straints and used to rank the retrieved cases in order of similarity to
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 323 preference to Q134, at least without knowing the extent of the compromises involved. Following the elimination of all the proper sub-queries of Q134 the only remaining candidates among the successful sub-queries of Q1234 are Q24 and Q2. It makes sense to include Q24 in the recovery set, as the user may not be prepared to accept the compromise associated with Q134. Inclusion of Q24 in the recovery set also means that Q2 can be excluded as it involves an additional compromise. The successful sub-queries Q134 and Q24 that we have identified as successful relaxations that the user may wish to consider are in fact the maximally successful sub-queries of Q1234 in the sense of the following definition. Definition 2: A successful sub-query Q∗ of a given query Q is a maximally successful sub-query of Q if there is no successful sub-query of Q of which Q∗ is a proper sub-query. For example, Q14 is a successful sub-query of Q1234 but not a maximally successful sub-query since Q134 is also a successful sub-query. In general, the recovery set for a failing query in our approach consists of all the maximally successful sub-queries of the failing query. Definition 3: Given a failing query Q, we define rs(Q), the recovery set for Q, to be the set of all maximally successful sub-queries of Q. For the example query in Figure 1, rs(Q1234) = {Q134, Q24}. In the domain of personal computers, suppose that Q1234 is the query: price ≤700,type=laptop, screen size=19, make=X and that its successful sub-queries are as shown in Figure 1. In Front Case, the user would be informed that there is no match for her query and invited to select from the following sub-queries, both of which are guaranteed to succeed: price ≤700,screen size=19, make=X (Q134) type=laptop, make=X (Q24) If not prepared to compromise on type, the user may decide instead to compromise on price and screen size by selecting the second sub-query. In this case, the constraints associated with price and screen size are not eliminated in Front Case but demoted to soft constraints and used to rank the retrieved cases in order of similarity to
324 DAVID MCSHERRY the user's initial query. A more detailed account of the recommenda tion process in Front Case is given in Section 5 As we show in the following theorem, whether or not a given sub query of a failing query Q is successful can be inferred from the recovery set for Q Lemma 1: Any sub-query Q1 of a successful query Q2 is also a suc- cessful query Proof: Any case that satisfies all the constraints in Q2 must also satisfy all the constraints in Q1 Theorem 1: A given sub-query @* of a failing query e is a successful query if and only if there exists goErs(Q) such that Q* is a sub-query of o Proof: If there exists QE rs(@) such that @* is a sub-query of Q°, then Q*isas by Lemma l Suppose now that Q is a successful sub-query of 0, and let @o be a successful sub-query of maximum length among the successful sub-queries of @(including Q* itself) of which Q* is a sub-query. Clearly Q is a maximally suc- cessful sub-query of @. We have established as required the existence ofQ∈rs(Q) such that Q* is a sub- query of g° Though also an important result in its own right, Theorem I con- firms our claim that the recovery set for a failing query e covers all possible sets of compromises that the user might be prepared to con- sider without knowing the extent of the compromises involved. Sup pose the user is prepared to consider the compromises associated with a successful sub-query @" that is not included in the recovery set for e. By Theorem l, the that Q* is a sub-query of o. As the set of compromises associated with Q must be a subset of the compromises associated with Q*, it is reasonable to assume that the user would never select Q" in pref erence to g. Moreover, it can easily be shown that the recovery set is the smallest set of successful relaxations that covers all possible sets of compromises in this way. Our algorithm for finding all maximally successful sub-queries of a failing query, called Relax, is shown in Figure 2. Sub-Queries is a list f all sub-queries, in order of decreasing query length, of a query Q for which a retrieval failure has occurred. For each sub-query e1 for which there is a matching case in the case library, Relax adds Q1 to
324 DAVID MCSHERRY the user’s initial query. A more detailed account of the recommendation process in Front Case is given in Section 5. As we show in the following theorem, whether or not a given subquery of a failing query Q is successful can be inferred from the recovery set for Q. Lemma 1: Any sub-query Q1 of a successful query Q2 is also a successful query. Proof: Any case that satisfies all the constraints in Q2 must also satisfy all the constraints in Q1. Theorem 1: A given sub-query Q∗ of a failing query Q is a successful query if and only if there exists Q◦ ∈rs(Q) such that Q∗ is a sub-query of Q◦. Proof: If there exists Q◦ ∈ rs(Q) such that Q∗ is a sub-query of Q◦, then Q∗ is a successful query by Lemma 1. Suppose now that Q∗ is a successful sub-query of Q, and let Q◦ be a successful sub-query of maximum length among the successful sub-queries of Q (including Q∗ itself) of which Q∗ is a sub-query. Clearly Q◦ is a maximally successful sub-query of Q. We have established as required the existence of Q◦ ∈ rs(Q) such that Q∗ is a sub-query of Q◦. Though also an important result in its own right, Theorem 1 con- firms our claim that the recovery set for a failing query Q covers all possible sets of compromises that the user might be prepared to consider without knowing the extent of the compromises involved. Suppose the user is prepared to consider the compromises associated with a successful sub-query Q∗ that is not included in the recovery set for Q. By Theorem 1, the recovery set includes a sub-query Q◦ such that Q∗ is a sub-query of Q◦. As the set of compromises associated with Q◦ must be a subset of the compromises associated with Q∗, it is reasonable to assume that the user would never select Q∗ in preference to Q◦. Moreover, it can easily be shown that the recovery set is the smallest set of successful relaxations that covers all possible sets of compromises in this way. Our algorithm for finding all maximally successful sub-queries of a failing query, called Relax, is shown in Figure 2. Sub-Queries is a list of all sub-queries, in order of decreasing query length, of a query Q for which a retrieval failure has occurred. For each sub-query Q1 for which there is a matching case in the case library, Relax adds Q1 to
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 325 algorithm Relax(o, Subqueries) RS←φ while Subqueries>0 d Q1←fmst( SubQueries) Deletions∈-{g if @1 is a successful query RS UO1 for all O2E rest(Sub queries)do if @2 is a sub-query of 21 then Deletions Deletions U(023 Subqueries < SubQueries-Deletions return rs Figure 2. Algorithm for finding all maximally successful sub-queries of a given query. the recovery set and deletes any sub-query Q2 which is a sub-query of Q from the remaining list of candidate sub-queries 3. Recovery Set Size Cognitive load is an important consideration in any approach to recovery from retrieval failure. Showing the user only the maximally successful sub-queries of her query helps to reduce cognitive load in our approach, but for longer queries there may be several such sub queries In the following theorem, we establish an upper bound for the number of maximally successful sub-queries, and hence the size of the recovery set in our approach Lemma 2: For any failing query Q and distinct sub-queries 01, Q rs(2), @I is not a sub-query of 22. Proof: By definition of rs(@), there can be no successful sub-query of Q of which @1 is a proper sub-query
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 325 Figure 2. Algorithm for finding all maximally successful sub-queries of a given query. the recovery set and deletes any sub-query Q2 which is a sub-query of Q1 from the remaining list of candidate sub-queries. 3. Recovery Set Size Cognitive load is an important consideration in any approach to recovery from retrieval failure. Showing the user only the maximally successful sub-queries of her query helps to reduce cognitive load in our approach, but for longer queries there may be several such subqueries. In the following theorem, we establish an upper bound for the number of maximally successful sub-queries, and hence the size of the recovery set in our approach. Lemma 2: For any failing query Q and distinct sub-queries Q1, Q2 ∈ rs(Q), Q1 is not a sub-query of Q2. Proof: By definition of rs(Q), there can be no successful sub-query of Q of which Q1 is a proper sub-query
326 DAVID MCSHERRY Theorem 2: The size of the recovery set for an unsuccessful query Q can never be more than Ci/2 where k=Latts(Q)] and Lk/2] is the integer part of k/2 Proof: First we note that for any distinct sub-queries 01, 22E rs(Q), atts(@1)and atts(22) are incomparable subsets of atts(o). For example, if atts(Q1)C atts( @2) then @1 would be a sub-query of @2 But this cannot be the case by Lemma 2. That the size of the recov ery set for Q can never exceed the specified limit immediately follows from Sperner's(1928)proof that the maximum number of incompara ble subsets of any set of size m is mCIm/2) n For a query involving 4 attributes, like the example query in Fig- l, it follows from Theorem 2 that the size of the recovery set can never exceed*C2=6. Table I shows the corresponding limits for que ries ranging in length from 1 to 10. It can be seen that recovery-set sizes are always within reasonable limits for queries involving less than five or six attributes. The same cannot be said for the recovery-set sizes that are possible, at least in theory, for longer queries. However, we now present empirical evidence which suggests that recovery sets tend to be much smaller in practice that their maximum possible sizes. Our evaluation focuses on full-length queries on the Travel case library(www.ai-cbr.org),astandardbenchmarkcontainingthedescri ptions of more than 1000 holidays in terms of eight attributes such as price, region, and holiday type. Our experimental method is based on a leave-one-out approach in which we temporarily remove each case from the case library, present its description as a query to our recom mender system, and record the size of the recovery set for each unsuc cessful query. In order for a query to be successful, there must be at least one case with exactly matching values for all attributes except price. In addition, the price for a matching case must not exceed the value specified in the query; that is, the specified price is treated as a preferred maximum. Given the length of the queries used in our exper iment, it is perhaps not surprising that less than 3% of them were successful Table 1. Maximum recovery-set sizes for queries of increasing length Query length: 1 2 3 4 5 6 Maximum size:123610203570126252
326 DAVID MCSHERRY Theorem 2: The size of the recovery set for an unsuccessful query Q can never be more than kCk/2 where k = atts(Q) and k/2 is the integer part of k/2. Proof: First we note that for any distinct sub-queries Q1, Q2∈ rs(Q), atts(Q1) and atts(Q2) are incomparable subsets of atts(Q). For example, if atts(Q1)⊂ atts(Q2) then Q1 would be a sub-query of Q2. But this cannot be the case by Lemma 2. That the size of the recovery set for Q can never exceed the specified limit immediately follows from Sperner’s (1928) proof that the maximum number of incomparable subsets of any set of size m is mCm/2. For a query involving 4 attributes, like the example query in Figure 1, it follows from Theorem 2 that the size of the recovery set can never exceed 4C2 =6. Table 1 shows the corresponding limits for queries ranging in length from 1 to 10. It can be seen that recovery-set sizes are always within reasonable limits for queries involving less than five or six attributes. The same cannot be said for the recovery-set sizes that are possible, at least in theory, for longer queries. However, we now present empirical evidence which suggests that recovery sets tend to be much smaller in practice that their maximum possible sizes. Our evaluation focuses on full-length queries on the Travel case library (www.ai-cbr.org), a standard benchmark containing the descriptions of more than 1000 holidays in terms of eight attributes such as price, region, and holiday type. Our experimental method is based on a leave-one-out approach in which we temporarily remove each case from the case library, present its description as a query to our recommender system, and record the size of the recovery set for each unsuccessful query. In order for a query to be successful, there must be at least one case with exactly matching values for all attributes except price. In addition, the price for a matching case must not exceed the value specified in the query; that is, the specified price is treated as a preferred maximum. Given the length of the queries used in our experiment, it is perhaps not surprising that less than 3% of them were successful. Table 1. Maximum recovery-set sizes for queries of increasing length Query length: 1 2 3 4 5 6 7 8 9 10 Maximum size: 1 2 3 6 10 20 35 70 126 252
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 327 12345678910111213141516171819 Recovery- Set Size Figure 3. Recovery-set sizes for full-length queries on the Travel case library. + 1234567 10111213141516171819 Figure 4. Cumulative relative frequencies of recovery-set sizes for full-length queries on the Travel case library Figure 3 shows the relative frequency of each recovery-set size over all 996 queries that were unsuccessful. By Theorem 2, the maximum possible size of the recovery set for an unsuccessful query involving 8 attributes is 70. However, recovery-set sizes are much smaller than this for all the unsuccessful queries in our experiment. The recovery set sizes that occur most frequently are 4, 5, and 6 while age size over all unsuccessful queries is 7. The maximum recovery-set size of 19 was reached by only one of the unsuccessful queries in our experiment igure 4 shows the cumulative relative frequencies of recovery-set sizes in the range from I to 19 for the same experiment. For each recovery-set size from 1 to 19, it shows the percentage of unsuccess ful queries for which the recovery set contains less than or equal to that number of relaxations. Recovery-set sizes in the range from l to 6 can be seen to account for about 50% of the unsuccessful queries while more than 80% have recovery-set sizes in the range from 1 to 10. Another striking feature of the results when viewed in this way is that recovery-set size was less than 15 for more than 99% of the successful queries
RETRIEVAL FAILURE AND RECOVERY IN RECOMMENDER SYSTEMS 327 Figure 3. Recovery-set sizes for full-length queries on the Travel case library. Figure 4. Cumulative relative frequencies of recovery-set sizes for full-length queries on the Travel case library. Figure 3 shows the relative frequency of each recovery-set size over all 996 queries that were unsuccessful. By Theorem 2, the maximum possible size of the recovery set for an unsuccessful query involving 8 attributes is 70. However, recovery-set sizes are much smaller than this for all the unsuccessful queries in our experiment. The recoveryset sizes that occur most frequently are 4, 5, and 6 while the average size over all unsuccessful queries is 7. The maximum recovery-set size of 19 was reached by only one of the unsuccessful queries in our experiment. Figure 4 shows the cumulative relative frequencies of recovery-set sizes in the range from 1 to 19 for the same experiment. For each recovery-set size from 1 to 19, it shows the percentage of unsuccessful queries for which the recovery set contains less than or equal to that number of relaxations. Recovery-set sizes in the range from 1 to 6 can be seen to account for about 50% of the unsuccessful queries, while more than 80% have recovery-set sizes in the range from 1 to 10. Another striking feature of the results when viewed in this way is that recovery-set size was less than 15 for more than 99% of the unsuccessful queries
328 DAVID MCSHERRY 4. Ranking the successful relaxations We now consider two possible approaches to ranking the suggested relaxations in order of predicted acceptability to the user. As the size of the recovery set cannot be predicted in advance, ranking the suggested relaxations provides a simple solution to the problem that arises when restrictions such as available screen space in handheld devices limit the number of suggested relaxations that can be pre- sented to the user. As in the familiar k-NN strategy of showing the user the k most similar cases, the user can be shown the k relaxations that are predicted to be most acceptable Go One simple approach to ranking the suggested relaxations would be give priority to those that involve fewest compromises, but this may not be sufficiently discriminating if there are several relaxations of equal length. It also ignores the possibility that the user may be pre- pared to compromise only on less important attributes. In Front Case the predicted acceptability of a suggested relaxation is based instead on the importance weights typically assigned to the query attributes a CBR system. Often in CBR approaches to product recommenda- ion, the weights can be adjusted to reflect the personal priorities of the user Below we define a simple measure of the recovery cost associated with a suggested relaxation of an unsuccessful query Definition 4: The recovery cost associated with a successful sub-quer Q* of an unsuccessful query Q recovery-cost(2)= a∈at(Q)-atts(g·) where for each a E atts(@), Wa is the importance weight assigned to a That is, recovery cost is the sum of the importance weights of the eliminated attributes. For the example query @234 in Figure 1, if the importance weights assigned to a1, a2, a3, and a4 are 1, 4, 2, and 3 respectively then recovery-cost(Q24)=3 while recovery-cost(Q34)=4 According to our measure of recovery cost, the preferred relaxation is predicted to be Qeven though it involves one more compromise than 04. In Front Case, the suggested relaxations in the recovery set are ranked in order of increasing recovery cost. If all query attributes are of equal importance, this is equivalent to giving priority to sug- gested relaxations that involve the fewest compromises. In this case
328 DAVID MCSHERRY 4. Ranking the Successful Relaxations We now consider two possible approaches to ranking the suggested relaxations in order of predicted acceptability to the user. As the size of the recovery set cannot be predicted in advance, ranking the suggested relaxations provides a simple solution to the problem that arises when restrictions such as available screen space in handheld devices limit the number of suggested relaxations that can be presented to the user. As in the familiar k-NN strategy of showing the user the k most similar cases, the user can be shown the k relaxations that are predicted to be most acceptable. One simple approach to ranking the suggested relaxations would be to give priority to those that involve fewest compromises, but this may not be sufficiently discriminating if there are several relaxations of equal length. It also ignores the possibility that the user may be prepared to compromise only on less important attributes. In Front Case, the predicted acceptability of a suggested relaxation is based instead on the importance weights typically assigned to the query attributes in a CBR system. Often in CBR approaches to product recommendation, the weights can be adjusted to reflect the personal priorities of the user. Below we define a simple measure of the recovery cost associated with a suggested relaxation of an unsuccessful query. Definition 4: The recovery cost associated with a successful sub-query Q∗ of an unsuccessful query Q is: recovery-cost(Q∗ )= a∈atts(Q)−atts(Q∗) wa where for each a ∈atts(Q), wa is the importance weight assigned to a. That is, recovery cost is the sum of the importance weights of the eliminated attributes. For the example query Q1234 in Figure 1, if the importance weights assigned to a1, a2, a3, and a4 are 1, 4, 2, and 3 respectively then recovery-cost(Q24) = 3 while recovery-cost(Q134) = 4. According to our measure of recovery cost, the preferred relaxation is predicted to be Q24 even though it involves one more compromise than Q134. In Front Case, the suggested relaxations in the recovery set are ranked in order of increasing recovery cost. If all query attributes are of equal importance, this is equivalent to giving priority to suggested relaxations that involve the fewest compromises. In this case