Combination of Evidence in recommendation Systems Characterized by distance Functions Luis m. rocha Complex Systems Modeling Los Alamos National Laboratory, MS B256 Los Alamos, NM 87545, U.S.A. E-Mail: rocha(@lanl. gov Wwwhttp://www.c3.lanl.gov/-rocha ract - Recommendation systems for different Document amongst their components. Below we discuss how such distance works (DN) such as the World wide Web (www), Digital functions are used to characterize dn and for recommendation Libarries, or Scientific Databases, often make use of distance functions extracted from relationships among documents and 2. DISTANCE FUNCTIONS IN DOCUMENT NETWORKS between documents and semantic tags. For instance documents in the www are related via a hyperlink network, while documents in bibliographic databases are related by citation and collaboration 2.1 Harvesting Relations from Document Net For each dN we can identify several distinct relations such as keywords used to describe their content. The distance among documents and between documents and semantic tags functions computed from these relations establish associative used to classify documents appropriately. For instance networks among items of the DN, and allow recommendation documents in the www are related via a hyperlink network, systems to identify relevant associations for individual users. The while documents in bibliographic databases are related by process of recommendation can be improved by integrating citation and collaboration networks [11]. Furthermore, associative data from different sources. Thus we are presented with documents can be related to semantic tags such as keywords items) from different sources characterized by distance functions used to describe their content. Although all the technology and In this paper we summarize our work on(1) inferring associations the hypothesis here discussed would apply equally to any of from semi-metric distance functions and (2)combining evidence these relations extracted from DN, let us exemplify the problem from different (distance) associative DN with the datasets we have created for the Active RecommendationProject(arp)(http:arplanlgov),partofthe 1. RECOMMENDATION IN DOCUMENT NETWORKS Library Without Walls Project, at the Research Library of the Los Alamos National Laboratory [1; 8 The prime example of a Document Network(DN) is the ARP is engaged in research and development of World wide Web(www). But many other types of such recommendation systems for digital libraries. The information networks exist: bibliographic databases containing scientific resources available to arP are large databases with academic publications, preprints, internal reports, as well as databases of articles. These databases contain bibliographic, citation, and datasets used in scientific endeavors. Each of these databases sometimes abstract information about academic articles one of possesses several distinct relationships among documents and the databases we work with is SciSearch, containing articles between documents and semantic tags or indices that classify from scientific journals from several fields collected by ISI documents appropriately (Institute for Scientific Indexing). We collected all SciSearch dn typically function as information resources for data from the years of 1996 to 1999. There are 2, 915, 258 communities of users who query them to obtain relevant documents, from which we extracted 839, 297 keywords information for their activities. Resources such as the Internet, (semantic tags) that occurred at least in two distinct documents Digital Libraries, and the like have become ubiquitous in the We have compiled relational information between records and past decade, demanding the development of new techniques to keywords. This relation allows us to infer the semantic value of cater to the information needs of communities of users. These documents and the inter-associations between keywords. Such techniques come from the field of Information Retrieval, and are semantic relation is stored as a very sparse Keyword-Record typically known as Recommender Systems e.g. [6][5]3][16]. Matrix A. Each entry ai, in the matrix is boolean and indicates The algorithms we have developed in this area integrate whether keyword k, indexes(1)document d, or not(O).The evidence about the association amongst elements of dn, sources of keywords are the terms authors and/or editors chose amongst users, and about the interests of individual users and to categorize(index)documents, as well as title word their communities. In particular, a soft computing algorithm (Talk Mine) has been created to integrate such evidence and also 2. 2 Computing Associative Distance Functions adapt DN to the expectations of their users[ 15]. The process of To discern closeness between keywords according to the integration of knowledge in TalkMine requires the construction documents they classify, we compute the Keyword Semantic of distance functions on dN that characterize the associations Proximity(KSP), obtained from a by the following formula 0-7803-7280-8/02/1000@2002IEE
Combination of Evidence in Recommendation Systems Characterized by Distance Functions Luis M. Rocha Complex Systems Modeling Los Alamos National Laboratory, MS B256 Los Alamos, NM 87545, U.S.A. E-Mail: rocha@lanl.gov WWW: http://www.c3.lanl.gov/~rocha Abstract - Recommendation systems for different Document Networks (DN) such as the World Wide Web (WWW), Digital Libarries, or Scientific Databases, often make use of distance functions extracted from relationships among documents and between documents and semantic tags. For instance, documentsin the WWW are related via a hyperlink network, while documents inbibliographic databases are related by citation andcollaboration networks. Furthermore, documents can be related to semantic tags such as keywords used to describe their content. The distance functions computed from these relations establish associative networks among items of the DN, and allow recommendation systems to identify relevant associations for individual users. The process of recommendation can be improved by integrating associative data fromdifferentsources.Thus we are presented with a problem of combining evidence (about associations between items) from different sources characterized by distance functions. In this paper we summarize our work on (1) inferring associations from semi-metric distance functions and (2) combining evidence from different (distance) associative DN. 1. RECOMMENDATION IN DOCUMENT NETWORKS The prime example of a Document Network (DN) is the World Wide Web (WWW). But many other types of such networks exist: bibliographic databases containing scientific publications, preprints, internal reports, as well as databases of datasets used in scientific endeavors. Each of these databases possesses several distinct relationships among documents and between documents and semantic tags or indices that classify documents appropriately. DN typically function as information resources for communities of users who query them to obtain relevant information for their activities. Resources such as the Internet, Digital Libraries, and the like have become ubiquitous in the past decade, demanding the development of new techniques to cater to the information needs of communities of users. These techniques come fromthe field ofInformationRetrieval, and are typically known as Recommender Systems e.g. [6] [5] [3] [16]. The algorithms we have developed in this area integrate evidence about the association amongst elements of DN, amongst users, and about the interests of individual users and their communities. In particular, a soft computing algorithm (TalkMine) has been created to integrate such evidence and also adapt DN to the expectations of their users[15]. The process of integration of knowledge in TalkMine requires the construction of distance functions on DN that characterize the associations amongst their components.Belowwe discuss howsuch distance functions are used to characterize DN and for recommendation. 2. DISTANCE FUNCTIONS IN DOCUMENT NETWORKS 2.1 Harvesting Relations from Document Networks For each DN we can identify several distinct relations among documents and between documents and semantic tags used to classify documents appropriately. For instance, documents in the WWW are related via a hyperlink network, while documents in bibliographic databases are related by citation and collaboration networks [11]. Furthermore, documents can be related to semantic tags such as keywords used to describe their content. Although all the technology and the hypothesis here discussed would apply equally to any of these relations extracted from DN, let us exemplify the problem with the datasets we have created for the Active RecommendationProject (ARP) (http://arp.lanl.gov), part ofthe Library Without Walls Project, at the Research Library of the Los Alamos National Laboratory [1;8]. ARP is engaged in research and development of recommendation systems for digital libraries. The information resources available to ARP are large databases with academic articles. These databases contain bibliographic, citation, and sometimes abstract information about academic articles. One of the databases we work with is SciSearch, containing articles from scientific journals from several fields collected by ISI (Institute for Scientific Indexing). We collected all SciSearch data from the years of 1996 to 1999. There are 2,915,258 documents, from which we extracted 839,297 keywords (semantic tags) that occurred at least in two distinct documents. We have compiled relational information between records and keywords. This relation allows us to infer the semantic value of documents and the inter-associations between keywords. Such semantic relation is stored as a very sparse Keyword-Record Matrix A. Each entry ai,j in the matrix is boolean and indicates whether keyword ki indexes (1) document dj or not (0). The sources of keywords are the terms authors and/or editors chose to categorize (index) documents, as well as title words. 2.2 Computing Associative Distance Functions To discern closeness between keywords according to the documents they classify, we compute the Keyword Semantic Proximity (KSP), obtained from A by the following formula: 0-7803-7280-8/02/$10.00 ©2002 IEEE
distinct clusters, through these "gateway" nodes, are to be ∧a expected KSP(k;i, k, )=l Clearly, semi-metric behavior is a question of degree. For some pairs of keywords, the indirect distance provides a much shorter short-cut, a larger reduction of distance, than for others One way to capture this property of pairs of semi-metric keywords is to compute a semi-Imetric ratio The semantic proximity between two keywords, k, and k depends on Nn(k, k,), the number of documents both keyword direct ( ki, k index, and N(ki, k,), the number of documents either keyword indexes. Two keywords are near if they tend to index many of the same documents Table i lists the values of ksp for the 10 most common keywords in the arP dataset From the inverse of ksp we obtain a distance function between keywords s is positive and 2 I for semi-metric pairs. Given that larger graphs tend to show a much larger spread of distance, s tends to (2) increase with the number of keywords. Therefore, to be able to compare semi-metric behavior between different DN and their respective different sets of keywords, a relative semi-metric ratio is also used d is a distance function because it is a nonnegative, symmetric d eal-valued function such that d(k, k)=0[20]. It defines a weighted, non-directed distance graph d whose nodes are all of rs(ki, k, the keywords extracted from a given DN, and the edges are the values of d TABLE 1: KSP FOR 10 MOST FREQUENT KEYWORDS rs compares the semi-metric distance reduction to the maximum possible distance reduction, dmac, in graph D 0.02 0.02 0.16 0.08 0.02 0.09 0.11 0.07 0.03 Often, the direct distance between two keywords 0.021.000.030010.020.030.020.020.020.04 because they do not index any documents in common. As a 0.16 0.010.021.000.130.01 007 0.10 0.08 0.02 result, s and rs are also oo for these cases. Thus, s and rs are not 0.08 0.020.02 0.13 1.00 0.01 0.07 0.06 0.04 0.01 capable of discerning the degree of semi-metric behavior for 0.02 0.03 0.05 0.01 0.01 1.00 0.02 0.02 0.03 0.01 pairs that do not have a finite direct distance. To detect relevant 90.020.020.070.070021000.060.050.02 instances of this infinite semi-metric reduction we define the 0.110.020.010.100.060.020.061.000.030.02 0070.020.020.080.040030.050031000.01 below average ratio 3. METRIC BEHAVIOR The distance function d(eq. 2)is not an Euclidean metric where d, represents the average direct distance from k, to all k, because it may violate the triangle inequality: d(k,, k,)s dk, such that daire (k, k,)20. b measures how much an indirect k,)+d(k3, k)for some keyword ka. This means that the shortest distance falls below the average distance of all keywords distance between two keywords may not be the direct link but directly associated with a keyword. Of course, b can also be rather an indirect pathway in D. Such measures of distance referred to as semi-metrics deed, given that most social applied to pairs with finite semI-metric reduction We have used these three measures of semi-metric behavior and knowledge-derived networks possess Small-World behavior to analyze several types ofDN [17]. We have shown that s(k, k) [22], we expect that nodes which tend to be clustered in a local and rs(k, k,)are useful to infer the interests of a user associated neighborhood of related nodes, have large distances to nodes in with a collection of documents. Specifically, given a collection other clusters. But because of the existence of "gateway nodes of documents a user has retrieved, this measure identifies pairs relating nodes in different clusters (the small-world of keyterms highly correlated with the interests of the user as phenomenon), smaller indirect distances between nodes in implied by the collection, but which tend not to be simultaneously present in many documents. In other words, it identifies pairs of keyterms which represent well the entire collection(by being highly indirectly associated in the I This measure of closeness, formally, is a proximity collection of documents), but not many individual documents in relation [4 9] because it is a reflexive and symmetric fuzzy relation. the collection. Such pairs are properties of the network, but not Its transitive closure is known as a similarity relation (Ibid) of individual documents. This is clearly an important piece of 0-7803-7280-8/02/1000@2002IEE
( ) ( ) ( ) ( ) ( ) KSP k k a a a a N k k N k k i j i k j k k m i k j k k m i j i j , , , , , , , = Ù Ú = = = = å å 1 1 I U (1) ( ) ( ) d k k KSP k k i j i j , , = - 1 1 (2) cell studi system express protein model activ human rat patient 1.00 0.02 0.02 0.16 0.08 0.02 0.09 0.11 0.07 0.03 0.02 1.00 0.03 0.01 0.02 0.03 0.02 0.02 0.02 0.04 0.02 0.03 1.00 0.02 0.02 0.05 0.02 0.01 0.02 0.01 0.16 0.01 0.02 1.00 0.13 0.01 0.07 0.10 0.08 0.02 0.08 0.02 0.02 0.13 1.00 0.01 0.07 0.06 0.04 0.01 0.02 0.03 0.05 0.01 0.01 1.00 0.02 0.02 0.03 0.01 0.09 0.02 0.02 0.07 0.07 0.02 1.00 0.06 0.05 0.02 0.11 0.02 0.01 0.10 0.06 0.02 0.06 1.00 0.03 0.02 0.07 0.02 0.02 0.08 0.04 0.03 0.05 0.03 1.00 0.01 0.03 0.04 0.01 0.02 0.01 0.01 0.02 0.02 0.01 1.00 TABLE I: KSP FOR 10 MOST FREQUENT KEYWORDS ( ) ( ) ( ) s k k d k k d k k i j direct i j indirect i j , , , = (3) ( ) ( ) ( ) rs k k d k k d k k d i j direct i j indirect i j , , , max = - (4) ( ) ( ) b k k d d k k i j k indirect i j i , , = (5) The semantic proximity1 between two keywords, ki and kj , depends on N1 (ki , kj ), the number of documents both keywords index, and Nc (ki , kj ), the number of documents either keyword indexes. Two keywords are near if they tend to index many of the same documents. Table I lists the values of KSP for the 10 most common keywords in the ARP dataset. From the inverse of KSP we obtain a distance function between keywords: d is a distance function because it is a nonnegative, symmetric real-valued function such that d(k, k) = 0 [20]. It defines a weighted, non-directed distance graph D whose nodes are all of the keywords extracted from a given DN, and the edges are the values of d. 3. METRIC BEHAVIOR The distance function d (eq. 2) is not an Euclidean metric because it may violate the triangle inequality: d(k1 , k2 ) # d(k1 , k3 ) + d(k3 , k2 ) forsome keyword k3 . This meansthat the shortest distance between two keywords may not be the direct link but rather an indirect pathway in D. Such measures of distance are referred to as semi-metrics [2]. Indeed, given that most social and knowledge-derived networks possess Small-World behavior [22], we expect that nodes which tend to be clustered in a local neighborhood of related nodes, have large distancesto nodesin other clusters. But because of the existence of “gateway” nodes relating nodes in different clusters (the small-world phenomenon), smaller indirect distances between nodes in distinct clusters, through these “gateway” nodes, are to be expected. Clearly, semi-metric behavior is a question of degree. For some pairs of keywords, the indirect distance provides a much shorter short-cut, a larger reduction of distance, than for others. One way to capture this property of pairs of semi-metric keywords is to compute a semi-metric ratio: s is positive and $ 1 for semi-metric pairs. Given that larger graphs tend to show a much larger spread of distance, s tends to increase with the number of keywords. Therefore, to be able to compare semi-metric behavior between different DN and their respective different sets of keywords, a relative semi-metric ratio is also used: rs comparesthe semi-metric distance reduction to the maximum possible distance reduction, dmax, in graph D. Often, the direct distance between two keywords is 4 because they do not index any documents in common. As a result, s and rs are also 4 for these cases. Thus, s and rs are not capable of discerning the degree of semi-metric behavior for pairs that do not have a finite direct distance. To detect relevant instances of this infinite semi-metric reduction, we define the below average ratio: where represents the average direct distance from ki dk to all kj i such that ddirect(k\i , kj ) $ 0. b measures how much an indirect distance falls below the average distance of all keywords directly associated with a keyword. Of course, b can also be applied to pairs with finite semi-metric reduction. We have used these three measures of semi-metric behavior to analyze several types of DN [17]. We have shown thats(ki , kj ) and rs(ki , kj ) are useful to infer the interests of a user associated with a collection of documents. Specifically, given a collection of documents a user has retrieved, this measure identifies pairs of keyterms highly correlated with the interests of the user as implied by the collection, but which tend not to be simultaneously present in many documents. In other words, it identifies pairs of keyterms which represent well the entire collection (by being highly indirectly associated in the collection of documents), but not many individual documentsin the collection. Such pairs are properties of the network, but not of individual documents. This is clearly an important piece of 1 This measure of closeness, formally, is a proximity relation [4;9] because it is a reflexive and symmetric fuzzy relation. Its transitive closure is known as a similarity relation (Ibid). 0-7803-7280-8/02/$10.00 ©2002 IEEE
knowledge to allow us to recommend to users documents which Where x is a set of available sets of elements X, e.g are similar to their interests implied by the entire collection of documents they have retrieved, but which may not be similar to X=K, D, Ui, where K is a set of keyterms, D a set of individual documents in the collection documents and U aset of users. R is a set of available relations We have also shown that s(k, k), rs(k, k), and b(k, k )are amongst the sets in X, e.g.R=(C(D, D), A(K, D)), where C Useful to identify trends in large collections of documents denotes a citation relation between the elements of the set of associated with many authors and/or users. When we deal with documents. and aa semantic relation between documents and large DN such as the ARP database discussed above, the keyterms, such as the keyterm-record matrix defined in section derived distance function, reflects myriad associations amongst 2. 1. Finally, 2 is a set of distance functions built from some keywords from a very heterogeneous collection of documents Instead of a smaller collection associated with a particular user, subset of relations inR, e.g. D=(dpi, where d, is the distar we deal with a collection of documents from multiple authors between keyterms such as the one defined by formula 4? nce and/or users. In this case. the semi-metric behavior measures pick up pairs of keyterms which tend not to co-occur in the same 4.2 Agent Recommendation Architecture documents, but are nonetheless highly indirectly associated in In our architecture of recommendation [16], users are also the distance graph D. We have shown that often, high semi- characterized as information resources, where X may contain metric behavior can be used to predict where a given community among other application-specific elements, the sets of is moving thematically. Specifically, high semi-metric pairs of documents previously retrieved by the user and their associated keyterms are good predictors that in subsequent years, keyterms. Notice that the same user may query information individual documents will appear which use those pairs directly. resources with very distinct sets of interests. For example,one Finally, we have shown that the behavior of s(k, k), rs(k, day a user may search databases as a biologist looking for k), and b(k, k )allows us to characterize the type of DN By scientific articles, and the next as a sports fan looking for game a collection of documents with many authors/users or if it is different"personalities", each one with its distinct history of more thematically coherent and thus associated with a single information retrieval defined by independent knowledge user or very coherent community contexts For details on these results please refer to [171, here we discuss The analysis of distance functions as mentioned in section 3 how to integrate distance information from different sources to provides a baseline recommendation feature [17]. Indeed,given improve recommendation each knowledge context of a user or a larger information resource, we can infer what are the important associated topics 4. INFORMATION RESOURCES AND USERS and trends. But these knowledge contexts and respectiv distance functions can additionally be used in integrative 4.1 Knowledge Context algorithms useful for fine tuning the present interests of users, Clearly, many other types of distance functions can be as well as adapt all the knowledge contexts accessed according defined on the elements of a DN. Distance functions applied to to user behavior. Such recommendation algorithms, instantiate citation structures or collaboration networks, will require an automated conversation fabric amongst a population of users distinct semantic considerations than those used for keyword and a set of information resources [15]. Each user accesses the sets of these distance functions. Indeed, the collection of all agent for the user as it engages in automated conversations with relevant associative distance functions from a DN, is an the agents of other users and the information resources [18] expression of the particular knowledge it conveys to its [16]. This process relies on the integration ofevidence about the community of users as an information resource nterests of users implied by distinct distance graphs as Notice that distinct information resources typically share discussed below very large set of keywords and documents. However, these ar organized differently in each resource, leading to different 5. EVIDENCE FROM DIFFERENT KNOWLEDGE CONTEXTS collections of relational information Indeed. each resour tailored to a particular community of users, with a distinct 5.1 Describing User Interest with Evidence Sets history of utilization and deployment of information by its Humans use language to communicate categories of objects authors and users. For instance, the same keywords will be in the world. But such linguistic categories are notoriously related to different sets of documents in distinct resources. context-dependent[7][14], which makes it harder for computer Therefore, we refer to the relational information of each programs to grasp the real interests of users. In information information resource as a Knowledge Context [15]. More retrieval we tend to use keyterms to describe the content of specifically, we characterize an information resource R with a documents, and sets ofkeyterms to describe the present interests structure named Knowledge context of a given user at a particular time(e.g. a web search) KNR=X, R, Df One of the advantages of using the knowledge contexts in our recommendation architecture is that the same keyterms can 0-7803-7280-8/02/1000@2002IEE
KNR = {X,R,D} (7) knowledge to allow usto recommend to users documents which are similar to their interests implied by the entire collection of documents they have retrieved, but which may not be similar to individual documents in the collection. We have also shown that s(ki , kj ), rs(ki , kj ), and b(ki , kj ) are useful to identify trends in large collections of documents associated with many authors and/or users. When we deal with large DN such as the ARP database discussed above, the derived distance function, reflects myriad associations amongst keywords from a very heterogeneous collection of documents. Instead of a smaller collection associated with a particular user, we deal with a collection of documents from multiple authors and/or users. In this case, the semi-metric behavior measures pick up pairs of keyterms which tend not to co-occur in the same documents, but are nonetheless highly indirectly associated in the distance graph D. We have shown that often, high semimetric behavior can be used to predict where a given community is moving thematically. Specifically, high semi-metric pairs of keyterms are good predictors that in subsequent years, individual documents will appear which use those pairs directly. Finally, we have shown that the behavior of s(ki , kj ), rs(ki , kj ), and b(ki , kj ) allows us to characterize the type of DN. By analyzing the semi-metric behavior of a DN, we can infer if it is a collection of documents with many authors/users or if it is more thematically coherent and thus associated with a single user or very coherent community. For details on these results please refer to [17], here we discuss how to integrate distance information from different sources to improve recommendation. 4. INFORMATION RESOURCES AND USERS 4.1 Knowledge Context Clearly, many other types of distance functions can be defined on the elements of a DN. Distance functions applied to citation structures or collaboration networks, will require distinct semantic considerations than those used for keyword sets. In any case, we characterize an information resource with sets of these distance functions. Indeed, the collection of all relevant associative distance functions from a DN, is an expression of the particular knowledge it conveys to its community of users as an information resource. Notice that distinct information resources typically share a very large set of keywords and documents. However, these are organized differently in each resource, leading to different collections of relational information. Indeed, each resource is tailored to a particular community of users, with a distinct history of utilization and deployment of information by its authors and users. For instance, the same keywords will be related to different sets of documents in distinct resources. Therefore, we refer to the relational information of each information resource as a Knowledge Context [15]. More specifically, we characterize an information resource R with a structure named Knowledge Context: Where X is a set of available sets of elements Xi , e.g. X = {K, D, U}, where K is a set of keyterms, D a set of documents, and U a set of users.Ris a set of available relations amongst the sets in X, e.g. R = {C(D, D), A(K, D)}, where C denotes a citation relation between the elements of the set of documents, and A a semantic relation between documents and keyterms, such as the keyterm-record matrix defined in section 2.1. Finally, D is a set of distance functions built from some subset of relations in R, e.g. D = {dk}, where dk is the distance between keyterms such as the one defined by formula (2). 4.2 Agent Recommendation Architecture In our architecture of recommendation [16], users are also characterized as information resources, where X may contain, among other application-specific elements, the sets of documents previously retrieved by the user and their associated keyterms. Notice that the same user may query information resources with very distinct sets of interests. For example, one day a user may search databases as a biologist looking for scientific articles, and the next as a sports fan looking for game scores. Therefore, the ARP architecture allows users to define different “personalities”, each one with its distinct history of information retrieval defined by independent knowledge contexts. The analysis of distance functions as mentioned in section 3, provides a baseline recommendation feature [17]. Indeed, given each knowledge context of a user or a larger information resource, we can infer what are the important associated topics and trends . But these knowledge contexts and respective distance functions can additionally be used in integrative algorithms useful for fine tuning the present interests of users, as well as adapt all the knowledge contexts accessed according to user behavior. Such recommendation algorithms, instantiate an automated conversation fabric amongst a population of users and a set of information resources [15]. Each user accesses the set of information resources via a browser that functions as an agent for the user asit engagesin automated conversations with the agents of other users and the information resources [18] [16]. This processrelies on the integration of evidence about the interests of users implied by distinct distance graphs as discussed below. 5. EVIDENCE FROM DIFFERENT KNOWLEDGE CONTEXTS 5.1 Describing User Interest with Evidence Sets Humans use language to communicate categories of objects in the world. But such linguistic categories are notoriously context-dependent [7] [14], which makesit harder for computer programs to grasp the real interests of users. In information retrieval we tend to use keyterms to describe the content of documents, and sets of keytermsto describe the present interests of a given user at a particular time (e.g. a web search). One of the advantages of using the knowledge contexts in our recommendation architecture is that the same keyterms can 0-7803-7280-8/02/$10.00 ©2002 IEEE
be differently associated in different information resources The basic set operations of complementation, intersection, Indeed the distance functions of knowledge contexts allow us and union have been defined and establish a belief-constrained to regard these as connectionist memory systems [15][16]. This approximate reasoning theory of which fuzzy approximate way, the same set of keyterms describing the present interests reasoning and traditional set operations are special cases [13] (or search) of a user, is associated with different sets of other [14]. Measures of uncertainty have also been defined for keyterms in distinct knowledge contexts. Thus, the interests of evidence sets. The total uncertainty of an evidence set A is the user are also context-dependent when several information defined by: U(A)=(IF(A), IN(A), IS(A). The three indices of resources are at stake uncertainty,which vary between I and 0, IF iciness), IN In this setting. the objective of a recommendation system that (nonspecificity), and /S(conflict)were introduced in [13]. IF is takes as input the present interest of a user, is to select and based on [23[24 and Klir and Yuan [4] measure of fuzziness ntegrate the appropriate contexts, or perspectives, from the IN is based on the Hartley measure, and ls on the Shannon several ways the user interests are constructed in each entropy as extended by Klir(1993)into the DST framework information resource. We have developed an algorithm named Talk Mine which implements the selective communication fabric 5. 2 Inferring User Interest in Different Knowledge Contexts necessary for this integration [141[151[16] Fundamental to the TalkMine algorithm is the integration of TalkMine uses a set structure named evidence set [12][14], information from different knowledge contexts into an evidence an extension of a fuzzy set [25], to model the interests of users set, representing the category of topics( described by keywords) defined as categories, or weighted sets of keyterms. Evidence a user is interested at a particular time. Thus, the keywords the sets are set structures which provide interval degrees of user employs to describe her interests or in a search, need to be membership, weighted by the probability constraint of the"decoded into appropriate key words for each information Dempster-Shafer Theory of Evidence (DST)[ 19]. They are resource: the perspective of each knowledge context defined by two complementary dimensions: membership and The present interests of each user can be described by a set belief. The first represents an interval type-2) fuzzy degree of of keywords P=(k1,",k, Using these keywords and the membership, and the second a degree of belief on that keyword distance function (2)of the several knowledge contexts membership. Specifically, an evidence set A of X, is defined for involved, we want to infer the interests of the user as"seen all x e X, by a membership function of the form from the several knowledge contexts involved Let us assume that r knowledge contexts R, are involved in A(x)→( addition to one from the user herself. The set of keywords where B[0, 1] is the set of all possible bodies of evidence contained in all the participating knowledge contexts is denoted ( mr)on I, the set ofall subintervals of [0, 1]. Such bodies of by K. do is the distance function of the knowledge context of the user,while dI.d, are the distance functions from each of the evidence are defined by a basic probability assignment mr on 1, other knowledge contexts. For each knowledge context R, and Each interval of membership I, represents the degree of interest fiery set FLu is calculated using df. 'kp ), a spreading for every x in X. mportance of a particular element x of X(e.g. a keyterm)in category A(e.g. the interests of a user)according to a particular perspective(e.g. a particular database), defined by evidential -ad, k k weight mr(I ) Thus, the membership of each element x of an Fi,m(k) EkER,=Lr,u=LP(8) evidence set A is defined by distinct intervals representing different perspectives s This fuzzy set contains the keywords of R, which are closer an e to k, according to an exponential function of d F m(l3) reads the interest of the user in k to keywords of R, that are ear according to d. The parameter a controls the spread of the exponential function. Because the knowledge context contains a different d, each FLu is also a different fuzzy set for the same k, possibly even containing keywords that do not ex in other knowledge contexts. There exist a total of n=rp m(1) spreading interest fuzzy sets FL given knowledge context and p keyterms in the user's present interests 5.3 The Linguistic "And/OR"Combination Figure 1: Evidence Set with 3 Perspectives Since each knowledge context produces a distinct fuzzy set, we need a procedure for integrating several of these fuzzy sets into an evidence set to obtain the integrated representation of user interests we desire. We have proposed such a procedure [16] 0-7803-7280-8/02/1000@2002IEE
Figure 1: Evidence Set with 3 Perspectives ( ( ) ) Ft u k e k R t r u p d k k t t u , . , ( ) = max , , ... , ... é ë ê ê ù û ú ú " Î = = -a e 2 1 1 (8) be differently associated in different information resources. Indeed, the distance functions of knowledge contexts allow us to regard these as connectionist memory systems [15] [16]. This way, the same set of keyterms describing the present interests (or search) of a user, is associated with different sets of other keyterms in distinct knowledge contexts. Thus, the interests of the user are also context-dependent when several information resources are at stake. In thissetting, the objective of a recommendation systemthat takes as input the present interest of a user, is to select and integrate the appropriate contexts, or perspectives, from the several ways the user interests are constructed in each information resource. We have developed an algorithm named TalkMine which implementsthe selective communication fabric necessary for this integration [14] [15] [16]. TalkMine uses a set structure named evidence set [12] [14], an extension of a fuzzy set [25], to model the interests of users defined as categories, or weighted sets of keyterms. Evidence sets are set structures which provide interval degrees of membership, weighted by the probability constraint of the Dempster-Shafer Theory of Evidence (DST) [19]. They are defined by two complementary dimensions: membership and belief. The first represents an interval (type-2) fuzzy degree of membership, and the second a degree of belief on that membership. Specifically, an evidence set A of X, is defined for all x 0 X, by a membership function of the form: A(x) ˇ (F x , m x ) 0 B[0, 1] where B[0, 1] is the set of all possible bodies of evidence (F x , m x ) on I, the set of allsubintervals of [0,1]. Such bodies of evidence are defined by a basic probability assignment m x on I, for every x in X. Each interval of membership Ij x represents the degree of importance of a particular element x of X (e.g. a keyterm) in category A (e.g. the interests of a user) according to a particular perspective (e.g. a particular database), defined by evidential weight m x ( Ij x ). Thus, the membership of each element x of an evidence set A is defined by distinct intervals representing different perspectives. The basic set operations of complementation, intersection, and union have been defined and establish a belief-constrained approximate reasoning theory of which fuzzy approximate reasoning and traditional set operations are special cases [13] [14]. Measures of uncertainty have also been defined for evidence sets. The total uncertainty of an evidence set A is defined by: U(A) = (IF(A), IN(A), IS(A)). The three indices of uncertainty, which vary between 1 and 0, IF (fuzziness), IN (nonspecificity), and IS (conflict) were introduced in [13]. IF is based on [23] [24] and Klir and Yuan [4] measure of fuzziness. IN is based on the Hartley measure, and IS on the Shannon entropy as extended by Klir (1993) into the DST framework. 5.2 Inferring User Interest in Different Knowledge Contexts Fundamental to the TalkMine algorithm is the integration of information fromdifferent knowledge contextsinto an evidence set, representing the category oftopics(described by keywords) a user is interested at a particular time. Thus, the keywords the user employs to describe her interests or in a search, need to be “decoded” into appropriate keywords for each information resource: the perspective of each knowledge context. The present interests of each user can be described by a set of keywords P u = {k1 , ˛, kp}. Using these keywords and the keyword distance function (2) ofthe several knowledge contexts involved, we want to infer the interests of the user as “seen” from the several knowledge contexts involved. Let us assume that r knowledge contexts Rt are involved in addition to one from the user herself. The set of keywords contained in all the participating knowledge contextsis denoted by K. d0 isthe distance function of the knowledge context of the user, while d1 ...dr are the distance functions from each of the other knowledge contexts. For each knowledge context Rt and each keyword ku in the user’s P u = {k1 , ˛, kp}, a spreading interest fuzzy set Ft,u is calculated using dt : This fuzzy set contains the keywords of Rt which are closer than , to ku , according to an exponential function of dt . Ft,u spreads the interest of the user in ku to keywords of Rt that are near according to dt . The parameter " controls the spread of the exponential function. Because the knowledge context Rt contains a different dt , each Ft,u is also a different fuzzy set for the same ku , possibly even containing keywordsthat do not exist in other knowledge contexts. There exist a total of n = r.p spreading interest fuzzy sets Ft,u given r knowledge context and p keyterms in the user's present interests. 5.3 The Linguistic "And/OR" Combination Since each knowledge context produces a distinct fuzzy set, we need a procedure for integrating several of these fuzzy sets into an evidence set to obtain the integrated representation of user interests we desire. We have proposed such a procedure [16] 0-7803-7280-8/02/$10.00 ©2002 IEEE
based on Turksen's [21]combination of Fuzzy Sets into Interval and/or" by giving the strongest belief weight to"and" and Valued Fuzzy Sets(VFS). Turksen proposed that fuzzy logic the weakest to"or". It expresses: I am interested in x and y compositions could be represented by IVFS's given by the to a higher degree, but I am also interested in x or y to a lower interval obtained from a compositions Disjunctive Normal degree Form(DNF)and Conjucntive Normal Form(CNF): DNF, Finally, formula(10)can be easily generalized for a CNF]. We note that in fuzzy logic, for certain families of combination of n fuzzy sets F with probability constrained conjugate pairs of conjunctions and disjunctions, DNF C CNF. weights m Using Turksen's approach, the union and intersection of two uzzy sets F and F, result in the two following IVFS ES()=wE( m, my wEE(r)- Im, m\(1) I(x)=Fi(x)U F(x), Fi(x)U F(x) In Talk Mine, this formula is used to combine the n spreading (x)=F(x)∩F(x),F(x)∩F(x) interest Fuzzy Sets obtained from r knowledge context and DNF keyterms in P as described in section 5.2. The resulting evidence set Es(h)defined on %, represents the interests of the where, AU B=AUB, A u B=(4nB)u(anBu(nB), user inferred from spreading the initial interest set of keywords AN B=(AUB)N(AUB)(AUB), and A n B=4nB, for any two in the intervening knowledge contexts using their respective zzy sets A and B distance functions. The inferring process combines each FL with the"and/or "linguistic expression entailed by formula(11) Formulae(9)constitute a procedure for calculating the union Each FLu contains the keywords related to keyword k, in the and intersection IVFS from two fuzzy sets. /l describes the knowledge context R, that is, the perspective of R, on ku Thus, linguistic expression"Fi or F2, while /describes"Fi and ES(k)contains the"and/or"combination of all the perspectives F2-capturing both fuzziness and nonspecificity of the on each keyword k, E(k, ,k, from each knowledge context particular fuzzy logic operators employed, as Turksen suggested R [16]. However, in common language, often"and"is used as an As an example, without loss of generality, consider that the unspecified"and/or". In other words, what we mean by the initial interests of an user contain one single keyword ki, and statement"I am interested in x and y, is more correctly that the user is querying two distinct information resources R, understood as an unspecified combination of x and y with"x and R,. Two spreading interest fuzzy sets, F,and F2,are ory.This is particularly relevant for recommendation systems generated using d, and d, respectively, with probabilistic where it is precisely this kind of statement from users that we weights m=v, and m,=v,, say, with m,>m,to indicate that the wish to respond to user trusts R, more than R2. ES(h) is easily obtained straight One use of evidence sets is as representations of the from formula(10). This evidence set contains the keywords integration of both /l and /l" into a linguistic category that related to k, in R,"and/or"the keywords related to k, in R2 expresses this ambiguous"and/or". To make this combination taking into account the probabilistic weights attributed to R,and more general, assume that we possess an evidential weight m, R,. F, is the perspective of R, on k, and F, the perspective ofr and m2 associated with each F, and F2 respectively. These are on k, probabilistic weights (m, m2=1)which represent the strength we associate with each fuzzy set being combined. The linguistic 6 DISTANCE FUNCTIONS IN RECOMMENDATION SYSTEMS expression at stake now becomes"I am interested in x and y, but I value x more/less thany To combine all this information into The evidence set obtained in Section 5.3 with formulas (10) an evidence set we use the following procedure and(11) is a first cut at detecting the interests of a user in a set of information resources. We can compute a more tuned interest set of keywords using an interactive conversation process ES(x)=((ryU(), min(m, m2)).(rn(x), max(m, ma)))(10) between the user and the information resources being queried Such conversation is an uncertainty reducing process based on Nakamura and lwai's [10] IR system, which we extended to Because /u is the less restrictive com bination obtained Evidence Sets [14][16 with TalkMine by applying the maximum operator to the original fuzzy sets TalkMine is then an algorithm for obtaining a representation FIand F2, its evidential weight is acquired via the minimum of user interests in several information resources(including operator of the evidential weights associated with Fi and F2. other users). It works by combining the perspectives of each The reverse is true for /m. Thus, the evidence set obtained information resources on the user interests into an evidence set from(10)contains /l with the lowest evidence, and /m with which is fine-tuned by an automated conversation process with the highest. Linguistically, it describes the ambiguity of the the user's agent/browser[16]. The combination of perspectives 0-7803-7280-8/02/1000@2002IEE
IV x F x F x F x F x IV x F x F x F x F x DNF CNF DNF CNF U I U U I I ( ) ( ) ( ), ( ) ( ) ( ) ( ) ( ), ( ) ( ) = é ë ê ù û ú = é ë ê ù û ú 1 2 1 2 1 2 1 2 (9) ES(x) = { IV (x),min(m ,m ) , IV (x),max(m ,m ) } U I 1 2 1 2 (10) ( ) ( ) ES x IV x m m n IV x m m n F F i j F F i j i j i j ( ) ( ), min , , ( ), max , / / = - - ì í ï î ï ü ý ï þ ï U I 1 1 (11) based on Turksen's[21] combination of Fuzzy Setsinto Interval Valued Fuzzy Sets (IVFS). Turksen proposed that fuzzy logic compositions could be represented by IVFS's given by the interval obtained from a composition’s Disjunctive Normal Form (DNF) and Conjucntive Normal Form (CNF): [DNF, CNF]. We note that in fuzzy logic, for certain families of conjugate pairs of conjunctions and disjunctions, DNF f CNF. Using Turksen’s approach, the union and intersection of two fuzzy sets F1 and F2 result in the two following IVFS, respectively: where, A B A B , , CNF U = U A B (A B) ( A B) ( A B) DNF U = I U I U I A B (A B) ( A B) ( A B) , and , for any two CNF I = U I U I U A B A B DNF I = I fuzzy sets A and B. Formulae (9) constitute a procedure for calculating the union and intersection IVFS from two fuzzy sets. IVc describes the linguistic expression “F1 or F2 ", while IV1 describes “F1 and F2 ", – capturing both fuzziness and nonspecificity of the particular fuzzy logic operators employed, as Turksen suggested [16]. However, in common language, often “and” is used as an unspecified “and/or”. In other words, what we mean by the statement “I am interested in x and y”, is more correctly understood as an unspecified combination of “x and y” with “x or y”. This is particularly relevant for recommendation systems where it is precisely this kind of statement from users that we wish to respond to. One use of evidence sets is as representations of the integration of both IV c and IV1 into a linguistic category that expresses this ambiguous “and/or”. To make this combination more general, assume that we possess an evidential weight m1 and m2 associated with each F1 and F2 respectively. These are probabilistic weights(m1 + m2 = 1) which represent the strength we associate with each fuzzy set being combined. The linguistic expression atstake now becomes “I aminterested in x and y, but I value x more/lessthan y”. To combine all thisinformation into an evidence set we use the following procedure: Because IVc is the less restrictive combination, obtained by applying the maximum operator to the original fuzzy sets F1 and F2 , its evidential weight is acquired via the minimum operator of the evidential weights associated with F1 and F2 . The reverse is true for IV1 . Thus, the evidence set obtained from (10) contains IVc with the lowest evidence, and IV1 with the highest. Linguistically, it describes the ambiguity of the “and/or” by giving the strongest belief weight to “and” and the weakest to “or”. It expresses: “I am interested in x and y to a higher degree, but I am also interested in x or y to a lower degree”. Finally, formula (10) can be easily generalized for a combination of n fuzzy sets Fi with probability constrained weights mi : In TalkMine, thisformula is used to combine the n spreading interest Fuzzy Sets obtained from r knowledge context and p keyterms in P u as described in section 5.2. The resulting evidence set ES(k) defined on K, represents the interests of the user inferred from spreading the initial interest set of keywords in the intervening knowledge contexts using their respective distance functions. The inferring process combines each Ft,u with the “and/or” linguistic expression entailed by formula (11). Each Ft,u contains the keywords related to keyword ku in the knowledge context Rt , that is, the perspective of Rt on ku . Thus, ES(k) contains the “and/or” combination of all the perspectives on each keyword ku 0 {k1 , ˛, kp} from each knowledge context Rt . As an example, without loss of generality, consider that the initial interests of an user contain one single keyword k1 , and that the user is querying two distinct information resources R1 and R2 . Two spreading interest fuzzy sets, F1 and F2 , are generated using d1 and d2 respectively, with probabilistic weights m1= m2 to indicate that the user trusts R1 more than R2. ES(k) is easily obtained straight from formula (10). This evidence set contains the keywords related to k1 in R1 “and/or” the keywords related to k1 in R2 , taking into account the probabilistic weights attributed to R1 and R2 . F1 is the perspective of R1 on k1 and F2 the perspective of R2 on k1 . 6 DISTANCE FUNCTIONS IN RECOMMENDATION SYSTEMS The evidence set obtained in Section 5.3 with formulas (10) and (11) is a first cut at detecting the interests of a user in a set ofinformation resources. We can compute a more tuned interest set of keywords using an interactive conversation process between the user and the information resources being queried. Such conversation is an uncertainty reducing process based on Nakamura and Iwai’s [10] IR system, which we extended to Evidence Sets [14] [16] with TalkMine. TalkMine isthen an algorithm for obtaining a representation of user interests in several information resources (including other users).It works by combining the perspectives of each information resources on the user interests into an evidence set, which is fine-tuned by an automated conversation process with the user’s agent/browser [16] . The combination of perspectives 0-7803-7280-8/02/$10.00 ©2002 IEEE
is based on evidence sets and uses the semi-metric distance and Salvatore Sessa(eds )Soft Computing Agents: New functions described in this article. The importance of such sen Trends for Designing Autonomous Systems Physica-Verlag, metric distance functions is thus described in this article, as they pringer, 2001, pp. 89-116 allow us to both analyze Document Networks for interests and [ 7] Rocha, L M, Semi-Metric Behavior in Document Networks trends(sec. 3), as well as offer an avenue to combine user interests in distinct information resources(sec. 5) Computing, 2002. In Press [18 Rocha, L. M. and bollen, J,"Biologically motivated distributed designs for adaptive knowledge management, "in REFERENCES Segel, L. A and Cohen, I(eds ) Design Principles for the mmune System and other Distributed Autonomous Systems [1] Bollen, J and Rocha, L. M, "An adaptive systems approach to Oxford University Press, 2001, pp. 305-334 the implementation and evaluation of digital library 1 19]Shafer, G, A Mathematical Theory of Evidence Princeton recommendation systems, Research and Advanced University Press, 1976 Technology for Digital Libraries: 4th European Conference, [20] Shore, S.D. and Sawyer, L.J., "Explicit Metrization, "Annals ECDL 2000 Lectures Notes in Computer Science. Springer of the New York Academy of Sciences, vol. 704 pp. 328-336 Verlag.2000,356-359 Topologies, "American Mathematical Monthly, vol. 98. no 7 [21]Turksen, I.B., "Non-specificity and interval-valued fuzzy sets, [2] Galvin, F and Shore, S. D, "Distance Functions and Fuccy sets and Syst stems, vol, 80, no. 1, pp 87-100, May 1996 pp.620-623,1991 22] Watts, D, Small Worlds: The Dynamics of Networks between 3 Herlocker, J L, Konstan, J A, Bouchers, A, and riedl,J,"An Order and randomness Princeton University Press, 1999 algorithmic framework for performing collaborative filtering, [23] Yager, R.R., "Measure of Fuzziness and Negation. 1 Proceedings of the 22nd annual international ACM SIGIR Membership in the Unit Interval, "International Journal of onference on Research and development in information General Systems, vol 5, no. 4, pp. 221-229, 1979 retrieval/999 ACM Press. 1999. 230-237 24] Yager, R.R., "On the Measure of Fuzziness and Negation 2 [4] Klir, G.J. and Yuan, B, Fucsy Sets and Fucry Logic: Theory Lattices, "Information and Control, vol. 44, no 3, pp. 236-260, and Applications Upper Saddle River, NJ: Prentice Hall, 1995 1980 [5]Konstan, J, Miller, B, Maltz, D, Herlocker, J, Gordon, L,, an [25] Zadeh, L. A, "Fuzzy Sets, "Information and Control, vol. 8 pp Riedl, J, "GroupLens- Applying Collaborative Filtering to 338-353.1965 Usenet News, "Commumications of the ACM, vol. 40, no. 3, pp 77-87,1997 [6] Krulwich, B and Burkey, C, "Learning user information interests through extraction of semantically significant phrases, "Proceedings of the AAAl Spring Symposium on Machine learning in Information Access 1996 7 Lakoff, G, Women, Fire, and Dangerous Things. What ategories Reveal about the Mind University of Chicago Press [8Luce, R, "Evolution and scientific literature: towards a decentralized adaptive web, "Nature: Web Debates, no May 10,2001,2001 [9]Miyamoto, S, Fic-y Sets in Information Retrieval and Cluster Analysis Kluwer Academic Publishers, 1990 [10J Nakamura, Kiyohiko and lwai, Sosuke, "Representation of Analogical Inference by Fuzzy Sets and its Application to Information Retrieval System, Fuccy Infand Decis Processes 1982,373-386 [11 Newman, M.E., "The structure of scientific collaboration networks, "Proc. Natl. Acad. Sci. U.S. A, vol. 98, no. 2, pp 404 [12] Rocha, L. M., "Cognitive categorization revisited: extending nterval valued fuzzy sets as simulation tools cor combination, "Proc. of the 1994 Int Conference of NAFIPSIFISNASA IEEE Press. 1994. 400-404 [13]Rocha, L. M, " Relative uncertainty and evidence sets: A constructivist framework, International Journal of General Systems, vol 26 pp.35-61,1997 [14]Rocha, L M., "Evidence sets: Modeling subjective categories, International Journal of General Systems, vol. 27, no 6, pp 57-494.1999 [15 Rocha, L M, "Adaptive recommendation and open-ended semiosis, "Kybernetes, vol. 30, no 5-6, pp. 821-851, 2001 [16 Rocha, L M, TalkMine: a Soft Computing Approach to Adaptive Knowledge Recommendation, " in Vincenzo Loia 0-7803-7280-8/02/1000@2002IEE
is based on evidence sets, and uses the semi-metric distance functions described in this article. The importance ofsuch semimetric distance functionsisthus described in this article, asthey allow us to both analyze Document Networks for interests and trends (sec. 3), as well as offer an avenue to combine user interests in distinct information resources (sec. 5). REFERENCES [1] Bollen, J. and Rocha, L. M., "An adaptive systems approach to the implementation and evaluation of digital library recommendation systems," Research and Advanced Technology for Digital Libraries: 4th European Conference, ECDL 2000 Lectures Notes in Computer Science. SpringerVerlag, 2000, 356-359. [2] Galvin , F. and Shore , S. D., "Distance Functions and Topologies," American Mathematical Monthly, vol. 98, no. 7, pp. 620-623, 1991. [3] Herlocker, J. L., Konstan, J. A., Bouchers, A., and Riedl, J., "An algorithmic framework for performing collaborative filtering," Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval 1999 ACM Press, 1999, 230-237. [4] Klir, G. J. and Yuan, B., Fuzzy Sets and Fuzzy Logic: Theory and Applications Upper Saddle River, NJ: Prentice Hall, 1995. [5] Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., and Riedl, J., "GroupLens - Applying Collaborative Filtering to Usenet News," Communications of the ACM, vol. 40, no. 3, pp. 77-87, 1997. [6] Krulwich, B. and Burkey, C., "Learning user information interests through extraction of semantically significant phrases," Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access 1996. [7] Lakoff, G., Women, Fire, and Dangerous Things: What Categories Reveal about the Mind University of Chicago Press, 1987. [8] Luce, R., "Evolution and scientific literature: towards a decentralized adaptive web," Nature: Web Debates, no. May 10, 2001, 2001. [9] Miyamoto, S., Fuzzy Sets in Information Retrieval and Cluster Analysis Kluwer Academic Publishers, 1990. [10] Nakamura, Kiyohiko and Iwai, Sosuke, "Representation of Analogical Inference by Fuzzy Sets and its Application to Information Retrieval System," Fuzzy Inf and Decis Processes 1982, 373-386. [11] Newman, M. E., "The structure of scientific collaboration networks," Proc.Natl.Acad.Sci.U.S.A, vol. 98, no. 2, pp. 404- 409, Jan.2001. [12] Rocha, L. M., "Cognitive categorization revisited: extending interval valued fuzzy sets as simulation tools concept combination," Proc.of the 1994 Int.Conference of NAFIPS/IFIS/NASA IEEE Press, 1994, 400-404. [13] Rocha, L. M., "Relative uncertainty and evidence sets: A constructivist framework," International Journal of General Systems, vol. 26, no. 1-2, pp. 35-61, 1997. [14] Rocha, L. M., "Evidence sets: Modeling subjective categories," International Journal of General Systems, vol. 27, no. 6, pp. 457-494, 1999. [15] Rocha, L. M., "Adaptive recommendation and open-ended semiosis," Kybernetes, vol. 30, no. 5-6, pp. 821-851, 2001. [16] Rocha, L. M., "TalkMine: a Soft Computing Approach to Adaptive Knowledge Recommendation," in Vincenzo Loia and Salvatore Sessa (eds.) Soft Computing Agents: New Trends for Designing Autonomous Systems Physica-Verlag, Springer, 2001, pp. 89-116. [17] Rocha, L. M., "Semi-Metric Behavior in Document Networks and Adaptive Recommendation Systems," Journal of Soft Computing, 2002. In Press. [18] Rocha, L. M. and Bollen, J., "Biologically motivated distributed designs for adaptive knowledge management," in Segel, L. A. and Cohen, I. (eds.) Design Principles for the Immune System and other Distributed Autonomous Systems Oxford University Press, 2001, pp. 305-334. [19] Shafer, G., A Mathematical Theory of Evidence Princeton University Press, 1976. [20] Shore , S. D. and Sawyer , L. J., "Explicit Metrization," Annals of the New York Academy of Sciences, vol. 704 pp. 328-336, 1993. [21] Turksen, I. B., "Non-specificity and interval-valued fuzzy sets," Fuzzy Sets and Systems, vol. 80, no. 1, pp. 87-100, May1996. [22] Watts, D., Small Worlds: The Dynamics of Networks between Order and Randomness Princeton University Press, 1999. [23] Yager , R. R., "Measure of Fuzziness and Negation .1. Membership in the Unit Interval," International Journal of General Systems, vol. 5, no. 4, pp. 221-229, 1979. [24] Yager , R. R., "On the Measure of Fuzziness and Negation .2. Lattices," Information and Control, vol. 44, no. 3, pp. 236-260, 1980. [25] Zadeh, L. A., "Fuzzy Sets," Information and Control, vol. 8 pp. 338-353, 1965. 0-7803-7280-8/02/$10.00 ©2002 IEEE