Semantic Reasoning: A Path to New Possibilities of Personalization* Yolanda Blanco-Fernandez, Jose J. Pazos-Arias, Alberto Gil-Solla, Manuel Ramos-Cabrer, and Martin Lopez-Nores Department of Telematics Engineering, University of vigo, 36310 yolanda, jose, agil, ramos, minores )@det. uvigo Abstract. Recommender systems face up to current information overload by se- lecting automatically items that match the personal preferences of each user. The So-called content-based recommenders suggest items similar to those the user liked in the past, by resorting to syntactic matching mechanisms. The rigid na- ure of such mechanisms leads to recommend only items that bear a strong re- emblance to those the user already knows. In this paper, we propose a novel ontent-based strategy that diversifies the offered recommendations by employ ing reasoning mechanisms borrowed from the Semantic Web. These mechanisms discover extra knowledge about the user's preferences, thus favoring more accu be used in a wide variety of personalization applications and services, in diverse domains and recommender systems. The proposed reasoning-based strategy has been empirically evaluated with a set of real users. The obtained results evidence computational feasibility and significant increases in recommendation accuracy w.r.t. existing approaches where our reasoning capabilities are disregarded 1 Introduction Recommender systems provide personalized advice to users about items or services they might be interested in. Currently, these tools are gaining momentum in the Digital Revolution, helping people efficiently manage content overload and reducing complex ity when searching for relevant information To fulfill these personalization needs, three main components are required in a rec ommender system: (i)a database where the available items are stored, (ii) personal profiles where the users' preferences are modeled, and(iii)recommendation strategies aimed at selecting personalized suggestions for each individual. The first such strat- egy was the so-called content-based filtering, which suggests to a user items similar to those he/she liked in the past. In spite of its accuracy, this technique is limited due to the employed similarity metrics. These metrics are based on rigid syntactic approaches Work funded by the Ministerio de educacion y Ciencia( Gobierno de Espana)research project TSI2007-61599, by the Conselleria de Educacion e Ordenacion Universitaria(Xunta e Galicia) incentives file 2007/000016-0, and by the programa de promocion Xeral da Investigacion de la Conselleria de Innovacion, Industria e Comercio (Xunta de galicia) PGIDITOSPXIC32204PN S. Bechhofer et al. (Eds ) ESWC 2008, LNCS 5021. Pp. 720-735, 2008. C Springer-Verlag Berlin Heidelberg 2008
Semantic Reasoning: A Path to New Possibilities of Personalization Yolanda Blanco-Fernández, José J. Pazos-Arias, Alberto Gil-Solla, Manuel Ramos-Cabrer, and Martín López-Nores Department of Telematics Engineering, University of Vigo, 36310, Spain {yolanda,jose,agil,mramos,mlnores}@det.uvigo.es Abstract. Recommender systems face up to current information overload by selecting automatically items that match the personal preferences of each user. The so-called content-based recommenders suggest items similar to those the user liked in the past, by resorting to syntactic matching mechanisms. The rigid nature of such mechanisms leads to recommend only items that bear a strong resemblance to those the user already knows. In this paper, we propose a novel content-based strategy that diversifies the offered recommendations by employing reasoning mechanisms borrowed from the Semantic Web. These mechanisms discover extra knowledge about the user’s preferences, thus favoring more accurate and flexible personalization processes. Our approach is generic enough to be used in a wide variety of personalization applications and services, in diverse domains and recommender systems. The proposed reasoning-based strategy has been empirically evaluated with a set of real users. The obtained results evidence computational feasibility and significant increases in recommendation accuracy w.r.t. existing approaches where our reasoning capabilities are disregarded. 1 Introduction Recommender systems provide personalized advice to users about items or services they might be interested in. Currently, these tools are gaining momentum in the Digital Revolution, helping people efficiently manage content overload and reducing complexity when searching for relevant information. To fulfill these personalization needs, three main components are required in a recommender system: (i) a database where the available items are stored, (ii) personal profiles where the users’ preferences are modeled, and (iii) recommendation strategies aimed at selecting personalized suggestions for each individual. The first such strategy was the so-called content-based filtering, which suggests to a user items similar to those he/she liked in the past. In spite of its accuracy, this technique is limited due to the employed similarity metrics. These metrics are based on rigid syntactic approaches Work funded by the Ministerio de Educación y Ciencia (Gobierno de España) research project TSI2007-61599, by the Consellería de Educación e Ordenación Universitaria (Xunta de Galicia) incentives file 2007/000016-0, and by the Programa de Promoción Xeral da Investigación de la Consellería de Innovación, Industria e Comercio (Xunta de Galicia) PGIDIT05PXIC32204PN. S. Bechhofer et al.(Eds.): ESWC 2008, LNCS 5021, pp. 720–735, 2008. c Springer-Verlag Berlin Heidelberg 2008
Semantic Reasoning: A Path to New Possibilities of personalization that only detect similarity between items that share all or some of their attributes [1] Consequently, traditional content-based approaches lead to overspecialized suggestions ncluding only items that bear a strong resemblance to those the user already knows (i.e items with attributes defined in his/her profile). To fight overspecialization, researchers devised a new strategy named collaborative filtering, based on offering to each user items that were appealing to others with similar preferences(named neighbors). Collaborative filtering reduces the effects of overs cialization by considering other users' interests, but it also causes new limitations, such as scalability problems, difficulties to select each user's neighborhood when the avail- able preferences are sparse(commonly named sparsity problem), and privacy concerns related to the confidentiality of the users' personal data(see [1] for details). Bearing in mind the severe drawbacks of the collaborative solutions, we propo novel content-based strategy that exploits the main strengths of this personaliza- tion paradigm and overcomes the overspecialized nature of its recommendations. For that purpose, our strategy diversifies the offered suggestions without resorting to other users' preferences, thus protecting their privacy. Specifically, we fight syntactic limita tions of the existing content-based approaches by employing two reasoning techniques borrowed from the Semantic Web field: the so-called semantic associations [3] and Spreading Activation techniques(henceforth, SA techniques)[7]. Instead of using the traditional syntactic similarity metrics, these associations trace semantic bonds between the user's preferences and the items available in the recommender system, which are previously formalized in a domain ontology along with their semantic annotations. Next, SA techniques efficiently explore these semantic relationships and discover new knowledge related to the users'interests. This knowledge permits our strategy to com- pare in a more flexible way the user's preferences with the available items, thus of fering more accurate recommendations. Although the adopted reasoning mechanisms have been widely used in the Semantic Web [3, 14, 15], their internals must be adapted to fulfill personalization requirements of a recommender system. So, these mechanisms must allow to: (i) learn automatically new knowledge about the users' preferences from their feedback, and (ii) adapt dynamically the strategy as these preferences evolve In spite of the generality of our reasoning-based approach, in this paper we have adopted a specific context with the goal of describing in detail its use in a domain where the information overload is noticeable. Specifically, we have exploited the rea- soning capabilities of our content-based strategy in order to enhance the recommenda tions offered to viewers of the Interactive Digital TV(IDTV). Today, TV viewers are exposed to overwhelming amounts of information, and challenged by the plethora of nteractive functionality provided by the current digital receivers. As there are hundreds of channels with an abundance of programs available, it is likely that appealing TV programs go unnoticed. To assist these viewers, it is possible to take advantage of the personalization capabilities provided by a TV recommender system, which sifts through the myriad of programs available in the digital stream and selects those that match the viewers' preferences by using our reasoning-based strategy This paper is organized as follows: Sect. 2 describes the two key elements in our reasoning framework: (i)the ontology where the domain knowledge is formalized, in cluding the available TV programs and their semantic descriptions, and (ii) the user
Semantic Reasoning: A Path to New Possibilities of Personalization 721 that only detect similarity between items that share all or some of their attributes [1]. Consequently, traditional content-based approaches lead to overspecialized suggestions including only items that bear a strong resemblance to those the user already knows (i.e. items with attributes defined in his/her profile). To fight overspecialization, researchers devised a new strategy named collaborative filtering, based on offering to each user items that were appealing to others with similar preferences (named neighbors). Collaborative filtering reduces the effects of overspecialization by considering other users’ interests, but it also causes new limitations, such as scalability problems, difficulties to select each user’s neighborhood when the available preferences are sparse (commonly named sparsity problem), and privacy concerns related to the confidentiality of the users’ personal data (see [1] for details). Bearing in mind the severe drawbacks of the collaborative solutions, we propose a novel content-based strategy that exploits the main strengths of this personalization paradigm and overcomes the overspecialized nature of its recommendations. For that purpose, our strategy diversifies the offered suggestions without resorting to other users’ preferences, thus protecting their privacy. Specifically, we fight syntactic limitations of the existing content-based approaches by employing two reasoning techniques borrowed from the Semantic Web field: the so-called semantic associations [3] and Spreading Activation techniques (henceforth, SA techniques) [7]. Instead of using the traditional syntactic similarity metrics, these associations trace semantic bonds between the user’s preferences and the items available in the recommender system, which are previously formalized in a domain ontology along with their semantic annotations. Next, SA techniques efficiently explore these semantic relationships and discover new knowledge related to the users’ interests. This knowledge permits our strategy to compare in a more flexible way the user’s preferences with the available items, thus offering more accurate recommendations. Although the adopted reasoning mechanisms have been widely used in the Semantic Web [3,14,15], their internals must be adapted to fulfill personalization requirements of a recommender system. So, these mechanisms must allow to: (i) learn automatically new knowledge about the users’ preferences from their feedback, and (ii) adapt dynamically the strategy as these preferences evolve. In spite of the generality of our reasoning-based approach, in this paper we have adopted a specific context with the goal of describing in detail its use in a domain where the information overload is noticeable. Specifically, we have exploited the reasoning capabilities of our content-based strategy in order to enhance the recommendations offered to viewers of the Interactive Digital TV (IDTV). Today, TV viewers are exposed to overwhelming amounts of information, and challenged by the plethora of interactive functionality provided by the current digital receivers. As there are hundreds of channels with an abundance of programs available, it is likely that appealing TV programs go unnoticed. To assist these viewers, it is possible to take advantage of the personalization capabilities provided by a TV recommender system, which sifts through the myriad of programs available in the digital stream and selects those that match the viewers’ preferences by using our reasoning-based strategy. This paper is organized as follows: Sect. 2 describes the two key elements in our reasoning framework: (i) the ontology where the domain knowledge is formalized, including the available TV programs and their semantic descriptions, and (ii) the user
722 Y. Blanco-Fernandez et al modeling approach employed to create the users' profiles. Next, Sect. 3 describes how the semantic associations and Sa techniques are exploited in our content-based strategy. Then, a sample example where a set of TV programs are suggested to a given viewer is presented in Sect. 4. The tests carried out to validate our reasoning-based approach are explained in detail in Sect. 5. Finally, Sect. 6 draws some conclusions and points out possible lines of further work 2 Domain Ontology and User Modeling 2.1 The Domain Ontology Two elements are needed to formalize the idTV domain by an ontology: (1)the seman tic descriptions of the TV programs that can be suggested, and (ii)a language expres- sive enough to represent the concepts (i.e. classes and their instances)and relationships (i.e. hierarchical links and properties)identified in the domain. In our approach, the se- mantic descriptions have been extracted from TV-Anytime metadata specifications [6] whereas the OWL ( DL) language has been selected due to its expressive capability, which allows to formalize concepts and expressions not supported in RDF and RDFS Starting from TV-Anytime metadata, we have defined and included in our OWL on- tology several hierarchies of classes and properties, as well as specific instances of them, as shown in the TV ontology depicted in Fig. 1. The considered TV programs (identified by unique IDs) have been automatically extracted from the Internet Movie Data Base (IMDB)and the BBC web server, and are represented as specific instances belonging to a hierarchy of genres organized in several levels (e.g. fiction, leisure, romance, etc. ) as shown at top of Fig. 1. The main attributes of these programs(e.g. involved credits topics and places, intended audience, intention, etc ) are also instances related to them by labeled properties. These attributes also belong to hierarchically organized classes As some of these classes are already defined in existing conceptualizations, we have imported ontologies about different domains such as sports, geographical information credits involved in TV programs(e.g. actors), among others. 2.2 Our User Modeling Approach Our approach models the user's profiles by reusing the knowledge available in the do- main ontology, that is why we named them ontology-profiles. Specifically, we propose a semantic model for each user that gives information about: (i) the tV programs that were appealing or uninteresting for him/her(named positive and negative preferences, respectively), (ii) their main attributes, and (iii) the genres under which these programs are classified in the TV ontology(see at the top of Fig. 1). This user modeling approach provides a formal representation of the users'preferences, permitting to reason about hem and discover additional knowledge about their interests. Such knowledge permit Seehttp://www.imdb.comandhttp://backstage.bbcco.uk/data/7daylistingDatafordetail 2TheseontologieswereextractedfromtheDamlrepositorylocatedinwww.damlorg/ ontologies and converted to the OWL language by means of a tool developed by the MindswApResearchGroup(seehttp://www.mindswap.org/2002/owl.shtmlfordetails
722 Y. Blanco-Fernández et al. modeling approach employed to create the users’ profiles. Next, Sect. 3 describes how the semantic associations and SA techniques are exploited in our content-based strategy. Then, a sample example where a set of TV programs are suggested to a given viewer is presented in Sect. 4. The tests carried out to validate our reasoning-based approach are explained in detail in Sect. 5. Finally, Sect. 6 draws some conclusions and points out possible lines of further work. 2 Domain Ontology and User Modeling 2.1 The Domain Ontology Two elements are needed to formalize the IDTV domain by an ontology: (i) the semantic descriptions of the TV programs that can be suggested, and (ii) a language expressive enough to represent the concepts (i.e. classes and their instances) and relationships (i.e. hierarchical links and properties) identified in the domain. In our approach, the semantic descriptions have been extracted from TV-Anytime metadata specifications [6], whereas the OWL (DL) language has been selected due to its expressive capability, which allows to formalize concepts and expressions not supported in RDF and RDFS. Starting from TV-Anytime metadata, we have defined and included in our OWL ontology several hierarchies of classes and properties, as well as specific instances of them, as shown in the TV ontology depicted in Fig. 1. The considered TV programs (identified by unique IDs) have been automatically extracted from the Internet Movie DataBase (IMDB) and the BBC web server1, and are represented as specific instances belonging to a hierarchy of genres organized in several levels (e.g. fiction, leisure, romance, etc.), as shown at top of Fig. 1. The main attributes of these programs (e.g. involved credits, topics and places, intended audience, intention, etc.) are also instances related to them by labeled properties. These attributes also belong to hierarchically organized classes. As some of these classes are already defined in existing conceptualizations, we have imported ontologies about different domains such as sports, geographical information, credits involved in TV programs (e.g. actors), among others2. 2.2 Our User Modeling Approach Our approach models the user’s profiles by reusing the knowledge available in the domain ontology, that is why we named them ontology-profiles. Specifically, we propose a semantic model for each user that gives information about: (i) the TV programs that were appealing or uninteresting for him/her (named positive and negative preferences, respectively), (ii) their main attributes, and (iii) the genres under which these programs are classified in the TV ontology (see at the top of Fig. 1). This user modeling approach provides a formal representation of the users’ preferences, permitting to reason about them and discover additional knowledge about their interests. Such knowledge permits 1 See http://www.imdb.com and http://backstage.bbc.co.uk/data/7DayListingData for details. 2 These ontologies were extracted from the DAML repository located in www.daml.org/ ontologies and converted to the OWL language by means of a tool developed by the MINDSWAP Research Group (see http://www.mindswap.org/2002/owl.shtml for details)
Semantic Reasoning: A Path to New Possibilities of Personalization 72 Fig 1. Excerpt from classes, properties and instances in our TV ontology to compare, in a more effective way, the users' preferences with the available items, thus leading to personalization processes more accurate than the traditional syntactic approaches [1]. In this regard, note that our ontology-profiles greatly improve other at lists-based approaches which are not well structured to favor the discovery of new owledge(see[4] for details) Fulfilling the goals of our personalization strategy requires identifying the interest of the user in both TV programs defined in his/her profile and their attributes and genres. Specifically, these Degrees Of Interest(named DOl indexes and belonging to [-1, 1))can be explicitly stated by the user or automatically inferred from his/her viewing behav ior(e.g. programs accepted or rejected after recommendations, viewing time for each suggested program, etc. ) Once the DOI indexes of each program in the user's profile have been established, we compute the indexes corresponding to their attributes and to
Semantic Reasoning: A Path to New Possibilities of Personalization 723 Tom Cruise Born on 4th of July Nicole Kidman Cameron Crowe ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID10 ID11 ID9 ID8 Clint Eastwood Morgan Freeman Kyoto Tokyo Tokyo Kyoto World War I World War I Vietnam War Vietnam War Stanley Kubrick Eyes Wide Shut Eyes Wide Shut Learn about WW I Learn about WW I Welcome to Tokyo Welcome to Tokyo Japanese cities War Topics Vanilla Sky Vanilla Sky Million Dollar Baby Danny the Dog Danny the Dog TV Contents Fiction Contents History Cookery Pets Tourism Drama Romance Mistery Action Non Fiction Contents Leisure Contents Martial Arts Game of Death Bruce Lee Karate Kung Fu Kung Fu Karate Jerry Maguire Jerry Maguire The Last Samurai HasActor HasActor HasActress HasActor HasDirector rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id HasDirector HasDirector HasDirector ActorIn DirectorIn HasActor ActorIn ActorIn ActorIn HasTopic HasTopic TopicIn HasTopic HasTopic HasPlace HasPlace rdf:typeOf rdf:SubClassOf owl:ObjectProperty The Last Samurai Paths of Glory Born on 4th of July Game of Death Million Dollar Baby Paths of Glory Fig. 1. Excerpt from classes, properties and instances in our TV ontology to compare, in a more effective way, the users’ preferences with the available items, thus leading to personalization processes more accurate than the traditional syntactic approaches [1]. In this regard, note that our ontology-profiles greatly improve other flat lists-based approaches which are not well structured to favor the discovery of new knowledge (see [4] for details). Fulfilling the goals of our personalization strategy requires identifying the interest of the user in both TV programs defined in his/her profile and their attributes and genres. Specifically, these Degrees Of Interest (named DOI indexes and belonging to [-1,1]) can be explicitly stated by the user or automatically inferred from his/her viewing behavior (e.g. programs accepted or rejected after recommendations, viewing time for each suggested program, etc.). Once the DOI indexes of each program in the user’s profile have been established, we compute the indexes corresponding to their attributes and to
724 Y. Blanco-Fernandez et al the genres under which these programs are classified in the ontology. This computation mechanism -omitted here due to space limitations-is explained in detail in [5] Although other ontology-based proposals have been devised in literature, our user modeling approach differs to a great extent from these existing works. As an exam- ple, note the Quickstep system proposed by Middleton in [10], which suggests research papers according to the users'interests. The main difference between our work and Quickstep is related to the knowledge used for modeling purposes. In fact, Quickstep uses a simple taxonomy of research categories for representing the papers each user appreciates, whereas our proposal exploits the whole knowledge formalized in the on- tology, permitting to carry out reasoning processes that discover extra information about users'preferences. The same limitation can be identified in the system proposed in [16] which recommends books according to the user preferences. There, the knowledge dis covery is based on analyzing just only hierarchical relationships, thus hampering more 3 Our Reasoning-Based Strategy Our personalization strategy suggests programs that are semantically associated with the contents the viewer has liked in the past, improving the syntactic similarity metrics adopted in the traditional content-based methods. Specifically, our strategy consists of two stages-named filtering phase and recommendation phase, respectively-, which are sketched next and fully described in Sect. 3.1 and Sect. 3.2. Filtering phase. This stage selects in the OWL ontology instances of classes and properties that are relevant for the user, by considering his/her personal preferences Next, our reasoning-based approach infers semantic associations among the lected entities identifying specific TV programs. These hidden associations -which we borrow from [3]-are discovered from the hierarchical links and properties de- fined in the domain ontology Recommendation phase. The inferred knowledge is processed in the second phase by employing SA techniques. This intelligent mechanism works as concept ex plorer, as it detects concepts that are closely related to the user's preferences by exploring the entities and semantic associations inferred during the filtering phase 3.1 Filtering Phase Firstly, our strategy locates in the domain ontology the programs that were(un )appealing to the user( defined in his/her profile). Next, it traverses successively the properties bound to these programs until reaching new class instances(nodes referred to programs, actors, topics.)in the ontology. To guarantee the computational feasibility, we have developed a controlled inference mechanism that works as follows. as new nodes are reached from a given instance, our approach firstly quantifies their relevance for the user. Then, the nodes whose relevance indexes are lower than a specific threshold are ignored, in a such The value of this threshold depends on both the domain ontology and the recommender system that adopts our content-based strategy. In our tests in DTV field, we have used values around
724 Y. Blanco-Fernández et al. the genres under which these programs are classified in the ontology. This computation mechanism -omitted here due to space limitations- is explained in detail in [5]. Although other ontology-based proposals have been devised in literature, our user modeling approach differs to a great extent from these existing works. As an example, note the Quickstep system proposed by Middleton in [10], which suggests research papers according to the users’ interests. The main difference between our work and Quickstep is related to the knowledge used for modeling purposes. In fact, Quickstep uses a simple taxonomy of research categories for representing the papers each user appreciates, whereas our proposal exploits the whole knowledge formalized in the ontology, permitting to carry out reasoning processes that discover extra information about users’ preferences. The same limitation can be identified in the system proposed in [16], which recommends books according to the user preferences. There, the knowledge discovery is based on analyzing just only hierarchical relationships, thus hampering more complex inference processes as those pursued in our work. 3 Our Reasoning-Based Strategy Our personalization strategy suggests programs that are semantically associated with the contents the viewer has liked in the past, improving the syntactic similarity metrics adopted in the traditional content-based methods. Specifically, our strategy consists of two stages –named filtering phase and recommendation phase, respectively–, which are sketched next and fully described in Sect. 3.1 and Sect. 3.2. – Filtering phase. This stage selects in the OWL ontology instances of classes and properties that are relevant for the user, by considering his/her personal preferences. Next, our reasoning-based approach infers semantic associations among the selected entities identifying specific TV programs. These hidden associations –which we borrow from [3]– are discovered from the hierarchical links and properties de- fined in the domain ontology. – Recommendation phase. The inferred knowledge is processed in the second phase by employing SA techniques. This intelligent mechanism works as concept explorer, as it detects concepts that are closely related to the user’s preferences by exploring the entities and semantic associations inferred during the filtering phase. 3.1 Filtering Phase Firstly, our strategy locates in the domain ontology the programs that were (un)appealing to the user (defined in his/her profile). Next, it traverses successively the properties bound to these programs until reaching new class instances (nodes referred to programs, actors, topics...) in the ontology. To guarantee the computational feasibility, we have developed a controlled inference mechanism that works as follows. As new nodes are reached from a given instance, our approach firstly quantifies their relevance for the user. Then, the nodes whose relevance indexes are lower than a specific threshold3 are ignored, in a such 3 The value of this threshold depends on both the domain ontology and the recommender system that adopts our content-based strategy. In our tests in DTV field, we have used values around 0.65
Semantic Reasoning: A Path to New Possibilities of Personalization 725 way that our inference mechanism continues traversing successively only the properties that permit to reach new nodes from those that are significant for the user(according to his/her profile). Consequently, our strategy explores solely entities of interest for the user, thus filtering those that probably do not provide knowledge useful for the personalization In our filtering mechanism, the more significant the relationship between a given node and the user's preferences(either positive or negative preferences), the more rele vant this node. In order to measure this relevance value, we have developed a technique hat takes into account diverse ontology-dependent filtering criteria. Some of these cri- teria-described in detail in 5]-are sketched next. 1. Length of the chain of properties that enables to reach the considered node starting from the user's preferences. Specifically, the longer this property sequence+, the lower the relevance index of the node, as its relationship to the user's preferences is less significant due to the presence of many intermediate nodes Example: Let us consider that a user has enjoyed the documentary Learn about ww I shown in Fig. 1. Here, it is possible to find the property sequence Learn about WW l-World War /- Paths of Glory- Stanley Kubrick- Eyes Wide Shut In this case, the relevance index of the program Paths of Glory is greater than the index of Eyes Wide Shut, as the relationship between Paths of Glory and Learn about ww I is more significant than the relationship between this documentary and Eyes Wide Shut. In other words, the relation"movie about the World War r"is more relevant than"movie whose director has directed movies about the world war l 2. Existence of hierarchical relationships between the node and the user's preferences The relevance of a node is increased when it is possible to find a common ancestor between it and the user' preferences in the ontology hierarchies. Example: Let us consider again the user who has liked the war documentary men- tioned in the previous example, whose topic is represented in our ontology by the instance World War 1. In this case, the filtering phase increases the relevance in- dex of other instances that share the common ancestor War Topics with the class instance World War/(e.g. Vietnam War 3. Existence of implicit relationships between the node and the user's preferences de- tected by concepts from graph theory. In graph theory [8], the betweenness among three nodes is high when in the most of paths existing between the first and the second node, the third node is also included. So, from a high value of betweenness it follows that the involved nodes are strongly related. In our approach, these nodes are the user's preferences and the class instance whose relevance is measured. Example: Let us consider a user who has liked the movies Vanilla Sky and Jerr Maguire with Tom Cruise as leading actor. In this case, the relevance index of the nstance Born on the 4th of July gets higher, as this movie is closely related to the user's preferences In fact, as shown in Fig. 1, the node Tom Cruise is included in all the paths established between Born on the 4th of July and the two movies defined in the user's profile The length of a sequence is defined as the number of properties included in it
Semantic Reasoning: A Path to New Possibilities of Personalization 725 way that our inference mechanism continues traversing successively only the properties that permit to reach new nodes from those that are significant for the user (according to his/her profile). Consequently, our strategy explores solely entities of interest for the user, thus filtering those that probably do not provide knowledge useful for the personalization process. In our filtering mechanism, the more significant the relationship between a given node and the user’s preferences (either positive or negative preferences), the more relevant this node. In order to measure this relevance value, we have developed a technique that takes into account diverse ontology-dependent filtering criteria. Some of these criteria –described in detail in [5]– are sketched next. 1. Length of the chain of properties that enables to reach the considered node starting from the user’s preferences. Specifically, the longer this property sequence4, the lower the relevance index of the node, as its relationship to the user’s preferences is less significant due to the presence of many intermediate nodes. Example: Let us consider that a user has enjoyed the documentary Learn about WW I shown in Fig. 1. Here, it is possible to find the property sequence Learn about WW I - World War I - Paths of Glory - Stanley Kubrick - Eyes Wide Shut. In this case, the relevance index of the program Paths of Glory is greater than the index of Eyes Wide Shut, as the relationship between Paths of Glory and Learn about WW I is more significant than the relationship between this documentary and Eyes Wide Shut. In other words, the relation “movie about the World War I” is more relevant than “movie whose director has directed movies about the World War I”. 2. Existence of hierarchical relationships between the node and the user’s preferences. The relevance of a node is increased when it is possible to find a common ancestor between it and the user’ preferences in the ontology hierarchies. Example: Let us consider again the user who has liked the war documentary mentioned in the previous example, whose topic is represented in our ontology by the instance World War I. In this case, the filtering phase increases the relevance index of other instances that share the common ancestor War Topics with the class instance World War I (e.g. Vietnam War). 3. Existence of implicit relationships between the node and the user’s preferences detected by concepts from graph theory. In graph theory [8], the betweenness among three nodes is high when in the most of paths existing between the first and the second node, the third node is also included. So, from a high value of betweenness, it follows that the involved nodes are strongly related. In our approach, these nodes are the user’s preferences and the class instance whose relevance is measured. Example: Let us consider a user who has liked the movies Vanilla Sky and Jerry Maguire with Tom Cruise as leading actor. In this case, the relevance index of the instance Born on the 4th of July gets higher, as this movie is closely related to the user’s preferences. In fact, as shown in Fig. 1, the node Tom Cruise is included in all the paths established between Born on the 4th of July and the two movies defined in the user’s profile. 4 The length of a sequence is defined as the number of properties included in it
726 Y. Blanco-Fernandez et al Once the nodes related to the user's interests(and the properties linking them to each other) have been selected, our strategy infers semantic associations between the instances referred to TV programs. Specifically, we adopt three associations that have been defined by Anyanwu and Sheth in [ 3] P-path association In our approach, two programs are p-pathAssociated when they ire linked by a chain or sequence of properties in the ontology. For instance, in Fig. 1, it is possible to trace a sequence between the documentary Learn about ww I and the movie Paths of glory by means of the World War I instance P-join association. Two programs are p-joinAssociated when their respective at tributes belong to the same class in the domain ontology. For instance, in Fig. 1 there exists a p-join association between the documentary Welcome to Tokyo and the movie Last Samurai, as both programs are bound to different cities in Japan (Tokyo and Kyoto, respectively, which are classified as Japanese cities in Fig. 1) p-cp association. Two programs are p-cpAssociated when they share a common ancestor in the genre hierarchy defined in the ontology. For instance, note that all the movies depicted at the top of Fig. I are p-cpAssociated by the ancestor Non Fiction Contents By means of the filtering and knowledge inference processes, our approach has built a network for the user, whose nodes are the instances of classes selected during the filtering phase, and whose links are both the properties joining these instances in the ontology and the semantic associations inferred from it. The knowledge represented in this network is explored during the second phase of the strategy by exploiting the infer ence capabilities provided by SA techniques, which are one of the most-used processing frameworks for semantic networks 3.2 Recommendation Phase We emphasize the use of SA techniques as a computational mechanism able to: (i)ex plore efficiently the relationships among the nodes interconnected in the user's network (henceforth SA network), and (ii) infer from them knowledge useful for the recommen- dation process by detecting concepts closely related to the user's preferences. Accord ing to the guidelines established in [7], these techniques work as follows The nodes of the network have an implicit relevance, named activation level. Be sides, each link joining two nodes has a weight, in a such way that the stronger the relationship between both nodes, the higher the assigned weight. Initially, a set of nodes are selected and their activation levels are spread until reaching the nodes connected to them by links(named neighbor nodes The activation level of a reached node is computed by considering the levels of its neighbors and the weights assigned to the links that join them to each other. Conse quently, the more relevant the neighbors of a given node (i. e higher their activation levels), and the stronger the relationship between the node and its neighbors (i.e higher the weights of the links between them), the more relevant this node
726 Y. Blanco-Fernández et al. Once the nodes related to the user’s interests (and the properties linking them to each other) have been selected, our strategy infers semantic associations between the instances referred to TV programs. Specifically, we adopt three associations that have been defined by Anyanwu and Sheth in [3]: – ρ-path association. In our approach, two programs are ρ-pathAssociated when they are linked by a chain or sequence of properties in the ontology. For instance, in Fig. 1, it is possible to trace a sequence between the documentary Learn about WW I and the movie Paths of Glory by means of the World War I instance. – ρ-join association. Two programs are ρ-joinAssociated when their respective attributes belong to the same class in the domain ontology. For instance, in Fig. 1 there exists a ρ-join association between the documentary Welcome to Tokyo and the movie Last Samurai, as both programs are bound to different cities in Japan (Tokyo and Kyoto, respectively, which are classified as Japanese cities in Fig. 1). – ρ-cp association. Two programs are ρ-cpAssociated when they share a common ancestor in the genre hierarchy defined in the ontology. For instance, note that all the movies depicted at the top of Fig. 1 are ρ-cpAssociated by the ancestor Non Fiction Contents. By means of the filtering and knowledge inference processes, our approach has built a network for the user, whose nodes are the instances of classes selected during the filtering phase, and whose links are both the properties joining these instances in the ontology and the semantic associations inferred from it. The knowledge represented in this network is explored during the second phase of the strategy by exploiting the inference capabilities provided by SA techniques, which are one of the most-used processing frameworks for semantic networks. 3.2 Recommendation Phase We emphasize the use of SA techniques as a computational mechanism able to: (i) explore efficiently the relationships among the nodes interconnected in the user’s network (henceforth SA network), and (ii) infer from them knowledge useful for the recommendation process by detecting concepts closely related to the user’s preferences. According to the guidelines established in [7], these techniques work as follows: – The nodes of the network have an implicit relevance, named activation level. Besides, each link joining two nodes has a weight, in a such way that the stronger the relationship between both nodes, the higher the assigned weight. Initially, a set of nodes are selected and their activation levels are spread until reaching the nodes connected to them by links (named neighbor nodes). – The activation level of a reached node is computed by considering the levels of its neighbors and the weights assigned to the links that join them to each other. Consequently, the more relevant the neighbors of a given node (i.e. higher their activation levels), and the stronger the relationship between the node and its neighbors (i.e. higher the weights of the links between them), the more relevant this node
Semantic Reasoning: A Path to New Possibilities of Personalization 727 This spreading process is repeated successively until reaching all the nodes of the network. Finally, the highest activation levels correspond to the nodes that are clos est related to those initially selected The Sa techniques have been widely used in the fields of searching and information retrieval [15, 12, 9]. However, to combine their inferential capabilities with the person- alization requirements of a recommender system, it is necessary to extend the existing approaches. The modifications we propose affect mainly to two issues: (i) the kind of links traditionally modeled in the SA network, and (ii) their weighting process On the one hand, the links considered in traditional approaches model only simple and direct relationships, thus disregarding during the spreading activation proces a huge amount of knowledge hidden behind more complex relationships. In order to fight this limitation, our SA network models both the properties defined in the ontology and the semantic associations inferred in the filtering phase. This way, the links corresponding to the associations allow to spread the relevance of the ers preferences until reaching programs appealing to him/her, which would go unnoticed in traditional SA-based approaches On the other hand, as weights of the links depends only on the strength of the elationship between the connected nodes, these values remain static in existing SA approaches. Bearing in mind the purposes of a recommender system, our SA-based strategy must also consider the user's preferences during the weighting process, in a such way that this process adapts to changes in his/her interests. In our approach, the proposed strategy activates in the user's network the nodes re- ferred to the programs defined in his/her profile, and assigns them an initial activation level equal to their respective DOI indexes. Next, it is necessary to weight conveniently he links of the network, which represent both the explicit knowledge formalized in the ontology (i.e. properties), and the implicit knowledge discovered from it (i.e. semantic associations). As we mentioned in the previous section, our strategy adjusts dynami cally these values as the user's preferences evolve over time. In fact, the weight of a link joining two nodes in our user's network is computed by considering their respec tive relevance indexes, which are measured during the filtering phase. As a result,our approach leads to a highly positive weight when the two linked nodes and the user's positive preferences are strongly related, and to a negative weight when the relation ship is established to his/her negative preferences. This way, as the user's preferences change, the weights of the links in his/her SA network are conveniently modified and hence, the elaborated recommendations are also updated According to the traditional SA techniques, once the propagation process has reached all the nodes in the user's network, the highest activation levels correspond to TV pro- rams satisfying two conditions: (i) their neighbor nodes are also relevant for the user (that is why their high activation levels), and (ii) they are closely related to the users preferences(that is why the high weight of the links). For that reason, these nodes iden- tify the TV programs finally suggested by our content-based strategy s For instance in information retrieval. a link between two nodes referred to terms indicates their co-occurrence in a document(see [7] for details)
Semantic Reasoning: A Path to New Possibilities of Personalization 727 – This spreading process is repeated successively until reaching all the nodes of the network. Finally, the highest activation levels correspond to the nodes that are closest related to those initially selected. The SA techniques have been widely used in the fields of searching and information retrieval [15,12,9]. However, to combine their inferential capabilities with the personalization requirements of a recommender system, it is necessary to extend the existing approaches. The modifications we propose affect mainly to two issues: (i) the kind of links traditionally modeled in the SA network, and (ii) their weighting process. – On the one hand, the links considered in traditional approaches model only simple and direct relationships5, thus disregarding during the spreading activation process a huge amount of knowledge hidden behind more complex relationships. In order to fight this limitation, our SA network models both the properties defined in the ontology and the semantic associations inferred in the filtering phase. This way, the links corresponding to the associations allow to spread the relevance of the user’s preferences until reaching programs appealing to him/her, which would go unnoticed in traditional SA-based approaches. – On the other hand, as weights of the links depends only on the strength of the relationship between the connected nodes, these values remain static in existing SA approaches. Bearing in mind the purposes of a recommender system, our SA-based strategy must also consider the user’s preferences during the weighting process, in a such way that this process adapts to changes in his/her interests. In our approach, the proposed strategy activates in the user’s network the nodes referred to the programs defined in his/her profile, and assigns them an initial activation level equal to their respective DOI indexes. Next, it is necessary to weight conveniently the links of the network, which represent both the explicit knowledge formalized in the ontology (i.e. properties), and the implicit knowledge discovered from it (i.e. semantic associations). As we mentioned in the previous section, our strategy adjusts dynamically these values as the user’s preferences evolve over time. In fact, the weight of a link joining two nodes in our user’s network is computed by considering their respective relevance indexes, which are measured during the filtering phase. As a result, our approach leads to a highly positive weight when the two linked nodes and the user’s positive preferences are strongly related, and to a negative weight when the relationship is established to his/her negative preferences. This way, as the user’s preferences change, the weights of the links in his/her SA network are conveniently modified and, hence, the elaborated recommendations are also updated. According to the traditional SA techniques, once the propagation process has reached all the nodes in the user’s network, the highest activation levels correspond to TV programs satisfying two conditions: (i) their neighbor nodes are also relevant for the user (that is why their high activation levels), and (ii) they are closely related to the user’s preferences (that is why the high weight of the links). For that reason, these nodes identify the TV programs finally suggested by our content-based strategy. 5 For instance, in information retrieval, a link between two nodes referred to terms indicates their co-occurrence in a document (see [7] for details)
728 Y. Blanco-Fernandez et al As a conclusion, note that the spreading process employed in our reasoning-based approach has three main advantages in the personalization field: Firstly, our strategy is able to discover that a TV program is appealing to the user even when its attributes are not defined in his/her profile. Thanks to the sa techniques, this program is relevant if it is semantically associated with the user's preferences. Consequently, our reasoning-based strategy offers diverse recommen- dations, beyond the overspecialized suggestions offered by the traditional syntactic ontent-based techniques. Secondly, our reasoning mechanisms consider both the positive and negative pref erences of the user. Whereas the interests help to identify contents appealing to the user, the negative preferences decrease the activation levels of the nodes to which are related(either explicitly by means of properties, or implicitly by semantic as ociation).This way, our strategy prevents from suggesting programs associated with those the user did not like Lastly, note that our approach not only favors the know ledge reusing, but also per- nits the user's network to adapt easily to changes in his/her preferences. This way, s these interests evolve, the filtering phase selects new nodes, properties and se mantic associations, and incorporates them into the current network of the user. 4 A Sample scenario The example described in this section shows the differences between our reasoning based recommendations and those offered by traditional content-based strategies. Due to space limitations, we consider only the brief excerpt from our TV ontology shown in Fig. 1. However, this restriction does not prevent from highlighting the associations used by our SA techniques for the selection of personalized recommendations. In this scenario, we consider that the target user is U, who has enjoyed the documen- taries Welcome to Tokyo and Lean about World War L, and the movies Vanilla Sky and Jerry Maguire starring Tom Cruise. As for U's negative preferences, we assume that this user liked neither Morgan Freeman(supporting actor in Million Dollar Baby ), and Game of Death(a movie about martial arts with Bruce Lee as leading actor). Filtering Phase: Selecting Instances Relevant for U Firstly, our strategy selects in the domain ontology instances that are relevant for the user U by considering his/her personal preferences. For that purpose, the strategy lo- cates in the ontology the nodes referred to the programs defined in U's profile, and explores successively the nodes joined to them by properties. For instance, from the node identifying Us favorite actor (Tom Cruise in Fig. 1), it is possible to reach the in- stances Born on the 4th of uly and The last samurai. Considering the second filtering criterion, our strategy selects the two reached instances, as they share common ances tors with U's positive preferences. In fact, as shown in the hierarchy at the top of Fig. 1 Born on the 4th of uly and Jerry Maguire are Drama movies, and The Last samurai and Vanilla Sky are classified as Action movies
728 Y. Blanco-Fernández et al. As a conclusion, note that the spreading process employed in our reasoning-based approach has three main advantages in the personalization field: – Firstly, our strategy is able to discover that a TV program is appealing to the user even when its attributes are not defined in his/her profile. Thanks to the SA techniques, this program is relevant if it is semantically associated with the user’s preferences. Consequently, our reasoning-based strategy offers diverse recommendations, beyond the overspecialized suggestions offered by the traditional syntactic content-based techniques. – Secondly, our reasoning mechanisms consider both the positive and negative preferences of the user. Whereas the interests help to identify contents appealing to the user, the negative preferences decrease the activation levels of the nodes to which are related (either explicitly by means of properties, or implicitly by semantic associations). This way, our strategy prevents from suggesting programs associated with those the user did not like. – Lastly, note that our approach not only favors the knowledge reusing, but also permits the user’s network to adapt easily to changes in his/her preferences. This way, as these interests evolve, the filtering phase selects new nodes, properties and semantic associations, and incorporates them into the current network of the user. 4 A Sample Scenario The example described in this section shows the differences between our reasoningbased recommendations and those offered by traditional content-based strategies. Due to space limitations, we consider only the brief excerpt from our TV ontology shown in Fig. 1. However, this restriction does not prevent from highlighting the associations used by our SA techniques for the selection of personalized recommendations. In this scenario, we consider that the target user is U, who has enjoyed the documentaries Welcome to Tokyo and Learn about World War I, and the movies Vanilla Sky and Jerry Maguire starring Tom Cruise. As for U’s negative preferences, we assume that this user liked neither Morgan Freeman (supporting actor in Million Dollar Baby), and Game of Death (a movie about martial arts with Bruce Lee as leading actor). Filtering Phase: Selecting Instances Relevant for U Firstly, our strategy selects in the domain ontology instances that are relevant for the user U by considering his/her personal preferences. For that purpose, the strategy locates in the ontology the nodes referred to the programs defined in U’s profile, and explores successively the nodes joined to them by properties. For instance, from the node identifying U’s favorite actor (Tom Cruise in Fig. 1), it is possible to reach the instances Born on the 4th of July and The Last Samurai. Considering the second filtering criterion, our strategy selects the two reached instances, as they share common ancestors with U’s positive preferences. In fact, as shown in the hierarchy at the top of Fig. 1, Born on the 4th of July and Jerry Maguire are Drama movies, and The Last Samurai and Vanilla Sky are classified as Action movies
Semantic reasoning a path to new possibilities of personalization Next, our filtering phase continues exploring the instances linked to the two previ- ously selected nodes(i.e. Vietnam War from the node Born on the 4th of July, and Kyoto from The last samurai). In this case, both instances are relevant for U, as this user has appreciate other instances belonging to their classes in the ontology(specifically, World War I belonging to the War Topics class, and Tokyo belonging to Japanese cities) We also search for class instances related to the user's negative preferences. These instances are relevant during the personalization processes, as they help to identify pro- grams the user will probably not enjoy. Among such nodes, note Danny the Dog. As own in Fig. 1, this movie not only involves an actor U does not like(Morgan Free man), but also it is about Karate, a topic that seems to be unappealing to this user, whe has not liked the movie about martial arts entitled Game of Death. This knowledge will be used by our SA-based reasoning to elaborate recor ons for U Filtering Phase: Inferring Semantic Associations Between TV Programs Once the instances relevant for U have been identified, our strategy infers associations between the programs included in the selected property sequences(see Table 1) Table 1. Property Sequences and Semantic Associations inferred for the user U Learn about wwI- World War I- Paths of Glory erry Maguire-Ton Cruise- Bom on 4th July- Vietnam War e-join(Welcome to Tokyo, The Last Samurai) Icome to Tokyo- Tokyo cp(Vanilla Sky, The Last Samurai) Danny the p-cp(Danny the Dog, Game of Death) Game of Death- Kung Fu P-path(Danny the Dog, Million Dollar Baby) The next step is to build the user Us SA network, by including as nodes the class instances selected by the filtering process, and as links both the properties that join these instances to each other in the ontology, and the associations inferred from it. Once the links have been weighted, Us network (in Fig. 2)is processed by SA techniques, which reason about the represented knowledge to select the personalized recommendations Fig. 2. Network used by sa techniques to select content-based recommendations for U
Semantic Reasoning: A Path to New Possibilities of Personalization 729 Next, our filtering phase continues exploring the instances linked to the two previously selected nodes (i.e. Vietnam War from the node Born on the 4th of July, and Kyoto from The Last Samurai). In this case, both instances are relevant for U, as this user has appreciate other instances belonging to their classes in the ontology (specifically, World War I belonging to the War Topics class, and Tokyo belonging to Japanese cities). We also search for class instances related to the user’s negative preferences. These instances are relevant during the personalization processes, as they help to identify programs the user will probably not enjoy. Among such nodes, note Danny the Dog. As shown in Fig. 1, this movie not only involves an actor U does not like (Morgan Freeman), but also it is about Karate, a topic that seems to be unappealing to this user, who has not liked the movie about martial arts entitled Game of Death. This knowledge will be used by our SA-based reasoning to elaborate recommendations for U. Filtering Phase: Inferring Semantic Associations Between TV Programs Once the instances relevant for U have been identified, our strategy infers associations between the programs included in the selected property sequences (see Table 1). Table 1. Property Sequences and Semantic Associations inferred for the user U Sequence Properties Semantic Associations Learn about WW I - World War I - Paths of Glory Jerry Maguire - Tom Cruise - Born on 4th July - Vietnam War Vanilla Sky - Tom Cruise - The Last Samurai - Kyoto Welcome to Tokyo - Tokyo Danny the Dog - Karate Game of Death - Kung Fu ρ-path (Jerry Maguire, Born on 4th July) ρ-join (Welcome to Tokyo, The Last Samurai) ρ-join (Learn about WW I, Born on 4th July) ρ-cp (Vanilla Sky, The Last Samurai) ρ-cp (Danny the Dog, Game of Death) ρ-path (Danny the Dog, Million Dollar Baby) The next step is to build the user U’s SA network, by including as nodes the class instances selected by the filtering process, and as links both the properties that join these instances to each other in the ontology, and the associations inferred from it. Once the links have been weighted, U’s network (in Fig. 2) is processed by SA techniques, which reason about the represented knowledge to select the personalized recommendations. Tom Cruise Born on 4th of July Kyoto Tokyo World War I Vietnam War Paths of Glory Learn about WW I Danny the Dog Morgan Freeman Game of Death Welcome to Tokyo Vanilla Sky Jerry Maguire The Last Samurai Fig. 2. Network used by SA techniques to select content-based recommendations for U