Intelligent User profiling Silvia schiaffino 2 and Analia amandil I ISISTAN Research Institute. Universidad Nacional del Centro de la Provincia de buenos Aires, Campus Universitario, Argentina CONICET, Consejo Nacional de Investigaciones Cientificas y Tecnicas, Argentina [sschia, amandi ]@exa.uniceneduar Abstract. User profiles or user models are vital in many areas in which it is essential to obtain knowledge about users of software applications. Exam- ples of these areas are intelligent agents, adaptive systems, intelligent tutor ing systems, recommender systems, intelligent e-commerce applications, and knowledge management systems. In this chapter we study the main is- es from the perspectives of these research fields We examine what information constitutes a user profile, how the user pro file is represented; how the user profile is acquired and built; and how the profile information is used. We also discuss some challenges and future rends in the intelligent user profiling area ntroduction a profile is a description of someone containing the most important or interesting facts about him or her. In the context of users of software applications, a user profile or user model contains essential information about an individual user. The motivation of building user profiles is that users differ in their preferences, inter- ests, background and goals when using software applications. Discovering these differences is vital to providing users with personalized services The content of a user profile varies from one application domain to another. For xample, if we consider an online newspaper domain, the user profile contains the types of news(topics) the user likes to read, the types of news(topics) the user does not like to read, the newspapers he usually reads, and the user's reading hab- its and patterns. In a calendar management domain the user profile contains in- formation about the dates and times when the user usually schedules each type activity in which he is involved, the priorities each activity feature has for the user, the relevance of each user contact and the user's scheduling and rescheduling habits. In other domains personal information about the user, such as name, age, b, and hobbies might be important Not only the content of user profiles differs from one domain to also how the information they contain is acquired. The content of a user profile can be explicitly provided by the user or it has to be learned using some intelligent M. Bramer(Ed ) Artificial Intelligence, LNAI 5640, pp. 193-216, 2009 o Springer- Verlag Berlin Heidelberg 2009
M. Bramer (Ed.): Artificial Intelligence, LNAI 5640, pp. 193 – 216, 2009. © Springer-Verlag Berlin Heidelberg 2009 Intelligent User Profiling Silvia Schiaffino1,2 and Analía Amandi1,2 1 ISISTAN Research Institute, Universidad Nacional del Centro de la Provincia de Buenos Aires, Campus Universitario, Argentina 2 CONICET, Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina {sschia,amandi}@exa.unicen.edu.ar Abstract. User profiles or user models are vital in many areas in which it is essential to obtain knowledge about users of software applications. Examples of these areas are intelligent agents, adaptive systems, intelligent tutoring systems, recommender systems, intelligent e-commerce applications, and knowledge management systems. In this chapter we study the main issues regarding user profiles from the perspectives of these research fields. We examine what information constitutes a user profile; how the user profile is represented; how the user profile is acquired and built; and how the profile information is used. We also discuss some challenges and future trends in the intelligent user profiling area. 1 Introduction A profile is a description of someone containing the most important or interesting facts about him or her. In the context of users of software applications, a user profile or user model contains essential information about an individual user. The motivation of building user profiles is that users differ in their preferences, interests, background and goals when using software applications. Discovering these differences is vital to providing users with personalized services. The content of a user profile varies from one application domain to another. For example, if we consider an online newspaper domain, the user profile contains the types of news (topics) the user likes to read, the types of news (topics) the user does not like to read, the newspapers he usually reads, and the user's reading habits and patterns. In a calendar management domain the user profile contains information about the dates and times when the user usually schedules each type of activity in which he is involved, the priorities each activity feature has for the user, the relevance of each user contact and the user's scheduling and rescheduling habits. In other domains personal information about the user, such as name, age, job, and hobbies might be important. Not only the content of user profiles differs from one domain to another, but also how the information they contain is acquired. The content of a user profile can be explicitly provided by the user or it has to be learned using some intelligent
194 S Schiaffino and A. amandi technique. User profiling implies inferring unobservable information about users from observable information about them, that is, their actions or utterances(Zu- kerman and Albrecht, 2001). A wide variety of Artificial Intelligence techniques have been used for user profiling, such as case-based reasoning(Lenz et al, 1998 Godoy et al., 2004), Bayesian networks(Horvitz et al, 1998; Conati et al, 2002 Schiaffino and Amandi, 2005; Garcia et al, 2007), association rules(Adomavicius and Tuzhilin, 2001; Schiaffino and Amandi, 2006), genetic algorithms(Moukas 1996, Yannibelli et al, 2006), neural networks(Yasdi, 1999, Villaverde et al, 2006), The purpose of obtaining user profiles is also different in the various areas that use them. In adaptive systems, the user profile is used to provide the adaptation effect, that is to behave differently for different users(Brusilovsky and Millan, 2007). In intelligent agents, particularly in interface agents, the user profile is used to provide personalized assistance to users with respect to some software applica- tion(Maes, 1994). In intelligent tutoring systems, the user profile or student model is used to guide students in their learning process according to their knowledge and learning styles( Garcia et al, 2007). In e-commerce applications the user or customer profile is used to make personalized offers and to suggest or recommend products the user is supposed to like( Adomavicius and Tuzhilin, 2001). In knowl edge management systems, the skills a user or employee has, the roles he takes within an organization, and his performance in these roles are used by managers or project leaders to assign him to the job position that suits him best(Sure et al 2000). In recommender systems the user profile contains ratings for items like mov ies, news or books, which are used to recommend potentially interesting items to him and to other users with similar tastes or interests(Resnick and Varian, 1997) In this Chapter we study user profiles from the different perspectives mentioned above. In Section 2 we describe what information constitutes a user profile. In Section 3 we examine the different ways in which we can acquire informatio about a user and then build a user profile. Section 4 focuses on intelligent user profiling techniques. Finally, Section 5 presents some future trends 2 User Profile contents A user profile is a representation of information about an individual user that is essential for the(intelligent)application we are considering. This section describes the most common contents of user profiles: user interests; the user's knowledge, background and skills; the user's goals; user behaviour; the users interaction preferences; the user's individual characteristics; and the users context. We ana- lyze and provide examples for the different contents in areas like intelligent agents, adaptive systems, intelligent tutoring systems, recommender systems, and knowledge management systems
194 S. Schiaffino and A. Amandi technique. User profiling implies inferring unobservable information about users from observable information about them, that is, their actions or utterances (Zukerman and Albrecht, 2001). A wide variety of Artificial Intelligence techniques have been used for user profiling, such as case-based reasoning (Lenz et al, 1998; Godoy et al., 2004), Bayesian networks (Horvitz et al, 1998; Conati et al, 2002; Schiaffino and Amandi, 2005; Garcia et al, 2007), association rules (Adomavicius and Tuzhilin, 2001; Schiaffino and Amandi, 2006), genetic algorithms (Moukas, 1996; Yannibelli et al, 2006), neural networks (Yasdi, 1999; Villaverde et al, 2006), among others. The purpose of obtaining user profiles is also different in the various areas that use them. In adaptive systems, the user profile is used to provide the adaptation effect, that is to behave differently for different users (Brusilovsky and Millán, 2007). In intelligent agents, particularly in interface agents, the user profile is used to provide personalized assistance to users with respect to some software application (Maes, 1994). In intelligent tutoring systems, the user profile or student model is used to guide students in their learning process according to their knowledge and learning styles (Garcia et al, 2007). In e-commerce applications the user or customer profile is used to make personalized offers and to suggest or recommend products the user is supposed to like (Adomavicius and Tuzhilin, 2001). In knowledge management systems, the skills a user or employee has, the roles he takes within an organization, and his performance in these roles are used by managers or project leaders to assign him to the job position that suits him best (Sure et al, 2000). In recommender systems the user profile contains ratings for items like movies, news or books, which are used to recommend potentially interesting items to him and to other users with similar tastes or interests (Resnick and Varian, 1997). In this Chapter we study user profiles from the different perspectives mentioned above. In Section 2 we describe what information constitutes a user profile. In Section 3 we examine the different ways in which we can acquire information about a user and then build a user profile. Section 4 focuses on intelligent user profiling techniques. Finally, Section 5 presents some future trends. 2 User Profile Contents A user profile is a representation of information about an individual user that is essential for the (intelligent) application we are considering. This section describes the most common contents of user profiles: user interests; the user’s knowledge, background and skills; the user’s goals; user behaviour; the user’s interaction preferences; the user’s individual characteristics; and the user’s context. We analyze and provide examples for the different contents in areas like intelligent agents, adaptive systems, intelligent tutoring systems, recommender systems, and knowledge management systems
ntelligent User profi 195 2.1 Interests User interests are one of the most important (and typically the only) part of the user profile in information retrieval and filtering systems, recommender systems, some interface agents, and adaptive systems that are information-driven such as encyclopedias, museum guides, and news systems(Brusilovsky and Millan 2007). Interests can represent news topics, web page topics, document topics, work-related topics or hobbies-related topics. Sometimes user interests are classi fied as short-term interests or long-term interests. The interest of users in football may be a short-term interest if the user reads or listens to news about this topic only during the World Cup, or a long-term interest if the user is always interested in this topic. For example, Newsdude(Billsus and Pazzani, 1999), an interface agent that learns about a users interests in daily news stories, considers informa- tion about recent events as short-term interests, and a users general preferences for news stories as long-term interests The most common representation of user interests are keyword-based models In these models interests are represented by weighted vectors of keywor Weights traditionally represent the relevance of the word for the user or within the topic. These representations are common in the Information Filtering and Informa- tion Retrieval areas. For example Letizia (lieberman et al, 2001a), a browsing assistant,uses TF-IDF (term frequency/inverse document frequency) vectors to model user interests. In this technique the weight of each word is calculated by comparing the word frequency in a document against the word frequency in all the documents in a corpus(Salton and McGill, 1983). This technique is also used NewsDude(Billsus and Pazzani, 1999), where news stories are converted to tF IDF vectors A more powerful representation of user interests is through topic hierarchies Godoy et al, 2004). Each node in the hierarchy represents a topic of interest for a user, which is defined by a set of representative words. This representation tech- nique is important when we want to model not only general user interests such as of these interests that are relevant to a given user. For example, the user profile can indicate that a certain user is inter- ested in documents talking about a famous football player and not in sports or football in general. An example of a topic hierarchy containing a users interests is shown in Figure I Often, a topic ontology is used as the reference to construct a user interest pro- file. An ontology is a conceptualization of a domain into a human-understandable, but machine-readable format consisting of entities, attributes, relationships, and axioms( Guarino and Giaretta 1995). For instance, in Quickstep(middleton et al 2004), the authors represent user profiles in terms of a research paper topic onto laboratory setting, representing user profiling with a research topic ontology ogy. This recommender system was built to help researchers in a computer sciend using ontological inference to assist the profiling process. Similarly, in(Liang et al, 2007)students' interests within an e-learning system are determined using a topic ontology
Intelligent User Profiling 195 2.1 Interests User interests are one of the most important (and typically the only) part of the user profile in information retrieval and filtering systems, recommender systems, some interface agents, and adaptive systems that are information-driven such as encyclopedias, museum guides, and news systems (Brusilovsky and Millán, 2007). Interests can represent news topics, web page topics, document topics, work-related topics or hobbies-related topics. Sometimes user interests are classified as short-term interests or long-term interests. The interest of users in football may be a short-term interest if the user reads or listens to news about this topic only during the World Cup, or a long-term interest if the user is always interested in this topic. For example, NewsDude (Billsus and Pazzani, 1999), an interface agent that learns about a user’s interests in daily news stories, considers information about recent events as short-term interests, and a user’s general preferences for news stories as long-term interests. The most common representation of user interests are keyword-based models. In these models interests are represented by weighted vectors of keywords. Weights traditionally represent the relevance of the word for the user or within the topic. These representations are common in the Information Filtering and Information Retrieval areas. For example Letizia (Lieberman et al, 2001a), a browsing assistant, uses TF-IDF (term frequency/inverse document frequency) vectors to model user interests. In this technique the weight of each word is calculated by comparing the word frequency in a document against the word frequency in all the documents in a corpus (Salton and McGill, 1983). This technique is also used in NewsDude (Billsus and Pazzani, 1999), where news stories are converted to TFIDF vectors. A more powerful representation of user interests is through topic hierarchies (Godoy et al, 2004). Each node in the hierarchy represents a topic of interest for a user, which is defined by a set of representative words. This representation technique is important when we want to model not only general user interests such as sports or economy, but also the sub-topics of these interests that are relevant to a given user. For example, the user profile can indicate that a certain user is interested in documents talking about a famous football player and not in sports or football in general. An example of a topic hierarchy containing a user’s interests is shown in Figure 1. Often, a topic ontology is used as the reference to construct a user interest profile. An ontology is a conceptualization of a domain into a human-understandable, but machine-readable format consisting of entities, attributes, relationships, and axioms (Guarino and Giaretta 1995). For instance, in Quickstep (Middleton et al, 2004), the authors represent user profiles in terms of a research paper topic ontology. This recommender system was built to help researchers in a computer science laboratory setting, representing user profiling with a research topic ontology and using ontological inference to assist the profiling process. Similarly, in (Liang et al, 2007) students’ interests within an e-learning system are determined using a topic ontology
ino and A. Amandi OOT User Topics ( Relevance 0.1) nampionship 0.9 imbledom 0.7 ATP ser Reading Fig. 1. Hierarchical representation of a user's interests 2.2 Knowledge, background and skills The knowledge the user has about the application domain, his background experi ence and his skills are important features within user profiles in different areas. In intelligent tutoring systems and adaptive educational systems, the students knowledge about the subject taught is vital to provide proper assistance to the student or to adapt the content of courses according to it. This knowledge can be represented in different ways. The most common representation is through a model that keeps track of the student knowledge about every element in the cours knowledge base. The idea is to mark each knowledge item X with a value calcu- lated as"student knowledge of X The value could be binary(knows-does not know), qualitative(good-average -bad)or quantitative, assigned as a probability of the student's familiarity with the item X. For instance, in Cumulate(Brusi lovsky et al, 2005), the state of a student s knowledge is represented as a weighted overlay model covering a set of topics, and each educational activity can contrib- ute to only one topic Another way of representing user's knowledge is through errors or misconcep- ions. In addition to(or instead of) modelling what the user knows, some works focus on modelling what the user does not know. For example, in( Chen and Hsieh 2005)the authors aim at diagnosing learners'common learning misconcep- tions during learning processes. They try to discover relationships between mis- conceptions Also, in many applications, the user's knowledge about the underlying domain is important. Some systems categorize users as expert, intermediate, or novice, depending on how well they know the application domain. For example, MetaDoc (Boyle and Encarnacion, 1994 )considers the knowledge users have about Unix, which is the underlying application domain in this system
196 S. Schiaffino and A. Amandi ROOT (Relevance 0.5) economy finances dollar 0.9 0.8 0.8 (Relevance 0.7) championship team player 0.9 0.8 0.7 (Relevance 0.1) politics vote president 0.8 0.9 0.7 (Relevance 0.4) tennis Wimbledom ATP 1.0 0.7 0.9 (Relevance 0.3) football world-cup FIFA 1.0 0.8 0.8 User Reading Experiences User Topics of Interest Fig. 1. Hierarchical representation of a user’s interests 2.2 Knowledge, background and Skills The knowledge the user has about the application domain, his background experience and his skills are important features within user profiles in different areas. In intelligent tutoring systems and adaptive educational systems, the student’s knowledge about the subject taught is vital to provide proper assistance to the student or to adapt the content of courses according to it. This knowledge can be represented in different ways. The most common representation is through a model that keeps track of the student knowledge about every element in the course knowledge base. The idea is to mark each knowledge item X with a value calculated as “student knowledge of X”. The value could be binary (knows - does not know), qualitative (good - average - bad) or quantitative, assigned as a probability of the student’s familiarity with the item X. For instance, in Cumulate (Brusilovsky et al, 2005), the state of a student’s knowledge is represented as a weighted overlay model covering a set of topics, and each educational activity can contribute to only one topic. Another way of representing user’s knowledge is through errors or misconceptions. In addition to (or instead of) modelling what the user knows, some works focus on modelling what the user does not know. For example, in (Chen and Hsieh 2005) the authors aim at diagnosing learners’ common learning misconceptions during learning processes. They try to discover relationships between misconceptions. Also, in many applications, the user’s knowledge about the underlying domain is important. Some systems categorize users as expert, intermediate, or novice, depending on how well they know the application domain. For example, MetaDoc (Boyle and Encarnacion, 1994) considers the knowledge users have about Unix, which is the underlying application domain in this system
ntelligent User Profiling Furthermore, user skills are key in areas like Knowledge Management. Within this area, skill management systems serve as technical platforms for mostly, though not exclusively, corporate-internal market places for skills and know-ho The systems are typically built on top of a database that contains profiles of em- ployees and applicants. In this domain, profiles consist of numerous values for different skills and may be represented as vectors. In( Sure et al, 2000)authors use he integers0”( no knowledge),l"( beginner),“2”( intermediate)and“3”(ex pert) as skill values. Examples of skills can be"Programming in Y or"Admini- stration of server x” Finally, the user's background refers to those user's characteristics that are not directly related to the application domain. For instance, if we consider a tutoring system, the user's job or profession, his work experience, his traveling experience. the languages he speaks, among other information, constitute the users back ground. As an application example, in( Cawsey et al, 2007) the authors describe an adaptive information system in the healthcare domain that considers users iteracy and medical background to provide them information that they can under- stand. The representation of users' background and skills is commonly done via stereotypes. We discuss them in Section 3. 4 2.3 Goals Goals represent the users objective or purpose with respect to the application he is working with, that is what the user wants to achieve. Goals are target tasks or subtasks at the focus of a users attention(Horvitz et al, 1998). If the user browsing the Web, his goal is obtaining relevant information( this type of goal is known as an information need). If the user is working with an e-learning system, his goal is learning a certain subject. In a calendar management system, the user's goals are scheduling new events or rescheduling conflicting events Determining what a user wants to do is not a trivial task. Plan recognition is a technique that aims at identifying the goal or intention of a user from the tasks he performs. In this context, a task corresponds to an action the user can perform in the software application, and a goal is a higher level intention of the user, which will be accomplished by carrying out a set of tasks. Systems using plan recogni- tion observe the input tasks of a user and try to find all possible plans by which the observed tasks can be explained. These possible explanations or candidate plans are narrowed as the user continues performing further tasks. Plan recognition has been applied in different areas such as intelligent tutoring( Greer and Kohenn, 95), interface agents (Lesh et al, 1999, Armentano and amandi, 2006), and collaborative planning(Huber and Durfee, 1994) Goals or intentions can be represented in different ways. Figure 2 shows a Bayesian network representation of a user's intentions in a calendar domain(Ar mentano and Amandi, 2006). In this representation, nodes represent user tasks and arcs represent probabilistic dependencies between tasks. Given evidence of a task performed by the user, the system can infer the next(most probable)task, and
Intelligent User Profiling 197 Furthermore, user skills are key in areas like Knowledge Management. Within this area, skill management systems serve as technical platforms for mostly, though not exclusively, corporate-internal market places for skills and know-how. The systems are typically built on top of a database that contains profiles of employees and applicants. In this domain, profiles consist of numerous values for different skills and may be represented as vectors. In (Sure et al, 2000) authors use the integers “0” (no knowledge), “1” (beginner), “2” (intermediate) and “3” (expert) as skill values. Examples of skills can be “Programming in Y” or “Administration of Server X”. Finally, the user’s background refers to those user’s characteristics that are not directly related to the application domain. For instance, if we consider a tutoring system, the user’s job or profession, his work experience, his traveling experience, the languages he speaks, among other information, constitute the user’s background. As an application example, in (Cawsey et al, 2007) the authors describe an adaptive information system in the healthcare domain that considers users’ literacy and medical background to provide them information that they can understand. The representation of users’ background and skills is commonly done via stereotypes. We discuss them in Section 3.4. 2.3 Goals Goals represent the user’s objective or purpose with respect to the application he is working with, that is what the user wants to achieve. Goals are target tasks or subtasks at the focus of a user’s attention (Horvitz et al, 1998). If the user is browsing the Web, his goal is obtaining relevant information (this type of goal is known as an information need). If the user is working with an e-learning system, his goal is learning a certain subject. In a calendar management system, the user’s goals are scheduling new events or rescheduling conflicting events. Determining what a user wants to do is not a trivial task. Plan recognition is a technique that aims at identifying the goal or intention of a user from the tasks he performs. In this context, a task corresponds to an action the user can perform in the software application, and a goal is a higher level intention of the user, which will be accomplished by carrying out a set of tasks. Systems using plan recognition observe the input tasks of a user and try to find all possible plans by which the observed tasks can be explained. These possible explanations or candidate plans are narrowed as the user continues performing further tasks. Plan recognition has been applied in different areas such as intelligent tutoring (Greer and Kohenn, 1995), interface agents (Lesh et al, 1999; Armentano and Amandi, 2006), and collaborative planning (Huber and Durfee, 1994). Goals or intentions can be represented in different ways. Figure 2 shows a Bayesian network representation of a user’s intentions in a calendar domain (Armentano and Amandi, 2006). In this representation, nodes represent user tasks and arcs represent probabilistic dependencies between tasks. Given evidence of a task performed by the user, the system can infer the next (most probable) task, and
198 S Schiaffino and A. Amandi hence, the user's goal. Similarly, the Lumiere project at Microsoft Research( Hor- vitz et al., 1998)uses Bayesian networks to infer a users needs by considering a user's background, actions and queries(help requests). Based on the beliefs of a user's needs and the utility theory of influence diagrams(an extension to Bayesian networks), an automated assistant provides help for users. In Andes( gertner and VanLehn, 2000), plan recognition is necessary for the problem solving coach select what step to suggest when a student asks for help. Since Andes wants to help students solve problems in their own way, it must determine what goal the student is probably trying to achieve, and suggest the action the student cannot perform due to lack of knowledge 2.4 Behaviour Usually, the user's behaviour with a software application is an important part of the user profile. If a given user behaviour is repetitive, then it represents a pattern that can be used by an adaptive system or an intelligent agent to adapt a web site or to assist the user according to the behaviour learnt. The type of behaviour mod- elled depends on the application domain. For example, CAP (Calendar APpren- tice)learns the scheduling behaviour of its user and learns rules that enable it to suggest the meeting duration, location, time, and date(Mitchell et al, 1994 ) In an intelligent e-commerce system, a behavioural profile models the customer's ac tions(Adomavicius and Tuzhilin, 2001). Examples of behaviours in this domain e"When purchasing cereal, John Doe usually buys milk"and"On weekend John Doe usually spends more than $100 on groceries". In intelligent tutoring systems, the student behaviour is vital to assist him properly. In(Xu, 2002),a student profile is a set of pairs, where e is a behaviour of the student and t expresses the time when the behaviour occurs. t could be a point in time or interval of time. In this work, there are two main types of student behaviours and making a choice in a quiz. ometimes behaviours are routine, that is, they show some kind of regularity or seasonality. For example, Query Guesser(Schiaffino and Amandi, 2005) models a users routine queries to a database in a Laboratory Information Management System. In this agent, the user profile is composed of the queries each user performs F如25.9 Fae18.39 plectcontact ComposeMailTod MIrs. 333 Fig. 2. Bayesian representation of a user's goal
198 S. Schiaffino and A. Amandi hence, the user’s goal. Similarly, the Lumiere project at Microsoft Research (Horvitz et al., 1998) uses Bayesian networks to infer a user’s needs by considering a user’s background, actions and queries (help requests). Based on the beliefs of a user’s needs and the utility theory of influence diagrams (an extension to Bayesian networks), an automated assistant provides help for users. In Andes (Gertner and VanLehn, 2000), plan recognition is necessary for the problem solving coach to select what step to suggest when a student asks for help. Since Andes wants to help students solve problems in their own way, it must determine what goal the student is probably trying to achieve, and suggest the action the student cannot perform due to lack of knowledge. 2.4 Behaviour Usually, the user’s behaviour with a software application is an important part of the user profile. If a given user behaviour is repetitive, then it represents a pattern that can be used by an adaptive system or an intelligent agent to adapt a web site or to assist the user according to the behaviour learnt. The type of behaviour modelled depends on the application domain. For example, CAP (Calendar APprentice) learns the scheduling behaviour of its user and learns rules that enable it to suggest the meeting duration, location, time, and date (Mitchell et al, 1994). In an intelligent e-commerce system, a behavioural profile models the customer’s actions (Adomavicius and Tuzhilin, 2001). Examples of behaviours in this domain are “When purchasing cereal, John Doe usually buys milk” and “On weekends, John Doe usually spends more than $100 on groceries”. In intelligent tutoring systems, the student behaviour is vital to assist him properly. In (Xu, 2002), a student profile is a set of pairs, where e is a behaviour of the student and t expresses the time when the behaviour occurs. t could be a point in time or an interval of time. In this work, there are two main types of student behaviours, reading a particular topic and making a choice in a quiz. Sometimes behaviours are routine, that is, they show some kind of regularity or seasonality. For example, QueryGuesser (Schiaffino and Amandi, 2005) models a user’s routine queries to a database in a Laboratory Information Management System. In this agent, the user profile is composed of the queries each user performs Fig. 2. Bayesian representation of a user’s goals
and the moment when each query is generally made. The agent detects hourly. daily, weekly, and monthly behavioural patterns 2.5 Interaction Preferences a quite new component of a user profile is interaction preferences, that is, infor mation about the user's interaction habits and preferences when he interacts with an interface agent(Schiaffino and Amandi, 2006). In interface agent technology, it is vital to know which agents actions the user expects in different contexts and the modality of these actions. A user may prefer warnings, suggestions, or actions on the user's behalf. In addition, the agent can provide assistance by interrupting or not interrupting the user's work. A user interaction preference then expresses the preferred agent action and modality for different situations or contexts. As an illustration, consider an agent helping a user, John Smith, organize his calendar Smiths current task is to schedule a meeting with several participants for the following Saturday in a free time slot. From past experience, the agent knows that one participant will disagree with the meeting date, because he never attends Sat- urday meetings. The agent can: warn the user about this problem, suggest another meeting date that considers all participant preferences and priorities, or do noth- ing. In this situation, some users would prefer a simple warning, while others would want suggestions about an alternative meeting date. In addition, when pro- viding user assistance, agents can either interrupt the users work or not. The agent must learn when the user prefers each modality. Information about these user preferences are kept in the user interaction profile, namely situations when the user: requires a suggestion to deal with a problem, needs only a warning about a problem, accepts an interruption from the agent, expects an action on his or her 2.6 Individual Characteristics In some domains, personal information about the user is also part of the user pro file. This item includes mainly demographic information such as gender, age marital status, city, country, number of children, among other features. For exam- ple, Figure 3 shows the demographic profile of a customer in Traveller, a tourism recommender system that recommends package holidays and tours to customers On the other hand, a widely used user characteristic in intelligent tutoring sys- tems and adaptive e-learning systems is the students learning style. A learning style model classifies students according to where they fit in a number of scales belonging to the ways in which they receive and process information. There have been proposed several models and frameworks for learning styles(Kolb 1984 Felder and Silverman, 1988; Honey and Mumford, 1992, Litzinger and Osif, 1993). For example, Felder and Silverman's model categorizes students as sensi- tive/intuitive, visual/verbal, active/reflective, and sequential/global, depending on how they learn. Various systems consider learning styles, such as ARTHUR (Gilbert
Intelligent User Profiling 199 and the moment when each query is generally made. The agent detects hourly, daily, weekly, and monthly behavioural patterns. 2.5 Interaction Preferences A quite new component of a user profile is interaction preferences, that is, information about the user’s interaction habits and preferences when he interacts with an interface agent (Schiaffino and Amandi, 2006). In interface agent technology, it is vital to know which agent’s actions the user expects in different contexts and the modality of these actions. A user may prefer warnings, suggestions, or actions on the user’s behalf. In addition, the agent can provide assistance by interrupting or not interrupting the user’s work. A user interaction preference then expresses the preferred agent action and modality for different situations or contexts. As an illustration, consider an agent helping a user, John Smith, organize his calendar. Smith’s current task is to schedule a meeting with several participants for the following Saturday in a free time slot. From past experience, the agent knows that one participant will disagree with the meeting date, because he never attends Saturday meetings. The agent can: warn the user about this problem, suggest another meeting date that considers all participant preferences and priorities, or do nothing. In this situation, some users would prefer a simple warning, while others would want suggestions about an alternative meeting date. In addition, when providing user assistance, agents can either interrupt the user’s work or not. The agent must learn when the user prefers each modality. Information about these user preferences are kept in the user interaction profile, namely situations when the user: requires a suggestion to deal with a problem, needs only a warning about a problem, accepts an interruption from the agent, expects an action on his or her behalf, and wants a notification rather than an interruption. 2.6 Individual Characteristics In some domains, personal information about the user is also part of the user profile. This item includes mainly demographic information such as gender, age, marital status, city, country, number of children, among other features. For example, Figure 3 shows the demographic profile of a customer in Traveller, a tourism recommender system that recommends package holidays and tours to customers. On the other hand, a widely used user characteristic in intelligent tutoring systems and adaptive e-learning systems is the student’s learning style. A learningstyle model classifies students according to where they fit in a number of scales belonging to the ways in which they receive and process information. There have been proposed several models and frameworks for learning styles (Kolb 1984; Felder and Silverman, 1988; Honey and Mumford, 1992; Litzinger and Osif, 1993). For example, Felder and Silverman’s model categorizes students as sensitive/intuitive, visual/verbal, active/reflective, and sequential/global, depending on how they learn. Various systems consider learning styles, such as ARTHUR (Gilbert
200 S Schiaffino and A Amandi nide\Peril Colby atso Fig 3. Demographic profile of a customer in Traveller and Han, 1999) which models three learning styles(visual-interactive, reading listener, textual), CS388( Carver et al, 1996) and MAs-PLANG(Pena et al. 2002 that use Felder and Silverman styles; the INSPIRE system( Grigoriadou et al 2001)that uses the styles proposed by Honey and Mumford Finally, personality traits are also important features in a user profile. a trait is a temporally stable, cross-situational individual difference. One of the most fa mous personality models is OCEAN(Goldberg, 1993). This model comprises five personality dimensions: Openness to Experience, Conscientiousness, Extraver detengreeableness, and Neuroticism. Personality models and the methods to ne personality are subjects widely studied in psychology(McCrae and Costa, 1996; Wiggins et al, 1988). In the area of user profiling, various methods e used to detect user's personality. For example, in(Arya et al, 2006)facial actions are used as visual cues for detecting personality 2.7 Contextual Information The user's context is a quite new feature in user profiling. There are several defi- nitions of context, mostly depending on the application domain. According to Dey and Abwod, 1999), context is any information that can be used to character- ize the situation of an entity. An entity is a person, place, or object that is consid- ered relevant to the interaction between a user and an application, including the user and applications themselves. There are different types of contexts or contex- tual information that can be modelled within a user profile, as defined in( Goker and Myrhaug, 2002). The environmental context captures the entities that sur- round the user. These entities can, for instance, be things, services, temperature light, humidity, noise, and persons. The personal context includes the physiologi- cal context and the mental context. The first part can contain information like pulse, blood pressure, weight, glucose level, retinal pattern, and hair colour. The latter part can contain information like mood, expertise, angriness, and stress. The social context describes the social aspects of the current user context. It can con-
200 S. Schiaffino and A. Amandi Fig. 3. Demographic profile of a customer in Traveller and Han, 1999) which models three learning styles (visual-interactive, readinglistener, textual), CS388 (Carver et al, 1996) and MAS-PLANG (Peña et al., 2002) that use Felder and Silverman styles; the INSPIRE system (Grigoriadou et al., 2001) that uses the styles proposed by Honey and Mumford. Finally, personality traits are also important features in a user profile. A trait is a temporally stable, cross-situational individual difference. One of the most famous personality models is OCEAN (Goldberg, 1993). This model comprises five personality dimensions: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Personality models and the methods to determine personality are subjects widely studied in psychology (McCrae and Costa, 1996; Wiggins et al, 1988). In the area of user profiling, various methods are used to detect user’s personality. For example, in (Arya et al, 2006) facial actions are used as visual cues for detecting personality. 2.7 Contextual Information The user’s context is a quite new feature in user profiling. There are several definitions of context, mostly depending on the application domain. According to (Dey and Abwod, 1999), context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. There are different types of contexts or contextual information that can be modelled within a user profile, as defined in (Goker and Myrhaug, 2002). The environmental context captures the entities that surround the user. These entities can, for instance, be things, services, temperature, light, humidity, noise, and persons. The personal context includes the physiological context and the mental context. The first part can contain information like pulse, blood pressure, weight, glucose level, retinal pattern, and hair colour. The latter part can contain information like mood, expertise, angriness, and stress. The social context describes the social aspects of the current user context. It can con-
tain information about friends neutrals. enemies. neighbours. co-workers. and relatives for instance. The spatio-temporal context describes aspects of the user context relating to the time and spatial extent for the user context. It can contain attributes like: time. location or direction Context-aware systems(agents)are computing systems(agents)that provide relevant services and information to users based on their situational conditions or contexts(Dey and Abwod, 1999). In( Schiaffino and Amandi, 2006), for example different types of assistance actions are executed by an agent depending on the task the user is carrying out and on the situation in which the user needs assis- tance. As regards users' emotions or mood, RoCo(Ahn and Picard, 2005)models different users'states, namely attentive, distracted, slumped, showing pleasure showing displeasure, and acts accordingly. Other examples of context-aware sys- tems based on the user location are various tourist guide projects where informa- tion is displayed depending on the current location of the user, such as(Y ang et al 1999) 2.8 Group Profiles In contrast to individual user profiles, group profiles aim at combining individual user profiles to model a group. Group profiles are vital in those domains where it is necessary to make recommendations to groups of users rather than to individual users. Examples of these domains are tourism recommendation systems, movie recommenders, and adaptive television. In the first type of application, we find INTRIGUE(Ardissono et al, 2002), which recommends places to visit for tourist groups taking into account characteristics of subgroups within that group(such as children and disabled). Similarly, CATS( Collaborative Advisory Travel System allows a group of users to simultaneously collaborate on choosing a sking holiday package which satisfies the group as a whole(McCarthy et al, 2006). Group user feedback is used to suggest products that satisfy the individual and the group As regards TV, in(Masthoff, 2004) the authors discuss different strategies for combining individual user profiles to adapt to groups in an adaptive television application. In(Yu et al, 2006)the authors propose a recommendation scheme that merges individual user profiles to form a common user profile, and then generates non recommendations according to the common user profile 3 Obtaining User Profiles To build a user profile, the information needed can be obtained explicitly, that is provided directly by the user, or implicitly, through the observation of the user's ctions. In this section we describe these alternatives
Intelligent User Profiling 201 tain information about friends, neutrals, enemies, neighbours, co-workers, and relatives for instance. The spatio-temporal context describes aspects of the user context relating to the time and spatial extent for the user context. It can contain attributes like: time, location, or direction. Context-aware systems (agents) are computing systems (agents) that provide relevant services and information to users based on their situational conditions or contexts (Dey and Abwod, 1999). In (Schiaffino and Amandi, 2006), for example, different types of assistance actions are executed by an agent depending on the task the user is carrying out and on the situation in which the user needs assistance. As regards users’ emotions or mood, RoCo (Ahn and Picard, 2005) models different users’ states, namely attentive, distracted, slumped, showing pleasure, showing displeasure, and acts accordingly. Other examples of context-aware systems based on the user location are various tourist guide projects where information is displayed depending on the current location of the user, such as (Yang et al, 1999). 2.8 Group Profiles In contrast to individual user profiles, group profiles aim at combining individual user profiles to model a group. Group profiles are vital in those domains where it is necessary to make recommendations to groups of users rather than to individual users. Examples of these domains are tourism recommendation systems, movie recommenders, and adaptive television. In the first type of application, we find INTRIGUE (Ardissono et al, 2002), which recommends places to visit for tourist groups taking into account characteristics of subgroups within that group (such as children and disabled). Similarly, CATS (Collaborative Advisory Travel System) allows a group of users to simultaneously collaborate on choosing a skiing holiday package which satisfies the group as a whole (McCarthy et al, 2006). Group user feedback is used to suggest products that satisfy the individual and the group. As regards TV, in (Masthoff, 2004) the authors discuss different strategies for combining individual user profiles to adapt to groups in an adaptive television application. In (Yu et al, 2006) the authors propose a recommendation scheme that merges individual user profiles to form a common user profile, and then generates common recommendations according to the common user profile. 3 Obtaining User Profiles To build a user profile, the information needed can be obtained explicitly, that is provided directly by the user, or implicitly, through the observation of the user’s actions. In this section we describe these alternatives
202 S Schiaffino and A. Amandi 3.1 Explicit Information The simplest way of obtaining information about users is through the data they input via forms or other user interfaces provided for this purpose. Usually, this type of information is optional since users are not willing to fill in long forms providing information about them. Generally, the information gathered in this way is demographic, such as the users age, gender, job, birthday, marital status, and hobbies. For eample, in(Adomavicius and Tuzhilin, 2001)this information consti- tutes the factual profile(name, gender, and date of birth), which is obtained by the e-commerce system from the customer's data In addition, personal interests can be informed explicitly. For example, in NewsAgent( Godoy et al, 2004) the user can indicate which sections of a digital newspaper he likes to read, which newspaper he prefers, or indicate general inter- esting topics, such as football, through a user interface, and he can also rate pages as interesting or uninteresting while he is reading. Figure 4 shows the user inter- faces for these purposes. In Syskill Webert(Pazzani et al, 1996), users make explicit relevance judgments of pages explored while browsing the Web. Syskill Webert learns a profile from the user's ratings of pages and uses this profile to suggest other pages. The user can rate a page as either hot (two thumbs up), luke warm(one thumb up and one thumb down), or cold(two thumbs down). The Apt Decision agent( Shearin and Lieberman, 2001) learns user preferences in the do- main of rental real estate by observing the users critique of apartment features Users provide a small number of criteria in the initial interaction consisting of number of bedrooms, city, and price, then receive a display of sample apartment and then react to any feature of any apartment independently, in any order Another way of providing explicit information is through the"Programming by Example"(PBE)or"Programming by Demonstration"paradigm(Lieberman, 2001b). In this approach, the user demonstrates examples to the computer. A Select Papers INCLUDE NTERNATIONAL TERRORISML Selected items Fig. 4. Providing explicit information about a users interests
202 S. Schiaffino and A. Amandi 3.1 Explicit Information The simplest way of obtaining information about users is through the data they input via forms or other user interfaces provided for this purpose. Usually, this type of information is optional since users are not willing to fill in long forms providing information about them. Generally, the information gathered in this way is demographic, such as the user’s age, gender, job, birthday, marital status, and hobbies. For eample, in (Adomavicius and Tuzhilin, 2001) this information constitutes the factual profile (name, gender, and date of birth), which is obtained by the e-commerce system from the customer’s data. In addition, personal interests can be informed explicitly. For example, in NewsAgent (Godoy et al, 2004) the user can indicate which sections of a digital newspaper he likes to read, which newspaper he prefers, or indicate general interesting topics, such as football, through a user interface, and he can also rate pages as interesting or uninteresting while he is reading. Figure 4 shows the user interfaces for these purposes. In Syskill & Webert (Pazzani et al, 1996), users make explicit relevance judgments of pages explored while browsing the Web. Syskill & Webert learns a profile from the user’s ratings of pages and uses this profile to suggest other pages. The user can rate a page as either hot (two thumbs up), lukewarm (one thumb up and one thumb down), or cold (two thumbs down). The Apt Decision agent (Shearin and Lieberman, 2001) learns user preferences in the domain of rental real estate by observing the user’s critique of apartment features. Users provide a small number of criteria in the initial interaction consisting of number of bedrooms, city, and price, then receive a display of sample apartments, and then react to any feature of any apartment independently, in any order. Another way of providing explicit information is through the "Programming by Example" (PBE) or "Programming by Demonstration" paradigm (Lieberman, 2001b). In this approach, the user demonstrates examples to the computer. A Fig. 4. Providing explicit information about a user’s interests