A New Ontology-Based User Modeling Method for Personalized recommendation Jiangling Yuan, Hui Zhang, Jiangfeng Ni State Key laboratory of Software Development Environment Beihang University, School of Computer Science 100191, Beijing, China Galunnier,hzhang,nijf@nlsde.buaa.edu.cn Abstrack-Personalized recommendation is an effective method semantic information, so these models can't accurately to resolve the current problem of Internet information describe the users' interests [4-6]. Ontology is used to depict overload. In the recommendation systems, user modeling is a the domain knowledge, provides the common understanding ucial step. Whether the model can accurately describe the of the knowledge about one area, defines the common sers' interests directly determines the quality of the cognitive vocabulary, and gives the clear definition on personalized recommendations. At present in most different domain terms. This paper presents a new ontology personalized service systems keywords models or user-item based user modeling method which uses ontology concept odels are used to describe the users' preferences, but vectors hierarchy tree to represent the users'interests, and we use the formation, so it is difficult to accurately model the users' reasoning and extension technique of the ontology to mine interests and hobbies, and it is also hard to extend the users, the users' potential interests. Experiment results show that interests. Ontology as a tool used to describe the domai this method can more accurately describe the users'interest knowledge is very powerful in conceptual describing and In the recommendation systems similarity measure plays gical reasoning. Computation of the neighbor set of users or an important role, which is the base procedure for finding out esources is also an important step in the recommendation, but the neighbor set of users or resources. At present three at present three commonly used similarity algorithms have commonly used similarity algorithms are: cosine-based some shortcomings which lead the system sometimes difficulty similarity, correlation-based similarity and adjusted-cosine to find similar users or resources. This paper presents a new similarity [7-9]. In this paper, we briefly mention the tology-based user modeling approach and an improved inherent drawbacks of the above three similarity algorithms similarity algorithm. Our experiments show that the user and present an improved similarity algorithm, which can model presented in r can effectively describe the effectively overcome these drawbacks users’ personalized p es, and we also prove that the The rest of the paper is organized as follows. In section 2, improved similarity is better than other three domain ontology building approach is proposed In section 3, commonly used similarity algorithms we show the ontology-based user modeling method. Section 4 presents an improved similarity algorithm called Simi-New Keywords-personalized recommendation; ontology; semantic Experimental results are provided in section 5. Section easoning; user modeling; similarity measure states the conclusion of this paper. . INTRODUCTION IL. DOMAIN ONTOLOGY CONSTRUCTION AND DATA The explosive growth in the information available on the EPROCESSING Web In this section we use the owL Web personalization systems that understand and exploit user developed by W3C to build the domain ontology. This preferences to dynamically serve customized content to language can define the ontology structure, name space, individual users [1]. The method that how to build the user basic elements(classes, individuals, properties)and ontology model determines the model whether can accurately describe mapping relationships. We define all kinds of attributes and a the users'real interests and the system whether can variety of property relationships between the ontology recommend the right items to users, so user modeling has concepts. In the paper, we take the rock and mineral fossils become the key step in the personalized recommendation domain for example. We make use of automatic construction 2-3]. At present Most of the personalized and hand-built components to build this domain ontology. recommendation systems use keywords vectors or user- Firstly, we use the codes of rock and mineral fossils resource matrix to represent the users'interests. However, resources to build the meta-data classification hierarchy tree. with the increase of users and resources in system, the scale Secondly, we use the meta-data classification hierarchy tree of vectors or matrixes will tremendously grow, which drops to build the ontology concept hierarchy tree. At last we add the efficiency of the system. As we all know there are the properties to concept nodes by hand. Figure 1 shows part emantic relationships between the resources visited by users, of the rock and mineral fossils meta-data classification but some commonly used models havent taken advantage of hierarchy tree these semantic relationships, some simply make use of the 978-142445540-9/10s2600@2010IEEE
A New Ontology-Based User Modeling Method for Personalized Recommendation Jiangling Yuan, Hui Zhang, Jiangfeng Ni State Key Laboratory of Software Development Environment Beihang University, School of Computer Science 100191, Beij ing, China {jalunnier, hzhang, nijf}@nlsde.buaa.edu.cn Abstract-Personalized recommendation is an effective method to resolve the current problem of Internet information overload. In the recommendation systems, user modeling is a crucial step. Whether the model can accurately describe the users' interests directly determines the quality of the personalized recommendations. At present in most personalized service systems keywords models or user-item models are used to describe the users' preferences, but vectors or matrixes used in these models do not contain semantic information, so it is difficult to accurately model the users' interests and hobbies, and it is also hard to extend the users' interests. Ontology as a tool used to describe the domain knowledge is very powerful in conceptual describing and logical reasoning. Computation of the neighbor set of users or resources is also an important step in the recommendation, but at present three commonly used similarity algorithms have some shortcomings which lead the system sometimes difficulty to find similar users or resources. This paper presents a new ontology-based user modeling approach and an improved similarity algorithm. Our experiments show that the user model presented in this paper can effectively describe the users' personalized preferences, and we also prove that the improved similarity algorithm is better than other three commonly used similarity algorithms. Keywords-personalized recommendation; ontology; semantic reasoning; user modeling; similarity measure 1. INTRODUCTION The explosive growth in the information available on the Web has prompted the need for developing Web personalization systems that understand and exploit user preferences to dynamically serve customized content to individual users [1]. The method that how to build the user model determines the model whether can accurately describe the users' real interests and the system whether can recommend the right items to users, so user modeling has become the key step in the personalized recommendation systems [2-3]. At present Most of the personalized recommendation systems use keywords vectors or userresource matrix to represent the users' interests. However, with the increase of users and resources in system, the scale of vectors or matrixes will tremendously grow, which drops the efficiency of the system. As we all know there are semantic relationships between the resources visited by users, but some commonly used models haven't taken advantage of these semantic relationships, some simply make use of the 978-1-4244-5540-9/10/$26.00 ©2010 IEEE 363 semantic information, so these models can't accurately describe the users' interests [4-6]. Ontology is used to depict the domain knowledge, provides the common understanding of the knowledge about one area, defines the common cognitive vocabulary, and gives the clear definition on different domain terms. This paper presents a new ontologybased user modeling method which uses ontology concept hierarchy tree to represent the users' interests, and we use the reasoning and extension technique of the ontology to mine the users' potential interests. Experiment results show that this method can more accurately describe the users' interests. In the recommendation systems similarity measure plays an important role, which is the base procedure for finding out the neighbor set of users or resources. At present three commonly used similarity algorithms are: cosine-based similarity, correlation-based similarity and adjusted-cosine similarity [7-9]. In this paper, we briefly mention the inherent drawbacks of the above three similarity algorithms and present an improved similarity algorithm, which can effectively overcome these drawbacks. The rest of the paper is organized as follows. In section 2, domain ontology building approach is proposed. In section 3, we show the ontology-based user modeling method. Section 4 presents an improved similarity algorithm called Simi-New. Experimental results are provided in section 5. Section 6 states the conclusion of this paper. II. DOMAIN ONTOLOGY CONSTRUCTION AND DATA PREPROCESSING In this section we use the OWL Web Ontology Language developed by W3C to build the domain ontology. This language can define the ontology structure, name space, basic elements (classes, individuals, properties) and ontology mapping relationships. We define all kinds of attributes and a variety of property relationships between the ontology concepts. In the paper, we take the rock and mineral fossils domain for example. We make use of automatic construction and hand-built components to build this domain ontology. Firstly, we use the codes of rock and mineral fossils resources to build the meta-data classification hierarchy tree. Secondly, we use the meta-data classification hierarchy tree to build the ontology concept hierarchy tree. At last we add the properties to concept nodes by hand. Figure 1 shows part of the rock and mineral fossils meta-data classification hierarchy tree
Given the access score vector on leaf nodes {v'1 the user likes the i(lsisn-th concept leaf node, and use the variable t denotes the number of leaf nodes in ontology concept hierarchy tree. The variable v can be calculated as Figure 1. Part of the rock and mineral fossils meta-data classification ∑F) Automatic construction procedure of the rock and ∑log2①∑aF) mineral fossils domain ontology is shown as follows. Firstly we read the codes of rock and mineral fossils resources from database. In the database, we store the code for every Here the variable FR means how many times the user esource. For example, we use 002317 to represent the fossil Ise 00231713 to represent the ancient spinal animal, and visits the resourcer belonging to concept leaf node /.So 0023171321 to represent the amphibian. Secondly, we far. we have obtained the access score vector on leaf nodes extract the resources'names and relationships according to As is known to all there are semantic relationships the codes. Thirdly, we use the resources'name and between father-child nodes in ontology concept hierarchy relationships to build the meta-data classification hierarchy tree, so we can make use of ontology reasoning technology tree. At last, we build the rock and mineral fossils domain to get the access scores on non-leaf nodes according to the ontology by using the meta-data classification hierarchy tree scores on leaf nodes. Given the hierarchy tree has t leaf according to the OwL language grammar Through the automatic construction of the rock and nodes, we use P, P2,",P, to define all shortest paths mineral fossils domain ontology we get the transmission from root to leaf nodes, and use the node set properties of this ontology. In the user modeling process we mainly use the father-child relationship to compute scores for n,u,"-,ni,) to signify the path P, from the root upper concept nodes according to lower nodes, so the n;o to leaf node n, in hierarchy tree. The score of the node transmission properties basically meet the requirements of the user modeling. The more complete the properties of the ni(0 Sxsy) in the path P, is defined as s(ni ),which concept nodes are, the better the query extension is, so we is calculated as following eed to add symmetry properties, inverse properties and function properties to the concept nodes. In this paper, we get some additional properties from network resources such as wikipedia and add these to the domain hierarchy tree by hx)+10≤x<y hand. After that, we have built an approving domain ontology which contains 493 concept nodes x=y In this section we present a new ontology-based user Here the variable ni(+l) denotes the son node of ni in modeling method. The following three steps take place to path P,, b(ni(+n)means the number of ni(+)'s brother to obtain the users' access scores on the leaf nodes of in the whole tree, and a is a reasoning factor which is ascertained in applications(the parameter a in this paper is ontology reasoning technology and access scores on leaf equal to 1. 8). We can compute for all paths according to the nodes in the ontology tree to get the access scores on non- same way. The score of the node n the user get is given by leaf nodes, finally we merge the access score vector on leaf nodes and score vector on non -leaf nodes build th ontology-based user model denoted by v=v, v2, ". v,) )=∑s(n) (3) The variable v, dlsiss) in the above expression denotes how much the user likes the I-th concept node. The variable After that we can get the access score vector on non-leaf s denotes the number of the nodes in ontology concept odes denoted by v hierarchy tree. We will elaborate on how to build the leaf nodes score vector and non-leaf nodes score vector in the So far. we have obtained score vector y on leaf nodes and vector von non-leaf nodes. After that, we combine vector v with v to generate the ontology-based
Layer One Layer Two Layer Three Figure I. Part of the rock and mineral fossils meta-data classification hierarchy tree. Automatic construction procedure of the rock and mineral fossils domain ontology is shown as follows. Firstly, we read the codes of rock and mineral fossils resources from database. In the database, we store the code for every resource. For example, we use 002317 to represent the fossil, use 00231713 to represent the ancient spinal animal, and use 0023171321 to represent the amphibian. Secondly, we extract the resources' names and relationships according to the codes. Thirdly, we use the resources' name and relationships to build the meta-data classification hierarchy tree. At last, we build the rock and mineral fossils domain ontology by using the meta-data classification hierarchy tree according to the OWL language grammar. Through the automatic construction of the rock and mineral fossils domain ontology we get the transmission properties of this ontology. In the user modeling process we mainly use the father-child relationship to compute scores for upper concept nodes according to lower nodes, so the transmission properties basically meet the requirements of the user modeling. The more complete the properties of the concept nodes are, the better the query extension is, so we need to add symmetry properties, inverse properties and function properties to the concept nodes. In this paper, we get some additional properties from network resources such as Wikipedia and add these to the domain hierarchy tree by hand. After that, we have built an approving domain ontology which contains 493 concept nodes. III. USER MODELING In this section we present a new ontology-based user modeling method. The following three steps take place to build the user models: firstly, we analyze the web server logs to obtain the users' access scores on the leaf nodes of ontology concept hierarchy tree, secondly, we make use of ontology reasoning technology and access scores on leaf nodes in the ontology tree to get the access scores on nonleaf nodes, finally we merge the access score vector on leaf nodes and score vector on non-leaf nodes to build the ontology-based user model denoted by V = {vI ' v2,", vJ. The variable Vi (1 � i � s) in the above expression denotes how much the user likes the i -th concept node. The variable s denotes the number of the nodes in ontology concept hierarchy tree. We will elaborate on how to build the leaf nodes score vector and non-leaf nodes score vector in the following content. 364 Given the access score vector on leaf nodes v' = {v\, V'2,···, V'I}, we use V'i to represent how much the user likes the i(1 � i � t) -th concept leaf node, and use the variable t denotes the number of leaf nodes in ontology concept hierarchy tree. The variable V'i can be calculated as: V'. = I I Llog2(LRElo FR) j=l J (1) Here the variable FR means how many times the user visits the resource R belonging to concept leaf node Ii . So far, we have obtained the access score vector on leaf nodes. As is known to all there are semantic relationships between father-child nodes in ontology concept hierarchy tree, so we can make use of ontology reasoning technology to get the access scores on non-leaf nodes according to the scores on leaf nodes. Given the hierarchy tree has t leaf nodes, we use PI' P2' .. , PI to define all shortest paths from root to leaf nodes, and use the node set ( niO' nil" .. , niy) to signify the path Pi from the root niO to leaf node niy in hierarchy tree. The score of the node nix (0 � x � y) in the path Pi is defined as s( niJ, which is calculated as following: (2) Here the variable ni(x+l) denotes the son node of nix m path Pi' b(ni(x+l)) means the number of ni(x+l) 's brother in the whole tree, and a is a reasoning factor which is ascertained in applications(the parameter a in this paper is equal to 1.8). We can compute for all paths according to the same way. The score of the node nx the user get is given by I s(nJ = Ls(niJ (3) i = l After that we can get the access score vector on non-leaf nodes denoted by v" = {v'\, V"2"'" v"r} . So far, we have obtained score vector V I on leaf nodes and vector v" on non-leaf nodes. After that, we can combine vector V I with v" to generate the ontology-based
user model denoted by corresponding resource average from each co-rated pair v=(vi, 22, v,v", v"2,", v"3, which can also be Formally,the similarity between user i and j using this en by IV. NEW SIMILARITY ALGORITHM (R-R(R-R) In section 3 we build the ontology-based user model, in (R.-R)∑(R,-R which we use concept nodes to describe the user interests Generally the domain ontology concept hierarchy tree has hundreds of nodes, but most of users are interested in a few. Here R, is the average rating of the resource u so the system should be able to handle the problem of the B. An Improved Similarity Algorithm data sparsity. The more sparse the data is, the more difficultly the system finds similar users or similar resources This paper presents an improved similarity algorithm On the other hand, the different searching hobby al called Simi-New. This algorithm is mainly based on the requires a new similarity algorithm. It can be seen from following assumptions [10] above that the similarity algorithm is one of the key parts a)I two users have scored more common resources and less non-common resources, then the similarity between A. Three Commonly Used Similarity Algorithms these two users will be higher There are a number of different ways to compute the b) If the scores rated by two users on the common similarity between users. Here we present three commonly sources are closer, then the similarity between these two used methods. These are cosine-based similarity, correlation- users will be higher, based similarity and adjusted-cosine similarity [7-9 gle between two users'score vectors is In the cosine-based similarity algorithm, two users are smaller, then the similarity between these two users will be thought of as two vectors in the n dimensional resource- space. The similarity between them is measured by From the ab can count the computing the cosine of the angle between these two vectors. number of resources simultaneously rated by both users I Given i is the score vector rated by user i in the n and j denoted by NOCH. We can also count the number dimensional resource-space, and j is the score vector rated of the resources rated by user i or j and not simultaneously by user j. The similarity between user i and j is given by rated by the both two users, which is denoted by NOD The ratio of the ty R sim(i,D=cos(i,j) NOC Where"denotes the dot-product of the two vectors R In the correlation-based similarity algorithm, similarity NODU between two users i and J is measured by computing the The closeness of the scores on the common resources is Pearson-r correlation corr:. To make the correlation defined as: computation accurate we must first isolate the co-rated cases Let the set of resources which are both rated by user I and Ds(0=∑ r∈U (R -RL j are denoted by U then the correlation similarity is given le common resources simultaneously rated by both two users i and j, and R is the score that the user sim(i,D) u rate for the resource u. The angle between two score vectors is counted by Tanimoto coefficient [11], which is HereR denotes the rating of i-th user on resource u, given by R is the average of the I-th user's ratings on all resources. Computing similarity using basic cosine measure has one important drawback-the differences in rating scale between S(i,)= different users are not taken into account. The adjusted R.R+r.R.-R. R cosine similarity offsets this drawback by subtracting the
user model denoted by V = {" VI' v2' ..• , V' " '' " } h' h I b I ' V I' V 2" '" V r ,W lC can a so e shown as V = {vI' v2," , vJ . IV. NEW SIMILARITY ALGORITHM In section 3 we build the ontology-based user model, in which we use concept nodes to describe the user interests. Generally the domain ontology concept hierarchy tree has hundreds of nodes, but most of users are interested in a few, so the system should be able to handle the problem of the data sparsity. The more sparse the data is, the more difficultly the system finds similar users or similar resources. On the other hand, the different searching hobby also requires a new similarity algorithm. It can be seen from above that the similarity algorithm is one of the key parts in the recommendation systems. A. Three Commonly Used Similarity Algorithms There are a number of different ways to compute the similarity between users. Here we present three commonly used methods. These are cosine-based similarity, correlationbased similarity and adjusted-cosine similarity [7-9]. In the cosine-based similarity algorithm, two users are thought of as two vectors in the n dimensional resourcespace. The similarity between them is measured by computing the cosine of the angle between these two vectors. Given T is the score vector rated by user i in the n dimensional resource-space, and J is the score vector rated by user j . The similarity between user i and j is given by sim( i, j) = cos(T, J ) = Ilil! � {jll Where ". II denotes the dot-product of the two vectors. (4) In the correlation-based similarity algorithm, similarity between two users i and j is measured by computing the Pearson - r correlation carr. ' . To make the correlation t,J computation accurate we must first isolate the co-rated cases. Let the set of resources which are both rated by user i and j are denoted by U ij then the correlation similarity is given by Here R;." denotes the rating of i -th user on resource u , R; is the average of the i -th user's ratings on all resources. Computing similarity using basic cosine measure has one important drawback-the differences in rating scale between different users are not taken into account. The adjusted cosine similarity offsets this drawback by subtracting the 365 corresponding resource average from each co-rated pair. Formally, the similarity between user i and j using this scheme is given by Here R" is the average rating of the resource u. B. An Improved Similarity Algorithm This paper presents an improved similarity algorithm called Simi-New. This algorithm is mainly based on the following assumptions [10]: a) If two users have scored more common resources and less non-common resources, then the similarity between these two users will be higher; b) If the scores rated by two users on the common resources are closer, then the similarity between these two users will be higher; c) If the angle between two users' score vectors is smaller, then the similarity between these two users will be higher; From the above assumptions, we know we can count the number of resources simultaneously rated by both users i and j denoted by NOCij' We can also count the number of the resources rated by user i or j and not simultaneously rated by the both two users, which is denoted by NODij' The ratio of the two numbers, denoted by Rij' is given by (7) The closeness of the scores on the common resources is defined as: Dis(i J')= �" (R - R ) 2 , �UEU.. l,U j,U lj (8) Here U ij is the common resources simultaneously rated by both two users i and j, and Ri,u is the score that the user ui rate for the resource U . The angle between two score vectors is counted by Tanimoto coefficient [11], which is given by (9)
Here R, is the score vector rated by i-th user in tho algorithm. The result of the experiment 1 is showed i mensional resource-space. We define the similarity between users i and j as sim(i, j) which is computed as 0.25 w*R sin(,)=m+(1-1( (10) 15 Here w(0sim(r,y) which accords with the real condition given the score vectors rated by users x, y and z are correspondingly x= 30.40 {0,5,50},y={0,5,1,0)andz={0,54,2}. As the vector x 0.20 is a self-equal vector, so the similarity between x and y and the similarity between x and z can't be computed by correlation-based similarity and adjusted-cosine similarity i ne-based corrclation- usted Sini-New However, these similarities can be counted and differed by Cosine using Simi-New similarity algorithm. The Simi-New similarity algorithm can not only overcome the problem of Figure 3. Coverage of four similarity algorithms. self-equal vector, but also can adapt to different applications by using adjusted parameter w From Figure 3, we can see that the Simi-New similarity algorithm is little better than cosine-based similarity V. EXPERIMENTAL RESULTS AND ANALYSIS algorithm in coverage of recommendation and much better than adjusted-cosine similarity algorithm and correlation- In this article all the experimental data used are collected based similarity algorithm. In a word, the Simi-New (http://ww.nimrf.netcn),andweselectfivemonthsWebsimilarityalgorithmsintheprecisionandcoverageofthe 2695 users who view 75702 pages by 27673 visits B. Experiments Of The Ontology-Based User Model A. Experiments Of The Similarity Algorithms Experiment 3 and 4 are used to compare the ontology In experiment 1, we select 5418 records visited by 335 as based user model with the user-resource matrix-based user the training data, and select 2051 records as the test set to model to verify the accuracy and validi of the first use compute Mean Absolute Error (MAE). We choose the User- model In experiment 3, we select 5418 records from the first Based Collaborative Filtering Algorithm as the predicting four months Web logs as the training data, and select 2051 records from the last month Web logs as the test set. There
Here Ri is the score vector rated by i -th user in the n dimensional resource-space. We define the similarity between users i and j as sim(i, j) which is computed as following: . C' .) w Slm l,} =1"ii""f1+ *Ry C1 -wJ ;-"S(") l,} e S 1.1 (10) Here wCO sim(x, y) , which accords with the real condition. Given the score vectors rated by users x, y and z are correspondingly x = {0,5,5,0}, Y = {0,5,1,0} and; = {0,5,4,2}. As the vector � is a self-equal vector, so the similarity between x and y and the similarity between x and z can't be computed by correlation-based similarity and adjusted-cosine similarity. However, these similarities can be counted and differed by using Simi-New similarity algorithm. The Simi-New similarity algorithm can not only overcome the problem of self-equal vector, but also can adapt to different applications by using adjusted parameter w . V. EXPERIMENTAL RESULTS AND ANALYSIS In this article all the experimental data used are collected from the rock and mineral fossils resources site (http://www.nimrf.net.cn). and we select five months Web logs from July 1, 2009 to November 3l. These logs contain 2695 users who view 75702 pages by 27673 visits. A. Experiments Of The Similarity Algorithms In experiment 1, we select 5418 records visited by 335 as the training data, and select 2051 records as the test set to compute Mean Absolute Error (MAE). We choose the UserBased Collaborative Filtering Algorithm as the predicting 366 algorithm. The result of the experiment 1 is showed in Figure 2. 0.30 0.25 - r-- 0.20 - --- --- ,...-- ""' O. 15 t- --- --- t- "" "" O. 10 r-- --- --- I- 0.05 - --- --- --- 0.00 cos i ne-based corre 1 at i on- adjusted Simi-New based cosi ne Figure 2. MAE of four similarity algorithms. From Figure 2, we can see that the Simi-New algorithm improved by this paper outperforms other three similarity algorithms in the precision of predicting. In experiment 2, we select the same data set as experiment 1 and compare the coverage of four similarity algorithms. The result of the experiment 2 is showed in Figure 3. 1. 00 ,--- .----- 0.80 l- i- '" � O. 60 - H ,--- '" � 0.40 - --- --- --- u 0.20 r-- -- --- -- 0.00 cosine-based correlat ion- adjusted Simi-New based Cosine Figure 3. Coverage of four similarity algorithms. From Figure 3, we can see that the Simi-New similarity algorithm is little better than cosine-based similarity algorithm in coverage of recommendation and much better than adjusted-cosine similarity algorithm and correlationbased similarity algorithm. In a word, the Simi-New similarity algorithm is superior to other three commonly used similarity algorithms in the precision and coverage of the recommendations. B. Experiments Of The Ontology-Based User Model Experiment 3 and 4 are used to compare the ontologybased user model with the user-resource matrix-based user model to verify the accuracy and validity of the first user model. In experiment 3, we select 5418 records from the first four months Web logs as the training data, and select 2051 records from the last month Web logs as the test set. There
are 335 users in the experiment data, so the sparsity of the the growth of data sparsity the superiority of the Simi-New data is 5.05%. We use the above four similarity algorithms to similarity algorithm is more obvious erify the superiority of the ontology-based user model presented in this paper. The result of the experiment 3 is VI CONCLUSION showed in Figure 4 This paper presents a new kind of ontology-based modeling method. We use the ontology concept hierarchy tree to build the user models. This paper newly introduces the ontology and semantic concept to the user modeling the concept nodes, so the user model presented in this paper can effectively describe the users'personalized preferences We also propose an improved similarity algorithm, which effectively bridges the gap between the traditional similarity algorithms and the precision of personalized recommendation. The new ontology-based user models and improved similarity algorithm effectively improve the quality of the personalized service systems and satisfy the sine-based adjus Simi-Nc害 based users' growing personalized needs 口user- resource matrix■ ontology ACKNOWLEDGMENT Figure 4. MAE of two user models The research is supported by the fund of the State Key Laboratory of Software Development Environment As shown in the above chart, we can see the ontology SKLSDE-2009ZX-12 based user model is superior to user-resource matrix-based user model in the condition the sparsity of the data is 5.05% In experiment 4, we select 7334 records from the first four REFERENCES records from the last month Web logs as the test set. There [1] o. Nasraoui, "World Wide Web Personalization,"In months Web logs as the training data, and select 2527 are 889 users in the experiment data, so the sparsity of the data is 2.62%. The result of the experiment 4 is showed in [2] Modi P J and Shen w.M, "Collaborative Multiagent leaming for sification tasks, " In: Proceedings of the Fifth Intematio Conference on Autonomous Agents, 2001 [3] Socha K. and Kisiel-Dorohinicki M," Agent-based evolutionary optimisation, " Hawaii, USA: Proceedings of CEC02- Congress on Evolutionary Computation, 2002 14 W Liu, F Jin, and X Zhang, "Ontology-Based User Modeling for E- Pervasive Computing and Applications. Alexandria, Egypt, 2008, Pp260-263 [5] J. Trajkova and S. Gauch,"Improving Ontology-Based User Profiles, "In: Proc of 2004'RIAO. Avignon, France, 2004 [6] S. Berkovsky, T. Kuflik, and F Ricci, "Mediation of user models for cosine-based ad justed correlation- Simi-New [7 B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms, "In: Proc of the 10th International World Wide Web Confence. New York, USA 2001,pp285-295 [8]M. Deshpande and G Karypis, "Item-based Top-N recommendation Figure 5. MAE of two user models 20 ithms, ACM Transactions on Information Systems, 2004, (1):143-177 As shown in the above chart, we can see the ontology- [9]G. Karypis,"Evaluation of item-based top-n recommendation based user model is superior to user-resource matrix-based he Tenth Intemational Confe user model in the condition the sparsity of the data is 2.62%. Information and Knowledge Management(CIKM). New Yo From the above, we can get the conclusion that the ontology 2001,pp247-254 based user model presented in this paper is better than [) HI Dapeng commonly used user-resource matrix-based user model. Computer Science-Technology and Applications, IFSCTA, 2009 From the two charts we can also get the conclusion that with [11) L. Jinzhong,"Patten Recognition Introduction, "Beijing, Higher Education Press. 1994: 300-301
are 335 users in the experiment data, so the sparsity of the data is 5.05%. We use the above four similarity algorithms to verify the superiority of the ontology-based user model presented in this paper. The result of the experiment 3 is showed in Figure 4. U.J ; O. 10 0.05 0.00 cosine-based adjusted correJation- Simi-New cos i ne ba sed [] user-resource matrix • on tology Figure 4. MAE of two user models As shown in the above chart, we can see the ontologybased user model is superior to user-resource matrix-based user model in the condition the sparsity of the data is 5.05%. In experiment 4, we select 7334 records from the first four months Web logs as the training data, and select 2527 records from the last month Web logs as the test set. There are 889 users in the experiment data, so the sparsity of the data is 2.62%. The result of the experiment 4 is showed in Figure 5. 0.20 0.15 � 0.10 "" 0.05 0.00 cosine-based adjusted correlation- Simi-New cosine based Figure 5. MAE of two user models As shown in the above chart, we can see the ontologybased user model is superior to user-resource matrix-based user model in the condition the sparsity of the data is 2.62%. From the above, we can get the conclusion that the ontologybased user model presented in this paper is better than commonly used user-resource matrix-based user model. From the two charts we can also get the conclusion that with 367 the growth of data sparsity the superiority of the Simi-New similarity algorithm is more obvious. VI. CONCLUSION This paper presents a new kind of ontology-based user modeling method. We use the ontology concept hierarchy tree to build the user models. This paper newly introduces the ontology and semantic concept to the user modeling method, which makes use of semantic relationship between the concept nodes, so the user model presented in this paper can effectively describe the users' personalized preferences. We also propose an improved similarity algorithm, which effectively bridges the gap between the traditional similarity algorithms and the preCISIOn of personalized recommendation. The new ontology-based user models and improved similarity algorithm effectively improve the quality of the personalized service systems and satisfy the users' growing personalized needs. ACKNOWLEDGMENT The research is supported by the fund of the State Key Laboratory of Software Development Environment SKLSDE-2009ZX-12. REFERENCES [I] O. Nasraoui, "World Wide Web Personalization," In J. Wang (ed), Encyclopedia of Data Mining and Data Warehousing, Idea Group, 2005. [2] Modi P. J. and Shen W. M., "Collaborative Multiagent learning for classification tasks," In: Proceedings of the Fifth International Conference on Autonomous Agents, 2001. [3] Socha K. and Kisiel-Dorohinicki M, "Agent-based evolutionary multi-objective optimisation," Hawaii, USA: Proceedings of CEC'02- Congress on Evolutionary Computation, 2002. [4] W Liu, F Jin, and X Zhang, "Ontology-Based User Modeling for ECommerce System," In: Proc of the 3rd International Conference on Pervasive Computing and Applications. Alexandria, Egypt, 2008, pp.260-263. [5] J. Trajkova and S. Gauch, "Improving Ontology-Based User Profiles," In: Proc of 2004'RIAO. Avignon, France, 2004. [6] S. Berkovsky, T. Kuflik, and F. Ricci, "Mediation of user models for enhanced personalization in recommender systems," In: Proc of User Modelling and User-Adapted Interaction. 2008, pp245-286. [7] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," In: Proc of the 10th International World Wide Web Confence. New York, USA, 2001, pp.285-295. [8] M. Deshpande and G. Karypis, "Item-based Top-N recommendation algorithms," ACM Transactions on Information Systems, 2004, 22(1):143-177. [9] G. Karypis, "Evaluation of item-based top-n recommendation algorithms," In: Proc of The Tenth International Conference on Information and Knowledge Management(CIKM). New York, USA, 200 I, pp.247 -254. [10] H. Dapeng, L. Qianhui, and Z. Jingmin, "An Improved Similarity Algorithm for Personalized Recommendation," International Forum on Computer Science-Technology and Applications, IFSCTA, 2009 [II] L. Jinzhong, "Pattern Recognition Introduction," Beijing, Higher Education Press, 1994: 300-301