X. Tang, Q. Zeng / The Journal of Sys_中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：Keyword clustering for user interest profiling refinement within paper recommender systems

正在加载图片...

X Tang QZeng/The journal of systems and Software 85(2012 )87-101 Algorithm 3(Implicit interest profiler). Let vC(topici) denotes the topic connectivity of the node topici, which equals the sum of the weights of all edges linked to the node topici: let Il(topic, user)be users implicit interest value for topix. Then, Il(topicx, user u) can be com- WET(ELx) topic j ll(topiX, user) vC(topic). vC(topig.El(to the set of topic nodes neighboring topic in m (topic)stands for in which topic∈ ITNS and topic∈EINS: Neighbor. topic topic 6. Prototype system and empirical evaluation 6. 1. SPRS-a prototype for scientific paper recommender system Fig 8. An example of topic network graph. In order to assess the precision and effectiveness of our subject ontology the semantic relations between topics can approach, we developed a scientific paper recommende system(SPRS), which is illustrated in Fig 9. There are three columns btained by an inverse process of keyword clustering. Here, we in the homepage of SPRS. The left column consists of three parts give the definition of topic network graphs to reveal these semantic relations. user information, the papers which are currently read by the cur rent user and the papers that the current user wants to read. The Definition8(Topic network graph). A topic network graph is a middle column presents the paper recommendations separately graph TNG=(ETNS. ITNS, ES), where based on the records of three behaviors download browse and comment. In the right column, three lists of papers in the behavioral (1) ETNS is the node set in which each node represents a topic in records are presented. which one or more papers are ever accessed by user, the weight Based on the user profiles obtained, SPRS is able to provide of the node topic equals El( topic, user) paper recommendations to users. Recommendations are produced by correlating the users' topics of interest and the papers class ( 2)ITNS is the node set in which each node represents a topic in fied to these topics. The explicit interests and implicit interests of hich no papers are ever accessed by user, the weight of the node topix equals the measurement of user 's implicit interest a certain user are both, separately, used for recommendation. The in topix, which is denoted as ll( topic, user). The calculation top-N items rule is implemented to control the number of recom- of ll( topic, user) is given in Algorithm 3 mendations. Papers are ranked in the order of the recommendation ()The set of edges ES C((ETNSUIINS)x(ETNSU ITNS), represent- confidence of them when being presented to a specific user;a the relevance between different topics in TNS. papers recommendation confidence for a user equals the produc of the classification confidence of this paper in a topic that maxi (4)WET(Ey)=sum(MES(topic topic), where Ey represents the mizes this paper's classification confidence and this user's degree ge connecting topic and topic: WET(Ei) denotes the weight of interest in this topic. Depending on which interest profile is used responds to an edge in WKG connecting one keyword belongs to interest or implicit interest st degree can be of either explicit topici and one keyword belongs to topic: the function sum()cal culates the summation of the weights of the edges in MES(topic The behavior history of users comment histories, are recorded and used to calculate their interest profiles. Also, papers that have been accessed by users before will be ignored when recommendations are provided The recommen- Thereby, we can now obtain a topic network graph for each sec- dations are presented separately. If users are interested in these ondary subject. Fig8 shows a sample of a topic network graph. The recommendations, they certainly tend to click on them, unless demonstration is provided for illustrative purposes only. In Fig.8. these items have already been visited the yellow circles represent topics belong to ETNS and the white circles represent topics belong to ITNS. 6.2. Empirical evaluation The relevance between topics is a meaningful way to infer implicit interests. As long as a user has explicit interests in some In this section, we conducted three experiments by SPrs has not yet explored can be estimated reasonably by the following ontology-based profiling approach proposed in this paper tended pics, the implicit interests of the user in the topics that the user to examine the ntology extens nethod and the ex algorithm. 6.2.1. Experiment 1: the acquisition of the extended subject ontology Topics of the secondary subject"databases First, we evaluated the extension of the original subject ontology by generating keyword clusters for the weighted keyword graphs onnectivity of central ofartificial intelligence "and"databases"subject. We used 200 papers in each subject in the following experiment. The papers data mining were from the Science Paper Online website(Sciencepap 2010). The clustering result of the"artificial intelligence"subject 10 has already been used as the examples shown in Section 4. In this section, we only discuss the clustering result of"databases". The weighted keyword graph of"databases"in which the edge weight spatial data mining threshold value is 1 is showed in Fig. 10. Then we assign 2 as theX. Tang, Q. Zeng / The Journal of Systems and Software 85 (2012) 87–101 95 Fig. 8. An example of topic network graph. subject ontology, the semantic relations between topics can be obtained by an inverse process of keyword clustering. Here, we give the definition of topic network graphs to reveal these semantic relations: Definition 8 (Topic network graph). A topic network graph is a graph TNG= (ETNS, ITNS, ES), where (1) ETNS is the node set in which each node represents a topic in which one ormore papers are ever accessed by useru, the weight of the node topici equals EI( topict, useru). (2) ITNS is the node set in which each node represents a topic in which no papers are ever accessed by useru, the weight of the node topicx equals the measurement of useru’s implicit interest in topicx, which is denoted as II( topict, useru). The calculation of II( topict, useru) is given in Algorithm 3. (3) The set of edges ES ⊆ ((ETNS∪ITNS) × (ETNS∪ITNS)), representing the relevance between different topics in TNS. (4) WET(Ei,j) = sum(MES(topici, topicj)), where Ei,j represents the edge connecting topici and topicj; WET(Ei,j) denotes the weight of Ei,j; MES(topici, topicj) is the edge set in which each edge corresponds to an edge in WKG connecting one keyword belongs to topici and one keyword belongs to topicj; the function sum( ) calculates the summation of the weights of the edges in MES(topici, topicj). Thereby, we can now obtain a topic network graph for each secondary subject. Fig. 8 shows a sample of a topic network graph. The demonstration is provided for illustrative purposes only. In Fig. 8, the yellow circles represent topics belong to ETNS and the white circles represent topics belong to ITNS. The relevance between topics is a meaningful way to infer implicit interests. As long as a user has explicit interests in some topics, the implicit interests of the user in the topics that the user has not yet explored can be estimated reasonably by the following algorithm. Table 2 Topics of the secondary subject “databases”. Topic name Connectivity of central keyword node data mining 32 SQL 14 clustering 11 data warehouse 10 distributed database 9 relational database 9 spatial data mining 7 Algorithm 3 (Implicit interest profiler). Let VC(topici) denotes the connectivity of the node topici, which equals the sum of the weights of all edges linked to the node topici; let II(topicx, useru) be useru’s implicit interest value for topicx. Then, II(topicx, useru) can be computed as follows: II(topicx, useru) = topici ∈ Neightbors(topicx ) WET(Ei,x) 2 VC(topicx) · VC(topici) · EI(topici, useru) , in which topicx ∈ ITNS and topici ∈ ETNS; Neighbors(topicx) stands for the set of topic nodes neighboring topicx in TNG. 6. Prototype system and empirical evaluation 6.1. SPRS—a prototype for scientific paper recommender system In order to assess the precision and effectiveness of our approach, we developed a scientific paper recommender prototype system (SPRS), which is illustrated in Fig. 9. There are three columns in the homepage of SPRS. The left column consists of three parts: user information, the papers which are currently read by the current user and the papers that the current user wants to read. The middle column presents the paper recommendations separately based on the records of three behaviors: download, browse and comment. In the right column, three lists of papers in the behavioral records are presented. Based on the user profiles obtained, SPRS is able to provide paper recommendations to users. Recommendations are produced by correlating the users’ topics of interest and the papers classi- fied to these topics. The explicit interests and implicit interests of a certain user are both, separately, used for recommendation. The top-N items rule is implemented to control the number of recommendations. Papers are ranked in the order of the recommendation confidence of them when being presented to a specific user; a paper’s recommendation confidence for a user equals the product of the classification confidence of this paper in a topic that maximizes this paper’s classification confidence and this user’s degree of interest in this topic. Depending on which interest profile is used for recommendation, the interest degree can be of either explicit interest or implicit interest. The behavior history of users, including browse, download and comment histories, are recorded and used to calculate their interest profiles. Also, papers that have been accessed by users before will be ignored when recommendations are provided. The recommendations are presented separately. If users are interested in these recommendations, they certainly tend to click on them, unless these items have already been visited. 6.2. Empirical evaluation In this section, we conducted three experiments by SPRS to examine the ontology extension method and the extended ontology-based profiling approach proposed in this paper. 6.2.1. Experiment 1: the acquisition of the extended subject ontology First, we evaluated the extension of the original subject ontology by generating keyword clusters for the weighted keyword graphs of “artificial intelligence” and “databases” subject. We used 200 papers in each subject in the following experiment. The papers were from the Science Paper Online website (Sciencepaper Online, 2010). The clustering result of the “artificial intelligence” subject has already been used as the examples shown in Section 4. In this section, we only discuss the clustering result of “databases”. The weighted keyword graph of “databases” in which the edge weight threshold value is 1 is showed in Fig. 10. Then we assign 2 as the

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：Keyword clustering for user interest profiling refinement within paper recommender systems