正在加载图片...
distribution r(n). Each new vertex connects with m edges to the vertices already in the network, and the probability of Database connecting to a vertex i is proportional to the degree and the fitness of vertex i. Topic model It is well known that the more frequent a word, the relevant pape send viewed papers available it is for production and comprehension processes This phenomenon is known as the frequency(referring to User the whole individual,s experience)or recency(referring to the recent individual's experience)effect. This phenomenon shows that preferential attachment is very likely to shape the Figure 2. A creation procedure of user's scale-free distribution of degrees[7] To deal with the characteristics(2)and(3), we looked pon words in papers which a user possesses as the vertices of the scale-free network. and word co-occurrence as the edges. We also calculated the frequency and the fitness of words from the Equation 2. Checking the user's interests or specialities secularly, we considered that the user's interests or specialities are determined using words that frequentl The construction of the user's model[ 22) can be seen in appear in the paper viewing history, and words which ap- Figure2. The user model generation mechanism uses pa pears most recently. Namely, we represent the frequency of pers which a user possesses to eliminate stopwords, pre- the network as the user's longer-term interest and a fitness processing based on making stems, and construct or adjust of network as the user's shorter-term interest the user's model. The comparison and selection mechanism seen in Figure 2. The constructed user's model is compared to papers in a Papits database. By comparing the users 2. 2. Construction of users model paper viewing history to the user's model, Papits can rec- ommend papers which are of interest to the user. Figure This section outlines the construction of the user model 3 represents the user's model made from a paper[7] as the based on the user's paper viewing history. Our method uses preprocessed network. Each vertex of Figure 3 is described papers that a user possesses to construct a network based on as square with the word, and each edge is described as line word frequency and word co-occurrences. The process of between two vertices. The described square, at the core, is our method is as follows: the frequent word which means the user's core word For example, we measured the frequency and fitness Step I Papers use natural language and require modifi- of the words using a paper[7]hereinafter called"pa cation before processing. The most frequent terms, such as a, andit,, are considered to be common and per A), a paper[6](hereinafter called"paper B"), and a paper[4 (hereinafter called"paper C"). Both paper A and meaningless[14]. For the reason, we should first re- paper B describe language and networks, and paper Cde move stopwords used in SMART system [21] scribes networks ep 2 Based on the assumption that terms with a common Table I shows a list of the top ten most frequent words stem usually have similar meanings, various-ED, This list consists of the word frequency and fitness of words. ING, -ION,IONS suffixes are removed to produce the The fitness i of words in Table 1 are calculated in the order word. For example, PLAY, PLAYS, PLAYED of paper A, paper B, and paper C. The fitness l of words in PLAYING are translated into play Our method em Table I are calculated in the order of paper A, paper C, and ployed Porter's suffix stripping algorithm[ 19] paper B From the fitness value point of view in Table 1, even if Step 3 Our method continuously adds words and word a different users read the same papers at different a period co-occurrences to the network. As previously men- a different value will be calculated. Thus, the latest paper tioned words are the network vertices and word co- which a user reads alters/changes the fitness occurrences are network edges. If the words or the An interface for paper recomm on using Papits can word co-occurrences have already been added to the be seen in Figure 4. Inside the bold line in Figure 4 is the network, they are not repeated. paper recommendation with title, authors, and paper rele Proceedings of the 2005 International Workshop on Data Engineering Issues in E-Commerce(DEEC'05) 076952401-X0520.00@2005LEEE SOCIETYdistribution r(η). Each new vertex connects with m edges to the vertices already in the network, and the probability of connecting to a vertex i is proportional to the degree and the fitness of vertex i, Πi =  ηiki j ηjkj (2) It is well known that the more frequent a word, the more available it is for production and comprehension processes. This phenomenon is known as the frequency (referring to the whole individual’s experience) or recency(referring to the recent individual’s experience) effect. This phenomenon shows that preferential attachment is very likely to shape the scale-free distribution of degrees[7]. To deal with the characteristics (2) and (3), we looked upon words in papers which a user possesses as the vertices of the scale-free network, and word co-occurrence as the edges. We also calculated the frequency and the fitness of words from the Equation 2. Checking the user’s interests or specialities secularly, we considered that the user’s interests or specialities are determined using words that frequently appear in the paper viewing history, and words which ap￾pears most recently. Namely, we represent the frequency of the network as the user’s longer-term interest and a fitness of network as the user’s shorter-term interest. 2.2. Construction of user’s model This section outlines the construction of the user model based on the user’s paper viewing history. Our method uses papers that a user possesses to construct a network based on word frequency and word co-occurrences. The process of our method is as follows: Step 1 Papers use natural language and require modifi- cation before processing. The most frequent terms, such as ‘a’ and ‘it’, are considered to be common and meaningless[14]. For the reason, we should first re￾move stopwords used in SMART system [21]. Step 2 Based on the assumption that terms with a common stem usually have similar meanings, various -ED, - ING, -ION, -IONS suffixes are removed to produce the stem word. For example, PLAY, PLAYS, PLAYED, PLAYING are translated into PLAY. Our method em￾ployed Porter’s suffix stripping algorithm[19]. Step 3 Our method continuously adds words and word co-occurrences to the network. As previously men￾tioned words are the network vertices and word co￾occurrences are network edges. If the words or the word co-occurrences have already been added to the network, they are not repeated. User's model Figure 2. A creation procedure of user’s model The construction of the user’s model[22] can be seen in Figure2. The user model generation mechanism uses pa￾pers which a user possesses to eliminate stopwords, pre￾processing based on making stems, and construct or adjust the user’s model. The comparison and selection mechanism seen in Figure 2. The constructed user’s model is compared to papers in a Papits database. By comparing the user’s paper viewing history to the user’s model, Papits can rec￾ommend papers which are of interest to the user. Figure 3 represents the user’s model made from a paper[7] as the preprocessed network. Each vertex of Figure 3 is described as square with the word, and each edge is described as line between two vertices. The described square, at the core, is the frequent word which means the user’s core word. For example, we measured the frequency and fitness of the words using a paper[7](hereinafter called “pa￾per A”), a paper[6](hereinafter called “paper B”), and a paper[4](hereinafter called “paper C”). Both paper A and paper B describe language and networks, and paper C de￾scribes networks. Table 1 shows a list of the top ten most frequent words. This list consists of the word frequency and fitness of words. The fitness I of words in Table 1 are calculated in the order of paper A, paper B, and paper C. The fitness II of words in Table 1 are calculated in the order of paper A, paper C, and paper B. From the fitness value point of view in Table 1, even if a different users read the same papers at different a period, a different value will be calculated. Thus, the latest paper which a user reads alters/changes the fitness. An interface for paper recommendation using Papits can be seen in Figure 4. Inside the bold line in Figure 4 is the paper recommendation with title, authors, and paper rele￾Proceedings of the 2005 International Workshop on Data Engineering Issues in E-Commerce (DEEC’05) 0-7695-2401-X/05 $20.00 © 2005 IEEE
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有