正在加载图片...
situation in which the term is used. the difference is a char. vSM--Thesaurus IRM-7Our method(Eq 6r7-Our method(Eq acteristic of the source and is used for selection Terms used Precision in each source are distinguished by the words that occur in a source and their frequencies of occurrence. However, meth ods using only statistical data face the problems caused by 0.5 polysemous words. This thesaurus based method, the mean- ings of a term are distinguished between by the relationship between the term and other terms sim(X,)=- where n is the number of terms that appears at X and y i and yi are the elements of ith row of the square matrix, which is constructed from a document X and a document 4.3. IRM two months ago IRM[16] denote the unconditional probability of a fre- Figure 6. An Experimental Result quent term EG as the expected probability Pg, and the to- tal number of co-occurrence of term wi and frequent terms G as fG(wi. Frequency of co-occurrence of term w; and term E G is written as freqlwi, g). The statistics value of x is defined as follows.(subscripting to representin document].) other methods and equation 6 of our method move down ward. Other methods can't resolve the characteristic(2) freq(wij, 9)-f(wi)pa)2 i.e., whether or not recommended papers are novel. How ever, precision of Equation 7 of our method moves upward fg(wii)pg Namely, the topic model mentioned in Section 2.3 is ef- fective to resolve the characteristic(2), and whether or not If x(w)>x2 a, the null hypothesis is rejected with recommended significance level a(xd is normally obtained from statisti- cal tables, or by integral calculation). The term fG(wi)Pg characteristic(3) mentioned in Section 1. From the charac- represents the expected frequency of co-occurrence, and teristic(3) point of view in Table 3, The precision of all other (freq(w, g)-fGwii)Pg)represents the difference be- methods moves downward. Other methods can't resolve the tween expected and observed frequencies. Therefore, large characteristic(3), i. e, whether or not recommended papers xii indicates that co-occurrence of term wi shows strong are of interest at the present moment. However, precision of bias.IRM uses the x2- measure as an index of biases, not Equation 6 and 7 is effective to resolve the characteristic(3), for tests of hypotheses and whether or not recommended papers are of interest at the 4.4. Experimental Result These experimental results show that correctness of rec- ommendation of our method is higher than that of other Figure 6 shows the precision of recommendation correct- methods. VSM and co-occurrence-based thesaurus IRM ness for each method. The horizontal axis of Figure shows a checked point of time stamp and the vertical axis show the precision of recommendation. The precision of other exist 5. Related works ng method, VSM, Thesaurus, IRM and Equation 6 keep or move downward. However, the precision of Equation 7 of Information recommendation is helpful in reducing the our method move upward. noise in a document and preventing information overload the characteristic(2) mentioned in Section 1. From the ported. Miura[17] proposed an adaptive Web which dy characteristic(2) point of view in Table 2, The precision of namically changed information content and exhibition due Proceedings of the 2005 International Workshop on Data Engineering Issues in E-Commerce(DEEC'05) 076952401-X0520.00@2005LEEE SOCIETYsituation in which the term is used. The difference is a char￾acteristic of the source and is used for selection. Terms used in each source are distinguished by the words that occur in a source and their frequencies of occurrence. However, meth￾ods using only statistical data face the problems caused by polysemous words. This thesaurus based method, the mean￾ings of a term are distinguished between by the relationship between the term and other terms. sim(X, Y ) = 1 n · n i=1 xiyi n i=1 x2 i n i=1 y2 i where n is the number of terms that appears at X and Y . xi and yi are the elements of ith row of the square matrix, which is constructed from a document X and a document Y . 4.3. IRM IRM[16] denote the unconditional probability of a fre￾quent term g ∈ G as the expected probability pg , and the to￾tal number of co-occurrence of term wi and frequent terms G as fG(wi). Frequency of co-occurrence of term wi and term g ∈ G is written as freq(wi, g). The statistics value of χ2 is defined as follows. (subscripting to represent“ in document j ”.) χ2 ij =  g∈G (freq(wij , g) − fG(wij )pg)2 fG(wij )pg If χ2(w) > χ2 α, the null hypothesis is rejected with significance level α (χ2 α is normally obtained from statisti￾cal tables, or by integral calculation). The term fG(wij )pg represents the expected frequency of co-occurrence, and (freq(w, g) − fG(wij )pg) represents the difference be￾tween expected and observed frequencies. Therefore, large χ2 ij indicates that co-occurrence of term wi shows strong bias. IRM uses the χ2- measure as an index of biases, not for tests of hypotheses. 4.4. Experimental Result Figure 6 shows the precision of recommendation correct￾ness for each method. The horizontal axis of Figure 6shows a checked point of time stamp and the vertical axis show the precision of recommendation. The precision of other exist￾ing method, VSM, Thesaurus, IRM and Equation 6 keep or move downward. However, the precision of Equation 7 of our method move upward. Table 2 shows how effective each method can resolve the characteristic(2) mentioned in Section 1. From the characteristic(2) point of view in Table 2, The precision of 0.24 0.26 0.27 0.33 0.34 0.32 0.35 0.37 0.37 0.37 0.4 0.38 0.51 0.52 0.57 0 0.1 0.2 0.3 0.4 0.5 0.6 six months ago four months ago two months ago VSM Thesaurus IRM Our method(Eq 6) Our method(Eq 7) Figure 6. An Experimental Result other methods and Equation 6 of our method move down￾ward. Other methods can’t resolve the characteristic(2), i.e., whether or not recommended papers are novel. How￾ever, precision of Equation 7 of our method moves upward. Namely, the topic model mentioned in Section 2.3 is ef￾fective to resolve the characteristic(2), and whether or not recommended papers are novel. Table 3 shows how effective each method can resolve the characteristic(3) mentioned in Section 1. From the charac￾teristic(3) point of view in Table 3, The precision of all other methods moves downward. Other methods can’t resolve the characteristic(3), i.e., whether or not recommended papers are of interest at the present moment. However, precision of Equation 6 and 7 is effective to resolve the characteristic(3), and whether or not recommended papers are of interest at the present moment. These experimental results show that correctness of rec￾ommendation of our method is higher than that of other methods, VSM and co-occurrence-based thesaurus, IRM. 5. Related Works Information recommendation is helpful in reducing the noise in a document and preventing information overload. Several methods of information filtering have been re￾ported. Miura[17] proposed an adaptive Web which dy￾namically changed information content and exhibition due Proceedings of the 2005 International Workshop on Data Engineering Issues in E-Commerce (DEEC’05) 0-7695-2401-X/05 $20.00 © 2005 IEEE
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有