正在加载图片...
·OEM- OEM-ful -OEM-ll --OEM-full 5 20 5 20 5 20 5 20 (a)arXiv-TH (b)arXiv-PH (c)arXiv-TH(K=250) (d)arXiv-PH(K=250) 02 -OEM- OEM-bet CEM-full 0.1 5 20 20 15 20 10 Time 15 20 (e)arXiv-TH (①arXiv-PH (g)Paper sets cited at the 8001st(h)Paper sets cited at the 8005th unique time unique time Figure 1:(a)and(b)are the average held-out log-likelihood of testing citation events.(c)and (d)are the recall of top-K recommendation lists.(e)and (f)are the average held-oud normalized ranks of testing citation events.Since all models have the same initial parameters after the building and training phases,all models have the same performance on the first testing batch,which can be seen from (a)to (f).(g)and(h)are the topic evolution of sets of papers cited at the 8001st and 8005th unique time.To avoid clutter,we only show the topics with the largest proportions (top topics). Table 3:Computation time (in seconds)of OEM-full and OEM-appr with A=0.1. CITER PERCENTAGE 2%5%10%20%30%50%100% OEM-FULL 0.130.430.871.421.962.613.91 OEM-APPR 0.060.220.410.700.95 1.29 1.94 Table 4:Average held-out log-likelihood when citer percentage is 10% 入 10-0.010.10.5 1 2 10 LOG-LIKELIHOOD -8.61-8.33-8.15-8.28-8.33-8.35-8.56 Table 5:Average held-out log-likelihood when A=0.1 CITER PERCENTAGE 2%5%10% 20%30%50%100% LOG-LIKELIHOOD -8.94-8.43-8.15-8.10-8.09 -8.03-7.98 AVERAGE TIME 0.13 0.430.87 1.42 1.96 2.61 3.91 and node features [Kataria et al.,2011;Hu et al.,2012; successfully overcome the problem of DEM whose predictive Krafft et al.,2012].Generally,it is very time-consuming accuracy will decrease significantly over time.Experimental to simultaneously update the topics and topic proportions for results on real-world citation networks demonstrate that OEM time-varying data. can achieve very promising performance in real applications Instead of utilizing some existing online LDA model- Although the experiments in this paper are only for paper s [Canini et al.,2009;Hoffman et al.,2010],we choose to citation networks,as stated in [Vu et al.,2011b],our model directly adjust the topic proportions of papers.This is be- can be generalized to other types of networks,which will be cause online inference of LDA interacts with the text contents pursued in our future work. of the papers,which will take a lot more time to update all the LDA vectors.However in our OEM,we only need to solve small convex optimization problems to update the vectors. 7 Acknowledgements 6 Conclusion This work is supported by the NSFC (No.61100125),the In this paper,an online egocentric model (OEM)is proposed 863 Program of China (No.2011AA01A202),and the Pro- for evolving citation network modeling.By adaptively learn- gram for Changjiang Scholars and Innovative Research Team ing the parameters and topic features over time,OEM has in University of China (IRT1158,PCSIRT).0 5 10 15 20 25 −12 −11 −10 −9 −8 Paper batches Average log−likelihood DEM OEM−beta OEM−appr OEM−full (a) arXiv-TH 0 5 10 15 20 25 −11 −10 −9 −8 Paper batches Average log−likelihood DEM OEM−beta OEM−appr OEM−full (b) arXiv-PH 0 5 10 15 20 25 0.25 0.3 0.35 0.4 0.45 Paper batches Recall DEM OEM−beta OEM−appr OEM−full (c) arXiv-TH(K=250) 0 5 10 15 20 25 0.2 0.25 0.3 0.35 0.4 Paper batches Recall DEM OEM−beta OEM−appr OEM−full (d) arXiv-PH(K=250) 0 5 10 15 20 25 0 0.02 0.04 0.06 0.08 0.1 0.12 Paper batches Average normalized rank DEM OEM−beta OEM−appr OEM−full (e) arXiv-TH 0 5 10 15 20 25 0.12 0.14 0.16 0.18 0.2 0.22 Paper batches Average normalized rank DEM OEM−beta OEM−appr OEM−full (f) arXiv-PH 0 5 10 15 20 25 0 0.1 0.2 0.3 0.4 0.5 Time Topic propotions Topic 7 Topic 15 Topic 44 Topic 46 (g) Paper sets cited at the 8001st unique time 0 5 10 15 20 25 0 0.1 0.2 0.3 0.4 0.5 0.6 Time Topic propotions Topic 7 Topic 10 Topic 37 (h) Paper sets cited at the 8005th unique time Figure 1: (a) and (b) are the average held-out log-likelihood of testing citation events. (c) and (d) are the recall of top-K recommendation lists. (e) and (f) are the average held-oud normalized ranks of testing citation events. Since all models have the same initial parameters after the building and training phases, all models have the same performance on the first testing batch, which can be seen from (a) to (f). (g) and (h) are the topic evolution of sets of papers cited at the 8001st and 8005th unique time. To avoid clutter, we only show the topics with the largest proportions (top topics). Table 3: Computation time (in seconds) of OEM-full and OEM-appr with λ = 0.1. CITER PERCENTAGE 2% 5% 10% 20% 30% 50% 100% OEM-FULL 0.13 0.43 0.87 1.42 1.96 2.61 3.91 OEM-APPR 0.06 0.22 0.41 0.70 0.95 1.29 1.94 Table 4: Average held-out log-likelihood when citer percentage is 10% λ 10−4 0.01 0.1 0.5 1 2 104 LOG-LIKELIHOOD -8.61 -8.33 -8.15 -8.28 -8.33 -8.35 -8.56 Table 5: Average held-out log-likelihood when λ = 0.1 CITER PERCENTAGE 2% 5% 10% 20% 30% 50% 100% LOG-LIKELIHOOD -8.94 -8.43 -8.15 -8.10 -8.09 -8.03 -7.98 AVERAGE TIME 0.13 0.43 0.87 1.42 1.96 2.61 3.91 and node features [Kataria et al., 2011; Hu et al., 2012; Krafft et al., 2012]. Generally, it is very time-consuming to simultaneously update the topics and topic proportions for time-varying data. Instead of utilizing some existing online LDA model￾s [Canini et al., 2009; Hoffman et al., 2010], we choose to directly adjust the topic proportions of papers. This is be￾cause online inference of LDA interacts with the text contents of the papers, which will take a lot more time to update all the LDA vectors. However in our OEM, we only need to solve small convex optimization problems to update the vectors. 6 Conclusion In this paper, an online egocentric model (OEM) is proposed for evolving citation network modeling. By adaptively learn￾ing the parameters and topic features over time, OEM has successfully overcome the problem of DEM whose predictive accuracy will decrease significantly over time. Experimental results on real-world citation networks demonstrate that OEM can achieve very promising performance in real applications. Although the experiments in this paper are only for paper citation networks, as stated in [Vu et al., 2011b], our model can be generalized to other types of networks, which will be pursued in our future work. 7 Acknowledgements This work is supported by the NSFC (No. 61100125), the 863 Program of China (No. 2011AA01A202), and the Pro￾gram for Changjiang Scholars and Innovative Research Team in University of China (IRT1158, PCSIRT)
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有