Online Egocentric Models for Citation Networks Hao Wang and Wu-Jun Li Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering,Shanghai Jiao Tong University,China js05212@sjtu.edu.cn,liwujun@cs.sjtu.edu.cn Abstract Although DEM can dynamically update the link features With the emergence of large-scale evolving(time- (statistics)of the nodes(papers).the learned parameters and topic features of DEM!are static(fixed)during the prediction varying)networks,dynamic network analy- process for evolving networks.Hence,DEM suffers from a sis (DNA)has become a very hot research topic in recent years.Although a lot of DNA methods decrease of accuracy over time because typically both the pa- rameters and the topic features of the papers will evolve over have been proposed by researchers from differ- time. ent communities.most of them can only model For example,one of the link features reflects the in- degree(number of citations)of a paper until some time point. snapshot data recorded at a very rough temporal granularity.Recently,some models have been As time goes on,the cumulative number of citations for ex- proposed for DNA which can be used to model isting papers will become larger and larger.Hence,the dis- tribution of the citations for the whole data set will change large-scale citation networks at a fine temporal over time.As a consequence,the corresponding parameter granularity.However,they suffer from a significant decrease of accuracy over time because the learned which typically reflects the distribution of the features should also change over time.At first sight,it seems a little confus- parameters or node features are static (fixed) during the prediction process for evolving citation ing that the topic features of a paper can change over time networks.In this paper,we propose a novel model, because the content of a published paper is typically static. called online egocentric model (OEM),to learn However,the citations to an existing paper are dynamic.It is more reasonable to combine both the citation and content time-varying parameters and node features for information to decide the topic of a paper.For example,a evolving citation networks.Experimental results on real-world citation networks show that our paper about neural network considered to be highly related OEM can not only prevent the prediction accuracy to the topic psychology in the 1950s may be more likely to be classified as a machine learning paper today because more from decreasing over time but also uncover the and more machine learning papers cite that neural network evolution of topics in citation networks. paper.Hence,it is very obvious to find that the topic features will also change over time.Without the ability to adaptively 1 Introduction learn the parameters and topic features,DEM fails to model Network analysis Goldenberg et al.,2009:Li and Ye- the evolution of networks.This phenomenon of decreasing ung,2009;Li et al.,2009a;2009b;Wang et al.,2010; prediction accuracy over time can also be observed from the Li et al.,2011;Li and Yeung,2012;Zhu,2012;McAuley experimental results in Figure 2 of [Vu et al.,2011b]. and Leskovec,2012;Kim and Leskovec,2012;Myers et al., In this paper,we propose an online extension of DEM, 2012],especially dynamic network analysis (DNA)has be- called online egocentric model (OEM),to capture the evo- come increasingly important in many fields like social sci- lution of both topic features and model parameters.The con- ence and biology.Although there have been a lot of work- tributions of this paper are briefly outlined as follows: s on DNA,most of them either focus on large-scale da- OEM takes the evolution of both topic features and pa- ta at a very rough temporal granularity [Fu et al.,2009: rameters into consideration and maintains high predic- Wyatt et al.,2010;Hanneke et al.,2010;Richard et al.,2012; tion accuracy regardless of the elapse of time Sarkar et al.,2012;Jin et al.,2011;Wang and Groth,2011; Nori et al.,2011]or focus on small networks at a fine tem- poral granularity [Wasserman,1980;Snijders,2005].Re- In [Vu et al.,2011b],there are two variants of DEM.One mod- cently,dynamic egocentric model (DEM)[Vu et al.,2011b], els only the link features,and the other models both the link and top- ic features (textual information).Unless otherwise stated,the DEM which is based on multivariate counting processes,has been in this paper refers to the variant with both link and topic features successfully proposed to model large-scale evolving citation because it achieves far better accuracy than DEM without topic fea- networks at a fine temporal granularity of individual time- tures [Vu et al.,2011b]and we can always get the topic features for stamped events. a paper if we want
Online Egocentric Models for Citation Networks Hao Wang and Wu-Jun Li Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong University, China js05212@sjtu.edu.cn, liwujun@cs.sjtu.edu.cn Abstract With the emergence of large-scale evolving (timevarying) networks, dynamic network analysis (DNA) has become a very hot research topic in recent years. Although a lot of DNA methods have been proposed by researchers from different communities, most of them can only model snapshot data recorded at a very rough temporal granularity. Recently, some models have been proposed for DNA which can be used to model large-scale citation networks at a fine temporal granularity. However, they suffer from a significant decrease of accuracy over time because the learned parameters or node features are static (fixed) during the prediction process for evolving citation networks. In this paper, we propose a novel model, called online egocentric model (OEM), to learn time-varying parameters and node features for evolving citation networks. Experimental results on real-world citation networks show that our OEM can not only prevent the prediction accuracy from decreasing over time but also uncover the evolution of topics in citation networks. 1 Introduction Network analysis [Goldenberg et al., 2009; Li and Yeung, 2009; Li et al., 2009a; 2009b; Wang et al., 2010; Li et al., 2011; Li and Yeung, 2012; Zhu, 2012; McAuley and Leskovec, 2012; Kim and Leskovec, 2012; Myers et al., 2012], especially dynamic network analysis (DNA) has become increasingly important in many fields like social science and biology. Although there have been a lot of works on DNA, most of them either focus on large-scale data at a very rough temporal granularity [Fu et al., 2009; Wyatt et al., 2010; Hanneke et al., 2010; Richard et al., 2012; Sarkar et al., 2012; Jin et al., 2011; Wang and Groth, 2011; Nori et al., 2011] or focus on small networks at a fine temporal granularity [Wasserman, 1980; Snijders, 2005]. Recently, dynamic egocentric model (DEM) [Vu et al., 2011b], which is based on multivariate counting processes, has been successfully proposed to model large-scale evolving citation networks at a fine temporal granularity of individual timestamped events. Although DEM can dynamically update the link features (statistics) of the nodes (papers), the learned parameters and topic features of DEM1 are static (fixed) during the prediction process for evolving networks. Hence, DEM suffers from a decrease of accuracy over time because typically both the parameters and the topic features of the papers will evolve over time. For example, one of the link features reflects the indegree (number of citations) of a paper until some time point. As time goes on, the cumulative number of citations for existing papers will become larger and larger. Hence, the distribution of the citations for the whole data set will change over time. As a consequence, the corresponding parameter which typically reflects the distribution of the features should also change over time. At first sight, it seems a little confusing that the topic features of a paper can change over time because the content of a published paper is typically static. However, the citations to an existing paper are dynamic. It is more reasonable to combine both the citation and content information to decide the topic of a paper. For example, a paper about neural network considered to be highly related to the topic psychology in the 1950s may be more likely to be classified as a machine learning paper today because more and more machine learning papers cite that neural network paper. Hence, it is very obvious to find that the topic features will also change over time. Without the ability to adaptively learn the parameters and topic features, DEM fails to model the evolution of networks. This phenomenon of decreasing prediction accuracy over time can also be observed from the experimental results in Figure 2 of [Vu et al., 2011b]. In this paper, we propose an online extension of DEM, called online egocentric model (OEM), to capture the evolution of both topic features and model parameters. The contributions of this paper are briefly outlined as follows: • OEM takes the evolution of both topic features and parameters into consideration and maintains high prediction accuracy regardless of the elapse of time. 1 In [Vu et al., 2011b], there are two variants of DEM. One models only the link features, and the other models both the link and topic features (textual information). Unless otherwise stated, the DEM in this paper refers to the variant with both link and topic features because it achieves far better accuracy than DEM without topic features [Vu et al., 2011b] and we can always get the topic features for a paper if we want
During the online training of OEM,we can also uncover of DEM will not change with the evolution of network- the evolution of topic features for each paper and the s,which will cause the accuracy of DEM to decrease over propagation of topic features between pairs of papers. time.In this section,we present our online egocentric mod- Extensive experiments on two real-world citation net- el(OEM)to solve the problems faced by DEM.The basic works are performed to demonstrate the effectiveness of idea is to adaptively update the parameters and topic features our novel model. after some new events are observed. Although we can learn the whole LDA model from the col- 2 Dynamic Egocentric Model lection of papers,it will be very time-consuming in gener- al even if we adopt the online LDA model [Hoffman et al. In this section,we briefly review the DEM [Vu et al.,2011b] 2010].Hence,in this paper we just learn the topic proportions which we base our work on.For ease of understanding,we 0 with the topics fixed.This is reasonable because the main use the same notations as those in [Vu et al.,2011b]. topics in the citation networks are relatively stable although Let n denote the total number of nodes (papers)in the the topic proportions for some papers will change over time. network.DEM tries to model a dynamic (citation)network We only need to update the whole topics after a long time pe- by placing a counting process Ni(t)on each node i(i= riod.From our experiments,we find that good performance 1.2.....n),where Ni(t)denotes the cumulative number can still be achieved by only updating the topic proportions. of events associated with node i until time t.The definition Hence,our OEM tries to minimize the following objective of events depends on context.For example,an event corre- function after observing some new events: sponds to a citation in citation networks Although a continuous-time model can be estimated by minimize-logL(B,w)+入∑lwk-03(2) maximizing the full likelihood of the counting process,for ci- 1 tation networks it is more practical to estimate the parameters subject to:Wk0,1Twk =1, associated with the time-dependent statistics at event times by maximizing the partial likelihood.Then the DEM tries to where wk is the new topic proportions of node k that need maximize the following likelihood of the whole network: to be learned and 0 is the current topic proportions of node k,w=wk=1.L(B,w)has the same definition as L(B) exp(Bsi(te)) (1) in (1)by treating both B and topic proportions as variables2, Yi(te)exp(Bsi(te)) Wk0 denotes each element in wk is non-negative,1 is a vector of all Is,the constraints are used to guarantee that all elements in w are non-negative and the summation of where m is the total number of citation events,e is the index the elements in wk is 1.A is a hyperparameter to control the of each citation event,ie denotes the paper cited in event e,te tradeoff between two terms. denotes the time of event e.Yi(t)is 1 if node i already exists When a new event or a set of new events are observed,the at time t and 0 otherwise,si(te)denotes the feature vector of second term in(2)will guarantee that the updated topic pro- node i at the time te,and 3 is a vector of parameters to learn. portion wk will not be too far away from the current topic The features in si(te)can be divided into two types.One proportion 0.Furthermore,we use the current B as initial- type is called link features (statistics),and the other type is ization to get the updated 3.Hence,by effectively using the called topic features.In [Vu et al.,2011b],eight link fea- information of the existing events,we successfully get an on- tures,including three preferential attachment statistics,three line learning algorithm. triangle statistics and two out-path statistics,are extracted for It is easy to see that the optimization problem in(2)is not each node.Fifty topic features are extracted by performing jointly convex in(B,w).But we can prove that the objective Latent Dirichlet Allocation (LDA)[Blei et al..2003]on the function is convex in either 3 or w with the other variable abstracts of the papers.More specifically,assuming the arriv- fixed.In this paper,we design an alternating projection algo- ing paper is i at the time te.we can compute the topic features rithm to find the solutions.More specifically,each time we of any existing paper j as follows: fix one variable and then update the other one.The procedure sDA(t)=0:08 is briefly outlined as follows: online 3 step:Fix w,and apply Newton's method to where 0;denotes the topic proportion of paper i,o denotes update the parameter B by using the current B for ini- the element-wise multiplication. tialization; Hence,si(te)is a vector of 58 features,with the first 8 fea- online topic step:Fix 3,and minimize the problem in tures being link features and the last 50 features being topic features.Correspondingly,B is a vector of length 58.More (2)to get the updated topic proportions wk based on the details about the features can be found in Vu et al.,2011b]. current topic proportions 6k. The above procedure will be repeated for several iterations 3 Online Egocentric Model until some termination condition is satisfied.We can prove that the learning algorithm is convergent. During the prediction process for evolving networks,the link features of nodes will be dynamically updated in DEM.How- 2Please note L(B,w)is different from L(B).In L(B).only B is ever,both the learned parameters (B)and topic features (0i) a variable and w is a constant (fixed value)
• During the online training of OEM, we can also uncover the evolution of topic features for each paper and the propagation of topic features between pairs of papers. • Extensive experiments on two real-world citation networks are performed to demonstrate the effectiveness of our novel model. 2 Dynamic Egocentric Model In this section, we briefly review the DEM [Vu et al., 2011b] which we base our work on. For ease of understanding, we use the same notations as those in [Vu et al., 2011b]. Let n denote the total number of nodes (papers) in the network. DEM tries to model a dynamic (citation) network by placing a counting process Ni(t) on each node i (i = 1, 2, · · · , n) , where Ni(t) denotes the cumulative number of events associated with node i until time t. The definition of events depends on context. For example, an event corresponds to a citation in citation networks. Although a continuous-time model can be estimated by maximizing the full likelihood of the counting process, for citation networks it is more practical to estimate the parameters associated with the time-dependent statistics at event times by maximizing the partial likelihood. Then the DEM tries to maximize the following likelihood of the whole network: L(β) = Ym e=1 exp(β T sie (te)) Pn i=1 Yi(te) exp(β T si(te)) , (1) where m is the total number of citation events, e is the index of each citation event, ie denotes the paper cited in event e, te denotes the time of event e, Yi(t) is 1 if node i already exists at time t and 0 otherwise, si(te) denotes the feature vector of node i at the time te, and β is a vector of parameters to learn. The features in si(te) can be divided into two types. One type is called link features (statistics), and the other type is called topic features. In [Vu et al., 2011b], eight link features, including three preferential attachment statistics, three triangle statistics and two out-path statistics, are extracted for each node. Fifty topic features are extracted by performing Latent Dirichlet Allocation (LDA) [Blei et al., 2003] on the abstracts of the papers. More specifically, assuming the arriving paper is i at the time te, we can compute the topic features of any existing paper j as follows: s LDA j (te) = θi ◦ θj , where θi denotes the topic proportion of paper i, ◦ denotes the element-wise multiplication. Hence, si(te) is a vector of 58 features, with the first 8 features being link features and the last 50 features being topic features. Correspondingly, β is a vector of length 58. More details about the features can be found in [Vu et al., 2011b]. 3 Online Egocentric Model During the prediction process for evolving networks, the link features of nodes will be dynamically updated in DEM. However, both the learned parameters (β) and topic features (θi) of DEM will not change with the evolution of networks, which will cause the accuracy of DEM to decrease over time. In this section, we present our online egocentric model (OEM) to solve the problems faced by DEM. The basic idea is to adaptively update the parameters and topic features after some new events are observed. Although we can learn the whole LDA model from the collection of papers, it will be very time-consuming in general even if we adopt the online LDA model [Hoffman et al., 2010]. Hence, in this paper we just learn the topic proportions θ with the topics fixed. This is reasonable because the main topics in the citation networks are relatively stable although the topic proportions for some papers will change over time. We only need to update the whole topics after a long time period. From our experiments, we find that good performance can still be achieved by only updating the topic proportions. Hence, our OEM tries to minimize the following objective function after observing some new events: minimize − log L(β, ω) + λ Pn k=1 kωk − θkk 2 2 (2) subject to : ωk 0, 1 T ωk = 1, where ωk is the new topic proportions of node k that need to be learned and θk is the current topic proportions of node k, ω = {ωk} n k=1, L(β, ω) has the same definition as L(β) in (1) by treating both β and topic proportions as variables2 , ωk 0 denotes each element in ωk is non-negative, 1 is a vector of all 1s, the constraints are used to guarantee that all elements in ωk are non-negative and the summation of the elements in ωk is 1, λ is a hyperparameter to control the tradeoff between two terms. When a new event or a set of new events are observed, the second term in (2) will guarantee that the updated topic proportion ωk will not be too far away from the current topic proportion θk. Furthermore, we use the current β as initialization to get the updated β. Hence, by effectively using the information of the existing events, we successfully get an online learning algorithm. It is easy to see that the optimization problem in (2) is not jointly convex in (β, ω). But we can prove that the objective function is convex in either β or ω with the other variable fixed. In this paper, we design an alternating projection algorithm to find the solutions. More specifically, each time we fix one variable and then update the other one. The procedure is briefly outlined as follows: • online β step: Fix ω, and apply Newton’s method to update the parameter β by using the current β for initialization; • online topic step: Fix β, and minimize the problem in (2) to get the updated topic proportions ωk based on the current topic proportions θk. The above procedure will be repeated for several iterations until some termination condition is satisfied. We can prove that the learning algorithm is convergent. 2 Please note L(β, ω) is different from L(β). In L(β), only β is a variable and ω is a constant (fixed value)
Mini-batches:In OEM introduced above,every time a where new paper i arrives,we can join it into the network and it- a;=exp(B1 sk(te)), erate between the online 3 step and the online topic step un- til it converges.However,it's computationally expensive for Yu=exp(BTsk(te.), large-scale citation networks.A common technique is to con- sider multiple events per update.Not only can this kind of Ai=>Yj(te.)exp(BTs;(te)). j≠k mini-batch strategy save computational cost,but it also re- duces noise [Hoffman et al.,2010].Hence,in our imple- Bu=∑yte)exp(gs,te)》, mentation,rather than perform updating for each event,we ≠从 perform one updating for every g citation events.g is set to ai=B:o0i, about 1500 in our experiments. The following content in this section will detail the al- bu=B:o0u. gorithms for online B learning and online topic proportion Here,B contains the first 8 elements of the parameter B(cor- learning. responding to link features),B,contains the last 50 elements of B(corresponding to the topic features),0;is the topic pro- 3.1 Online B Step portion of the citer at citation event ei and s(te)is the link features(first 8 features)of node k at citation event ei.Cu is With w fixed,the objective function to learn B is as follows: a constant irrelevant to wk. 9 exp(BTsi(te)) The first and second order derivatives of(3)are as follows: P e=T Yi(te)exp(Bsi(te)) =-∑a+ aiaiexp(afwk) =1 =1 台Ai+iexp(awk】 where x is the starting event in the mini-batch of q events. buyu exp(bTwk) To avoid walking through all the citation events when up- ∠,Bu+Yu exp(bIwk) u=p+1 dating the parameter B,we can use a training window to re- strict the training in a small subset of the citation events.With +2(wk-8k), (4) the training window of width W:(1 W:sq),B can be Aiaiaial exp(afwk) trained by optimizing: (Ai+aiexp(aTwk))2 L(B exp(gs.(te)】 BuYububr exp(bTwk) + +2I, e=r+q-W:>Yi(te)exp(BTs;(te)) u=p+1 (Bu+7u exp(bTwk))2 i=1 where I is an identity matrix. Furthermore,we can cache the link features of each node It is easy to prove that the second order derivative(Hessian) to further cut the computational time,as done in [Vu et al., is positive definite(PD).Hence,the function in (3)is convex 2011b. We can use some solver to find a global optimal solution. 3.2 Online Topic Step Approximative Online Topic Step In (4),Ai is far larger than aioiexp(awk)and In this section,we first formulate the full online topic step to aiexp(aw),and p is relatively small in each batch. learn the updated topic proportions k.After that,we derive Similarly,Bu is far larger than buyu exp(bwk)and an approximative online topic step to speed up the optimiza- Yu exp(bwk),and (g-p)is relatively small.Hence,the tion process. second and third terms in (4)are much smaller than the other Full Online Topic Step two.Consequently,we can remove these two smaller terms to get the following approximative gradient: It is very time-consuming if we update all the topic propor- tions in w at a time.We also design an alternating algorithm ∂f ≈->a+2A(wk-0k): for updating w.More specifically,each time we optimize Owk i=1 the topic proportion for one paper,say wk,with all the other wiik}fixed.Given a mini-batch of size g.if node k Based on the above approximative gradient,we can recover the following approximative objective function of (2): gets cited at citation event e,e2,...,pand doesn't get cited at time ep+1,ep+2,...,e(note that e2 does not necessari- ly happen before ep+2although its subscript is smaller),the minimize aw+入∑ls- (5 i=1 k=1 function f(wk)we optimize is: subject to wk之0,1Twk=1. log(p() aiexp(afwk) We call the OEM variant in(5)approximative OEM and Bu Yu exp(bTwk) the original OEM in (2)full OEM.In our experiments,we find that the approximative OEM achieves accuracy comparable +lwk-0k匠, (3) with that of full OEM with much less computational cost
Mini-batches: In OEM introduced above, every time a new paper i arrives, we can join it into the network and iterate between the online β step and the online topic step until it converges. However, it’s computationally expensive for large-scale citation networks. A common technique is to consider multiple events per update. Not only can this kind of mini-batch strategy save computational cost, but it also reduces noise [Hoffman et al., 2010]. Hence, in our implementation, rather than perform updating for each event, we perform one updating for every q citation events. q is set to about 1500 in our experiments. The following content in this section will detail the algorithms for online β learning and online topic proportion learning. 3.1 Online β Step With ω fixed, the objective function to learn β is as follows: L(β) = x+ Yq−1 e=x exp(β T sie (te)) Pn i=1 Yi(te) exp(β T si(te)) , where x is the starting event in the mini-batch of q events. To avoid walking through all the citation events when updating the parameter β, we can use a training window to restrict the training in a small subset of the citation events. With the training window of width Wt (1 ≤ Wt ≤ q), β can be trained by optimizing: Lw(β) = x+ Yq−1 e=x+q−Wt exp(β T sie (te)) Pn i=1 Yi(te) exp(β T si(te)) . Furthermore, we can cache the link features of each node to further cut the computational time, as done in [Vu et al., 2011b]. 3.2 Online Topic Step In this section, we first formulate the full online topic step to learn the updated topic proportions ωk. After that, we derive an approximative online topic step to speed up the optimization process. Full Online Topic Step It is very time-consuming if we update all the topic proportions in ω at a time. We also design an alternating algorithm for updating ω. More specifically, each time we optimize the topic proportion for one paper, say ωk, with all the other {ωi |i 6= k} fixed. Given a mini-batch of size q, if node k gets cited at citation event e1, e2, . . . , ep and doesn’t get cited at time ep+1, ep+2, . . . , eq (note that e2 does not necessarily happen before ep+2 although its subscript is smaller), the function f(ωk) we optimize is: − log(Yp i=1 αi exp(a T i ωk) Ai + αi exp(a T i ωk) Yq u=p+1 Cu Bu + γu exp(bT u ωk) ) + λkωk − θkk 2 2 , (3) where αi = exp(β T l s l k (tei )), γu = exp(β T l s l k (teu )), Ai = X j6=k Yj (tei ) exp(β T sj (tei )), Bu = X j6=k Yj (teu ) exp(β T sj (teu )), ai = βt ◦ θi , bu = βt ◦ θu. Here, βl contains the first 8 elements of the parameter β (corresponding to link features), βt contains the last 50 elements of β (corresponding to the topic features), θi is the topic proportion of the citer at citation event ei and s l k (tei ) is the link features (first 8 features) of node k at citation event ei , Cu is a constant irrelevant to ωk. The first and second order derivatives of (3) are as follows: ∂f ∂ωk = − Xp i=1 ai + Xp i=1 aiαi exp(a T i ωk) Ai + αi exp(a T i ωk) + Xq u=p+1 buγu exp(b T u ωk) Bu + γu exp(bT u ωk) + 2λ(ωk − θk), (4) ∂ 2f ∂ω2 k = Xp i=1 Aiαiaia T i exp(a T i ωk) (Ai + αi exp(a T i ωk))2 + Xq u=p+1 Buγubub T u exp(b T u ωk) (Bu + γu exp(bT u ωk))2 + 2λI, where I is an identity matrix. It is easy to prove that the second order derivative (Hessian) is positive definite (PD). Hence, the function in (3) is convex. We can use some solver to find a global optimal solution. Approximative Online Topic Step In (4), Ai is far larger than aiαi exp(a T i ωk) and αi exp(a T i ωk), and p is relatively small in each batch. Similarly, Bu is far larger than buγu exp(b T u ωk) and γu exp(b T u ωk), and (q − p) is relatively small. Hence, the second and third terms in (4) are much smaller than the other two. Consequently, we can remove these two smaller terms to get the following approximative gradient: ∂f ∂ωk ≈ −Xp i=1 ai + 2λ(ωk − θk). Based on the above approximative gradient, we can recover the following approximative objective function of (2): minimize − Xp i=1 a T i ωk + λ Xn k=1 kωk − θkk 2 2 (5) subject to : ωk 0, 1 T ωk = 1. We call the OEM variant in (5) approximative OEM and the original OEM in (2) full OEM. In our experiments, we find that the approximative OEM achieves accuracy comparable with that of full OEM with much less computational cost
Table 1:Information of data sets Table 2:Data set partition for building,training and testing. DATA SET #PAPERS #CITATIONS #UNIQUE TIMES DATA SETS BUILDING TRAINING TESTING ARXIV-TH 14226 100025 10500 ARXIV-TH 62239 1465 36328 ARXIV-PH 16526 125311 1591 ARXIV-PH 82343 1739 41229 3.3 Convergence Analysis OEM-appr:The approximative OEM with both online In each iteration,the learning algorithm ensures that the ob- 3 and topic steps,where both the topic features and pa- jective function value always decreases.Furthermore,the ob- rameters are adaptively learned over time using the ap- jective function is bounded below by 0.Hence,the learning proximative objective function in(5) algorithm will converge. 4.3 Evaluation Metrics 4 Experiment As in IVu et al..2011bl,we evaluate the above models with the following three metrics: We apply DEM and OEM to two citation networks and com- Average held-out log-likelihood:By taking a logarithm pare the results between them.Furthermore,we will also an- of the likelihood L(B)in (1)for each testing citation alyze the evolution of papers'topic proportions. event,we can get the held-out log-likelihood.We divide the summation of log-likelihood over all testing events in 4.1 Data Sets a batch by the number of events in that batch to get the Since this paper mainly focuses on citation network analysis average held-out log-likelihood of that batch.A higher which is one of the most important applications of dynam- average held-out log-likelihood indicates a better testing ic network analysis,we use two citation data sets,arXiv-TH accuracy. and arXiv-PH,which are crawled from arXiv3.The general Recall of top-K recommendation list:The recall is de- information of these data sets is summarized in Table 1. fined as the fraction of true citation events in the top The arXiv-TH data set is a collection of articles on high K(measured using likelihood)possible citation events. energy physics theory.It spans from 1993 to 1997 with Here K is the cut-point. high-resolution timestamps (millisecond resolution).The arXiv-PH data set is a collection of articles on high energy 。Average held--out normalized rank.:The“rank”for each physics phenomenology.It spans from 1993 to 1997 on a dai- citation event is the position of the true citation in the ly scale.Since the resolution is high enough,we assume that sorted list(in decreasing order of likelihoods)of possi- every new paper joins the network at a unique time and obvi- ble citations.The rank is then normalized by the number ously there can be more than one citation event happening at of possible citations.A lower rank indicates a better pre- each unique time.As mentioned in the previous section,we dictive power. update the topic proportions and parameters batch by batch More specifically,we partition the data sets into mini-batches 4.4 Results and Analysis and each mini-batch contains citation events happening in a As in DEM [Vu et al.,2011b],we split each data set into three span of unique times.For arXiv-TH the number of unique parts which are used for the building phase,training phase times per batch is 100 and for arXiv-PH the number is 20. and testing phase,respectively.The building phase is aimed And the number of events for each mini-batch is about 1500. to build up the statistics of the citation networks and it is rel- atively long in order to mitigate the truncation effect(citation 4.2 Baseline events happening before 1993 is not contained in the data set- We choose the following four models for comparison in the s)and avoid biases.In the training phase,we train the initial model parameters and the topic features.To fully demon- experiments: strate and compare the predictive power of these models,we .DEM:The original DEM with 8 link features and 50 have a relatively long testing phase and the testing phase is topic features.Note that the original DEM is not on- divided into 24 batches.Please remember that the statistics line and the parameters and topic features are fixed after (link features)are dynamically changed in both training and training. testing phases.The sizes(measured by the numbers of cita- tion events)of each phase are listed in Table 2. .OEM-B:The OEM with only online B step,where the To further cut the time cost of OEM,we randomly choose B will be adaptively updated while the topic features a fraction of unique times in every batch to optimize the topic (topic proportions)of each paper will not change over proportions.For example,when optimizing the topic pro- time. portions for paper i after the first batch comes,we randomly .OEM-full:The full OEM with both online B and topic choose 10%(here 10%is what we call citer percentage in the steps,where both the topic features and parameters are rest of this paper)of citers instead of the whole set of them. adaptively learned over time using the objective function This can be used to speed up the computation.In OEM,we in(2). set the hyperparameters =0.1 and citer percentage =10 unless otherwise stated.The effect of these hyperparameters http://snap.stanford.edu/data (citer percentage and A)will be detailed in later experiments
Table 1: Information of data sets DATA SET #PAPERS #CITATIONS #UNIQUE TIMES ARXIV-TH 14226 100025 10500 ARXIV-PH 16526 125311 1591 3.3 Convergence Analysis In each iteration, the learning algorithm ensures that the objective function value always decreases. Furthermore, the objective function is bounded below by 0. Hence, the learning algorithm will converge. 4 Experiment We apply DEM and OEM to two citation networks and compare the results between them. Furthermore, we will also analyze the evolution of papers’ topic proportions. 4.1 Data Sets Since this paper mainly focuses on citation network analysis which is one of the most important applications of dynamic network analysis, we use two citation data sets, arXiv-TH and arXiv-PH, which are crawled from arXiv3 . The general information of these data sets is summarized in Table 1. The arXiv-TH data set is a collection of articles on high energy physics theory. It spans from 1993 to 1997 with high-resolution timestamps (millisecond resolution). The arXiv-PH data set is a collection of articles on high energy physics phenomenology. It spans from 1993 to 1997 on a daily scale. Since the resolution is high enough, we assume that every new paper joins the network at a unique time and obviously there can be more than one citation event happening at each unique time. As mentioned in the previous section, we update the topic proportions and parameters batch by batch. More specifically, we partition the data sets into mini-batches and each mini-batch contains citation events happening in a span of unique times. For arXiv-TH the number of unique times per batch is 100 and for arXiv-PH the number is 20. And the number of events for each mini-batch is about 1500. 4.2 Baseline We choose the following four models for comparison in the experiments: • DEM: The original DEM with 8 link features and 50 topic features. Note that the original DEM is not online and the parameters and topic features are fixed after training. • OEM-β: The OEM with only online β step, where the β will be adaptively updated while the topic features (topic proportions) of each paper will not change over time. • OEM-full: The full OEM with both online β and topic steps, where both the topic features and parameters are adaptively learned over time using the objective function in (2). 3 http://snap.stanford.edu/data Table 2: Data set partition for building, training and testing. DATA SETS BUILDING TRAINING TESTING ARXIV-TH 62239 1465 36328 ARXIV-PH 82343 1739 41229 • OEM-appr: The approximative OEM with both online β and topic steps, where both the topic features and parameters are adaptively learned over time using the approximative objective function in (5). 4.3 Evaluation Metrics As in [Vu et al., 2011b], we evaluate the above models with the following three metrics: • Average held-out log-likelihood: By taking a logarithm of the likelihood L(β) in (1) for each testing citation event, we can get the held-out log-likelihood. We divide the summation of log-likelihood over all testing events in a batch by the number of events in that batch to get the average held-out log-likelihood of that batch. A higher average held-out log-likelihood indicates a better testing accuracy. • Recall of top-K recommendation list: The recall is de- fined as the fraction of true citation events in the top K (measured using likelihood) possible citation events. Here K is the cut-point. • Average held-out normalized rank: The “rank” for each citation event is the position of the true citation in the sorted list (in decreasing order of likelihoods) of possible citations. The rank is then normalized by the number of possible citations. A lower rank indicates a better predictive power. 4.4 Results and Analysis As in DEM [Vu et al., 2011b], we split each data set into three parts which are used for the building phase, training phase and testing phase, respectively. The building phase is aimed to build up the statistics of the citation networks and it is relatively long in order to mitigate the truncation effect (citation events happening before 1993 is not contained in the data sets) and avoid biases. In the training phase, we train the initial model parameters and the topic features. To fully demonstrate and compare the predictive power of these models, we have a relatively long testing phase and the testing phase is divided into 24 batches. Please remember that the statistics (link features) are dynamically changed in both training and testing phases. The sizes (measured by the numbers of citation events) of each phase are listed in Table 2. To further cut the time cost of OEM, we randomly choose a fraction of unique times in every batch to optimize the topic proportions. For example, when optimizing the topic proportions for paper i after the first batch comes, we randomly choose 10% (here 10% is what we call citer percentage in the rest of this paper) of citers instead of the whole set of them. This can be used to speed up the computation. In OEM, we set the hyperparameters λ = 0.1 and citer percentage = 10 unless otherwise stated. The effect of these hyperparameters (citer percentage and λ) will be detailed in later experiments
The testing procedure of OEM is detailed as follows.We mance increases moderately and the time cost increases sig- first train an initial OEM using the building and training data nificantly when the citer percentage is larger than 10%,which sets.Hence,this initial OEM is actually equivalent to DEM. means that 10%may be a good choice of the citer percentage. Then,we evaluate the predictive power on Batch 1 of the test- Overall,our model is not sensitive the these hyperparameters. ing set(note that we don't use the data of Batch I during the To show the topic evolution of papers,we choose three sets training).After that (testing Batch 1).we absorb Batch 1 as of papers in arXiv-TH.To avoid clutter we take the average extra training data and update OEM with the online learning topic proportions of each paper set and we select the topics algorithms.Then,we use the updated OEM to predict Batch (elements of vectors)which have the highest average propor- 2.That is to say,to test a specific batch,we DO NOT use any tion on all 24 time periods to demonstrate the effect of topic data from this batch for training.Hence,the testing results of evolution.Specifically,we use S.=[r1,r2,...,ri}to de- OEM will truly reflect the generalization/predictive ability of note the set of papers cited at unique time t(thus papers in OEM. the same set are cited by the same paper).And we denote Figure 1 (a)and (b)show the average held-out log- likelihood for all the models.Because the initial OEM is e- t=wr to be the average topic vector of paper set St. quivalent to DEM,we can see that all the models have the We choose Ssoor and Ssoos to be our examples,which are same performance on testing Batch 1.However,as time goes shown in Figure 1 (g)and (h). on,the performance of DEM will dramatically decrease while From Figure 1 (g),we can see that proportions of Topic 7 all the OEM variants can prevent the performance from de- creasing.For example,from Figure 1 (a),we can see that (namelyand Topic46 (namely are decreasing the log-likelihood of the original DEM decreases significant- over time.Yet proportions of Topic 15and Topic 44 ly over time and the log-likelihood of OEM-B decreases only )show the opposite tendency.One explanation is that from-8.24 to-8.97.Our OEM-full outperforms the previous the set of papers cited at the 8001st unique time is originally two with the log-likelihood ranging from-7.89 to -8.38 and on some sub-fields of physics,but as time goes on,the val- the log-likelihood of OEM-appr decreases only from-8.24 to ues of the papers are found by researchers in other sub-fields. -8.56. Citations by papers of other sub-fields will transfer some pro- Figure 1 (c)and (d)show the recall of top-K recommen- portions from the original topics (Topic 7 and Topic 46)to dation list with K=250.We can find that the performance of the new topics (Topic 15 and Topic 44).The same thing hap- DEM,OEM-B,and OEM-appr decreases over time.Howev- pens in the fields of statistics,phycology (as original topics) er,our OEM-full can prevent the performance from decreas- and machine learning (as the newly arising topic).The topic ing.Although the performance of OEM-appr also decreas- evolution of the paper set at the 8005th unique time(name- es over time,it still outperforms DEM.The performance of ly Ssoo5)is similar to the one at the 8001st unique time,as OEM-B is as bad as that of DEM,which implies that the topic shown in Figure 1 (h). features are very informative and it is not enough to learn only 3 for this metric.Please note similar results can be observed 5 Related Work when K takes other values.Here,we omit those results due to space limitation. Dynamic network analysis (DNA)has been widely studied by Figure 1 (e)and (f)show the average held-out normalized researchers from a diversity of fields such as social networks rank.We find that the performance of DEM and OEM-B citation networks and email networks Leskovec et al.,2005: can not be improved over time.However,the performance Kossinets and Watts.2006:Viswanath et al..2009.However. of OEM-full and OEM-appr will be improved as time goes most existing works [Sarkar and Moore,2005;Hanneke et al., on.Note that a lower rank indicates a better predictive pow- 2010;Fu et al.,2009;Wyatt et al.,2010;Foulds et al,2011; er.Once again,the bad performance of OEM-3 indicates Ho et al.,2011]focus either on small networks at a fine tem- the importance of topic features for this metric.Because the poral granularity or on large-scale networks at a rough tem- number of possible citations for later batches are larger than poral granularity.Although DEM [Vu et al.,2011b]is able that of former batches,the performance in terms of absolute to handle large-scale networks at a fine temporal granulari- rank of DEM actually decreases over time.But OEM-full ty,its parameters remain static,which leads to an obvious can present the performance in terms of absolute rank from decrease of accuracy over time.Continuous-time regression decreasing.This conforms to the results in Figure 1 (a),(b), models for longitudinal networks [Vu et al..201lal allow the (c),and (d). parameter to be updated over time.However they are used to Table 3 compares the computation cost between full OEM model edges rather than nodes.Furthermore,they do not take and approximative OEM.We can see that although the ap- the topic information of papers into consideration.But in our proximative OEM achieves a slightly worse predictive accu- proposed OEM,topic information is effectively integrated in- racy than full OEM,it saves more than 50%of the time. to the model's building,training and testing phases. To measure how the hyperparamters,citer percentage and Another line of research related to our work is about topic A,affect the predictive performance,we use the arXiv-TH models.By extending the original LDA topic model [Blei data set and compute the average log-likelihood of all testing et al.,2003],many methods have been proposed to mod- batches for each choice of citer percentage and A.The results el the topic evolution over time [Wang et al.,2008;2009; are shown in Table 4 and Table 5.Table 4 indicates that 0.1 is Chen et al.,2012;Dubey et al.,2013].There also ex- the best value for A.Table 5 shows that the predictive perfor- ist some methods which can model both network structure
The testing procedure of OEM is detailed as follows. We first train an initial OEM using the building and training data sets. Hence, this initial OEM is actually equivalent to DEM. Then, we evaluate the predictive power on Batch 1 of the testing set (note that we don’t use the data of Batch 1 during the training). After that (testing Batch 1), we absorb Batch 1 as extra training data and update OEM with the online learning algorithms. Then, we use the updated OEM to predict Batch 2. That is to say, to test a specific batch, we DO NOT use any data from this batch for training. Hence, the testing results of OEM will truly reflect the generalization/predictive ability of OEM. Figure 1 (a) and (b) show the average held-out loglikelihood for all the models. Because the initial OEM is equivalent to DEM, we can see that all the models have the same performance on testing Batch 1. However, as time goes on, the performance of DEM will dramatically decrease while all the OEM variants can prevent the performance from decreasing. For example, from Figure 1 (a), we can see that the log-likelihood of the original DEM decreases significantly over time and the log-likelihood of OEM-β decreases only from -8.24 to -8.97. Our OEM-full outperforms the previous two with the log-likelihood ranging from -7.89 to -8.38 and the log-likelihood of OEM-appr decreases only from -8.24 to -8.56. Figure 1 (c) and (d) show the recall of top-K recommendation list with K=250. We can find that the performance of DEM, OEM-β, and OEM-appr decreases over time. However, our OEM-full can prevent the performance from decreasing. Although the performance of OEM-appr also decreases over time, it still outperforms DEM. The performance of OEM-β is as bad as that of DEM, which implies that the topic features are very informative and it is not enough to learn only β for this metric. Please note similar results can be observed when K takes other values. Here, we omit those results due to space limitation. Figure 1 (e) and (f) show the average held-out normalized rank. We find that the performance of DEM and OEM-β can not be improved over time. However, the performance of OEM-full and OEM-appr will be improved as time goes on. Note that a lower rank indicates a better predictive power. Once again, the bad performance of OEM-β indicates the importance of topic features for this metric. Because the number of possible citations for later batches are larger than that of former batches, the performance in terms of absolute rank of DEM actually decreases over time. But OEM-full can present the performance in terms of absolute rank from decreasing. This conforms to the results in Figure 1 (a), (b), (c), and (d). Table 3 compares the computation cost between full OEM and approximative OEM. We can see that although the approximative OEM achieves a slightly worse predictive accuracy than full OEM, it saves more than 50% of the time. To measure how the hyperparamters, citer percentage and λ, affect the predictive performance, we use the arXiv-TH data set and compute the average log-likelihood of all testing batches for each choice of citer percentage and λ. The results are shown in Table 4 and Table 5. Table 4 indicates that 0.1 is the best value for λ. Table 5 shows that the predictive performance increases moderately and the time cost increases significantly when the citer percentage is larger than 10%, which means that 10% may be a good choice of the citer percentage. Overall, our model is not sensitive the these hyperparameters. To show the topic evolution of papers, we choose three sets of papers in arXiv-TH. To avoid clutter we take the average topic proportions of each paper set and we select the topics (elements of vectors) which have the highest average proportion on all 24 time periods to demonstrate the effect of topic evolution. Specifically, we use St = {r1, r2, . . . , rl} to denote the set of papers cited at unique time t (thus papers in the same set are cited by the same paper). And we denote φt = 1 l P l i=1 ωri to be the average topic vector of paper set St. We choose S8001 and S8005 to be our examples, which are shown in Figure 1 (g) and (h). From Figure 1 (g), we can see that proportions of Topic 7 (namely φ (7) 8001) and Topic 46 (namely φ (46) 8001) are decreasing over time. Yet proportions of Topic 15 (φ (15) 8001) and Topic 44 (φ (44) 8001) show the opposite tendency. One explanation is that the set of papers cited at the 8001st unique time is originally on some sub-fields of physics, but as time goes on, the values of the papers are found by researchers in other sub-fields. Citations by papers of other sub-fields will transfer some proportions from the original topics (Topic 7 and Topic 46) to the new topics (Topic 15 and Topic 44). The same thing happens in the fields of statistics, phycology (as original topics) and machine learning (as the newly arising topic). The topic evolution of the paper set at the 8005th unique time (namely S8005) is similar to the one at the 8001st unique time, as shown in Figure 1 (h). 5 Related Work Dynamic network analysis (DNA) has been widely studied by researchers from a diversity of fields such as social networks, citation networks and email networks [Leskovec et al., 2005; Kossinets and Watts, 2006; Viswanath et al., 2009]. However, most existing works[Sarkar and Moore, 2005; Hanneke et al., 2010; Fu et al., 2009; Wyatt et al., 2010; Foulds et al., 2011; Ho et al., 2011] focus either on small networks at a fine temporal granularity or on large-scale networks at a rough temporal granularity. Although DEM [Vu et al., 2011b] is able to handle large-scale networks at a fine temporal granularity, its parameters remain static, which leads to an obvious decrease of accuracy over time. Continuous-time regression models for longitudinal networks [Vu et al., 2011a] allow the parameter to be updated over time. However they are used to model edges rather than nodes. Furthermore, they do not take the topic information of papers into consideration. But in our proposed OEM, topic information is effectively integrated into the model’s building, training and testing phases. Another line of research related to our work is about topic models. By extending the original LDA topic model [Blei et al., 2003], many methods have been proposed to model the topic evolution over time [Wang et al., 2008; 2009; Chen et al., 2012; Dubey et al., 2013]. There also exist some methods which can model both network structure
·OEM- OEM-ful -OEM-ll --OEM-full 5 20 5 20 5 20 5 20 (a)arXiv-TH (b)arXiv-PH (c)arXiv-TH(K=250) (d)arXiv-PH(K=250) 02 -OEM- OEM-bet CEM-full 0.1 5 20 20 15 20 10 Time 15 20 (e)arXiv-TH (①arXiv-PH (g)Paper sets cited at the 8001st(h)Paper sets cited at the 8005th unique time unique time Figure 1:(a)and(b)are the average held-out log-likelihood of testing citation events.(c)and (d)are the recall of top-K recommendation lists.(e)and (f)are the average held-oud normalized ranks of testing citation events.Since all models have the same initial parameters after the building and training phases,all models have the same performance on the first testing batch,which can be seen from (a)to (f).(g)and(h)are the topic evolution of sets of papers cited at the 8001st and 8005th unique time.To avoid clutter,we only show the topics with the largest proportions (top topics). Table 3:Computation time (in seconds)of OEM-full and OEM-appr with A=0.1. CITER PERCENTAGE 2%5%10%20%30%50%100% OEM-FULL 0.130.430.871.421.962.613.91 OEM-APPR 0.060.220.410.700.95 1.29 1.94 Table 4:Average held-out log-likelihood when citer percentage is 10% 入 10-0.010.10.5 1 2 10 LOG-LIKELIHOOD -8.61-8.33-8.15-8.28-8.33-8.35-8.56 Table 5:Average held-out log-likelihood when A=0.1 CITER PERCENTAGE 2%5%10% 20%30%50%100% LOG-LIKELIHOOD -8.94-8.43-8.15-8.10-8.09 -8.03-7.98 AVERAGE TIME 0.13 0.430.87 1.42 1.96 2.61 3.91 and node features [Kataria et al.,2011;Hu et al.,2012; successfully overcome the problem of DEM whose predictive Krafft et al.,2012].Generally,it is very time-consuming accuracy will decrease significantly over time.Experimental to simultaneously update the topics and topic proportions for results on real-world citation networks demonstrate that OEM time-varying data. can achieve very promising performance in real applications Instead of utilizing some existing online LDA model- Although the experiments in this paper are only for paper s [Canini et al.,2009;Hoffman et al.,2010],we choose to citation networks,as stated in [Vu et al.,2011b],our model directly adjust the topic proportions of papers.This is be- can be generalized to other types of networks,which will be cause online inference of LDA interacts with the text contents pursued in our future work. of the papers,which will take a lot more time to update all the LDA vectors.However in our OEM,we only need to solve small convex optimization problems to update the vectors. 7 Acknowledgements 6 Conclusion This work is supported by the NSFC (No.61100125),the In this paper,an online egocentric model (OEM)is proposed 863 Program of China (No.2011AA01A202),and the Pro- for evolving citation network modeling.By adaptively learn- gram for Changjiang Scholars and Innovative Research Team ing the parameters and topic features over time,OEM has in University of China (IRT1158,PCSIRT)
0 5 10 15 20 25 −12 −11 −10 −9 −8 Paper batches Average log−likelihood DEM OEM−beta OEM−appr OEM−full (a) arXiv-TH 0 5 10 15 20 25 −11 −10 −9 −8 Paper batches Average log−likelihood DEM OEM−beta OEM−appr OEM−full (b) arXiv-PH 0 5 10 15 20 25 0.25 0.3 0.35 0.4 0.45 Paper batches Recall DEM OEM−beta OEM−appr OEM−full (c) arXiv-TH(K=250) 0 5 10 15 20 25 0.2 0.25 0.3 0.35 0.4 Paper batches Recall DEM OEM−beta OEM−appr OEM−full (d) arXiv-PH(K=250) 0 5 10 15 20 25 0 0.02 0.04 0.06 0.08 0.1 0.12 Paper batches Average normalized rank DEM OEM−beta OEM−appr OEM−full (e) arXiv-TH 0 5 10 15 20 25 0.12 0.14 0.16 0.18 0.2 0.22 Paper batches Average normalized rank DEM OEM−beta OEM−appr OEM−full (f) arXiv-PH 0 5 10 15 20 25 0 0.1 0.2 0.3 0.4 0.5 Time Topic propotions Topic 7 Topic 15 Topic 44 Topic 46 (g) Paper sets cited at the 8001st unique time 0 5 10 15 20 25 0 0.1 0.2 0.3 0.4 0.5 0.6 Time Topic propotions Topic 7 Topic 10 Topic 37 (h) Paper sets cited at the 8005th unique time Figure 1: (a) and (b) are the average held-out log-likelihood of testing citation events. (c) and (d) are the recall of top-K recommendation lists. (e) and (f) are the average held-oud normalized ranks of testing citation events. Since all models have the same initial parameters after the building and training phases, all models have the same performance on the first testing batch, which can be seen from (a) to (f). (g) and (h) are the topic evolution of sets of papers cited at the 8001st and 8005th unique time. To avoid clutter, we only show the topics with the largest proportions (top topics). Table 3: Computation time (in seconds) of OEM-full and OEM-appr with λ = 0.1. CITER PERCENTAGE 2% 5% 10% 20% 30% 50% 100% OEM-FULL 0.13 0.43 0.87 1.42 1.96 2.61 3.91 OEM-APPR 0.06 0.22 0.41 0.70 0.95 1.29 1.94 Table 4: Average held-out log-likelihood when citer percentage is 10% λ 10−4 0.01 0.1 0.5 1 2 104 LOG-LIKELIHOOD -8.61 -8.33 -8.15 -8.28 -8.33 -8.35 -8.56 Table 5: Average held-out log-likelihood when λ = 0.1 CITER PERCENTAGE 2% 5% 10% 20% 30% 50% 100% LOG-LIKELIHOOD -8.94 -8.43 -8.15 -8.10 -8.09 -8.03 -7.98 AVERAGE TIME 0.13 0.43 0.87 1.42 1.96 2.61 3.91 and node features [Kataria et al., 2011; Hu et al., 2012; Krafft et al., 2012]. Generally, it is very time-consuming to simultaneously update the topics and topic proportions for time-varying data. Instead of utilizing some existing online LDA models [Canini et al., 2009; Hoffman et al., 2010], we choose to directly adjust the topic proportions of papers. This is because online inference of LDA interacts with the text contents of the papers, which will take a lot more time to update all the LDA vectors. However in our OEM, we only need to solve small convex optimization problems to update the vectors. 6 Conclusion In this paper, an online egocentric model (OEM) is proposed for evolving citation network modeling. By adaptively learning the parameters and topic features over time, OEM has successfully overcome the problem of DEM whose predictive accuracy will decrease significantly over time. Experimental results on real-world citation networks demonstrate that OEM can achieve very promising performance in real applications. Although the experiments in this paper are only for paper citation networks, as stated in [Vu et al., 2011b], our model can be generalized to other types of networks, which will be pursued in our future work. 7 Acknowledgements This work is supported by the NSFC (No. 61100125), the 863 Program of China (No. 2011AA01A202), and the Program for Changjiang Scholars and Innovative Research Team in University of China (IRT1158, PCSIRT)
References [Li et al.,2009a]Wu-Jun Li,Dit-Yan Yeung,and Zhihua Zhang. [Blei et al.,2003]David M.Blei,Andrew Y.Ng,and Michael I. Probabilistic relational PCA.In N/PS.2009. Jordan.Latent dirichlet allocation.Journal of Machine Learning [Li et al.,2009b]Wu-Jun Li,Zhihua Zhang,and Dit-Yan Yeung. Research.3:992-1022.2003. Latent wishart processes for relational kernel learning.In AIS- TATS.2009 Canini et al.,2009]Kevin Robert Canini,Lei Shi,and Thomas L Griffiths.Online inference of topics with latent dirichlet alloca- [Li et al.,2011]Wu-Jun Li.Dit-Yan Yeung,and Zhihua Zhang. tion.In A/STATS.2009. Generalized latent factor models for social network analysis.In [Chen et al.,2012]Changyou Chen,Nan Ding,and Wray L.Bun- ICAI,2011. tine.Dependent hierarchical normalized random measures for [McAuley and Leskovec,2012]Julian J.McAuley and Jure dynamic topic modeling.In /CML,2012. Leskovec.Learning to discover social circles in ego networks. In NIPS.2012. [Dubey et al.,2013]Avinava Dubey,Ahmed Hefny.Sinead Williamson,and Eric P.Xing.A non-parametric mixture model [Myers et al.,2012]Seth A.Myers,Chenguang Zhu,and Jure for topic modeling over time.In SDM.2013. Leskovec.Information diffusion and external influence in net- works.In ACM SIGKDD,2012. [Foulds et al.,2011]James R.Foulds,Christopher DuBois, Arthur U.Asuncion,Carter T.Butts,and Padhraic Smyth.A INori et al..2011]Nozomi Nori.Danushka Bollegala.and Mitsuru dynamic relational infinite feature model for longitudinal social Ishizuka.Interest prediction on multinomial.time-evolving so- networks.In A/STATS.2011. cial graph.In //CAI.2011. [Fu et al.,2009]Wenjie Fu,Le Song,and Eric P.Xing.Dynamic [Richard et al.,2012]Emile Richard,Stephane Gaiffas,and Nico- mixed membership blockmodel for evolving networks.In /CML las Vayatis.Link prediction in graphs with autoregressive fea- tures.In NIPS.2012. 2009. [Sarkar and Moore,2005]Purnamrita Sarkar and Andrew W. [Goldenberg et al.,2009]Anna Goldenberg,Alice X.Zheng, Moore.Dynamic social network analysis using latent space mod- Stephen E.Fienberg,and Edoardo M.Airoldi.A survey of s- els.SIGKDD Explorations,7(2):31-40,2005. tatistical network models.Foundations and Trends in Machine Learning,2:129-233,2009. [Sarkar et al.,2012]Purnamrita Sarkar,Deepayan Chakrabarti,and Michael I.Jordan.Nonparametric link prediction in dynamic net- [Hanneke et al.,2010]Steve Hanneke,Wenjie Fu,and Eric P.Xing. works.In /CML.2012. Discrete temporal models of social networks.Electronic Journal ofS1 atistics,4:585-605.2010】 [Snijders,2005]Tom A B Snijders.Models for longitudinal net- work data.Models and Methods in Social Network Analysis. [Ho et al.,2011]Qirong Ho,Le Song.and Eric P.Xing.Evolv- pages215-247,2005. ing cluster mixed-membership blockmodel for time-evolving net- works.In A/STATS,2011. [Viswanath et al.,2009]Bimal Viswanath,Alan Mislove,Meey- oung Cha,and P.Krishna Gummadi.On the evolution of user [Hoffman et al.,2010]Matthew D.Hoffman,David M.Blei,and interaction in facebook.In ACM Workshop on Online Social Net- Francis R.Bach.Online learning for latent dirichlet allocation. works.2009. In NIPS.2010. [Vu et al.,2011a]Duy Vu,Arthur U.Asuncion,David Hunter,and [Hu et al,2012]Yuheng Hu,Ajita John,Fei Wang.and Subbarao Padhraic Smyth.Continuous-time regression models for longitu- Kambhampati.ET-LDA:joint topic modeling for aligning events dinal networks.In NIPS,2011. and their twitter feedback.In AAAl.2012. [Vu et al.,2011b]Duy Vu,Arthur U.Asuncion,David Hunter,and [Jin et al.,2011]Yingzi Jin,Ching-Yung Lin,Yutaka Matsuo,and Padhraic Smyth.Dynamic egocentric models for citation net- Mitsuru Ishizuka.Mining longitudinal network for predicting works.In /CML.2011. company value.In //CA/.2011. [Wang and Groth,2011]Shenghui Wang and Paul T.Groth.A Kataria et al.,2011]Saurabh Kataria.Prasenjit Mitra.Cornelia framework for longitudinal influence measurement between com- Caragea.and C.Lee Giles.Context sensitive topic models for munication content and social networks.In /CA/.2011. author influence in document networks.In //CA/.2011. [Wang et al.,2008]Chong Wang,David M.Blei,and David Heck- [Kim and Leskovec,2012]Myunghwan Kim and Jure Leskovec erman.Continuous time dynamic topic models.In UA/,2008. Latent multi-group membership graph model.In /CML,2012. [Wang et al.,2009]Chong Wang,Bo Thiesson,Christopher Meek, [Kossinets and Watts,2006]Gueorgi Kossinets and Duncan J and David M.Blei.Markov topic models.In A/STATS,2009. Watts.Empirical analysis of an evolving social network.Sci- [Wang et al.,2010]Chi Wang,Jiawei Han,Yuntao Jia,Jie Tang, ence,311(5757):88-90.2006 Duo Zhang,Yintao Yu,and Jingyi Guo.Mining advisor- [Krafft et al.,2012]Peter Krafft,Juston Moore,Hanna Wallach, advisee relationships from research publication networks.In and Bruce Desmarais.Topic-partitioned multinetwork embed- ACM SIGKDD.2010. dings.In NIPS.2012. [Wasserman.1980]Stanley Wasserman.Analyzing social networks [Leskovec et al.,2005]Jure Leskovec,Jon M.Kleinberg,and as stochastic processes.Journal of the American Statistical As- Christos Faloutsos.Graphs over time:densification laws,shrink- s0 ciation,75(370):280-294.1980. ing diameters and possible explanations.In ACM S/GKDD,2005. [Wyatt et al.,2010]Danny Wyatt,Tanzeem Choudhury,and Jef- [Li and Yeung.2009]Wu-Jun Li and Dit-Yan Yeung.Relation reg- f Bilmes.Discovering long range properties of social networks ularized matrix factorization.In //CA/.2009. with multi-valued time-inhomogeneous models.In AAA/,2010. [Zhu,2012]Jun Zhu.Max-margin nonparametric latent feature [Li and Yeung,2012]Wu-Jun Li and Dit-Yan Yeung.Sparse prob- models for link prediction.In /CML,2012. abilistic relational projection.In AAA/,2012
References [Blei et al., 2003] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:992–1022, 2003. [Canini et al., 2009] Kevin Robert Canini, Lei Shi, and Thomas L. Griffiths. Online inference of topics with latent dirichlet allocation. In AISTATS, 2009. [Chen et al., 2012] Changyou Chen, Nan Ding, and Wray L. Buntine. Dependent hierarchical normalized random measures for dynamic topic modeling. In ICML, 2012. [Dubey et al., 2013] Avinava Dubey, Ahmed Hefny, Sinead Williamson, and Eric P. Xing. A non-parametric mixture model for topic modeling over time. In SDM, 2013. [Foulds et al., 2011] James R. Foulds, Christopher DuBois, Arthur U. Asuncion, Carter T. Butts, and Padhraic Smyth. A dynamic relational infinite feature model for longitudinal social networks. In AISTATS, 2011. [Fu et al., 2009] Wenjie Fu, Le Song, and Eric P. Xing. Dynamic mixed membership blockmodel for evolving networks. In ICML, 2009. [Goldenberg et al., 2009] Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg, and Edoardo M. Airoldi. A survey of statistical network models. Foundations and Trends in Machine Learning, 2:129–233, 2009. [Hanneke et al., 2010] Steve Hanneke, Wenjie Fu, and Eric P. Xing. Discrete temporal models of social networks. Electronic Journal of Statistics, 4:585–605, 2010. [Ho et al., 2011] Qirong Ho, Le Song, and Eric P. Xing. Evolving cluster mixed-membership blockmodel for time-evolving networks. In AISTATS, 2011. [Hoffman et al., 2010] Matthew D. Hoffman, David M. Blei, and Francis R. Bach. Online learning for latent dirichlet allocation. In NIPS, 2010. [Hu et al., 2012] Yuheng Hu, Ajita John, Fei Wang, and Subbarao Kambhampati. ET-LDA: joint topic modeling for aligning events and their twitter feedback. In AAAI, 2012. [Jin et al., 2011] Yingzi Jin, Ching-Yung Lin, Yutaka Matsuo, and Mitsuru Ishizuka. Mining longitudinal network for predicting company value. In IJCAI, 2011. [Kataria et al., 2011] Saurabh Kataria, Prasenjit Mitra, Cornelia Caragea, and C. Lee Giles. Context sensitive topic models for author influence in document networks. In IJCAI, 2011. [Kim and Leskovec, 2012] Myunghwan Kim and Jure Leskovec. Latent multi-group membership graph model. In ICML, 2012. [Kossinets and Watts, 2006] Gueorgi Kossinets and Duncan J Watts. Empirical analysis of an evolving social network. Science, 311(5757):88–90, 2006. [Krafft et al., 2012] Peter Krafft, Juston Moore, Hanna Wallach, and Bruce Desmarais. Topic-partitioned multinetwork embeddings. In NIPS, 2012. [Leskovec et al., 2005] Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In ACM SIGKDD, 2005. [Li and Yeung, 2009] Wu-Jun Li and Dit-Yan Yeung. Relation regularized matrix factorization. In IJCAI, 2009. [Li and Yeung, 2012] Wu-Jun Li and Dit-Yan Yeung. Sparse probabilistic relational projection. In AAAI, 2012. [Li et al., 2009a] Wu-Jun Li, Dit-Yan Yeung, and Zhihua Zhang. Probabilistic relational PCA. In NIPS, 2009. [Li et al., 2009b] Wu-Jun Li, Zhihua Zhang, and Dit-Yan Yeung. Latent wishart processes for relational kernel learning. In AISTATS, 2009. [Li et al., 2011] Wu-Jun Li, Dit-Yan Yeung, and Zhihua Zhang. Generalized latent factor models for social network analysis. In IJCAI, 2011. [McAuley and Leskovec, 2012] Julian J. McAuley and Jure Leskovec. Learning to discover social circles in ego networks. In NIPS, 2012. [Myers et al., 2012] Seth A. Myers, Chenguang Zhu, and Jure Leskovec. Information diffusion and external influence in networks. In ACM SIGKDD, 2012. [Nori et al., 2011] Nozomi Nori, Danushka Bollegala, and Mitsuru Ishizuka. Interest prediction on multinomial, time-evolving social graph. In IJCAI, 2011. [Richard et al., 2012] Emile Richard, Stephane Gaiffas, and Nicolas Vayatis. Link prediction in graphs with autoregressive features. In NIPS, 2012. [Sarkar and Moore, 2005] Purnamrita Sarkar and Andrew W. Moore. Dynamic social network analysis using latent space models. SIGKDD Explorations, 7(2):31–40, 2005. [Sarkar et al., 2012] Purnamrita Sarkar, Deepayan Chakrabarti, and Michael I. Jordan. Nonparametric link prediction in dynamic networks. In ICML, 2012. [Snijders, 2005] Tom A B Snijders. Models for longitudinal network data. Models and Methods in Social Network Analysis, pages 215–247, 2005. [Viswanath et al., 2009] Bimal Viswanath, Alan Mislove, Meeyoung Cha, and P. Krishna Gummadi. On the evolution of user interaction in facebook. In ACM Workshop on Online Social Networks, 2009. [Vu et al., 2011a] Duy Vu, Arthur U. Asuncion, David Hunter, and Padhraic Smyth. Continuous-time regression models for longitudinal networks. In NIPS, 2011. [Vu et al., 2011b] Duy Vu, Arthur U. Asuncion, David Hunter, and Padhraic Smyth. Dynamic egocentric models for citation networks. In ICML, 2011. [Wang and Groth, 2011] Shenghui Wang and Paul T. Groth. A framework for longitudinal influence measurement between communication content and social networks. In IJCAI, 2011. [Wang et al., 2008] Chong Wang, David M. Blei, and David Heckerman. Continuous time dynamic topic models. In UAI, 2008. [Wang et al., 2009] Chong Wang, Bo Thiesson, Christopher Meek, and David M. Blei. Markov topic models. In AISTATS, 2009. [Wang et al., 2010] Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo. Mining advisoradvisee relationships from research publication networks. In ACM SIGKDD, 2010. [Wasserman, 1980] Stanley Wasserman. Analyzing social networks as stochastic processes. Journal of the American Statistical Association, 75(370):280–294, 1980. [Wyatt et al., 2010] Danny Wyatt, Tanzeem Choudhury, and Jeff Bilmes. Discovering long range properties of social networks with multi-valued time-inhomogeneous models. In AAAI, 2010. [Zhu, 2012] Jun Zhu. Max-margin nonparametric latent feature models for link prediction. In ICML, 2012