正在加载图片...
Online Egocentric Models for Citation Networks Hao Wang and Wu-Jun Li Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering,Shanghai Jiao Tong University,China js05212@sjtu.edu.cn,liwujun@cs.sjtu.edu.cn Abstract Although DEM can dynamically update the link features With the emergence of large-scale evolving(time- (statistics)of the nodes(papers).the learned parameters and topic features of DEM!are static(fixed)during the prediction varying)networks,dynamic network analy- process for evolving networks.Hence,DEM suffers from a sis (DNA)has become a very hot research topic in recent years.Although a lot of DNA methods decrease of accuracy over time because typically both the pa- rameters and the topic features of the papers will evolve over have been proposed by researchers from differ- time. ent communities.most of them can only model For example,one of the link features reflects the in- degree(number of citations)of a paper until some time point. snapshot data recorded at a very rough temporal granularity.Recently,some models have been As time goes on,the cumulative number of citations for ex- proposed for DNA which can be used to model isting papers will become larger and larger.Hence,the dis- tribution of the citations for the whole data set will change large-scale citation networks at a fine temporal over time.As a consequence,the corresponding parameter granularity.However,they suffer from a significant decrease of accuracy over time because the learned which typically reflects the distribution of the features should also change over time.At first sight,it seems a little confus- parameters or node features are static (fixed) during the prediction process for evolving citation ing that the topic features of a paper can change over time networks.In this paper,we propose a novel model, because the content of a published paper is typically static. called online egocentric model (OEM),to learn However,the citations to an existing paper are dynamic.It is more reasonable to combine both the citation and content time-varying parameters and node features for information to decide the topic of a paper.For example,a evolving citation networks.Experimental results on real-world citation networks show that our paper about neural network considered to be highly related OEM can not only prevent the prediction accuracy to the topic psychology in the 1950s may be more likely to be classified as a machine learning paper today because more from decreasing over time but also uncover the and more machine learning papers cite that neural network evolution of topics in citation networks. paper.Hence,it is very obvious to find that the topic features will also change over time.Without the ability to adaptively 1 Introduction learn the parameters and topic features,DEM fails to model Network analysis Goldenberg et al.,2009:Li and Ye- the evolution of networks.This phenomenon of decreasing ung,2009;Li et al.,2009a;2009b;Wang et al.,2010; prediction accuracy over time can also be observed from the Li et al.,2011;Li and Yeung,2012;Zhu,2012;McAuley experimental results in Figure 2 of [Vu et al.,2011b]. and Leskovec,2012;Kim and Leskovec,2012;Myers et al., In this paper,we propose an online extension of DEM, 2012],especially dynamic network analysis (DNA)has be- called online egocentric model (OEM),to capture the evo- come increasingly important in many fields like social sci- lution of both topic features and model parameters.The con- ence and biology.Although there have been a lot of work- tributions of this paper are briefly outlined as follows: s on DNA,most of them either focus on large-scale da- OEM takes the evolution of both topic features and pa- ta at a very rough temporal granularity [Fu et al.,2009: rameters into consideration and maintains high predic- Wyatt et al.,2010;Hanneke et al.,2010;Richard et al.,2012; tion accuracy regardless of the elapse of time Sarkar et al.,2012;Jin et al.,2011;Wang and Groth,2011; Nori et al.,2011]or focus on small networks at a fine tem- poral granularity [Wasserman,1980;Snijders,2005].Re- In [Vu et al.,2011b],there are two variants of DEM.One mod- cently,dynamic egocentric model (DEM)[Vu et al.,2011b], els only the link features,and the other models both the link and top- ic features (textual information).Unless otherwise stated,the DEM which is based on multivariate counting processes,has been in this paper refers to the variant with both link and topic features successfully proposed to model large-scale evolving citation because it achieves far better accuracy than DEM without topic fea- networks at a fine temporal granularity of individual time- tures [Vu et al.,2011b]and we can always get the topic features for stamped events. a paper if we want.Online Egocentric Models for Citation Networks Hao Wang and Wu-Jun Li Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong University, China js05212@sjtu.edu.cn, liwujun@cs.sjtu.edu.cn Abstract With the emergence of large-scale evolving (time￾varying) networks, dynamic network analy￾sis (DNA) has become a very hot research topic in recent years. Although a lot of DNA methods have been proposed by researchers from differ￾ent communities, most of them can only model snapshot data recorded at a very rough temporal granularity. Recently, some models have been proposed for DNA which can be used to model large-scale citation networks at a fine temporal granularity. However, they suffer from a significant decrease of accuracy over time because the learned parameters or node features are static (fixed) during the prediction process for evolving citation networks. In this paper, we propose a novel model, called online egocentric model (OEM), to learn time-varying parameters and node features for evolving citation networks. Experimental results on real-world citation networks show that our OEM can not only prevent the prediction accuracy from decreasing over time but also uncover the evolution of topics in citation networks. 1 Introduction Network analysis [Goldenberg et al., 2009; Li and Ye￾ung, 2009; Li et al., 2009a; 2009b; Wang et al., 2010; Li et al., 2011; Li and Yeung, 2012; Zhu, 2012; McAuley and Leskovec, 2012; Kim and Leskovec, 2012; Myers et al., 2012], especially dynamic network analysis (DNA) has be￾come increasingly important in many fields like social sci￾ence and biology. Although there have been a lot of work￾s on DNA, most of them either focus on large-scale da￾ta at a very rough temporal granularity [Fu et al., 2009; Wyatt et al., 2010; Hanneke et al., 2010; Richard et al., 2012; Sarkar et al., 2012; Jin et al., 2011; Wang and Groth, 2011; Nori et al., 2011] or focus on small networks at a fine tem￾poral granularity [Wasserman, 1980; Snijders, 2005]. Re￾cently, dynamic egocentric model (DEM) [Vu et al., 2011b], which is based on multivariate counting processes, has been successfully proposed to model large-scale evolving citation networks at a fine temporal granularity of individual time￾stamped events. Although DEM can dynamically update the link features (statistics) of the nodes (papers), the learned parameters and topic features of DEM1 are static (fixed) during the prediction process for evolving networks. Hence, DEM suffers from a decrease of accuracy over time because typically both the pa￾rameters and the topic features of the papers will evolve over time. For example, one of the link features reflects the in￾degree (number of citations) of a paper until some time point. As time goes on, the cumulative number of citations for ex￾isting papers will become larger and larger. Hence, the dis￾tribution of the citations for the whole data set will change over time. As a consequence, the corresponding parameter which typically reflects the distribution of the features should also change over time. At first sight, it seems a little confus￾ing that the topic features of a paper can change over time because the content of a published paper is typically static. However, the citations to an existing paper are dynamic. It is more reasonable to combine both the citation and content information to decide the topic of a paper. For example, a paper about neural network considered to be highly related to the topic psychology in the 1950s may be more likely to be classified as a machine learning paper today because more and more machine learning papers cite that neural network paper. Hence, it is very obvious to find that the topic features will also change over time. Without the ability to adaptively learn the parameters and topic features, DEM fails to model the evolution of networks. This phenomenon of decreasing prediction accuracy over time can also be observed from the experimental results in Figure 2 of [Vu et al., 2011b]. In this paper, we propose an online extension of DEM, called online egocentric model (OEM), to capture the evo￾lution of both topic features and model parameters. The con￾tributions of this paper are briefly outlined as follows: • OEM takes the evolution of both topic features and pa￾rameters into consideration and maintains high predic￾tion accuracy regardless of the elapse of time. 1 In [Vu et al., 2011b], there are two variants of DEM. One mod￾els only the link features, and the other models both the link and top￾ic features (textual information). Unless otherwise stated, the DEM in this paper refers to the variant with both link and topic features because it achieves far better accuracy than DEM without topic fea￾tures [Vu et al., 2011b] and we can always get the topic features for a paper if we want
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有