Mini-batches: In OEM introduced above_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Online Egocentric models for citation networks

正在加载图片...

Mini-batches:In OEM introduced above,every time a where new paper i arrives,we can join it into the network and it- a;=exp(B1 sk(te)), erate between the online 3 step and the online topic step un- til it converges.However,it's computationally expensive for Yu=exp(BTsk(te.), large-scale citation networks.A common technique is to con- sider multiple events per update.Not only can this kind of Ai=>Yj(te.)exp(BTs;(te)). j≠k mini-batch strategy save computational cost,but it also re- duces noise [Hoffman et al.,2010].Hence,in our imple- Bu=∑yte)exp(gs,te)》, mentation,rather than perform updating for each event,we ≠从 perform one updating for every g citation events.g is set to ai=B:o0i, about 1500 in our experiments. The following content in this section will detail the al- bu=B:o0u. gorithms for online B learning and online topic proportion Here,B contains the first 8 elements of the parameter B(cor- learning. responding to link features),B,contains the last 50 elements of B(corresponding to the topic features),0;is the topic pro- 3.1 Online B Step portion of the citer at citation event ei and s(te)is the link features(first 8 features)of node k at citation event ei.Cu is With w fixed,the objective function to learn B is as follows: a constant irrelevant to wk. 9 exp(BTsi(te)) The first and second order derivatives of(3)are as follows: P e=T Yi(te)exp(Bsi(te)) =-∑a+ aiaiexp(afwk) =1 =1 台Ai+iexp(awk】 where x is the starting event in the mini-batch of q events. buyu exp(bTwk) To avoid walking through all the citation events when up- ∠，Bu+Yu exp(bIwk) u=p+1 dating the parameter B,we can use a training window to re- strict the training in a small subset of the citation events.With +2(wk-8k), (4) the training window of width W:(1 W:sq),B can be Aiaiaial exp(afwk) trained by optimizing: (Ai+aiexp(aTwk))2 L(B exp(gs.(te)】 BuYububr exp(bTwk) + +2I, e=r+q-W:>Yi(te)exp(BTs;(te)) u=p+1 (Bu+7u exp(bTwk))2 i=1 where I is an identity matrix. Furthermore,we can cache the link features of each node It is easy to prove that the second order derivative(Hessian) to further cut the computational time,as done in [Vu et al., is positive definite(PD).Hence,the function in (3)is convex 2011b. We can use some solver to find a global optimal solution. 3.2 Online Topic Step Approximative Online Topic Step In (4),Ai is far larger than aioiexp(awk)and In this section,we first formulate the full online topic step to aiexp(aw),and p is relatively small in each batch. learn the updated topic proportions k.After that,we derive Similarly,Bu is far larger than buyu exp(bwk)and an approximative online topic step to speed up the optimiza- Yu exp(bwk),and (g-p)is relatively small.Hence,the tion process. second and third terms in (4)are much smaller than the other Full Online Topic Step two.Consequently,we can remove these two smaller terms to get the following approximative gradient: It is very time-consuming if we update all the topic propor- tions in w at a time.We also design an alternating algorithm ∂f ≈->a+2A(wk-0k): for updating w.More specifically,each time we optimize Owk i=1 the topic proportion for one paper,say wk,with all the other wiik}fixed.Given a mini-batch of size g.if node k Based on the above approximative gradient,we can recover the following approximative objective function of (2): gets cited at citation event e,e2,...,pand doesn't get cited at time ep+1,ep+2,...,e(note that e2 does not necessari- ly happen before ep+2although its subscript is smaller),the minimize aw+入∑ls- (5 i=1 k=1 function f(wk)we optimize is: subject to wk之0,1Twk=1. log(p() aiexp(afwk) We call the OEM variant in(5)approximative OEM and Bu Yu exp(bTwk) the original OEM in (2)full OEM.In our experiments,we find that the approximative OEM achieves accuracy comparable +lwk-0k匠， (3) with that of full OEM with much less computational cost.Mini-batches: In OEM introduced above, every time a new paper i arrives, we can join it into the network and iterate between the online β step and the online topic step until it converges. However, it’s computationally expensive for large-scale citation networks. A common technique is to consider multiple events per update. Not only can this kind of mini-batch strategy save computational cost, but it also reduces noise [Hoffman et al., 2010]. Hence, in our implementation, rather than perform updating for each event, we perform one updating for every q citation events. q is set to about 1500 in our experiments. The following content in this section will detail the algorithms for online β learning and online topic proportion learning. 3.1 Online β Step With ω fixed, the objective function to learn β is as follows: L(β) = x+ Yq−1 e=x exp(β T sie (te)) Pn i=1 Yi(te) exp(β T si(te)) , where x is the starting event in the mini-batch of q events. To avoid walking through all the citation events when updating the parameter β, we can use a training window to restrict the training in a small subset of the citation events. With the training window of width Wt (1 ≤ Wt ≤ q), β can be trained by optimizing: Lw(β) = x+ Yq−1 e=x+q−Wt exp(β T sie (te)) Pn i=1 Yi(te) exp(β T si(te)) . Furthermore, we can cache the link features of each node to further cut the computational time, as done in [Vu et al., 2011b]. 3.2 Online Topic Step In this section, we first formulate the full online topic step to learn the updated topic proportions ωk. After that, we derive an approximative online topic step to speed up the optimization process. Full Online Topic Step It is very time-consuming if we update all the topic proportions in ω at a time. We also design an alternating algorithm for updating ω. More specifically, each time we optimize the topic proportion for one paper, say ωk, with all the other {ωi |i 6= k} fixed. Given a mini-batch of size q, if node k gets cited at citation event e1, e2, . . . , ep and doesn’t get cited at time ep+1, ep+2, . . . , eq (note that e2 does not necessarily happen before ep+2 although its subscript is smaller), the function f(ωk) we optimize is: − log(Yp i=1 αi exp(a T i ωk) Ai + αi exp(a T i ωk) Yq u=p+1 Cu Bu + γu exp(bT u ωk) ) + λkωk − θkk 2 2 , (3) where αi = exp(β T l s l k (tei )), γu = exp(β T l s l k (teu )), Ai = X j6=k Yj (tei ) exp(β T sj (tei )), Bu = X j6=k Yj (teu ) exp(β T sj (teu )), ai = βt ◦ θi , bu = βt ◦ θu. Here, βl contains the first 8 elements of the parameter β (corresponding to link features), βt contains the last 50 elements of β (corresponding to the topic features), θi is the topic proportion of the citer at citation event ei and s l k (tei ) is the link features (first 8 features) of node k at citation event ei , Cu is a constant irrelevant to ωk. The first and second order derivatives of (3) are as follows: ∂f ∂ωk = − Xp i=1 ai + Xp i=1 aiαi exp(a T i ωk) Ai + αi exp(a T i ωk) + Xq u=p+1 buγu exp(b T u ωk) Bu + γu exp(bT u ωk) + 2λ(ωk − θk), (4) ∂ 2f ∂ω2 k = Xp i=1 Aiαiaia T i exp(a T i ωk) (Ai + αi exp(a T i ωk))2 + Xq u=p+1 Buγubub T u exp(b T u ωk) (Bu + γu exp(bT u ωk))2 + 2λI, where I is an identity matrix. It is easy to prove that the second order derivative (Hessian) is positive definite (PD). Hence, the function in (3) is convex. We can use some solver to find a global optimal solution. Approximative Online Topic Step In (4), Ai is far larger than aiαi exp(a T i ωk) and αi exp(a T i ωk), and p is relatively small in each batch. Similarly, Bu is far larger than buγu exp(b T u ωk) and γu exp(b T u ωk), and (q − p) is relatively small. Hence, the second and third terms in (4) are much smaller than the other two. Consequently, we can remove these two smaller terms to get the following approximative gradient: ∂f ∂ωk ≈ −Xp i=1 ai + 2λ(ωk − θk). Based on the above approximative gradient, we can recover the following approximative objective function of (2): minimize − Xp i=1 a T i ωk + λ Xn k=1 kωk − θkk 2 2 (5) subject to : ωk 0, 1 T ωk = 1. We call the OEM variant in (5) approximative OEM and the original OEM in (2) full OEM. In our experiments, we find that the approximative OEM achieves accuracy comparable with that of full OEM with much less computational cost

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Online Egocentric models for citation networks