正在加载图片...
).Moreover,we assume that for the optimal parameter Similarly.definingT)(t)and (8*)≤A2. 0x(T)=10x(t).we have (A.5)The error function fr at each node k is L-Lipschitz with respect to the norm-l,i.e.,for 01,f2∈Θ,we have f(0k(t),y)-f(0*,x*)≤f(p(t),yk)-f(0*,y) L T lf(01,9k)-fk(02,9k)川≤Ll01-2 a7∑I8:-p(+Llk-yl t=1 Lemma 1.Define the function Based on above lemmas,we now present the proof of Theorem 4. V(0)max{(0,s-00)-wo(s)} C∈日 Proof:We perform our proof by analyzing the random family {(t))o Given an arbitrary 6",we have Then function V()is convex and differentiable on e.More- T over,its gradient is L-Lipschitz continuous with respect to the [f((),)-f(0*,9】 nom-l‖ t=1 T川 Iv(-(o≤u-,ueo, =d∑f(ee,)-fi(9,联】 可台台 where the gradient is defined as follows 7V(u)=πw(u)-uo,元(u)=arg mir{-(u,v〉+wo(u)}. ∑Ll()-0(训+f(e(》-f(8()训 k= (25) The inequality of the above equation is resulted by the L- Note that uo=πw(O). Lipschitz condition on f&. Lemma 2.Let [g(t)be an arbitrary sequence of vectors. Let g(t)ofk((t))and use the convexity of the and consider the sequence ((t)generated by function,then we will obtain the following bound: VI I叭 +-{名+0a ∑(0()-f(0〗≤∑g(),0x()-日) k=1 k=1 =1 t I川 IvI Tw(t) 9 =∑@因,p因-8)+∑g因,0e④-p》27 =1 k=1 k=1 I For any non-decreasing positive step-sizes w(t)o.and any a∈Θc,we have +∑gx(因)-9(),8()-8) k=1 三gw-)s2l8+c For the first term in the right hand side of Equation(27). from the Lemma 2,it follows that w(t) For any8∈9cC日*,we have 三a0a0-8y≤号上a8+wo (28) 1 写w(t) t=1 ≤2 子11 +w(T)(0*) In addition,we establish the convergency of algorithm via Holder's inequality implies thatElg((s)12 two auxiliary sequences and Ellgk(t)l.]L2 since llgk(t)ll.L for any k,l,s,t. We use these two inequalities to bound(28), p(t+1)=πae(m(t+1) (26 and present the following lemma. 1 . 1 Lemma 3.With definitions of the random family {k(t). 三到6.oL.le0l≤2 uk(t)o and (t)1 in Eq.(7).(8)and (26).and the For the second term in the right hand side of Equation(27), L-Lipschitz condition of each fk,for each node k V,we 8k∈Ft-1andp(t)∈Ft-1 by assumption,so have E(gk(t),8k(t)-p(t)》≤EIg(t)川I0k(t)-p(t)川 T =E(E[llgk (t)ll F:-1]10x (t)-()ll) ∑f(ex(),y)-f8,y〗≤∑f(p),)-f(8,yk川 ≤LEI8x(t)-p(t)川 =1 t=1 +∑LI8)-p)+Llgk-加. ≤尚Ela0-4L t1 (29)8 Θ}. Moreover, we assume that for the optimal parameter θ ∗ , ϕ(θ ∗ ) ≤ A2 . (A.5) The error function fk at each node k is L-Lipschitz with respect to the norm ∥·∥, i.e., for θ1, θ2 ∈ Θ, we have |fk(θ1, yˆk ) − fk(θ2, yˆk )| ≤ L∥θ1 − θ2∥ . Lemma 1. Define the function Vω(θ) = max ζ∈Θ {⟨θ, ζ − θ0⟩ − ωϕ(ζ)}. Then function Vω(·) is convex and differentiable on Θ. More￾over, its gradient is L-Lipschitz continuous with respect to the norm ∥·∥ ∥∇Vω(u) − ∇Vω(v)∥ ≤ 1 ωτ ∥u − v∥ , ∀u, v ∈ Θ, where the gradient is defined as follows ∇Vω(u) = πω(u) − u0, πω(u) = arg min v∈Θ {− ⟨u, v⟩ + ωϕ(v)}. (25) Note that u0 = πω(0). Lemma 2. Let {g(t)}∞ t=1 be an arbitrary sequence of vectors, and consider the sequence {θ(t)}∞ t=1 generated by θ(t + 1) = arg min θ∈Θ {∑t r=1 ⟨g(r), θ⟩ + ω(t)ϕ(θ) } = πω(t) ( − ∑t r=1 g(r) ) . For any non-decreasing positive step-sizes {ω(t)}∞ t=0, and any θˆ ∈ ΘC , we have ∑ T t=1 ⟨ g(t), θ(t) − θˆ ⟩ ≤ 1 2τ ∑ T t=1 ∥g(t)∥ 2 ∗ ω(t) + ω(T)C. For any θˆ ∈ ΘC ⊂ Θ∗ , we have ∑ T t=1 ⟨ g(t), θ(t) − θˆ ⟩ ≤ 1 2τ ∑ T t=1 ∥g(t)∥ 2 ∗ ω(t) + ω(T)ϕ(θ ∗ ). In addition, we establish the convergency of algorithm via two auxiliary sequences φ(t + 1) = πω(t)(µ¯(t + 1)) (26) and present the following lemma. Lemma 3. With definitions of the random family {θk(t)}∞ t=0, {µk (t)}∞ t=0 and {φk (t)}∞ t=1 in Eq. (7), (8) and (26), and the L-Lipschitz condition of each fk, for each node k ∈ V, we have ∑ T t=1 [f(θk(t), yk ) − f(θ ∗ , y ∗ )] ≤ ∑ T t=1 [f(φ(t), yk ) − f(θ ∗ , yk )] + ∑ T t=1 [L∥θk(t) − φ(t)∥ + L∥yk − y ∗ ∥]. Similarly, defining φˆ(T) = 1 T ∑T t=1 φ(t) and θˆ k(T) = 1 T ∑T t=1 θk(t), we have f(θˆ k(t), yk ) − f(θ ∗ , x ∗ ) ≤ f(φˆ(t), yk ) − f(θ ∗ , yk ) + L ω(t)T ∑ T t=1 ∥θk(t) − φ(t)∥ + L∥yk − y ∗ ∥. Based on above lemmas, we now present the proof of Theorem 4. Proof: We perform our proof by analyzing the random family {φ(t)}∞ t=0. Given an arbitrary θ ∗ ∈ Θ, we have ∑ T t=1 [f(φ(t), yˆk ) − f(θ ∗ , yˆk )] = 1 |V| ∑ T t=1 ∑ |V| k=1 [fk(φ(t), yˆk ) − fk(θ ∗ , yˆk )] ≤ ∑ T t=1 1 |V|   ∑ |V| k=1 [L∥φ(t) − θk(t)∥ + fk(φ(t)) − fk(θk(t))]   . The inequality of the above equation is resulted by the L￾Lipschitz condition on fk. Let gk (t) ∈ ∂fk(θk(t)) and use the convexity of the function, then we will obtain the following bound: ∑ |V| k=1 [fk(θk(t)) − fk(θ ∗ )] ≤ ∑ |V| k=1 ⟨gk (t), θk(t) − θ ∗ ⟩ = ∑ |V| k=1 ⟨gˆk (t), φ(t) − θ ∗ ⟩ + ∑ |V| k=1 ⟨gˆk (t), θk(t) − φ(t)⟩ + ∑ |V| k=1 ⟨gk (t) − gˆk (t), θk(t) − θ ∗ ⟩ (27) For the first term in the right hand side of Equation (27), from the Lemma 2, it follows that 1 |V| ∑ T t=1 ⟨∑ |V| k=1 gˆk (t), φ(t) − θ ∗ ⟩ ≤ 1 2τ ∑ T t=1 1 ω(t) 1 |V| ∑ |V| k=1 gˆk (t) 2 ∗ + ω(T)ϕ(θ ∗ ). (28) Holder’s inequality implies that E[∥gˆl (t)∥∗ ∥gˆk (s)∥∗ ] ≤ L 2 and E[∥gˆk (t)∥∗ ] ≤ L 2 since ∥gˆk (t)∥∗ ≤ L for any k, l, s, t. We use these two inequalities to bound (28), E 1 |V| ∑ |V| k=1 gˆk (t) 2 ∗ ≤ 1 |V|2 ∑ |V| k,l=1 E[∥gˆk (t)∥∗ ∥gˆl (t)∥∗ ] ≤ L 2 . For the second term in the right hand side of Equation (27), θk ∈ Ft−1 and φ(t) ∈ Ft−1 by assumption, so E ⟨gˆk (t), θk(t) − φ(t)⟩ ≤ E ∥gˆk (t)∥ ∥θk(t) − φ(t)∥ =E(E[∥gˆk (t)∥ |Ft−1] ∥θk(t) − φ(t)∥) ≤LE ∥θk(t) − φ(t)∥ ≤ L ω(t)τ E ∥µ¯(t) − µk (t)∥∗ . (29)
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有