Proposition 4.3 ([Hay06]). Let I = (V_中国高校课件下载中心

正在加载图片...

Proposition 4.3([Hay06]).Let I =(V,E,Q,be an MRF instance with n VI,and=OV the state space.LetH(o,t)兰l{o∈V|ou≠denote the Hamming distance between o∈2andt∈2.IfI satisfies the Dobrushin-Shlosman condition(Condition 3.1)with constant 8>0,then the one-step optimal coupling(X:,Y:)tzo for Gibbs sampling(Definition 4.2)satisfies Ya,t∈Q:B[HX.Y)lX-1=oAY-1=t]≤1- H(o,), n then the mixing rate of Gibbs sampling on I is bounded as rmix(I,e)log 5.OUTLINES OF ALGORITHM Let 0:RK be a probabilistic inference problem that maps each MRF instance in i to a K-dimensional probability vector,and let Se be its estimating function.Le I=(V,E,O,be the current instance,where n=V.Our dynamic inference algorithm maintains a sequence of N(n) independent samples.(N(which are (n)-close to the Gibbs distribution in total variation distance and an(N,e)-estimator (of (T)such that T)=8g(Xm,X②，.，Xm川 Upon an update request that modifies I to a new instance I'=(V,E,,')where n'=V', our algorithm does the followings: Update the sample sequence.Update x(1),...,X(N(m))to a new sequence of N(n')independent samples Y(,...,Y(N(that aree(n)-close to ur,in total variation distance,and output the difference between two sample sequences. Update the estimator.Given the difference between the two sample sequences,update (T)to 0()=Se(Y(1),...,Y(N(n))by accessing the oracle in Definition 2.3. Obviously,the updated estimator 0()is an(N,e)-estimator for 0(). Our main technical contribution is to give an algorithm that dynamically maintains a sequence of N(n)independent samples for ur,while I itself is dynamically changing.The dynamic sampling problem was recently introduced in [FVY19].The dynamical sampling algorithm given there only handles update of a single vertex or edge and works only for graphical models with soft constraints. In contrast,our dynamic sampling algorithm maintains a sequence of N(n)independent samples for ur within total variation distance e(n),while the entire specification of the graphical model I is subject to dynamic update (to a new I'with difference d(,I')<L o(n)).Specifically,the algorithm updates the sample sequence within expected time O(A2N(n)L log'n An).Note that the extra O(An)cost is necessary for just editing the current MRF instance I to I'because a single update may change all the vertex and edge potentials simultaneously.This incremental time cost dominates the time cost of the dynamic inference algorithm,and is efficient for maintaining N(n)independent samples,especially when N(n)is sufficiently large,e.g.N(n)=Q(n/L),in which case the average incremental cost for updating each sample is O(A2Llog3 n+An/N(n))=O(A2Llog3 n). We illustrate the main idea by explaining how to maintain one sample.The idea is to represent the trace of the Markov chain for generating the sample by a dynamic data structure,and when the MRF instance is changed,this trace is modified to that of the new chain for generating the sample for the updated instance.This is achieved by both a set of efficient dynamic data structures and the coupling between the two Markov chains. Specifically,let(X)obe the Gibbs sampler chain for distribution ur.When the chain is rapidly mixing,starting from an arbitrary initial configuration Xo e Ov,after suitably many steps X=Xr is an accurate enough sample for ur.At each step,X-1 and X:may differ only at a vertex vr which is picked from V uniformly and independently at random.The evolution of the chain is fully captured by the initial state Xo and the sequence of pairs (v,Xr(vr)),from t=1 to t =T,which is called the execution log of the chain in the paper. Now suppose that the current instance I is updated to I'.We construct such a coupling between the original chain(Xand the new chain (Ysuch that (Y)is a faithful Gibbs sampling chain for the updated instance I'given that (X)is a faithful chain for I,and the difference between 8Proposition 4.3 ([Hay06]). Let I = (V, E,Q, Φ) be an MRF instance with n = |V |, and Ω = Q V the state space. Let H(σ, τ ) ≜ |{v ∈ V | σv , τv }| denote the Hamming distance between σ ∈ Ω and τ ∈ Ω. If I satisfies the Dobrushin-Shlosman condition (Condition 3.1) with constant δ > 0, then the one-step optimal coupling (Xt ,Yt )t ≥0 for Gibbs sampling (Definition 4.2) satisfies ∀σ, τ ∈ Ω : E [H(Xt ,Yt ) | Xt−1 = σ ∧ Yt−1 = τ ] ≤ 1 − δ n · H(σ, τ ), then the mixing rate of Gibbs sampling on I is bounded as τmix(I, ϵ) ≤ n δ log n ϵ . 5. Outlines of algorithm Let θ : M → R K be a probabilistic inference problem that maps each MRF instance in M to a K-dimensional probability vector, and let Eθ be its estimating function. Le I = (V, E,Q, Φ) ∈ M be the current instance, where n = |V |. Our dynamic inference algorithm maintains a sequence of N(n) independent samples X (1) , . . . ,X (N (n)) ∈ Q V which are ϵ(n)-close to the Gibbs distribution µI in total variation distance and an (N, ϵ)-estimator ˆθ(I) of θ(I) such that ˆθ(I) = Eθ (X (1) ,X (2) , . . . ,X (N (n))). Upon an update request that modifies I to a new instance I 0 = (V 0 , E 0 ,Q, Φ 0 ) ∈ M, where n 0 = |V 0 |, our algorithm does the followings: • Update the sample sequence. Update X (1) , . . . ,X (N (n)) to a new sequence of N(n 0 ) independent samplesY (1) , . . . ,Y (N (n 0 )) ∈ Q V 0 that are ϵ(n 0 )-close to µI0 in total variation distance, and output the difference between two sample sequences. • Update the estimator. Given the difference between the two sample sequences, update ˆθ(I) to ˆθ(I0 ) = Eθ (Y (1) , . . . ,Y (N (n 0 ))) by accessing the oracle in Definition 2.3. Obviously, the updated estimator ˆθ(I0 ) is an (N, ϵ)-estimator for θ(I0 ). Our main technical contribution is to give an algorithm that dynamically maintains a sequence of N(n) independent samples for µI, while I itself is dynamically changing. The dynamic sampling problem was recently introduced in [FVY19]. The dynamical sampling algorithm given there only handles update of a single vertex or edge and works only for graphical models with soft constraints. In contrast, our dynamic sampling algorithm maintains a sequence of N(n) independent samples for µI within total variation distance ϵ(n), while the entire specification of the graphical model I is subject to dynamic update (to a new I 0 with difference d(I, I 0 ) ≤ L = o(n)). Specifically, the algorithm updates the sample sequence within expected time O(∆ 2N(n)L log3 n + ∆n). Note that the extra O(∆n) cost is necessary for just editing the current MRF instance I to I 0 because a single update may change all the vertex and edge potentials simultaneously. This incremental time cost dominates the time cost of the dynamic inference algorithm, and is efficient for maintaining N(n) independent samples, especially when N(n) is sufficiently large, e.g. N(n) = Ω(n/L), in which case the average incremental cost for updating each sample is O(∆ 2L log3 n + ∆n/N(n)) = O(∆ 2L log3 n). We illustrate the main idea by explaining how to maintain one sample. The idea is to represent the trace of the Markov chain for generating the sample by a dynamic data structure, and when the MRF instance is changed, this trace is modified to that of the new chain for generating the sample for the updated instance. This is achieved by both a set of efficient dynamic data structures and the coupling between the two Markov chains. Specifically, let (Xt ) T t=0 be the Gibbs sampler chain for distribution µI. When the chain is rapidly mixing, starting from an arbitrary initial configuration X0 ∈ Q V , after suitably many steps X = XT is an accurate enough sample for µI. At each step, Xt−1 and Xt may differ only at a vertex vt which is picked from V uniformly and independently at random. The evolution of the chain is fully captured by the initial state X0 and the sequence of pairs hvt ,Xt (vt )i, from t = 1 to t = T , which is called the execution log of the chain in the paper. Now suppose that the current instance I is updated to I 0 . We construct such a coupling between the original chain (Xt ) T t=0 and the new chain (Yt ) T t=0 , such that (Yt ) T t=0 is a faithful Gibbs sampling chain for the updated instance I 0 given that (Xt ) T t=0 is a faithful chain for I, and the difference between 8

<<向上翻页向下翻页>>