What can be sampled locally? Yitong Yin Nanjing University Joint work with:Weiming Feng(Nanjing University) Yuxin Sun (Nanjing University)
What can be sampled locally? Yitong Yin Nanjing University Joint work with: Weiming Feng (Nanjing University) Yuxin Sun (Nanjing University)
Local Computation the LOCAC model [Linial '87]: In t rounds:each node can collect information up to distance t. Locally Checkable Labeling (LCL)problems [Noar,Stockmeyer 293]: CSPs with local constraints. Construct a feasible solution: vertex/edge coloring,Lovasz local lemma Find local optimum:MIS,MM Approximate global optimum: maximum matching,minimum vertex cover,minimum dominating set network G(E) Q:"What locally definable problems are locally computable?" by local constraints in O(1)rounds or in small number of rounds
Local Computation • CSPs with local constraints. • Construct a feasible solution: vertex/edge coloring, Lovász local lemma • Find local optimum: MIS, MM • Approximate global optimum: maximum matching, minimum vertex cover, minimum dominating set Locally Checkable Labeling (LCL) problems [Noar, Stockmeyer ’93] : Q: “What locally definable problems are locally computable?” the LOCAL model [Linial ’87]: network G(V,E) • In t rounds: each node can collect information up to distance t. by local constraints in O(1) rounds or in small number of rounds
"What can be sampled locally?" CSP with local constraints on the network: ·proper q-coloring; ●independent set; ● Sample a uniform random feasible solution: distributed algorithms (in the LOCAC model) network G(,E) Q:"What locally definable joint distributions are locally sample-able?
“What can be sampled locally?” network G(V,E) • CSP with local constraints on the network: • proper q-coloring; • independent set; • Sample a uniform random feasible solution: • distributed algorithms (in the LOCAL model) Q: “What locally definable joint distributions are locally sample-able?
Markoy Random Fields (MRF) Each vertex corresponds to a network G(E): variable with finite domain [g]. ● Each edge e=(u,v)EE imposes a weighted binary constraint: X∈[q] ⑨bw Ae:[gl2→R≥0 ●Each vertex v∈E imposes a weighted unary constraint: b:[gl→R>o Gibbs distribution u:Voe[g] x∈[g]y follows u L(o)Ae(ou;0)bo(o) e=(u,w)∈E w∈V
Markov Random Fields network G(V,E): • Each vertex corresponds to a variable with finite domain [q]. • Each edge e=(u,v)∈E imposes a weighted binary constraint: • Each vertex v∈E imposes a weighted unary constraint: • Gibbs distribution µ : ∀σ∈[q]V Ae : [q] 2 ! R0 bv : [q] ! R0 µ() / Y e=(u,v)2E Ae(u, v) Y v2V bv(v) Ae bv Xv∈[q] u v (MRF) X ~ 2 [q] V follows µ
Markoy Random Fields (MRF) Gibbs distribution u:voelg] network G(E): L()II Ac(ou,o)IIbu(o) e=(u,v)∈E o proper q-coloring: X∈[q] ⊙bv 1 independent set: a-b周&-日 ∈[g]y follows u local conflict colorings [Fraigniaud,Heinrich,Kosowski'16] Ae∈{0,1}9x9,b,∈{0,1}9 Hammersley-Clifford theorem (Fundamental Thm of random fields): MRFs are universal for conditional independent positive distributions
Markov Random Fields network G(V,E): Ae bv Xv∈[q] u v X ~ 2 [q] V follows µ (MRF) • Gibbs distribution µ : ∀σ∈[q]V µ() / Y e=(u,v)2E Ae(u, v) Y v2V bv(v) • proper q-coloring: Ae = 2 6 6 6 4 0 0 ... 0 3 7 7 7 5 1 1 bv = 2 6 4 1 . . . 1 3 7 5 • independent set: bv = 1 1 Ae = 1 1 1 0 Hammersley–Clifford theorem (Fundamental Thm of random fields): MRFs are universal for conditional independent positive distributions. • local conflict colorings : [Fraigniaud, Heinrich, Kosowski’16] Ae 2 {0, 1}q⇥q, bv 2 {0, 1}q
A Motivation: Distributed Machine Learning ●Data are stored in a distributed system. ·Sampling from a probabilistic graphical model (e.g.the Markov random field)by distributed algorithms
A Motivation: Distributed Machine Learning • Data are stored in a distributed system. • Sampling from a probabilistic graphical model (e.g. the Markov random field) by distributed algorithms
Glauber Dynamics starting from an arbitrary Xo E [g] G(V,E) transition for XX+1: pick a uniform random vertex v; resample X(v)according to the marginal distribution induced by u at vertex v conditioning on X(N(v)); marginal distribution: bu(c)ΠueN回A(u,(Xu,) Prxet()) MRF:o∈g', u(a) ΠAe(o,o)Πb(o) e=(u,v)EE uV stationary distribution:u mixing time::Tmix=max min{t|drv(Xt,))≤2e} Xo
Glauber Dynamics G(V,E): pick a uniform random vertex v; resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 : marginal distribution: Pr[Xv = x | XN(v)] = bv(x) Q u2N(v) A(u,v)(Xu, x) P y2[q] bv(y) Q u2N(v) A(u,v)(Xu, y) Ae bv v µ() / Y e=(u,v)2E Ae(u, v) Y v2V bv(v) MRF: 8 2 [q] V , stationary distribution: µ mixing time: ⌧mix = max X0 min t | dTV(Xt, µ) 1 2e
Mixing of Glauber Dynamics influence matrix [Pv,uv,uEv: Pv.u:max discrepancy (in total variation distance)of marginal distributions at v caused by any pair o,t of boundary conditions that differ only at u Dobrushin's condition: contraction of one-step plo=ms∑peu≤1-e optimal coupling in the worst u∈V case w.r.t.Hamming distance Theorem (Dobrushin'70;Jerrum'95;Salas,Sokal'97): Dobrushin's Tmix =O(nlogn) condition for Glauber dynamics for g-coloring: Dobrushin's 92(2+ε)△ condition △=max-degree
Mixing of Glauber Dynamics for q-coloring: Dobrushin’s q≥(2+ε)Δ condition Δ = max-degree u v influence matrix {⇢v,u }v,u2V : ρv,u: max discrepancy (in total variation distance) of marginal distributions at v caused by any pair σ,τ of boundary conditions that differ only at u Dobrushin’s condition: k⇢k1 = max v2V X u2V ⇢v,u 1 ✏ contraction of one-step optimal coupling in the worst case w.r.t. Hamming distance Theorem (Dobrushin ’70; Jerrum ’95; Salas, Sokal ’97): Dobrushin’s condition for Glauber dynamics ⌧mix = O (n log n)
Parallelization Glauber dynamics: starting from an arbitrary Xo E [g] G(V,E): transition for XX+1: pick a uniform random vertex v; resample X(v)according to the marginal distribution induced by u at vertex v conditioning on Xi(N(v)); Parallelization: ● Chromatic scheduler [folklore][Gonzalez et al.,AISTAT'11]: Vertices in the same color class are updated in parallel. "Hogwild!"[Niu,Recht,Re,Wright,NIPS'11]De Sa,Olukotun,Re,ICML16]: All vertices are updated in parallel,ignoring concurrency issues
Parallelization G(V,E): v Glauber dynamics: Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al., AISTAT’11]: Vertices in the same color class are updated in parallel. • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16]: All vertices are updated in parallel, ignoring concurrency issues. pick a uniform random vertex v; resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 :
Warm-up:When Luby meets Glauber starting from an arbitrary Xo E [g] at each step,for each vertex vV: GV,E) independently sample a random Luby number B.∈[0,1]; step if B,is locally maximum among its neighborhood Nv): resample X(v)according to the Glauber step marginal distribution induced by u at vertex v conditioning on X(Nv)); Luby step:Independently sample a random independent set. Glauber step:For independent set vertices,update correctly according to the current marginal distributions. Stationary distribution:the Gibbs distribution u
Warm-up: When Luby meets Glauber G(V,E): resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); at each step, for each vertex v∈V: starting from an arbitrary X0 ∈ [q]V independently sample a random number βv∈[0,1]; if βv is locally maximum among its neighborhood N(v): Luby step Glauber step • Luby step: Independently sample a random independent set. • Glauber step: For independent set vertices, update correctly according to the current marginal distributions. • Stationary distribution: the Gibbs distribution µ