What can be sampled locally? Yitong Yin Nanjing University Joint work with:Weiming Feng,Yuxin Sun
What can be sampled loca!y? Yitong Yin Nanjing University Joint work with: Weiming Feng, Yuxin Sun
Local Computation "What can be computed locally?" [Noar,Stockmeyer,STOC'93,SICOMP'95] the LOCAC model [Linial '87]: ● Communications are synchronized. In each round:each node can send messages of unbounded sizes to all its neighbors. Local computations are free. 0 Complexity:of rounds to terminate in the worst case. In t rounds:each node can collect information up to distance t
Local Computation • Communications are synchronized. • In each round: each node can send messages of unbounded sizes to all its neighbors. • Local computations are free. • Complexity: # of rounds to terminate in the worst case. • In t rounds: each node can collect information up to distance t. the LOCAL model [Linial ’87]: “What can be computed locally?” [Noar, Stockmeyer, STOC’93, SICOMP’95]
Local Computation the LOCAC model [Linial '87]: In t rounds:each node can collect information up to distance t. Locally Checkable Labeling (LCL)problems [Noar,Stockmeyer 293]: CSPs with local constraints. Construct a feasible solution: vertex/edge coloring,Lovasz local lemma Find local optimum:MIS,MM Approximate global optimum: maximum matching,minimum vertex cover,minimum dominating set network G(E) Q:"What locally definable problems are locally computable?" by local constraints in O(1)rounds or in small number of rounds
Local Computation • CSPs with local constraints. • Construct a feasible solution: vertex/edge coloring, Lovász local lemma • Find local optimum: MIS, MM • Approximate global optimum: maximum matching, minimum vertex cover, minimum dominating set Locally Checkable Labeling (LCL) problems [Noar, Stockmeyer ’93] : Q: “What locally definable problems are locally computable?” the LOCAL model [Linial ’87]: network G(V,E) • In t rounds: each node can collect information up to distance t. by local constraints in O(1) rounds or in small number of rounds
"What can be sampled locally?" CSP with local constraints network G(E): on the network: ●proper q-coloring; ●independent set; ● Sample a uniform random feasible solution: distributed algorithms (in the LOCAC model) Q:"What locally definable joint distributions are locally sample-able?
“What can be sampled locally?” network G(V,E): • CSP with local constraints on the network: • proper q-coloring; • independent set; • Sample a uniform random feasible solution: • distributed algorithms (in the LOCAL model) Q: “What locally definable joint distributions are locally sample-able?
Markoy Random Fields (MRF) Each vertex corresponds to a network G(E): variable with finite domain [g]. ● Each edge e=(u,v)EE imposes a weighted binary constraint: X∈[q] ⑨bv Ae:[gl2→R≥0 。Each vertex ve∈imposes a weighted unary constraint: b:[gl→R≥o Gibbs distribution u:∀o∈[g]' X∈[a]V follows u L(o)Ae(ou,0)b() e=(u,v)∈E u∈V
Markov Random Fields network G(V,E): • Each vertex corresponds to a variable with finite domain [q]. • Each edge e=(u,v)∈E imposes a weighted binary constraint: • Each vertex v∈E imposes a weighted unary constraint: • Gibbs distribution µ : ∀σ∈[q]V Ae : [q] 2 ! R0 bv : [q] ! R0 µ() / Y e=(u,v)2E Ae(u, v) Y v2V bv(v) Ae bv Xv∈[q] u v (MRF) X ~ 2 [q] V follows µ
Markoy Random Fields (MRF) Gibbs distribution u:Voe[g] network GE): H(o)Ae(ou,)b(o) e=(u,w)∈E ●proper q-coloring: X∈[q] ⑨bv 10 1 independent set: 4-且司-日 X∈[aly follows u local conflict colorings: [Fraigniaud,Heinrich,Kosowski FOCS'16] Ae∈{0,1}9×9,bm∈{0,1}9
Markov Random Fields network G(V,E): Ae bv Xv∈[q] u v X ~ 2 [q] V follows µ (MRF) • Gibbs distribution µ : ∀σ∈[q]V µ() / Y e=(u,v)2E Ae(u, v) Y v2V bv(v) • proper q-coloring: Ae = 2 6 6 6 4 0 0 ... 0 3 7 7 7 5 1 1 bv = 2 6 4 1 . . . 1 3 7 5 • independent set: bv = 1 1 Ae = 1 1 1 0 • local conflict colorings: [Fraigniaud, Heinrich, Kosowski FOCS’16] Ae 2 {0, 1}q⇥q, bv 2 {0, 1}q
A Motivation: Distributed Machine Learning ●Data are stored in a distributed system. 0 ·Sampling from a probabilistic graphical model (e.g.the Markov random field))by distributed algorithms
A Motivation: Distributed Machine Learning • Data are stored in a distributed system. • Sampling from a probabilistic graphical model (e.g. the Markov random field) by distributed algorithms
Glauber Dynamics starting from an arbitrary Xo E [g] G(V,E) transition for XX+1: pick a uniform random vertex v; resample X(v)according to the marginal distribution induced by u at vertex v conditioning on X(N(v)); marginal distribution: bo()IIEN()A(.)(Xu:) PrlX,=z|Xwol=ehb.IexoAeX, MRF:o∈g', u(o)xΠAe(o,o)Πb(a) v∈V stationary distribution:u e=(u,v)∈E mixing time::Tmix=max min{t|drv(Xt,))≤2e} Xo
Glauber Dynamics G(V,E): pick a uniform random vertex v; resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 : marginal distribution: Pr[Xv = x | XN(v)] = bv(x) Q u2N(v) A(u,v)(Xu, x) P y2[q] bv(y) Q u2N(v) A(u,v)(Xu, y) Ae bv v µ() / Y e=(u,v)2E Ae(u, v) Y v2V bv(v) MRF: 8 2 [q] V , stationary distribution: µ mixing time: ⌧mix = max X0 min t | dTV(Xt, µ) 1 2e
Mixing of Glauber Dynamics influence matrix [Po,u,uEv: Pv.u:max discrepancy (in total variation distance)of marginal distributions at v caused by any pair o,t of boundary conditions that differ only at u Dobrushin's condition: contraction of one-step Ipl=∑Pee≤1-e optimal coupling in the worst u∈V case w.r.t.Hamming distance Theorem (Dobrushin'70:Salas,Sokal'97): Dobrushin's Tmix =O(nlogn) condition for Glauber dynamics for g-coloring: Dobrushin's 92(2+ε)△ condition △=max-degree
Mixing of Glauber Dynamics for q-coloring: Dobrushin’s q≥(2+ε)Δ condition Δ = max-degree u v influence matrix {⇢v,u }v,u2V : ρv,u: max discrepancy (in total variation distance) of marginal distributions at v caused by any pair σ,τ of boundary conditions that differ only at u Dobrushin’s condition: k⇢k1 = max v2V X u2V ⇢v,u 1 ✏ contraction of one-step optimal coupling in the worst case w.r.t. Hamming distance Theorem (Dobrushin ’70; Salas, Sokal ’97): Dobrushin’s condition for Glauber dynamics ⌧mix = O (n log n)
Parallelization Glauber dynamics: starting from an arbitrary Xo E [g] G(V,E): transition for XX+1: pick a uniform random vertex v; resample X(v)according to the marginal distribution induced by u at vertex v conditioning on Xi(N(v)); Parallelization: ● Chromatic scheduler [folklore][Gonzalez et al.,AISTAT'11]: Vertices in the same color class are updated in parallel. "Hogwild!"[Niu,Recht,Re,Wright,NIPS'11]De Sa,Olukotun,Re,ICML16]: All vertices are updated in parallel,ignoring concurrency issues
Parallelization G(V,E): v Glauber dynamics: Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al., AISTAT’11]: Vertices in the same color class are updated in parallel. • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16]: All vertices are updated in parallel, ignoring concurrency issues. pick a uniform random vertex v; resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v)); starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 :