Sampling Counting for Big Data 南京大学 尹一通 2019年全国理论计算机科学学术年会 2019年8月3日于兰州大学
Sampling & Counting for Big Data + 2019ଙࢵقቘᦞᦇᓒᑀଙտ 2019ଙ8์3෭ԭهय़
Sampling vs Counting [Jerrum-Valiant-Vazirani86]:for all self-reducible problems approx counting exact sampling vol(R) X=(X1,X2,,X) Poly-Time Turing approx inference X~2 Machine Pr[X,=·|X=o] DOMIZED RANDOM GENERATION OF COMBINATORIAL STRUCTURES RITHMS FROM A UNIFORM DISTRIBUTION Mark R.JERRUM Department of Computer Science,University of Edinburgh,Edinburgh EH9 3JZ United Kingdom Leslie G.VALIANT Aiken Computation Laboratory,Harvard University,Cambridge,MA 02138,U.S.A. Vijay V.VAZIRANI** Computer Science Department,Comell University,Ithaca,NY 14853,U.S.A
Sampling vs Counting sampling exact approx { } X = (X1, X2, …, Xn) approx counting vol(Ω) approx inference Pr[Xi = ⋅ ∣ XS = σ] ( [Jerrum-Valiant-Vazirani ’86]: Poly-Time Turing Machine for all self-reducible problems X ∼ Ω
MCMC Sampling Markov chain for sampling=(X1,X2,...,X) Gibbs sampling(Glauber dynamics,heat-bath) pick a random i; [Glauber,'63] resample Xi~u(·W(); [Geman,Geman,'84] Metropolis-Hastings algorithm pick a random i; propose a random c; [Metropolis et al,'53] X=cwp.∝4(X)/u: [Hastings,'84] Analysis:coupling methods [Aldous,'83][Jerrum,'95][Bubley,Dyer '97] may give O(n log n)upper bound for mixing time
MCMC Sampling • Gibbs sampling (Glauber dynamics, heat-bath) • Metropolis-Hastings algorithm Markov chain for sampling X = (X1, X2,…, Xn) ∼ μ pick a random i; resample Xi ~ µv( · |N(v)); pick a random i; propose a random c; Xi = c w.p. ∝ µ(X’)/µ(X); [Glauber, ’63] [Geman, Geman, ’84] [Metropolis et al, ’53] [Hastings, ’84] • Analysis: coupling methods [Aldous, ’83] [Jerrum, ’95] [Bubley, Dyer ’97] may give O(n log n) upper bound for mixing time
Computational Phase Transition hardcore model:graph G(V,E),max-degree A,fugacity >0 approx sample independent set in G w.p. λ(△)= (△-1)(△-1) [Weitz,.STOC'06:lfλλc, NP-hard even for A=O(1). [Efthymiou,Hayes,Stefankovic, Vigoda,Y.,FOCS'16]: max-deg△ lfλ<入c,O(n log n)mixing time. If A is large enough,and there is no small cycle. A phase transition occurs at Ac
Computational Phase Transition hardcore model: graph G(V,E), max-degree Δ, fugacity λ>0 approx sample independent set I in G w.p. ∝ λ|I| c() = ( 1)(1) ( 2) 2 4 6 8 10 1 2 3 4 5 6 Hard Easy max-deg Δ λ • [Weitz, STOC’06]: If λλc, NP-hard even for Δ=O(1). [Efthymiou, Hayes, Štefankovič, Vigoda, Y., FOCS’16]: If λ<λc, O(n log n) mixing time. If Δ is large enough, and there is no small cycle. A phase transition occurs at λc
Sampling Counting [Jerrum-Valiant-Vazirani'86]: for all self-reducible problems approx counting exact approx sampling vol() X=(X1.X2.....X) Poly-Time Turing approx inference X~2 Machine Pr[X=·|X=o FON CNITORN ON OCNONMTORAL STUCTU Mark R.JERRUM MCMC Sampling Computational Phase Transition Markov chain for sampling =(X1.X2.....X)~ hardcore model:graph G(V,E).max-degree A,fugacity >0 ● Gibbs sampling (Glauber dynamics,heat-bath) approx sample independent set Iin Gw.p. pick a random i: [Glauber,'63] X(△)= (4-1)a-1) [Weitz,STOC'06]:If <c,no(log A)time. resample~4,(·N(v)h [Geman,Geman,'84] (△-2)A [Sly,FOCS'10 best paper]:If Metropolis-Hastings algorithm NP-hard even for A=0(1). pick a random i: [Metropolis et al,'53] Hard [Efthymiou,.Hayes,Stefankoviě, propose a random e; X-cwp.uxu(); [Hastings,'84] Easy Vigoda,Y.,FOCS'16]: If <he,O(n log n)mixing time. Analysis:coupling methods max-deg△ If A is large enough.and there is no small cycle. [Aldous,'83][Jerrum,'95][Bubley,Dyer'97] A phase transition occurs at e. may give O(n log n)upper bound for mixing time Big Data?
Big Data?
Sampling and Inference for Big Data ●Sampling from a joint distribution (specified by a probabilistic graphical model). ● Inferring according to a probabilistic graphical model. The data (probabilistic graphical model)is BIG
Sampling and Inference for Big Data • Sampling from a joint distribution (specified by a probabilistic graphical model). • Inferring according to a probabilistic graphical model. • The data (probabilistic graphical model) is BIG
Parallel/distributed algorithms for sampling? ·PTIME→Polylog(n)rounds o For parallel/distributed computing: sampling approx counting/inference? 0 PTIME=Polylog(n)rounds √●Dynamic sampling algorithms2 PTIME Polylog(n)incremental cost
• Parallel/distributed algorithms for sampling? • For parallel/distributed computing: sampling ≡ approx counting/inference? • Dynamic sampling algorithms? ✓ ✓ ✓ • PTIME ⟹ Polylog(n) rounds • PTIME ⟹ Polylog(n) rounds • PTIME ⟹ Polylog(n) incremental cost
Local Computation "What can be computed locally?" [Noar,Stockmeyer,STOC'93,SICOMP'95] the LOCAC model [Linial '87]: ● Communications are synchronized. In each round:unlimited local computation and communication with neighbors. Complexity:of rounds to 0 terminate in the worst case. In t rounds:each node can collect information up to distance t. PLOCAL:t=polylog(n)
Local Computation • Communications are synchronized. • In each round: unlimited local computation and communication with neighbors. • Complexity: # of rounds to terminate in the worst case. • In t rounds: each node can collect information up to distance t. the LOCAL model [Linial ’87]: “What can be computed locally?” [Noar, Stockmeyer, STOC’93, SICOMP’95] PLOCAL: t = polylog(n)
"What can be sampled locally?" Joint distribution defined by local constraints: ●Markov random field ●Graphical model .Sample a random solution from the joint distribution: distributed algorithms (in the LOCAC model) network G(,E) Q:"What locally definable joint distributions are locally sample-able?
“What can be sampled locally?” network G(V,E) • Joint distribution defined by local constraints: • Markov random field • Graphical model • Sample a random solution from the joint distribution: • distributed algorithms (in the LOCAL model) Q: “What locally definable joint distributions are locally sample-able?
MCMC Sampling Classic MCMC sampling: GV,E) Markoy chain X→X+l: pick a uniform random vertex v; update X(v)conditioning on X(N(v)); O(n log n)time when mixing Parallelization: Chromatic scheduler [folklore][Gonzalez et al.,AISTAT'11]: Vertices in the same color class are updated in parallel. ●O△logn)mixing time(△is max degree) "Hogwild!"[Niu,Recht,Re.Wright,NIPS'11][De Sa,Olukotun,Re,ICML'16]: All vertices are updated in parallel,ignoring concurrency issues. ●Wrong distribution!
MCMC Sampling G(V,E): v Classic MCMC sampling: Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al., AISTAT’11]: Vertices in the same color class are updated in parallel. • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16]: All vertices are updated in parallel, ignoring concurrency issues. pick a uniform random vertex v; update X(v) conditioning on X(N(v)); Markov chain Xt → Xt+1 : O(n log n) time when mixing • O(Δ log n) mixing time (Δ is max degree) • Wrong distribution!