The Probabilistic Method Lecture Notes JIRi MATOUSEK JAN VONDRAK ied Mathematics 1.25 Czech Republic If you find errors,please let us kn (e-mail:matousek@kam.mff.cuni.cz) Rev.March 2008
The Probabilistic Method Lecture Notes Jiří Matoušek Jan Vondrák Department of Applied Mathematics Charles University Malostranské nám. 25 118 00 Praha 1 Czech Republic If you find errors, please let us know! (e-mail: matousek@kam.mff.cuni.cz) Rev. March 2008
Table of Contents 1 Preliminaries 6 1.1 Probability Theory..........................6 1.2 Useful Estimates 9 2 The Probabilistic Method 11 2.1 Ramsey Numbers.... 11 2.2 Hypergraph Coloring. 2.3 The Erdos-Ko-Rado Theorem.......... 15 2.4 Pairs of Sets... 16 3 Linearity of Expectation 18 3.1 Computing Expectation Using Indicators............. 18 3.2 Hamiltonian Paths........................l9 3.3 Splitting Graphs 。。。 4 Alterations 21 4.1 Independent Sets 4.2 High Girth and High Chromatic Number.............22 5 The Second Mo ent 24 . Variance and the Chebyshev Inequality 5.2 Estimating the Middle Binomial Coefficient··,,,··,·,,·25 5.3 Threshold Functions...·...··.·.:······。····· 26 5.4 The Clique Number......................,.. 29 6 The Lovasz Local Lemma 33 0.1 Statement and Proof 33 6.2 ypergraph Coloring Again···...··.·...·.··..··36 6.3 Directed Cycles,...,...,......,,,.,,..,.., 36 6.4 Ridiculous Iniections......................... 37 6.5 Coloring of Real Numbers...................... 7 Strong Concentration Around the Expectation 7.1 Sum of Independent Uniform+1 Variables. 7.2 Sums of Bounded Independent Random Variables.,··,··.. 42 7.3 A Lower Bound For the Binomial Distribution····.·.···45 7.4 Sums of Moderately Dependent Indicator Variables........48
Table of Contents 1 Preliminaries 6 1.1 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Useful Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 The Probabilistic Method 11 2.1 Ramsey Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Hypergraph Coloring . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 The Erd˝os–Ko–Rado Theorem . . . . . . . . . . . . . . . . . . . 15 2.4 Pairs of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Linearity of Expectation 18 3.1 Computing Expectation Using Indicators . . . . . . . . . . . . . 18 3.2 Hamiltonian Paths . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Splitting Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Alterations 21 4.1 Independent Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 High Girth and High Chromatic Number . . . . . . . . . . . . . 22 5 The Second Moment 24 5.1 Variance and the Chebyshev Inequality . . . . . . . . . . . . . . 24 5.2 Estimating the Middle Binomial Coefficient . . . . . . . . . . . . 25 5.3 Threshold Functions . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 The Clique Number . . . . . . . . . . . . . . . . . . . . . . . . . 29 6 The Lovász Local Lemma 33 6.1 Statement and Proof . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.2 Hypergraph Coloring Again . . . . . . . . . . . . . . . . . . . . . 36 6.3 Directed Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.4 Ridiculous Injections . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.5 Coloring of Real Numbers . . . . . . . . . . . . . . . . . . . . . . 38 7 Strong Concentration Around the Expectation 40 7.1 Sum of Independent Uniform ±1 Variables . . . . . . . . . . . . . 41 7.2 Sums of Bounded Independent Random Variables . . . . . . . . . 42 7.3 A Lower Bound For the Binomial Distribution . . . . . . . . . . 45 7.4 Sums of Moderately Dependent Indicator Variables . . . . . . . . 48
8 Concentration of Lipschitz Functions 51 8.1 Concentration on Product Spaces.. .......51 8.2 Concentration of Lipschitz Functions,With a Proof.. ...54 8.3 Martingales,Azuma's Inequality,and Concentration on Permu- tations 8.4 Isoperimetric Inequalities and Concentration on the Sphere...61 9 Concentration:Beyond the Lipschitz Condition 9.1 Talagrand's Inequality·.,,,,·.···.··,,,··,·,64 9.2Theu-Kim Inequality.....··.。..。..········68
8 Concentration of Lipschitz Functions 51 8.1 Concentration on Product Spaces . . . . . . . . . . . . . . . . . . 51 8.2 Concentration of Lipschitz Functions, With a Proof . . . . . . . . 54 8.3 Martingales, Azuma’s Inequality, and Concentration on Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 8.4 Isoperimetric Inequalities and Concentration on the Sphere . . . 61 9 Concentration: Beyond the Lipschitz Condition 64 9.1 Talagrand’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . 64 9.2 The Vu–Kim Inequality . . . . . . . . . . . . . . . . . . . . . . . 68
Preface These are notes to a lecture taught by J.Matousek at Charles University in Prague fors rs The ere students of mathe ter science,u Generally speaking.an introductory text on the probabilistic method is rather ous,since at least two excellent sources are available:the be J.Spencer:Ten lectures on the probabilistic method,CBMS-NSF, SIAM,Philadelphia,PA,1987 and the more modern and more extensive but no less readable N.Alon and J.Spencer:The Probabilistic Method,J.Wiley and Sons,New York,NY,2nd edition,2000. The lecture was indeed based on these.However,these books were not generally available to students in Prague,and this was the main reason for starting with the present notes.For students,the notes may have another advantage too: they cover the material usually presented in the course relatively concisely. Chapters 8 and 9 go beyond the usual scope of the course and present,mostly without proofs,more recent and more advanced results on strong concentration Our presentation is slightly more formal in some cases and includes a brief review of the relevant probability theory notions.This keeps with the Prague mathematical tradition and should be closer to the presentation the students are used to from other math courses.Teaching experience also shows that the students'proficiency in application of the notions learned in probability theory is limited and that it is useful to demonstrate concrete applications of abstract probabilistic notions in some detail. The techniques are usually illustrated with combinatorial examples.The notation and definitions not introduced here can be found in the book J.Matousek and J.Nesetril:Invitation to Discrete Mathematics, Oxford University Press,Oxford 1998 (Czech version:Kapitoly z diskretni matematiky,Nakladatelstvi Karolinum 2000). A large part of the material is taken directly from the Alon-Spencer book cited above,sometimes with a little different presentation.Readers wishing to pursue the subject in greater depth are certainly advised to consult that book. A more advanced source is
Preface These are notes to a lecture taught by J. Matoušek at Charles University in Prague for several years. The audience were students of mathematics or computer science, usually with interest in combinatorics and/or theoretical computer science. Generally speaking, an introductory text on the probabilistic method is rather superfluous, since at least two excellent sources are available: the beautiful thin book J. Spencer: Ten lectures on the probabilistic method, CBMS-NSF, SIAM, Philadelphia, PA, 1987 and the more modern and more extensive but no less readable N. Alon and J. Spencer: The Probabilistic Method, J. Wiley and Sons, New York, NY, 2nd edition, 2000. The lecture was indeed based on these. However, these books were not generally available to students in Prague, and this was the main reason for starting with the present notes. For students, the notes may have another advantage too: they cover the material usually presented in the course relatively concisely. Chapters 8 and 9 go beyond the usual scope of the course and present, mostly without proofs, more recent and more advanced results on strong concentration. Our presentation is slightly more formal in some cases and includes a brief review of the relevant probability theory notions. This keeps with the Prague mathematical tradition and should be closer to the presentation the students are used to from other math courses. Teaching experience also shows that the students’ proficiency in application of the notions learned in probability theory is limited and that it is useful to demonstrate concrete applications of abstract probabilistic notions in some detail. The techniques are usually illustrated with combinatorial examples. The notation and definitions not introduced here can be found in the book J. Matoušek and J. Nešetřil: Invitation to Discrete Mathematics, Oxford University Press, Oxford 1998 (Czech version: Kapitoly z diskrétní matematiky, Nakladatelství Karolinum 2000). A large part of the material is taken directly from the Alon–Spencer book cited above, sometimes with a little different presentation. Readers wishing to pursue the subject in greater depth are certainly advised to consult that book. A more advanced source is
5 S.Janson,T.Luczak,A.Rucirski:Topics in random graphs,J. Wiley and Sons,New York,NY,2000. very nice book on probabilistic algorithms,also including a chapter on the method per se,is R.Motwani and P.Raghavan:Randomized Algorithms,Cambridge University Press,Cambridge.1995. Two journals in whose scope the probabilistic method occupies a central place are Random Structures Algorithms and Combinatorics,Probability Com- puting.Papers with applications of the probabilistic method are abundant and can be found in many other journals too. A note for Czech students.Teorie pravdepodobnosti,podobne jako jine matematicke discipliny,ma ustalenou zakladni ceskou terminologii,ktera se v mnoha pripadech neshoduje s doslovnym prekladem terminologie anglicke.Do textu jsme zahrnuli nektere ceske terminy jako poznamky pod carou,abychom nepodporovali bujeni obratu typu "ocekavana hodnota",cozje doslovny preklad anglickeho "expectation",misto spravneho stredni hodnota
5 S. Janson, T. Luczak, A. Ruci´nski: Topics in random graphs, J. Wiley and Sons, New York, NY, 2000. A very nice book on probabilistic algorithms, also including a chapter on the probabilistic method per se, is R. Motwani and P. Raghavan: Randomized Algorithms, Cambridge University Press, Cambridge, 1995. Two journals in whose scope the probabilistic method occupies a central place are Random Structures & Algorithms and Combinatorics, Probability & Computing. Papers with applications of the probabilistic method are abundant and can be found in many other journals too. A note for Czech students. Teorie pravděpodobnosti, podobně jako jiné matematické disciplíny, má ustálenou základní českou terminologii, která se v mnoha případech neshoduje s doslovným překladem terminologie anglické. Do textu jsme zahrnuli některé české termíny jako poznámky pod čarou, abychom nepodporovali bujení obratů typu “očekávaná hodnota”, což je doslovný překlad anglického “expectation”, místo správného střední hodnota
1 Preliminaries 1.1 Probability Theory This section summarizes the fundamental notions of probability theory and some results which we will need in the following chapters.In no way is it inten ded to serve asa substitute for a course in probability theory. 1.1.1 Definition.A probability space is a triple (P),where is a set i gebra on of closed or untable w table inte tion vith P]=1.Th nts ofΣar nd the lled ele ementary events.For ar of A. where the set event ity mea ure i rds, y specifying a function p:9 0.1]with)=1.Then the probability meastire is 8 ven by P[A]=】 EAP(w). he basic example of a probability measure is the uniform distributiond on 9,where P[A)=4 for all A. Such a distribution represents the situation where any outcome of an experiment (such as rolling a die)5 is equally likely. 1.1.2 Definition (Random graphs).6 The probability space of random gra- phs G(n,p)is a finite probability space whose elementary events are all graphs on a fixed set of n vertices,and where the probability of a graph with m edges P(G)=p"(1-p)(3)-m probability space-pravdepodobnostni prostor event jev
1 Preliminaries 1.1 Probability Theory This section summarizes the fundamental notions of probability theory and some results which we will need in the following chapters. In no way is it intended to serve as a substitute for a course in probability theory. 1.1.1 Definition. A probability space1 is a triple (Ω, Σ,P), where Ω is a set, Σ ⊆ 2 Ω is a σ-algebra on Ω (a collection of subsets containing Ω and closed on complements, countable unions and countable intersections), and P is a countably additive measure2 on Σ with P[Ω] = 1. The elements of Σ are called events3 and the elements of Ω are called elementary events. For an event A, P[A] is called the probability of A. In this text, we will consider mostly finite probability spaces where the set of elementary events Ω is finite and Σ = 2Ω. Then the probability measure is determined by its values on elementary events; in other words, by specifying a function p : Ω → [0, 1] with P ω∈Ω p(ω) = 1. Then the probability measure is given by P[A] = P ω∈A p(ω). The basic example of a probability measure is the uniform distribution4 on Ω, where P[A] = |A| |Ω| for all A ⊆ Ω. Such a distribution represents the situation where any outcome of an experiment (such as rolling a die)5 is equally likely. 1.1.2 Definition (Random graphs). 6 The probability space of random graphs G(n, p) is a finite probability space whose elementary events are all graphs on a fixed set of n vertices, and where the probability of a graph with m edges is p(G) = p m(1 − p) ( n 2 )−m. 1probability space = pravděpodobnostní prostor 2measure = míra 3 event = jev 4 uniform distribution = rovnoměrné rozdělení 5 rolling a die = hod kostkou 6 random graph = náhodný graf
7 1.Preliminaries mnt r amru可oatsaacoaamihwaeifeoaie Here is an elementary fact which is used all the time: 1.1.3 Lemma.For any collection of events A,....A P[UA≤∑PA, Li=1 =1 Proof.For i=1,...,n,we define B:=A;\(A:U A2 U...UAi-1). P[UA=P[UB=PB刷≤EPA 1.1.4 Definition.Events A,B are independento if P[AOB]=P[A]P[B]. More events A,A2,....An are independent if for any subset of P04:=IIP[Ad. We use the convenient notation [n]for the set {1,2.....n. The independence of A,A2,. An is not equivalent to all the pairs A,A, being independent.Exercise:find three events A,A2 and As that are pairwise independent but not mutually independent. Intuitively,the property of independence means that the knowledge of whe- ther some of the events A. ...An occurred does not provide any information regarding the remaining events. h比,(e h P(B]=PIAOB PB] toss a fair coin=hodit spravedlivou minci eads=lic (hlava) conditional probability podminena pravdepodobnost
7 1. Preliminaries This corresponds to generating the random graph by including every potential edge independently with probability p. For p = 1 2 , we toss a fair coin7 for each pair {u, v} of vertices and connect them by an edge if the outcome is heads.8 9 Here is an elementary fact which is used all the time: 1.1.3 Lemma. For any collection of events A1, . . . , An, P [n i=1 Ai ≤ Xn i=1 P[Ai ]. Proof. For i = 1, . . . , n, we define Bi = Ai \ (A1 ∪ A2 ∪ . . . ∪ Ai−1). Then S Bi = S Ai , P[Bi ] ≤ P[Ai ], and the events B1, . . . , Bn are disjoint. By additivity of the probability measure, P [n i=1 Ai = P [n i=1 Bi = Xn i=1 P[Bi ] ≤ Xn i=1 P[Ai ]. ✷ 1.1.4 Definition. Events A, B are independent10 if P[A ∩ B] = P[A] P[B] . More generally, events A1, A2, . . . , An are independent if for any subset of indices I ⊆ [n] P \ i∈I Ai = Y i∈I P[Ai ]. We use the convenient notation [n] for the set {1, 2, . . . , n}. The independence of A1, A2, . . . , An is not equivalent to all the pairs Ai , Aj being independent. Exercise: find three events A1, A2 and A3 that are pairwise independent but not mutually independent. Intuitively, the property of independence means that the knowledge of whether some of the events A1, . . . , An occurred does not provide any information regarding the remaining events. 1.1.5 Definition (Conditional probability). For events A and B with P[B] > 0, we define the conditional probability11 of A, given that B occurs, as P[A|B] = P[A ∩ B] P[B] . 7 toss a fair coin = hodit spravedlivou mincí 8 heads = líc (hlava) 9 tails = rub (orel) 10independent events = nezávislé jevy 11conditional probability = podmíněná pravděpodobnost
1.1 Probability Theory 8 Note that if A and B are independent,then P[AB]=P[A]. 1.1.6 Definition (Random variable).A real random variable2 on a pro bability (O s P)is a function x.O -R that is P.m easurable.(That is for any aR,{we:X(u)≤a}e) We can also consider random variables with other than real values:for riable In such c dom function om the bability into the P R"in onsider real ra 1.1.7 Definition.The expectation13 of a(real)random variable X is E[X]=x(w)dP(w). can be expressed as E[X]=p(w)X(w). 1.1.8 Definition (Independence of variables).Real random variables X,Y are independent if we have,for every two measurable sets A,B CR, PX∈A and Y∈B周=P[X∈APY∈B ents in the previous definition:For X and Y 塞 rm nce versa.I r al me e ts A and B:i P[K≤a and Y≤=P[K≤adPY≤ for all a.bE R.then X and Y are independent As we will check in Chapter 3.ElX+Yl=EIXI+EY]holds for anu two tandom variables (provided that the On the other hand. EXY]is generally different from E[X]E Y].But we have 1.1.9 Lemma.If X and Y are independent random variables,then EXY]=EXI·EY]
1.1 Probability Theory 8 Note that if A and B are independent, then P[A|B] = P[A]. 1.1.6 Definition (Random variable). A real random variable12 on a probability space (Ω, Σ,P) is a function X: Ω → R that is P-measurable. (That is, for any a ∈ R, {ω ∈ Ω: X(ω) ≤ a} ∈ Σ.) We can also consider random variables with other than real values; for example, a random variable can have complex numbers or n-component vectors of real numbers as values. In such cases, a random variable is a measurable function from the probability space into the appropriate space with measure (complex numbers or Rn in the examples mentioned above). In this text, we will mostly consider real random variables. 1.1.7 Definition. The expectation13 of a (real) random variable X is E [X] = Z Ω X(ω) dP(ω). Any real function on a finite probability space is a random variable. Its expectation can be expressed as E [X] = X ω∈Ω p(ω)X(ω). 1.1.8 Definition (Independence of variables). Real random variables X, Y are independent if we have, for every two measurable sets A, B ⊆ R, P[X ∈ A and Y ∈ B] = P[X ∈ A] · P[Y ∈ B] . Note the shorthand notation for the events in the previous definition: For example, P[X ∈ A] stands for P[{ω ∈ Ω: X(ω) ∈ A}]. Intuitively, the independence of X and Y means that the knowledge of the value attained by X gives us no information about Y , and vice versa. In order to check independence, one need not consider all measurable sets A and B; it is sufficient to look at A = (−∞, a] and B = (−∞, b]. That is, if P[X ≤ a and Y ≤ b] = P[X ≤ a] P[Y ≤ b] for all a, b ∈ R, then X and Y are independent. As we will check in Chapter 3, E [X + Y ] = E [X] +E [Y ] holds for any two random variables (provided that the expectations exist). On the other hand, E [XY ] is generally different from E [X] E [Y ]. But we have 1.1.9 Lemma. If X and Y are independent random variables, then E [XY ] = E [X] · E [Y ] . 12random variable = náhodná proměnná 13expectation = střední hodnota!!!
9 1.Preliminaries Proof(for finite probability spaces).If X and Y are random variables on a finite probability space,the proof is especially simple.Let Vx,Vy be the (finite)sets of values attained by X and by Y,respectively.By independence, we have P[X a and Y=]=P[X a]P[Y =b]for any a Vx and bVy. We calculate ∑ab.Px=a and y= = ab.P[X a]P[y =b = (∑aPX=al)(∑bPY=)=E[X]E [Y] aEVx but the idea is the same. 1.2 Useful Estimates th the probabilistic method,many problems are is belo 1,or even ten age of often nee e en rul here is to start ughest nd only i they don't can try ones. Here we describe the most often used estimates for basic co For the factorial function n!,we can often do with the obvious upper bound n!<n".More refined bounds are ()”s≤m()” =2.718281828 is the basis of natural ogarithms),which For the"o(月,the baic bound is(≤k,and sharper ones are ()≤(因s() For all k,we also have(2".Sometimes we need better estimates of the middle binomial coefficient ()we have 22m )需 (also see Section 5.2 for a derivation of a slightly weaker lower bound). Very often we nee d the inequality1+x≤e valid for all real In particula for bounding expressions of the form(1-p)from above,with p0small one uses (1-p)m≤e-mp
9 1. Preliminaries Proof (for finite probability spaces). If X and Y are random variables on a finite probability space, the proof is especially simple. Let VX, VY be the (finite) sets of values attained by X and by Y , respectively. By independence, we have P[X = a and Y = b] = P[X = a] P[Y = b] for any a ∈ VX and b ∈ VY . We calculate E [XY ] = X a∈VX ,b∈VY ab · P[X = a and Y = b] = X a∈VX ,b∈VY ab · P[X = a] P[Y = b] = X a∈VX a P[X = a] X b∈VY b P[Y = b] = E [X] E [Y ] . For infinite probability spaces, the proof is formally a little more complicated but the idea is the same. ✷ 1.2 Useful Estimates In the probabilistic method, many problems are reduced to showing that certain probability is below 1, or even tends to 0. In the final stage of such proofs, we often need to estimate some complicated-looking expressions. The golden rule here is to start with the roughest estimates, and only if they don’t work, one can try more refined ones. Here we describe the most often used estimates for basic combinatorial functions. For the factorial function n!, we can often do with the obvious upper bound n! ≤ n n . More refined bounds are n e n ≤ n! ≤ en n e n (where e = 2.718281828 . . . is the basis of natural logarithms), which can be proved by induction. The well-known Stirling formula is very seldom needed in its full strength. For the binomial coefficient n k , the basic bound is n k ≤ n k , and sharper ones are n k k ≤ n k ! ≤ en k k . For all k, we also have n k ≤ 2 n . Sometimes we need better estimates of the middle binomial coefficient 2m m ; we have 2 2m 2 √ m ≤ 2m m ! ≤ 2 2m √ 2m (also see Section 5.2 for a derivation of a slightly weaker lower bound). Very often we need the inequality 1+x ≤ e x , valid for all real x. In particular, for bounding expressions of the form (1 − p) m from above, with p > 0 small, one uses (1 − p) m ≤ e −mp
1.2 Useful Estimates 10 almost automatically.For estimating such expressions from below,which is usually more delicate,we can often use 1-p≥ep, which is valid for0≤p≤
1.2 Useful Estimates 10 almost automatically. For estimating such expressions from below, which is usually more delicate, we can often use 1 − p ≥ e −2p , which is valid for 0 ≤ p ≤ 1 2