香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 08 An introduction to expander graphs（EXPANDER GRAPHS AND THEIR APPLICATIONS）.pdf_大学文库

BULLETIN (New Series)OF THE AMERICAN MATHEMATICAL SOCIETY (00012 4.October 2006,Pages 439-561 Article electronically published on August 7,2006 EXPANDER GRAPHS AND THEIR APPLICATIONS SHLOMO HOORY.NATHAN LINIAL.AND AVI WIGDERSON An Overview A major consideration we had in writing this survey was to make it accessible to mathematicians as well as to computer scientists,since expander graphs,the protagonists of our story,come up in numerous and often surprising contexts in both fields. But,perhaps,we should start with a few words about graphs in general.They are,of course,one of the prime objects of study in Discrete Mathematics.However, graphs are among the most ubiquitous models of both natural and human-made structures.In the natural and social sciences they model relations among species, societies,companies,etc.In computer science,they represent networks of commu- nication,data organization,computational devices as well as the flow of computa- tion,and more.In mathematics,Cayley graphs are useful in Group Theory.Graphs carry a natural metric and are therefore useful in Geometry,and though they are "just"one-dimensional complexes,they are useful in certain parts of Topology,e.g. Knot Theory.In statistical physics,graphs can represent local connections between interacting parts of a system,as well as the dynamics of a physical process on such systems. The study of these models calls.then.for the comprehension of the significant structural properties of the relevant graphs.But are there nontrivial structural properties which are universally important?Expansion of a graph requires that it is simultaneously sparse and highly connected.Expander graphs were first de- fined by Bassalygo and Pinsker,and their existence first proved by Pinsker in the early'70s.The property of being an expander seems significant in many of these mathematical,computational and physical contexts.It is not surprising that ex- panders are useful in the design and analysis of communication networks.What is less obvious is that expanders have surprising utility in other computational settings such as in the theory of error correcting codes and the theory of pseudorandom- ness.In mathematics,we will encounter e.g.their role in the study of metric embeddings,and in particular in work around the Baum-Connes Conjecture.Ex- pansion is closely related to the convergence rates of Markov Chains,and so they play a key role in the study of Monte-Carlo algorithms in statistical mechanics and in a host of practical computational applications.The list of such interesting and fruitful connections goes on and on with so many applications we will not even Received by the editors April 28,2006,and,in revised form,May 10,2006. 2000 Mathematics Subject Classification.Primary 01-01,68-01,05-01,68Q01,94-01;Sec- ondary68Q15,68Q17,94B05.05C25,05C35,05C40,05C50,05C80.05C90,60J10.35J99,20F05 20F69.20C99. Work supported in part by grants from the Israel Science Foundation and the Israel-U.S Binational Fund. C2006 S.Hoory,N.Linial,and A.Wigderson 439

BULLETIN (New Series) OF THE AMERICAN MATHEMATICAL SOCIETY Volume 43, Number 4, October 2006, Pages 439–561 S 0273-0979(06)01126-8 Article electronically published on August 7, 2006 EXPANDER GRAPHS AND THEIR APPLICATIONS SHLOMO HOORY, NATHAN LINIAL, AND AVI WIGDERSON An Overview A major consideration we had in writing this survey was to make it accessible to mathematicians as well as to computer scientists, since expander graphs, the protagonists of our story, come up in numerous and often surprising contexts in both fields. But, perhaps, we should start with a few words about graphs in general. They are, of course, one of the prime objects of study in Discrete Mathematics. However, graphs are among the most ubiquitous models of both natural and human-made structures. In the natural and social sciences they model relations among species, societies, companies, etc. In computer science, they represent networks of communication, data organization, computational devices as well as the flow of computation, and more. In mathematics, Cayley graphs are useful in Group Theory. Graphs carry a natural metric and are therefore useful in Geometry, and though they are “just” one-dimensional complexes, they are useful in certain parts of Topology, e.g. Knot Theory. In statistical physics, graphs can represent local connections between interacting parts of a system, as well as the dynamics of a physical process on such systems. The study of these models calls, then, for the comprehension of the significant structural properties of the relevant graphs. But are there nontrivial structural properties which are universally important? Expansion of a graph requires that it is simultaneously sparse and highly connected. Expander graphs were first de- fined by Bassalygo and Pinsker, and their existence first proved by Pinsker in the early ’70s. The property of being an expander seems significant in many of these mathematical, computational and physical contexts. It is not surprising that expanders are useful in the design and analysis of communication networks. What is less obvious is that expanders have surprising utility in other computational settings such as in the theory of error correcting codes and the theory of pseudorandomness. In mathematics, we will encounter e.g. their role in the study of metric embeddings, and in particular in work around the Baum-Connes Conjecture. Expansion is closely related to the convergence rates of Markov Chains, and so they play a key role in the study of Monte-Carlo algorithms in statistical mechanics and in a host of practical computational applications. The list of such interesting and fruitful connections goes on and on with so many applications we will not even Received by the editors April 28, 2006, and, in revised form, May 10, 2006. 2000 Mathematics Subject Classification. Primary 01-01, 68-01, 05-01, 68Q01, 94-01; Secondary 68Q15, 68Q17, 94B05, 05C25, 05C35, 05C40, 05C50, 05C80, 05C90, 60J10, 35J99, 20F05, 20F69, 20C99. Work supported in part by grants from the Israel Science Foundation and the Israel-U.S. Binational Fund. c 2006 S. Hoory, N. Linial, and A. Wigderson 439

440 SHLOMO HOORY.NATHAN LINIAL.AND AVI WIGDERSON be able to mention.This universality of expanders is becoming more evident as more connections are discovered.It transpires that expansion is a fundamental mathematical concept,well deserving to be thoroughly investigated on its own. In hindsight,one reason that expanders are so ubiquitous is that their very defini- tion can be given in at least three languages:combinatorial/geometric,probabilistic and algebraic.Combinatorially,expanders are graphs which are highly connected; to disconnect a large part of the graph,one has to sever many edges.Equivalently, using the geometric notion of isoperimetry,every set of vertices has a (relatively) very large boundary.From the probabilistic viewpoint,one considers the natural random walk on a graph,in which we have a token on a vertex,that moves at every step to a random neighboring vertex,chosen uniformly and independently. Expanders are graphs for which this process converges to its limiting distribution as rapidly as possible.Algebraically,one can consider the Laplace operator on the graph and its spectrum.From this perspective,expanders are graphs in which the first positive eigenvalue(of their Laplace operator)is bounded away from zero. The study of expanders leads in different directions.There are structural prob- lems:what are the best bounds on the various expansion parameters,and how do they relate to each other and to other graph invariants?There are problems concerning explicit constructions:how to efficiently generate expanders with given parameters.These are extremely important for applications.There are algorith- mic problems-given a graph,test if it is an expander with given parameters. Finally,there is the problem of understanding the relation of expansion with other mathematical notions,and the application of expanders to practical and theoretical problems. In the past four decades,a great amount of research has been done on these topics,resulting in a wide-ranging body of knowledge.In this survey,we could not hope to cover even a fraction of it.We have tried to make the presentation as broad as possible,touching on the various research directions mentioned above. Even what we do cover is of course incomplete,and we try to give the relevant references for more comprehensive coverage.We have also tried to mention in each section related research which we are not covering at all and to reference some of this as well. The selection of material naturally reflects our interests and biases.It is rather diverse and can be read in different orders,according to the reader's taste and interests. General background material on the computer science side includes the books on Computational Complexity (specifically,complexity classes)Pap94,Sip97],on Algorithms [CLRSO1]and on Randomized Algorithms MR95],and the survey on the P versus NP problem Wig06. This article evolved from lecture notes for a course on expanders taught at the Hebrew University,Israel,in 2003 by Nati Linial and Avi Wigderson.We are greatly indebted to the scribes of the course notes:Ran Gilad-Bachrach,Danny Harnik. Boaz Barak,Udi Wieder,Eran Ofek,Erez Waisbard,Yael Vinner-Dekel,Yishai Beeri,David Statter,Eyal Bigman,Tamir Hazan,Elon Portugaly,Ariel Elbaz, Yuval Filmus,Michal Igell,Eyal Rozenman,Danny Gutfreund,and Yonatan Bilu. Also,we acknowledge that the proof that Margulis construction is an expander is taken (with slight changes)from course notes of Ravi Boppana,with Mukesh Dalal as scribe

440 SHLOMO HOORY, NATHAN LINIAL, AND AVI WIGDERSON be able to mention. This universality of expanders is becoming more evident as more connections are discovered. It transpires that expansion is a fundamental mathematical concept, well deserving to be thoroughly investigated on its own. In hindsight, one reason that expanders are so ubiquitous is that their very definition can be given in at least three languages: combinatorial/geometric, probabilistic and algebraic. Combinatorially, expanders are graphs which are highly connected; to disconnect a large part of the graph, one has to sever many edges. Equivalently, using the geometric notion of isoperimetry, every set of vertices has a (relatively) very large boundary. From the probabilistic viewpoint, one considers the natural random walk on a graph, in which we have a token on a vertex, that moves at every step to a random neighboring vertex, chosen uniformly and independently. Expanders are graphs for which this process converges to its limiting distribution as rapidly as possible. Algebraically, one can consider the Laplace operator on the graph and its spectrum. From this perspective, expanders are graphs in which the first positive eigenvalue (of their Laplace operator) is bounded away from zero. The study of expanders leads in different directions. There are structural problems: what are the best bounds on the various expansion parameters, and how do they relate to each other and to other graph invariants? There are problems concerning explicit constructions: how to efficiently generate expanders with given parameters. These are extremely important for applications. There are algorithmic problems - given a graph, test if it is an expander with given parameters. Finally, there is the problem of understanding the relation of expansion with other mathematical notions, and the application of expanders to practical and theoretical problems. In the past four decades, a great amount of research has been done on these topics, resulting in a wide-ranging body of knowledge. In this survey, we could not hope to cover even a fraction of it. We have tried to make the presentation as broad as possible, touching on the various research directions mentioned above. Even what we do cover is of course incomplete, and we try to give the relevant references for more comprehensive coverage. We have also tried to mention in each section related research which we are not covering at all and to reference some of this as well. The selection of material naturally reflects our interests and biases. It is rather diverse and can be read in different orders, according to the reader’s taste and interests. General background material on the computer science side includes the books on Computational Complexity (specifically, complexity classes) [Pap94, Sip97], on Algorithms [CLRS01] and on Randomized Algorithms [MR95], and the survey on the P versus NP problem [Wig06]. This article evolved from lecture notes for a course on expanders taught at the Hebrew University, Israel, in 2003 by Nati Linial and Avi Wigderson. We are greatly indebted to the scribes of the course notes: Ran Gilad-Bachrach, Danny Harnik, Boaz Barak, Udi Wieder, Eran Ofek, Erez Waisbard, Yael Vinner-Dekel, Yishai Beeri, David Statter, Eyal Bigman, Tamir Hazan, Elon Portugaly, Ariel Elbaz, Yuval Filmus, Michal Igell, Eyal Rozenman, Danny Gutfreund, and Yonatan Bilu. Also, we acknowledge that the proof that Margulis construction is an expander is taken (with slight changes) from course notes of Ravi Boppana, with Mukesh Dalal as scribe

44d SHLOMO HOORY.NATHAN LINIAL.AND AVI WIGDERSON 1.1.Three motivating problems 1.1.1.Hardness results for linear transformations.The P vs.NP problem is ar- guably the most important open problem in theoretical computer science.Despite its great significance and despite intensive research efforts,very little progress has been made.But interesting aspects of computational complexity can be investi- gated in other,more restricted contexts.For example,we may consider evaluating polynomials over a field using only the field's arithmetic,or even evaluating linear transformations using only addition and multiplication by scalars from the field Valiant [Val76]considered the following natural problem: Problem 1.1.Let A be an nxn matrix over the field!F.What is the least number of gates in a circuit that computes the linear transformation xAr?Each gate is specified by two field elements a and b.Such a gate receives two inputs z and y and outputs ax +by. Aside from its profound theoretical importance,certain instances of this question have far-reaching technological significance.Consider the matrix ar.s=w"s(n-1> r.s >0),where w is a primitive n-th root of unity.The transformation Ax is the Discrete Fourier Transform,which is fundamental to many modern technologies involving signal processing,machine learning,etc.As observed by Cooley and Tukey CT65,there is a circuit realizing this linear transformation (the so-called Fast Fourier Transform(FFT))with only O(nlogn)gates.Therefore the least number of gates in such a circuit is between O(n logn)and n(which are required just to input the vector z).This may seem like a small gap in our knowledge, but it is rather significant.The technological implications of a Very Fast Fourier Transform,i.e.an O(n)-sized circuit that computes the transform (should one exist),are hard to overestimate.On the other hand,it would be a great theoretical breakthrough to establish a matching lower bound of (nlogn),or even rule out the existence of such a circuit with only O(n)gates. For every field F,it is fairly easy to show that for most n x n matrices A,every circuit realizing A must have (n2/log n)gates.This is based on a counting argu- ment that compares the number of circuits with a given number of gates and the number of n x n matrices over the field.As is often the case in computational com- plexity,despite this abundance of computationally hard functions,we are unable to exhibit any specific,explicit linear transformation A that requires asymptot- ically more then O(n)gates.In an attempt to solve this problem,Valiant [Val76] conjectured that super regular transformations are "hard"in this sense. Definition 1.2(Super Regular Matrix).A matrix A is super regular if every square sub-matrix of A has full rank. Valiant considered the graph layout of a circuit which realizes the linear trans- formation corresponding to a super regular matrix.His main observation was that this graph must be a super concentrator: Definition 1.3(Super Concentrator).Let G=(V,E)be a graph and let I and O be two subsets of V with n vertices,each called the input and output sets respectively.We say that G is a super concentrator if for every k and every SI and T CO with S=T=k,there exist k vertex disjoint paths in G from S to T IThis problem is interesting for finite as well as for infinite fields

444 SHLOMO HOORY, NATHAN LINIAL, AND AVI WIGDERSON 1.1. Three motivating problems. 1.1.1. Hardness results for linear transformations. The P vs. NP problem is arguably the most important open problem in theoretical computer science. Despite its great significance and despite intensive research efforts, very little progress has been made. But interesting aspects of computational complexity can be investigated in other, more restricted contexts. For example, we may consider evaluating polynomials over a field using only the field’s arithmetic, or even evaluating linear transformations using only addition and multiplication by scalars from the field. Valiant [Val76] considered the following natural problem: Problem 1.1. Let A be an n×n matrix over the field1 F. What is the least number of gates in a circuit that computes the linear transformation x → Ax? Each gate is specified by two field elements a and b. Such a gate receives two inputs x and y and outputs ax + by. Aside from its profound theoretical importance, certain instances of this question have far-reaching technological significance. Consider the matrix ar,s = ωrs (n−1 ≥ r, s ≥ 0), where ω is a primitive n-th root of unity. The transformation x → Ax is the Discrete Fourier Transform, which is fundamental to many modern technologies involving signal processing, machine learning, etc. As observed by Cooley and Tukey [CT65], there is a circuit realizing this linear transformation (the so-called Fast Fourier Transform (FFT)) with only O(n log n) gates. Therefore the least number of gates in such a circuit is between O(n log n) and n (which are required just to input the vector x). This may seem like a small gap in our knowledge, but it is rather significant. The technological implications of a Very Fast Fourier Transform, i.e. an O(n)-sized circuit that computes the transform (should one exist), are hard to overestimate. On the other hand, it would be a great theoretical breakthrough to establish a matching lower bound of Ω(n log n), or even rule out the existence of such a circuit with only O(n) gates. For every field F, it is fairly easy to show that for most n × n matrices A, every circuit realizing A must have Ω(n2/ log n) gates. This is based on a counting argument that compares the number of circuits with a given number of gates and the number of n×n matrices over the field. As is often the case in computational complexity, despite this abundance of computationally hard functions, we are unable to exhibit any specific, explicit linear transformation A that requires asymptotically more then O(n) gates. In an attempt to solve this problem, Valiant [Val76] conjectured that super regular transformations are “hard” in this sense. Definition 1.2 (Super Regular Matrix). A matrix A is super regular if every square sub-matrix of A has full rank. Valiant considered the graph layout of a circuit which realizes the linear transformation corresponding to a super regular matrix. His main observation was that this graph must be a super concentrator : Definition 1.3 (Super Concentrator). Let G = (V,E) be a graph and let I and O be two subsets of V with n vertices, each called the input and output sets respectively. We say that G is a super concentrator if for every k and every S ⊆ I and T ⊆ O with |S| = |T| = k, there exist k vertex disjoint paths in G from S to T. 1This problem is interesting for finite as well as for infinite fields

EXPANDER GRAPHS AND THEIR APPLICATIONS 445 It is a simple exercise to show that indeed the underlying graph of any circuit for a super regular matrix is a super concentrator (with inputs and outputs retaining their meaning in both).Valiant conjectured that any super concentrator must have>n edges.That would have implied that any circuit which computes a super regular matrix must have>n gates.However,Valiant himself disproved the conjecture and presented super concentrators with O(n)edges.As you may have guessed,this is where expanders come into the picture. We note that this construction can actually be used to give a super regular ma- trix that has a linear sized circuit,which seems to put this approach to rest.This is not quite so,and Valiant's ideas were later realized,as follows:If we consider circuits with more than two inputs per gate but where the circuit's depth is re- stricted,then superlinear lower bounds for the number of edges in depth-limited super concentrators were proven [DDPW83].Subsequently the desired superlinear lower bounds for computing the associated linear transformations in bounded-depth circuit model were derived Lok01.RS03 Even though this approach did not yield strong lower bounds on circuit sizes, these attempts have brought forward the importance of sparse super concentrators in network theory and other areas.Valiant's idea has eventually had a major impact on the field. We now skip to a totally different problem. 1.1.2.Construction of good error correcting codes.One of the most fundamental problems in communication is noise.Suppose that Alice has a message of k bits which she would like to deliver to Bob over some(noisy)communication channel. The problem is that noise in the channel may corrupt the message so that Bob receives a message that differs from the one sent by Alice. In his ground-breaking paper "A Mathematical Theory of Communication" [Sha48,Claude Shannon laid the foundations for Information Theory and the mathematical theory of communication.The problem of communicating over a noisy channel(which in the form below was suggested by Hamming H50])occupies a central part of this theory. Problem 1.4 (communication over noisy channel).Alice and Bob communicate over a noisy channel.A fraction p of the bits sent through the channel may be altered.What is the smallest number of bits that Alice can send,assuming she wants to communicate an arbitrary k-bit message,so that Bob should be able to unambiguously recover the original message? To solve this problem,Shannon suggested creating a dictionary (or code)CC {0,l}n of size IC]=2 *and using a bijective mapping(“an encoding”）p:{0,l}k→ C.To send a message x e {0,1,Alice transmits the n-bit encoded message ()EC.It is assumed that Bob receives a string y{0,1)"that is a corrupted version of the message actually sent (r)EC.Bob finds the codeword 2EC that is closest to y (the metric used is the Hamming distance:d(u,v)is the number of coordinates i where ui vi).He concludes that the message actually sent was (z).If the distance between every two words in C is greater than 2pn,it is guaranteed that indeed z=(r),and Bob correctly infers the message sent by Alice. The problem of communicating over a noisy channel is thus reduced to the problem of finding a good dictionary:namely,a set C of n-bit strings of largest

EXPANDER GRAPHS AND THEIR APPLICATIONS 445 It is a simple exercise to show that indeed the underlying graph of any circuit for a super regular matrix is a super concentrator (with inputs and outputs retaining their meaning in both). Valiant conjectured that any super concentrator must have n edges. That would have implied that any circuit which computes a super regular matrix must have n gates. However, Valiant himself disproved the conjecture and presented super concentrators with O(n) edges. As you may have guessed, this is where expanders come into the picture. We note that this construction can actually be used to give a super regular matrix that has a linear sized circuit, which seems to put this approach to rest. This is not quite so, and Valiant’s ideas were later realized, as follows: If we consider circuits with more than two inputs per gate but where the circuit’s depth is restricted, then superlinear lower bounds for the number of edges in depth-limited super concentrators were proven [DDPW83]. Subsequently the desired superlinear lower bounds for computing the associated linear transformations in bounded-depth circuit model were derived [Lok01, RS03]. Even though this approach did not yield strong lower bounds on circuit sizes, these attempts have brought forward the importance of sparse super concentrators in network theory and other areas. Valiant’s idea has eventually had a major impact on the field. We now skip to a totally different problem. 1.1.2. Construction of good error correcting codes. One of the most fundamental problems in communication is noise. Suppose that Alice has a message of k bits which she would like to deliver to Bob over some (noisy) communication channel. The problem is that noise in the channel may corrupt the message so that Bob receives a message that differs from the one sent by Alice. In his ground-breaking paper “A Mathematical Theory of Communication” [Sha48], Claude Shannon laid the foundations for Information Theory and the mathematical theory of communication. The problem of communicating over a noisy channel (which in the form below was suggested by Hamming [H50]) occupies a central part of this theory. Problem 1.4 (communication over noisy channel). Alice and Bob communicate over a noisy channel. A fraction p of the bits sent through the channel may be altered. What is the smallest number of bits that Alice can send, assuming she wants to communicate an arbitrary k-bit message, so that Bob should be able to unambiguously recover the original message? To solve this problem, Shannon suggested creating a dictionary (or code) C ⊆ {0, 1}n of size |C| = 2k and using a bijective mapping (“an encoding”) ϕ : {0, 1}k → C. To send a message x ∈ {0, 1}k, Alice transmits the n-bit encoded message ϕ(x) ∈ C. It is assumed that Bob receives a string y ∈ {0, 1}n that is a corrupted version of the message actually sent ϕ(x) ∈ C. Bob finds the codeword z ∈ C that is closest to y (the metric used is the Hamming distance: dH (u, v) is the number of coordinates i where ui = vi). He concludes that the message actually sent was ϕ−1(z). If the distance between every two words in C is greater than 2pn, it is guaranteed that indeed z = ϕ(x), and Bob correctly infers the message sent by Alice. The problem of communicating over a noisy channel is thus reduced to the problem of finding a good dictionary: namely, a set C of n-bit strings of largest

446 SHLOMO HOORY.NATHAN LINIAL.AND AVI WIGDERSON possible cardinality subject to the condition that every two strings in C are at a large Hamming distance. Definition 1.5 (the rate and distance of a dictionary).Let C{0,1)"be a dictionary.Its rate and(normalized)distance are defined by: R= log C] 6= minc1≠c2eCdH(c1,c2） n As we saw before,the distance of a dictionary controls its power to overcome noise.A code's rate measures its efficiency in channel utilization.At this point we can refine the problem and ask: Problem 1.6(refined communication problem).Is it possible to design arbitrarily large dictionaries {Ck}of size Ckl =2,with R(Ck)>Ro and 6(Ck)>6o for some absolute constants Ro,6o >0?Moreover,can we make these codes explicit and efficiently encodable and decodable? This problem and its relatives (optimizing the code's parameters,and the algo- rithms'efficiency,in this and other error models and communication settings)is the subject of Coding Theory,a rich and active field initiated by Shannon's work(see e.g.[MS77a,MS77b]and [vL99]for the general theory and Sudan's notes [Sud00] for complexity theoretic aspects of the field).It took over 20 years of research until even the basic Problem 1.6 was resolved,but below we present a simple solution to this problem using expander graphs.However,before we do that,let us present our third motivating problem. 1.1.3.Deterministic error amplification for RP.The field of probabilistic algo- rithms burst into existence within Theoretical Computer Science,with the fast primality tests of Rabin Rab80 and of Solovay and Strassen SS77.Given a k-bit integer x and a string r of k random bits,these algorithms efficiently compute a boolean valued function f(r,r)with the following property.If z is prime,then f(x,r)=1 for all choices of r.Otherwise,if x is composite,f(r,r)=1 with prob- ability smaller than 1/2 over a randomly chosen r.If f =1,the algorithm declares x a prime,and otherwise declares it to be composite.It never fails on primes,and for every composite z its probability of failure is at most 1/2. The error bound 1/2 may not be very satisfactory,and one would like to reduce it to some desired level.A very simple way to reduce our failure probability is to apply the same algorithm repeatedly with new randomly chosen r's.Repeating it (say)d times will reduce the probability of error to below 2-d.On the other hand, the running time and the number of random bits used increase by a factor of d. Is there a way to reduce the error "deterministically"without using more random bits,or at least using less than the obvious procedure above?We will see several answers to this question in these notes,and this section contains an initial advance on the problem.The importance of minimizing the number of random bits may not be evident,but we can assure the reader that it is a basic theoretical problem and. moreover,that getting your hands on good random bits is a nontrivial practical problem. The above-mentioned primality testing algorithms belong to the class RP of Randomized Polynomial-Time algorithms.It is in this general setting that we

446 SHLOMO HOORY, NATHAN LINIAL, AND AVI WIGDERSON possible cardinality subject to the condition that every two strings in C are at a large Hamming distance. Definition 1.5 (the rate and distance of a dictionary). Let C ⊆ {0, 1}n be a dictionary. Its rate and (normalized) distance are defined by: R = log |C| n , δ = minc1=c2∈C dH(c1, c2) n . As we saw before, the distance of a dictionary controls its power to overcome noise. A code’s rate measures its efficiency in channel utilization. At this point we can refine the problem and ask: Problem 1.6 (refined communication problem). Is it possible to design arbitrarily large dictionaries {Ck} of size |Ck| = 2k, with R(Ck) ≥ R0 and δ(Ck) ≥ δ0 for some absolute constants R0, δ0 > 0? Moreover, can we make these codes explicit and efficiently encodable and decodable? This problem and its relatives (optimizing the code’s parameters, and the algorithms’ efficiency, in this and other error models and communication settings) is the subject of Coding Theory, a rich and active field initiated by Shannon’s work (see e.g. [MS77a, MS77b] and [vL99] for the general theory and Sudan’s notes [Sud00] for complexity theoretic aspects of the field). It took over 20 years of research until even the basic Problem 1.6 was resolved, but below we present a simple solution to this problem using expander graphs. However, before we do that, let us present our third motivating problem. 1.1.3. Deterministic error amplification for RP. The field of probabilistic algorithms burst into existence within Theoretical Computer Science, with the fast primality tests of Rabin [Rab80] and of Solovay and Strassen [SS77]. Given a k-bit integer x and a string r of k random bits, these algorithms efficiently compute a boolean valued function f(x, r) with the following property. If x is prime, then f(x, r) = 1 for all choices of r. Otherwise, if x is composite, f(x, r) = 1 with probability smaller than 1/2 over a randomly chosen r. If f = 1, the algorithm declares x a prime, and otherwise declares it to be composite. It never fails on primes, and for every composite x its probability of failure is at most 1/2. The error bound 1/2 may not be very satisfactory, and one would like to reduce it to some desired level. A very simple way to reduce our failure probability is to apply the same algorithm repeatedly with new randomly chosen r’s. Repeating it (say) d times will reduce the probability of error to below 2−d. On the other hand, the running time and the number of random bits used increase by a factor of d. Is there a way to reduce the error “deterministically” without using more random bits, or at least using less than the obvious procedure above? We will see several answers to this question in these notes, and this section contains an initial advance on the problem. The importance of minimizing the number of random bits may not be evident, but we can assure the reader that it is a basic theoretical problem and, moreover, that getting your hands on good random bits is a nontrivial practical problem. The above-mentioned primality testing algorithms belong to the class RP of Randomized Polynomial-Time algorithms. It is in this general setting that we

EXPANDER GRAPHS AND THEIR APPLICATIONS 447 discuss our problem.Let {0,1]*denote the set of all finite binary strings.Then a language CC {0,1]*is in the class RP if there exists a randomized algorithm A with a polynomial (in running time such that if x C,then A(z,r)=1 (with certainty),whereas if x C,the probability of A(z,r)=1 is smaller than 1/16.(The definition remains unchanged with any constant 1 that we choose. The constant 1/16 was chosen for notational convenience.)Note again that r is a uniformly chosen random string of k bits,with k polynomially dependent on the length of the input z.In this case we say that C{0,1)*has a(1-sided error) randomized polynomial time membership algorithm. Problem 1.7 (Saving Random Bits).Assume that C {0,1)*has a (1-sided error)randomized polynomial time membership algorithm.How many random bits are needed in order to reduce the probability of error to be <e?(Note that we seek a bound that should apply to every input.) 1.2.Magical graphs.In the previous section we presented three seemingly un- related problems.We now introduce a new object:a "Magical Graph"that will enable us to solve all these problems.This object exhibits an "expansion"property (a "combinatorial isoperimetric inequality")to fit our three applications. Definition 1.8 (Magical Graph).Let G=(L,R,E)be a bipartite graph.The vertex set consists of L and R,two disjoint subsets,henceforth the left and right vertex sets.We say that G is an (n,m;d)-magical graph if L=n,R=m,and every left vertex has d neighbors and the following two properties hold(where I(S) denotes the set of neighbors of a set S in G): ()T(Sl≥g.IS]for every SCL with S]≤品 (2)T(Sl≥S]for every S≤L with0a<lS≤受 As observed by Pinsker Pin73](for other but related expansion properties),such graphs exist.The proof is by a probabilistic argument and it implies that,in fact, most graphs are magical Lemma 1.9.There erists a constant no such that for every d 32 and n no.m 3n/4 there erists an (n,m;d)-magical graph. Proof.Let G be a random bipartite graph with n vertices on the left and m vertices on the right,where each left vertex connects to a randomly chosen set of d vertices on the right.We claim that with high probability G is a magical graph.We start by proving that the first property holds with high probability. Let S_L have cardinality s=lS≤品，and let T∈R have cardinality t<Let Xs.r be an indicator random variable for the event that all the edges from S go to T.It is clear that if Xs.r=0,where the sum is over all choices of S and T as above,then the first property holds.The probability of the event Xs.r is (t/m)sd,and therefore using a union bound and the inequality

EXPANDER GRAPHS AND THEIR APPLICATIONS 447 discuss our problem. Let {0, 1}∗ denote the set of all finite binary strings. Then a language L⊆{0, 1}∗ is in the class RP if there exists a randomized algorithm A with a polynomial (in |x|) running time such that if x ∈ L, then A(x, r)=1 (with certainty), whereas if x /∈ L, the probability of A(x, r) = 1 is smaller than 1/16. (The definition remains unchanged with any constant < 1 that we choose. The constant 1/16 was chosen for notational convenience.) Note again that r is a uniformly chosen random string of k bits, with k polynomially dependent on the length |x| of the input x. In this case we say that L⊆{0, 1}∗ has a (1-sided error) randomized polynomial time membership algorithm. Problem 1.7 (Saving Random Bits). Assume that L⊆{0, 1}∗ has a (1-sided error) randomized polynomial time membership algorithm. How many random bits are needed in order to reduce the probability of error to be ≤ ? (Note that we seek a bound that should apply to every input.) 1.2. Magical graphs. In the previous section we presented three seemingly unrelated problems. We now introduce a new object: a “Magical Graph” that will enable us to solve all these problems. This object exhibits an “expansion” property (a “combinatorial isoperimetric inequality”) to fit our three applications. Definition 1.8 (Magical Graph). Let G = (L, R, E) be a bipartite graph. The vertex set consists of L and R, two disjoint subsets, henceforth the left and right vertex sets. We say that G is an (n,m; d)-magical graph if |L| = n, |R| = m, and every left vertex has d neighbors and the following two properties hold (where Γ(S) denotes the set of neighbors of a set S in G): (1) |Γ(S)| ≥ 5d 8 · |S| for every S ⊆ L with |S| ≤ n 10d . (2) |Γ(S)|≥|S| for every S ⊆ L with n 10d < |S| ≤ n 2 . As observed by Pinsker [Pin73] (for other but related expansion properties), such graphs exist. The proof is by a probabilistic argument and it implies that, in fact, most graphs are magical. Lemma 1.9. There exists a constant n0 such that for every d ≥ 32 and n ≥ n0, m ≥ 3n/4 there exists an (n,m; d)-magical graph. Proof. Let G be a random bipartite graph with n vertices on the left and m vertices on the right, where each left vertex connects to a randomly chosen set of d vertices on the right. We claim that with high probability G is a magical graph. We start by proving that the first property holds with high probability. Let S ⊆ L have cardinality s = |S| ≤ n 10d , and let T ⊆ R have cardinality t = |T| < 5ds 8 . Let XS,T be an indicator random variable for the event that all the edges from S go to T. It is clear that if XS,T = 0, where the sum is over all choices of S and T as above, then the first property holds. The probability of the event XS,T is (t/m)sd, and therefore using a union bound and the inequality

448 SHLOMO HOORY, NATHAN LINIAL, AND AVI WIGDERSON n k ≤ (ne/k)k, we get that: Pr[ S,T XS,T > 0] ≤ S,T Pr[XS,T = 1] = S,T (t/m) sd ≤ n/ 10d s=1 n s m 5ds/8 5ds 8m sd ≤ n/ 10d s=1 ne s s 8me 5ds 5ds/8 · 5ds 8m sd 0] ≤ S,T Pr[YS,T = 1] = S,T (t/n) sd ≤ n/2 s=n/10d n s m s (s/m) sd ≤ n/2 s=1 (ne/s) · (me/s) · (s/m) d s < 1/10. As before, the last inequality follows by noting that for all s the quantity in square brackets is bounded by 10−4. Therefore, most graphs are (n,m; d)-magical graphs. We now turn to the solution of the three problems presented above. Note that Lemma 1.9 is existential, whereas we need explicit constructions of magical graphs to resolve our three problems. The issue of explicit constructions is an important aspect of this field and of this article, but at present we show how to solve these problems using the existence of magic graphs as a “black box”. 1.3. The three solutions. 1.3.1. A super concentrator with O(n) edges. We will see how magical graphs allow us to construct super concentrators. These graphs exhibit incredibly high connectivity despite the fact that they have only O(n) edges. There is a long and still ongoing search for super concentrators with n input and output vertices and Kn edges with K as small as possible. This “sport” has motivated quite a few important advances in this area. The current “world record” holders are Alon and Capalbo [AC04]. If G is an (n, 3n/4; d)-magical graph, then |Γ(S)|≥|S| for every S ⊂ L with |S| ≤ n 2 . By Hall’s marriage theorem (e.g., [Die97, Theorem 2.1.2]), for every S ⊆ L of size |S| ≤ n 2 there is a perfect matching from S to Γ(S). We use this fact to recursively construct a super concentrator C with n vertices on each side. For n below n0, simply observe that a complete bipartite graph is a super concentrator with n2 edges. For n ≥ n0 we construct a super concentrator C with n inputs and outputs, using three building blocks: (i) Two copies G1 = (L1, R1, E1) and G2 = (L2, R2, E2)