香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 7 Decision Tree Complexity and Fourier analysis

团购合买资源类别：文库，文档格式：DOCX，文档页数：7，文件大小：155.97KB

Lecture 7.Decision Tree Complexity and Fourier analysis 1.Decision Tree Complexity Recall that in the first lecture,we introduced decision tree as a model to compute a function.Basically, we make a sequence of queries of the form "xi=?"and finally output the answer f(x).The (deterministic)decision tree complexity,or query complexity,ofa functionf,is the minimum number of queries a decision tree needs to make in order to compute f.It's interesting that some functions have sublinear query complexity,namely one can always get to know f(x)long before learningx itself.We've seen a couple of such examples.Let's see one more. Example:Sink problem.We are given a directed graph,and we wish to know whether the graph has a"sink"vertex,which has out-degree n-1 and in-degree 0.(Namely,everyone else points to it,but it doesn't point to anyone.)What we can ask is for any ordered pair(i j),whether there is an edge from i to j.What's the minimum number of queries we need? We can also consider randomized decision trees,which can make random choices during the computation.Two types ofrequirement for correctness are used: Monte Carlo:On any input,the algorithm finishes in at most g queries and is correct with probability 1-e. Las Vegas:On any input,the algorithm is correct with probability 1,and the expected the number of queries is at most g. The randomized query complexity is thus the minimum number g of queries needed.Denote by Re(f)and R(f)the randomized query complexity offin the Monte Carlo and Las Vegas models, respectively.Similar as in the communication complexity case,for any distribution u on input set {0,1)",we can define the u-distributional query complexity De(f)as the minimum number of queries needed for any deterministic query algorithm which has error probability,averaged over x u,at most e.Also define D4(f)as the minimum expected (over x-u)number of queries by any deterministic query algorithm that computes fcorrectly on all inputs.By Yao's minmax principle,the following holds. Theorem 1.1.Re(f)=maxuDe (f),R(f)=maxuD4(f)

Lecture 7. Decision Tree Complexity and Fourier analysis 1. Decision Tree Complexity Recall that in the first lecture, we introduced decision tree as a model to compute a function. Basically, we make a sequence of queries of the form "𝑥𝑖 =? " and finally output the answer 𝑓(𝑥). The (deterministic) decision tree complexity, or query complexity, of a function f, is the minimum number of queries a decision tree needs to make in order to compute f. It’s interesting that some functions have sublinear query complexity, namely one can always get to know 𝑓(𝑥) long before learning x itself. We’ve seen a couple of such examples. Let’s see one more. Example: Sink problem. We are given a directed graph, and we wish to know whether the graph has a “sink” vertex, which has out-degree n-1 and in-degree 0. (Namely, everyone else points to it, but it doesn’t point to anyone.) What we can ask is for any ordered pair (i,j), whether there is an edge from i to j. What’s the minimum number of queries we need? We can also consider randomized decision trees, which can make random choices during the computation. Two types of requirement for correctness are used: ⚫ Monte Carlo: On any input, the algorithm finishes in at most q queries and is correct with probability 1 − ϵ. ⚫ Las Vegas: On any input, the algorithm is correct with probability 1, and the expected the number of queries is at most q. The randomized query complexity is thus the minimum number q of queries needed. Denote by 𝑅𝜖(𝑓) and 𝑅(𝑓) the randomized query complexity of f in the Monte Carlo and Las Vegas models, respectively. Similar as in the communication complexity case, for any distribution 𝜇 on input set {0,1} 𝑛, we can define the 𝜇-distributional query complexity 𝐷𝜖 𝜇 (𝑓) as the minimum number of queries needed for any deterministic query algorithm which has error probability, averaged over 𝑥 ← 𝜇, at most 𝜖. Also define 𝐷𝜇(𝑓) as the minimum expected (over 𝑥 ← 𝜇) number of queries by any deterministic query algorithm that computes f correctly on all inputs. By Yao’s minmax principle, the following holds. Theorem 1.1. 𝑅𝜖 (𝑓) = max𝜇𝐷𝜖 𝜇 (𝑓), 𝑅(𝑓) = max𝜇𝐷𝜇(𝑓)

While some functions enjoy small decision tree complexity,many don't.As always in complexity theories,we hope to understand what functions have small decision tree complexity. An important class of functions are those with symmetries.For example,consider ● AND(x)=x1A...Axm ●0R(x)=x1V.Vxn ●Thk(x)=1ifx1+…+xn≥k,and Thy(x)=0 otherwise. These functions are"very symmetric".Actually they depend only on the number of 1's in {x1,..,n--it's not important which variables are I's. Some other functions are"less symmetric".For example,consider all graph properties---functions on raphsthtonabftheveti.Thasthenutothe functionhas variables,indicating whether each pair (i,)is an edge in the graph.The function depends only on the graph structure,not the name of vertices.Formally,fis invariant to vertex permutations.If we view the input G as an n x n matrix AG(which is symmetric with all diagonal entries being0),and simultaneously perform a permutation to rows and to columns,then f(x)remains unchanged. Examples include the functions that we are familiar,such as ● Hamiltonian Cycle:Does the graph have a Hamiltonian cycle? ● k-Clique:Does the graph have a clique ofsize k? ● Connectivity:Is the graph connected? ● Bipartiteness:Is the graph a bipartite graph? ● Matching:Does the graph have a perfect matching? ● Much attention has been drawn on the effect of symmetry on the decision tree complexity.Aanderaa nettallappropertieshavescmlexitynmenastoak all variables.Functions fon N variables with D(f)=N are called evasive.It was soon found by Rosenberg that this is not correct.Consider the Sink problem at the beginning of this lecture.(Well, technically speaking,Sink is a directed graph property.But there is also an analog called Scorpion, which is indeed a graph property,and the decision tree complexity is 0(n).)So Rosenberg added one more condition:monotonicity.This leads to the most famous conjecture in decision tree complexity: Aanderaa-Rosenberg Conjecture:All monotone graph properties are evasive

While some functions enjoy small decision tree complexity, many don’t. As always in complexity theories, we hope to understand what functions have small decision tree complexity. An important class of functions are those with symmetries. For example, consider ⚫ AND(𝑥) = 𝑥1 ∧ …∧ 𝑥𝑛. ⚫ OR(𝑥) = 𝑥1 ∨… ∨ 𝑥𝑛. ⚫ Th𝑘 (𝑥) = 1 if 𝑥1 + ⋯ + 𝑥𝑛 ≥ 𝑘, and Th𝑘 (𝑥) = 0 otherwise. These functions are “very symmetric”. Actually they depend only on the number of 1’s in {𝑥1,… ,𝑥𝑛}---it’s not important which variables are 1’s. Some other functions are “less symmetric”. For example, consider all graph properties---functions on graphs that don’t depend on the labeling of the vertices. That is, the input to the function has ( 𝑛 2 ) variables, indicating whether each pair (𝑖, 𝑗) is an edge in the graph. The function depends only on the graph structure, not the name of vertices. Formally, f is invariant to vertex permutations. If we view the input G as an 𝑛 × 𝑛 matrix AG (which is symmetric with all diagonal entries being 0), and simultaneously perform a permutation to rows and to columns, then 𝑓(𝑥) remains unchanged. Examples include the functions that we are familiar, such as ⚫ Hamiltonian Cycle: Does the graph have a Hamiltonian cycle? ⚫ k-Clique: Does the graph have a clique of size k? ⚫ Connectivity: Is the graph connected? ⚫ Bipartiteness: Is the graph a bipartite graph? ⚫ Matching: Does the graph have a perfect matching? ⚫ … Much attention has been drawn on the effect of symmetry on the decision tree complexity. Aanderaa once conjectured that all graph properties have decision tree complexity ( 𝑛 2 ), namely one has to ask all variables. Functions f on N variables with 𝐷(𝑓) = 𝑁 are called evasive. It was soon found by Rosenberg that this is not correct. Consider the Sink problem at the beginning of this lecture. (Well, technically speaking, Sink is a directed graph property. But there is also an analog called Scorpion, which is indeed a graph property, and the decision tree complexity is 𝑂(𝑛).) So Rosenberg added one more condition: monotonicity. This leads to the most famous conjecture in decision tree complexity: Aanderaa-Rosenberg Conjecture: All monotone graph properties are evasive

While this conjecture was not fully solved, it was known that such functions need 𝛺(𝑁) queries. It can also be proved that 𝐷(𝑓) = 𝑁 if the input size N is a prime power [RV76]. See [LY94] for a survey of evasiveness. However, things become much more complicated when it comes to randomized query complexity. Karp modified the conjecture to the following. Aanderaa-Karp-Rosenberg Conjecture: All monotone graph properties f of N variables have 𝑅(𝑓) = 𝛺(𝑁). This conjecture was still wide open despite many attacks by first-tier researchers. It’s considered to be one of the few most notorious conjectures in theoretical computer science. The best lower bound that we have is 𝑅(𝑓) = Ω̃(𝑁2/3 ). (Notation: Tilde hides a log factor.) There are two proofs of this fact. One is by graph packing, initialized by Yao [Yao87] and improved by King [Kin88], Hajnal [Haj91], Chakrabarti -Khot [CK01]. The other approach is by Fourier analysis.Both are very beautiful, but the second one is simpler, which we’ll talk about. First we’ll need some basic knowledge of Fourier analysis on Boolean functions. 2. Fourier analysis There are two common ways to represent Boolean values: {0,1} or {+1, −1}. To emphasize the difference, we will use x and f for {0,1} and y and g for {+1,−1}. For each 𝑠 ∈ {0,1} 𝑛, define 𝜒𝑠 by 𝜒𝑠 (𝑥) = (−1) 𝑠⋅𝑥 (or equivalently 𝜒𝑠 (𝑦) = ∏𝑖∈𝑆𝑦𝑖 .) The following fact is easy to verify. Fact 2.1. {𝜒𝑠 :𝑠 ∈ {0,1} 𝑛} is an orthonormal basis of the space {ℎ:{0,1} 𝑛 → 𝑅} under the inner product 〈h, h ′〉 = Ex [ℎ(𝑥)ℎ ′(𝑥)]. By the above fact, any real function h on {0,1} 𝑛 can be expanded in this basis. We can write it as ℎ = ∑𝑠 ℎ̂(𝑠)𝜒𝑠 , where the coefficients ℎ̂(𝑠) are called the Fourier coefficients. The transform from h to ℎ̂ is the Fourier transform. A basic property is about the transform is that it keeps the ℓ2-norm. Fact 2.2. (Parseval’s formula) Ex [ℎ(𝑥) 2] = ∑ ℎ̂(𝑠) 2 s . Thus for Boolean functions 𝑔 of range {+1,−1}, we have ∑ 𝑔̂(𝑠) 2 s = 1. So {𝑔̂(𝑠) 2 } can be viewed as a probability distribution over {0,1} 𝑛

3. Aanderaa-Karp-Rosenberg Conjecture Now we can prove the theorem. Theorem 3.1. 𝑅(𝑓) = 𝛺(𝑁2/3 ) for all monotone graph properties f. The proof below is so short that it fits in one page. (You’d appreciate the proof more if you’ve ever tried along the approaches of read the original proofs by graph packing…) The lower bound is obtained by combining two inequalities: 𝑅(𝑓) ≥ Var[𝑓] 𝑝𝑞 ⋅ max𝑖 Inf𝑖 (𝑓, 𝑝) (1) and 𝑅(𝑓) ≥ 𝑝𝑞 ⋅ Inf(𝑓,𝑝) 2 (2) 3.1 Eq.(1) The original proof was given in [OSSS05]. Here we present a simplified proof given in [JZ11]. Take a best deterministic algorithm A with k queries and 𝜇𝑝-distributional error at most 𝛿, and we’ll construct another algorithm A’ with 𝑘 − 1 queries and 𝜇𝑝-distributional error at most 𝛿 + 𝑝𝑞 ⋅ max i Infi(𝑓,𝑝). Then repeat this k times and we’ll get an algorithm making no query and has error 𝛿 + 𝑘𝑝𝑞 ⋅ max i Infi(𝑓,𝑝). But an algorithm without any query can do nothing but guessing the most likely 𝑓(𝑥); the error is Θ(Var[𝑓]), and we then obtained the bound. A’ is very simple. Suppose A’s first query is 𝑥𝑖 , then A’ is the same as A except that A’ doesn’t make this query. Instead, A’ draws 𝑥𝑖 ′ ← 𝜇𝑝 as a guess of 𝑥𝑖 . With probability 𝑝 2 + 𝑞 2 = 1 − 2𝑝𝑞, 𝑥𝑖 ′ = 𝑥𝑖 , great: A’ guessed correctly. With probability 2𝑝𝑞, 𝑥𝑖 ′ ≠ 𝑥𝑖 , bad. But how bad is it? The newly introduced error probability is at most Infi(𝑓,𝑝) by definition of influence! Thus we are done. (The above algorithm is randomized. An averaging argument gives a deterministic one.) □ 3.2 Eq (2) For simplicity, we prove the special case of 𝑝 = 1/2; the general case is similar. By Yao's minimax principle, it's enough to argue that for any fixed deterministic query algorithm, the expected number of queries is at least |Inf(𝑔)| = | ∑𝑖𝑔̂(𝑖) | (equality because of Fact 2.). Note that any decision tree partitions {+1,−1} 𝑛 into monochromatic subcubes {C}. A key (and very simple) observation is that the following two random processes give the same distribution of (𝑦,𝐶): 1) Draw 𝑦 ∈ {+1,−1} 𝑛 uniformly at random, and then let C be the cube containing y. 2) Draw C with probability 2 𝑑𝑖𝑚(𝐶)−𝑛

点击进入文档下载页（DOCX格式）

已到末页，全文结束

点击下载（DOCX格式）

浏览记录