Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity Yitong Yin* Department of Computer Science,Yale University yitong.yin@yale.edu. Abstract.We study the nondeterministic cell-probe complexity of static data structures.We introduce cell-probe proofs (CPP),a proof system for the cell-probe model,which describes verifications instead of compu- tations in the cell-probe model.We present a combinatorial characteri- zation of CPP.With this novel tool,we prove the following lower bounds for the nondeterministic cell-probe complexity of static data structures: We show that there exist data structure problems which have super- constant nondeterministic cell-probe complexity.In particular,we show that for the exact nearest neighbor search(NNS)problem or the partial match problem in high dimensional Hamming space,there does not exist a static data structure with Poly(n)cells,each of which contains n(1)bits,such that the nondeterministic cell-probe complexity is O(1),where n is the number of points in the data set for the NNS or partial match problem. For the polynomial evaluation problem,if single-cell nondeterminis- tic probes are sufficient,then either the size of a single cell is close to the size of the whole polynomial,or the total size of the data structure is close to that of a naive data structure that stores results for all possible queries. 1 Introduction We study the problem of nondeterministic cell-probe complexity of static data structures. Given a set Y of data instances,and a set X of possible queries,a data structure problem can be abstractly defined as a function f mapping each pair consisting of a query r E X and a data instance y E Y to an answer.One of the most well-studied examples of data structure problems is the "membership query":X=[m]is a data universe,Y=(m),and f(,y)=1 if ey and f(,y)=0 if otherwise. There are some other important examples of data structure problems: Exact nearest neighbor search (NNS):given a metric space U,let X =U and Y =()and for every zEX and yEy,f(z,y)is defined as the closest point to x in y according to the metric. Supported by a Kempner Foundation Fellowship and NSF grant CNS-0435201
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity Yitong Yin? Department of Computer Science, Yale University yitong.yin@yale.edu. Abstract. We study the nondeterministic cell-probe complexity of static data structures. We introduce cell-probe proofs (CPP), a proof system for the cell-probe model, which describes verifications instead of computations in the cell-probe model. We present a combinatorial characterization of CPP. With this novel tool, we prove the following lower bounds for the nondeterministic cell-probe complexity of static data structures: – We show that there exist data structure problems which have superconstant nondeterministic cell-probe complexity. In particular, we show that for the exact nearest neighbor search (NNS) problem or the partial match problem in high dimensional Hamming space, there does not exist a static data structure with Poly(n) cells, each of which contains n o(1) bits, such that the nondeterministic cell-probe complexity is O(1), where n is the number of points in the data set for the NNS or partial match problem. – For the polynomial evaluation problem, if single-cell nondeterministic probes are sufficient, then either the size of a single cell is close to the size of the whole polynomial, or the total size of the data structure is close to that of a naive data structure that stores results for all possible queries. 1 Introduction We study the problem of nondeterministic cell-probe complexity of static data structures. Given a set Y of data instances, and a set X of possible queries, a data structure problem can be abstractly defined as a function f mapping each pair consisting of a query x ∈ X and a data instance y ∈ Y to an answer. One of the most well-studied examples of data structure problems is the “membership query”: X = [m] is a data universe, Y = [m] n , and f(x, y) = 1 if x ∈ y and f(x, y) = 0 if otherwise. There are some other important examples of data structure problems: Exact nearest neighbor search (NNS): given a metric space U, let X = U and Y = U n , and for every x ∈ X and y ∈ Y , f(x, y) is defined as the closest point to x in y according to the metric. ? Supported by a Kempner Foundation Fellowship and NSF grant CNS-0435201
Yitong Yin Partial match:=(01,)4,Y=((0)),and f(,y)e11}such that for every z∈X and y∈Y,f(x,y)=1 if and only if there exists z∈y having either xi =zi or i=*for every i. Polynomial evaluation:X=2k is a finite field,Y=2kd is the set of all (d-1)-degree polynomials over the finite field 2,and f(x,y)returns the value of y(r). A classic computational model for static data structures is the cell-probe model [11].For each data instance y,a table of cells is constructed to store y. This table is called a static data structure for some problem f.Upon a query an all-powerful algorithm tries to compute f(r,y),based on adaptive random access (probes)to the cells. The cell-probe model is a clean and general model for static data structures and serves as a great tool for the study of lower bounds.Previous research on static data structures in the cell-probe model has focused on the complexity of adaptive cell-probes.In this work.we focus on the complexity of nondeterministic cell-probes and the tradeoff between the number of probes needed and with space.We speculate that it is an important problem because:(1)in considering the complexity of data structures,nondeterminism is a very natural extension to the cell-probe model;(2)instead of adaptive computations,nondeterministic cell-probes capture the question of verification,which is a natural and important aspect of data structures. Although nondeterministic cell-probe complexity is an important problem, there are few general tools and techniques for studying it,especially for the case of static data structures.In fact,because of the great generality of the cell- probe model,even for deterministic cell-probe complexity,super-constant lower bounds for static data structures are rare.Nondeterminism grants the cell-probe model extra power and makes non-trivial lower bounds even rarer.For many standard examples of data structure problems,such as membership query,it is easy to construct a data structure that has standard space usage and constant nondeterministic cell-probe complexity. It is thus worth asking whether there exists any data structure problem such that in data structures with feasible sizes (polynomial in the size of data set), the nondeterministic cell-probe complexity is super-constant.More importantly, it calls for a general technique to prove lower bounds on the nondeterministic cell-probe complexity of static data structures. 1.1 Our contribution In this paper,we initiate the study of nondeterministic cell-probe complexity for static data structures.As a first step,we characterize the power of a single- cell nondeterministic probe.Although at first glance this may seem like a very restricted case,by applying a trivial parameter reduction,we show that the case of a single-cell probe is actually a canonical case for all nondeterministic cell- probe mechanisms,and is thus sufficient to prove super-constant lower bounds for general nondeterministic cell-probe mechanisms
2 Yitong Yin Partial match: X = {0, 1, ∗}d , Y = {0,1} d n , and f(x, y) ∈ {0, 1} such that for every x ∈ X and y ∈ Y , f(x, y) = 1 if and only if there exists z ∈ y having either xi = zi or xi = ∗ for every i. Polynomial evaluation: X = 2k is a finite field, Y = 2kd is the set of all (d − 1)-degree polynomials over the finite field 2k , and f(x, y) returns the value of y(x). A classic computational model for static data structures is the cell-probe model [11]. For each data instance y, a table of cells is constructed to store y. This table is called a static data structure for some problem f. Upon a query x, an all-powerful algorithm tries to compute f(x, y), based on adaptive random access (probes) to the cells. The cell-probe model is a clean and general model for static data structures and serves as a great tool for the study of lower bounds. Previous research on static data structures in the cell-probe model has focused on the complexity of adaptive cell-probes. In this work, we focus on the complexity of nondeterministic cell-probes and the tradeoff between the number of probes needed and with space. We speculate that it is an important problem because: (1) in considering the complexity of data structures, nondeterminism is a very natural extension to the cell-probe model; (2) instead of adaptive computations, nondeterministic cell-probes capture the question of verification, which is a natural and important aspect of data structures. Although nondeterministic cell-probe complexity is an important problem, there are few general tools and techniques for studying it, especially for the case of static data structures. In fact, because of the great generality of the cellprobe model, even for deterministic cell-probe complexity, super-constant lower bounds for static data structures are rare. Nondeterminism grants the cell-probe model extra power and makes non-trivial lower bounds even rarer. For many standard examples of data structure problems, such as membership query, it is easy to construct a data structure that has standard space usage and constant nondeterministic cell-probe complexity. It is thus worth asking whether there exists any data structure problem such that in data structures with feasible sizes (polynomial in the size of data set), the nondeterministic cell-probe complexity is super-constant. More importantly, it calls for a general technique to prove lower bounds on the nondeterministic cell-probe complexity of static data structures. 1.1 Our contribution In this paper, we initiate the study of nondeterministic cell-probe complexity for static data structures. As a first step, we characterize the power of a singlecell nondeterministic probe. Although at first glance this may seem like a very restricted case, by applying a trivial parameter reduction, we show that the case of a single-cell probe is actually a canonical case for all nondeterministic cellprobe mechanisms, and is thus sufficient to prove super-constant lower bounds for general nondeterministic cell-probe mechanisms
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 3 We introduce cell-probe proofs.a proof system in the cell-probe model.This notion of proofs corresponds to considering verifications instead of computations in the cell-probe model.Unlike the fully adaptive computations in the traditional cell-probe model,the formulation of cell-probe proofs shows a combinatorial simplicity.We introduce a combinatorial structure that fully characterizes which problems have single-cell proofs,and general cell-probe proofs are reduced to this case. With these novel tools,we show following lower bounds on nondeterministic cell-probe complexity: We show that there exist static data structure problems with super-constant nondeterministic cell-probe complexity.In particular,we show that for the exact nearest neighbor search(NNS)problem or partial match problem in high dimensional Hamming space,there does not exist a static data structure with Poly(n)cells,each of which contains n()bits,such that the nonde- terministic cell-probe complexity is O(1),where n is the number of points in the data set for the NNS or partial match problem. -For the polynomial evaluation problem,if for a static data structure,the single-cell nondeterministic probes are sufficient to answer queries,then ei- ther the size of the single cell is close to the size of the whole polynomial,or the total size of the data structure is close to that of the naive data structure that stores results for all possible queries. 1.2 Related work To the best of our knowledge,there is no general technique for proving lower bounds for nondeterministic cell-probe complexity of static data structures.Nor do there exist any non-trivial lower bounds for this question.Previous work on static data structures in the cell-probe model have focused on the complexity of adaptive cell-probes.The most important tool for proving such lower bounds is asymmetric communication complexity as introduced by Miltersen et al.in [10]. In [6],Fredman and Saks introduce the chronogram method.This powerful technique is specialized for proving the query /update trade-off for dynamic data structures,especially for the problems which are hard only in the dynamic case. It is worth noting that the chronogram method can prove nondeterministic lower bounds for certain dynamic data structure problems.This is formally addressed by Husfeldt and Rauhe in [7],and recently by Demaine and Patrascu in [5]. However,as pointed in [7],this is only a by-product of the nondeterministic nature of chronogram method and can only yield amortized query/update trade- offs for dynamic data structure problems with a certain property.Because of the unique structure of the chronogram method,this technique can not be utilized to prove lower bounds for static data structures. 2 Cell-probe proofs A static data structure problem or just data structure problem,is rep resented as a boolean function f:Xx Y10,1}.For the purposes of proving
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 3 We introduce cell-probe proofs, a proof system in the cell-probe model. This notion of proofs corresponds to considering verifications instead of computations in the cell-probe model. Unlike the fully adaptive computations in the traditional cell-probe model, the formulation of cell-probe proofs shows a combinatorial simplicity. We introduce a combinatorial structure that fully characterizes which problems have single-cell proofs, and general cell-probe proofs are reduced to this case. With these novel tools, we show following lower bounds on nondeterministic cell-probe complexity: – We show that there exist static data structure problems with super-constant nondeterministic cell-probe complexity. In particular, we show that for the exact nearest neighbor search (NNS) problem or partial match problem in high dimensional Hamming space, there does not exist a static data structure with Poly(n) cells, each of which contains n o(1) bits, such that the nondeterministic cell-probe complexity is O(1), where n is the number of points in the data set for the NNS or partial match problem. – For the polynomial evaluation problem, if for a static data structure, the single-cell nondeterministic probes are sufficient to answer queries, then either the size of the single cell is close to the size of the whole polynomial, or the total size of the data structure is close to that of the naive data structure that stores results for all possible queries. 1.2 Related work To the best of our knowledge, there is no general technique for proving lower bounds for nondeterministic cell-probe complexity of static data structures. Nor do there exist any non-trivial lower bounds for this question. Previous work on static data structures in the cell-probe model have focused on the complexity of adaptive cell-probes. The most important tool for proving such lower bounds is asymmetric communication complexity as introduced by Miltersen et al. in [10]. In [6], Fredman and Saks introduce the chronogram method. This powerful technique is specialized for proving the query/update trade-off for dynamic data structures, especially for the problems which are hard only in the dynamic case. It is worth noting that the chronogram method can prove nondeterministic lower bounds for certain dynamic data structure problems. This is formally addressed by Husfeldt and Rauhe in [7], and recently by Demaine and P˘atra¸scu in [5]. However, as pointed in [7], this is only a by-product of the nondeterministic nature of chronogram method and can only yield amortized query/update tradeoffs for dynamic data structure problems with a certain property. Because of the unique structure of the chronogram method, this technique can not be utilized to prove lower bounds for static data structures. 2 Cell-probe proofs A static data structure problem or just data structure problem, is represented as a boolean function f : X × Y → {0, 1}. For the purposes of proving
Yitong Yin lower bounds,we only consider decision problems.We refer to each y E Y as data and each rE X as a query.For each pair of r and y,f(,y)specifies the result of the query r to the data structure that represents the data y. In the cell-probe model (c.f.[11,6]),the data instance y is preprocessed and stored in cells,and for each query z,the value of f(r,y)is decided by adaptive probes to the cells.Formally,a cell-probe scheme consists of a table structure and a query algorithm.The table structure T:Y×I一{0,l}6 specifies a table T:I{0,1]for each data instance y,which maps indices of cells to their contents.Given a query r,the query algorithm makes a sequence of probes i1,i2,...to the cells,where ik depends on x and all previous cell probes (i1,T(i)),(i2,T(i2)),...,(ik-1,T(ik-1)).The value of f(r,y)is decided at last based on the collected information. In this work,we focus on nondeterministic cell-probes.Given a query z to a data instance y,a set of t cells i1,i2,...,it are probed nondeterministi- cally,such that the value of f(,y)is decided based on the probed information (i1,Ty(i1)》,(2,Ty(i2)》,,(it,Ty(it)》. In order to formally characterize nondeterministic cell-probes for data struc- tures,we introduce a new concept,cell-probe proofs,which formalizes the notion of proofs and verifications in the cell-probe model.For a specific data structure problem f,a cell-probe proof system(CPP)may be defined for f as described below. We can think of a cell-probe proof system as a game played between an honest verifier and an untrusted prover.Both of them have unlimited computa- tional power.Given an instance of data,a table of cells is honestly constructed according to the rules known to both prover and verifier.Both the prover and the verifier know the query,but only the prover can observe the whole table and thus knows the data.The prover tries to convince the verifier about the result of the query to the data by revealing certain cells.After observing the revealed cells,the verifier either decides the correct answer,or rejects the proof,but can not be tricked by the prover into returning a wrong answer. Formally,a cell-probe proof system (CPP)consists of three parts: -A table structure T:Y×I→{0,l}b.For any data,a table T:I→{0,1b is a mapping from indices of cells to their contents. -A prover P.For every x and y,Pry CI is a set of cells.We refer to Pry as a proof and {(i,T(i))|iE Pry}as a certificate. -A verifier v,which maps the queries with the certificates to the answers (0,1,L).Given an instance of data y,for any query z,both of the following conditions hold: (Completeness)3Py≤I:v(a,{i,Tg(i)》li∈Pxy})=f(x,y),and (Soundnes)P'cI:e,{k,T,(》lieP')={ 了f(x,) An (s,b,t)-CPP is a CPP such that for every x and y:(1)the table has s cells,i.e.I=s;(2)each cell contains b bits;(3)each proof consists of t cell probes,i.e.Prul =t
4 Yitong Yin lower bounds, we only consider decision problems. We refer to each y ∈ Y as data and each x ∈ X as a query. For each pair of x and y, f(x, y) specifies the result of the query x to the data structure that represents the data y. In the cell-probe model (c.f. [11, 6]), the data instance y is preprocessed and stored in cells, and for each query x, the value of f(x, y) is decided by adaptive probes to the cells. Formally, a cell-probe scheme consists of a table structure and a query algorithm. The table structure T : Y × I → {0, 1} b specifies a table Ty : I → {0, 1} b for each data instance y, which maps indices of cells to their contents. Given a query x, the query algorithm makes a sequence of probes i1, i2, . . . to the cells, where ik depends on x and all previous cell probes hi1, Ty(i1)i,hi2, Ty(i2)i, . . . ,hik−1, Ty(ik−1)i. The value of f(x, y) is decided at last based on the collected information. In this work, we focus on nondeterministic cell-probes. Given a query x to a data instance y, a set of t cells i1, i2, . . . , it are probed nondeterministically, such that the value of f(x, y) is decided based on the probed information hi1, Ty(i1)i,hi2, Ty(i2)i, . . . ,hit, Ty(it)i. In order to formally characterize nondeterministic cell-probes for data structures, we introduce a new concept, cell-probe proofs, which formalizes the notion of proofs and verifications in the cell-probe model. For a specific data structure problem f, a cell-probe proof system (CPP) may be defined for f as described below. We can think of a cell-probe proof system as a game played between an honest verifier and an untrusted prover. Both of them have unlimited computational power. Given an instance of data, a table of cells is honestly constructed according to the rules known to both prover and verifier. Both the prover and the verifier know the query, but only the prover can observe the whole table and thus knows the data. The prover tries to convince the verifier about the result of the query to the data by revealing certain cells. After observing the revealed cells, the verifier either decides the correct answer, or rejects the proof, but can not be tricked by the prover into returning a wrong answer. Formally, a cell-probe proof system (CPP) consists of three parts: – A table structure T : Y ×I → {0, 1} b . For any data y, a table Ty : I → {0, 1} b is a mapping from indices of cells to their contents. – A prover P. For every x and y, Pxy ⊆ I is a set of cells. We refer to Pxy as a proof and {hi, Ty(i)i | i ∈ Pxy} as a certificate. – A verifier v, which maps the queries with the certificates to the answers {0, 1, ⊥}. Given an instance of data y, for any query x, both of the following conditions hold: (Completeness) ∃Pxy ⊆ I : v(x, {hi, Ty(i)i | i ∈ Pxy}) = f(x, y), and (Soundness) ∀P 0 ⊆ I : v(x, {hi, Ty(i)i | i ∈ P 0}) = f(x, y) ⊥ . An (s, b, t)-CPP is a CPP such that for every x and y: (1) the table has s cells, i.e. |I| = s; (2) each cell contains b bits; (3) each proof consists of t cell probes, i.e. |Pxy| = t
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 5 Erample:For the membership problem[11],wherex=[m]and Y=(),and f(x,y)=1 if and only if ry,a naive construction shows a 2-cell proof:with a sorted table storing y,if x Ey,the proof is the cell that contains z,if y,the proof consists of two consecutive cells which are the predecessor and successor of z.The same CPP also works for predecessor search[2]. The notion of cell-probe proofs captures the necessary information to answer queries,and characterizes the nondeterministic probes in the cell-probe model. It is natural to see that for a cell-probe scheme,for any query,the cells probed by an adaptive algorithm contain a cell-probe proof.This can be seen as a data structure counterpart of P C NP. It is important to note that although a data structure problem is nothing but a boolean function,CPP is very different from the certificate complexity of boolean functions [4.In CPP,the prover and the verifier communicate with each other via a table structure,which distinguishes CPP from standard certificate complexity.For any data structure problem,the table structure can always store the results for all queries,making one cell-probe sufficient to prove the result, which is generally impossible in the model of certificate complexity. Unlike adaptive cell-probes,CPP has a static nature,which is convenient for reductions.As stated by the following lemma.any CPP can be trivially reduced to 1-cell proofs. Lemma 1 (reduction lemma).For any data structure problem f,if there erists an (s,b,t)-CPP,then there erists an (st,bt,1)-CPP. Proof.Just store every t-tuple of cells in the (s,b,t)-CPP as a new cell in the (s,bt,1)-CPP. 0 3 Characterization of CPPs We now introduce a combinatorial characterization of CPP.Given a set system Fs2r,for any y∈Y,we let F(y)={F∈F|y∈F}.For convenience,for a partition P of Y,we abuse this notation and let P(y)denote the set FP that y∈F Definition 1.We say a set system FC2 is an s x k-partition of y,if F is a union of s number of partitions of Y,where the cardinality of each partition is at most k. This particular notion of partitions of Y fully captures the structure of cell- probe proofs.In this extended abstract,we only provide the characterization of 1-cell proofs.As shown by Lemma 1,this is a canonical case for cell-probe proofs. Theorem 1.There is an(s,b,l)-CPP for f:X×Y→{0,l},if and only if there exists an s x 2b-partition F of Y,such that for every xE X and every y∈Y,there is an F∈F(y)that f(x,F)l=l
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 5 Example: For the membership problem[11], where X = [m] and Y = [m] n , and f(x, y) = 1 if and only if x ∈ y, a naive construction shows a 2-cell proof: with a sorted table storing y, if x ∈ y, the proof is the cell that contains x, if x 6∈ y, the proof consists of two consecutive cells which are the predecessor and successor of x. The same CPP also works for predecessor search[2]. The notion of cell-probe proofs captures the necessary information to answer queries, and characterizes the nondeterministic probes in the cell-probe model. It is natural to see that for a cell-probe scheme, for any query, the cells probed by an adaptive algorithm contain a cell-probe proof. This can be seen as a data structure counterpart of P ⊆ NP. It is important to note that although a data structure problem is nothing but a boolean function, CPP is very different from the certificate complexity of boolean functions [4]. In CPP, the prover and the verifier communicate with each other via a table structure, which distinguishes CPP from standard certificate complexity. For any data structure problem, the table structure can always store the results for all queries, making one cell-probe sufficient to prove the result, which is generally impossible in the model of certificate complexity. Unlike adaptive cell-probes, CPP has a static nature, which is convenient for reductions. As stated by the following lemma, any CPP can be trivially reduced to 1-cell proofs. Lemma 1 (reduction lemma). For any data structure problem f, if there exists an (s, b, t)-CPP, then there exists an (s t , bt, 1)-CPP. Proof. Just store every t-tuple of cells in the (s, b, t)-CPP as a new cell in the (s t , bt, 1)-CPP. ut 3 Characterization of CPPs We now introduce a combinatorial characterization of CPP. Given a set system F ⊆ 2 Y , for any y ∈ Y , we let F(y) = {F ∈ F | y ∈ F}. For convenience, for a partition P of Y , we abuse this notation and let P(y) denote the set F ∈ P that y ∈ F. Definition 1. We say a set system F ⊆ 2 Y is an s × k-partition of Y , if F is a union of s number of partitions of Y , where the cardinality of each partition is at most k. This particular notion of partitions of Y fully captures the structure of cellprobe proofs. In this extended abstract, we only provide the characterization of 1-cell proofs. As shown by Lemma 1, this is a canonical case for cell-probe proofs. Theorem 1. There is an (s, b, 1)-CPP for f : X × Y → {0, 1}, if and only if there exists an s × 2 b -partition F of Y , such that for every x ∈ X and every y ∈ Y , there is an F ∈ F(y) that |f(x, F)| = 1
6 Yitong Yin Proof..(→)Given a table structure T:Y×I→{0,lb,define the map of the table structure as a s x 25 matrix M such that Mij=yEY Tu(i)=j}, i.e.Mi;is the set of such data set y that the content of the i's cell of the table storing y is j.It is clear that each row i of M is a partition of Y with at most 25 partition sets,because each data set y has one and only one value of T(i), and there are at most 25 possible values for a cell,therefore the matrix M is an s×2b-partition F of Y,where each Mij is an F∈F. If there is an (s,b,1)-CPP of f,due to the completeness of CPP,for every xE X and every y EY,there exists a cell i that (i,j)becomes the certificate where j=T(i),and due to the soundness of CPP,there must not be any other y'EY such that T(i)=j and f(r,y')f(r,y).Note that by definition of M, Mij contains all y'such that Ty(i)=j,thus f(,Mij)=1. ()Assuming that F is an s x 25-partition of Y such that for every r and every y there is an FF(y)that If(x,F)|=1,we rewrite F in the form of an sx 25 matrix M that Mij is the FE F which is indexed as the jth partition set in the ith partition.We can define our table structure T:Y×I→{0,lb in the way that Tu(i)is assigned with the unique j that yE Mii.Because each row of M is a partition of y,such T is well-defined. For every z∈X and every y∈Y,there is an F∈F(y)that f(x,F)l=l, i.e.there is an Mij3y that If(,Mij)=1,then we use (i,j)as the certificate. Since for every x and y,there exists such i,the corresponding CPP is complete, and since f(x,)is constant on such Mij,the CPP is also sound. ▣ Let Yo yy f(z,y)=0}and yf=yEy f(r,y)=1).An alternative characterization is that there is a (s,b,1)-CPP for a problem f X×Y→{0,l},if and only if there exists an s×2b-partition F of Y,such that Yo,YiJrex is contained by the union-closure of F.Note that this statement is equivalent to the statement in Theorem 1,so we state it without proof.With this formulation,we get some intuition about 1-cell proofs.that is.a problem f:Xx Y{0,1}has simple proofs,if and only if there exists some set system FC2 with a simple structure,such that the complexity of F matches the complexity of the problem. 4 Nearest neighbor search We consider the decision version of nearest neighbor search,A-near neighbor (A-NN),in a high dimensional Hamming cube (0,1]d.Here X ={0,1]d,Y ()and f()1}answers whether there exists a point in y within distance A from the r.As in [3,1],we assume that d=w(logn)nn(1)to make the problem non-trivial. We prove that with the above setting,there does not exist a(Poly(n),n(1),1)- CPP for the A-NN problem,thus due to Lemma 1,a super-constant lower bound holds for the problem.To show this,we show the same lower bound for the par- tial match problem[9,8],which is an instantiation of the A-NN problem as shown in[3
6 Yitong Yin Proof. (=⇒) Given a table structure T : Y × I → {0, 1} b , define the map of the table structure as a s × 2 b matrix M such that Mij = {y ∈ Y | Ty(i) = j}, i.e. Mij is the set of such data set y that the content of the i’s cell of the table storing y is j. It is clear that each row i of M is a partition of Y with at most 2 b partition sets, because each data set y has one and only one value of Ty(i), and there are at most 2b possible values for a cell, therefore the matrix M is an s × 2 b -partition F of Y , where each Mij is an F ∈ F. If there is an (s, b, 1)-CPP of f, due to the completeness of CPP, for every x ∈ X and every y ∈ Y , there exists a cell i that hi, ji becomes the certificate where j = Ty(i), and due to the soundness of CPP, there must not be any other y 0 ∈ Y such that Ty0 (i) = j and f(x, y0 ) 6= f(x, y). Note that by definition of M, Mij contains all y 0 such that Ty0 (i) = j, thus |f(x, Mij )| = 1. (⇐=) Assuming that F is an s × 2 b -partition of Y such that for every x and every y there is an F ∈ F(y) that |f(x, F)| = 1, we rewrite F in the form of an s × 2 b matrix M that Mij is the F ∈ F which is indexed as the jth partition set in the ith partition. We can define our table structure T : Y × I → {0, 1} b in the way that Ty(i) is assigned with the unique j that y ∈ Mij . Because each row of M is a partition of Y , such T is well-defined. For every x ∈ X and every y ∈ Y , there is an F ∈ F(y) that |f(x, F)| = 1, i.e. there is an Mij 3 y that |f(x, Mij )| = 1, then we use hi, ji as the certificate. Since for every x and y, there exists such i, the corresponding CPP is complete, and since f(x, ·) is constant on such Mij , the CPP is also sound. ut Let Y x 0 = {y ∈ Y | f(x, y) = 0} and Y x 1 = {y ∈ Y | f(x, y) = 1}. An alternative characterization is that there is a (s, b, 1)-CPP for a problem f : X × Y → {0, 1}, if and only if there exists an s × 2 b -partition F of Y , such that {Y x 0 , Y x 1 }x∈X is contained by the union-closure of F. Note that this statement is equivalent to the statement in Theorem 1, so we state it without proof. With this formulation, we get some intuition about 1-cell proofs, that is, a problem f : X × Y → {0, 1} has simple proofs, if and only if there exists some set system F ⊆ 2 Y with a simple structure, such that the complexity of F matches the complexity of the problem. 4 Nearest neighbor search We consider the decision version of nearest neighbor search, λ-near neighbor (λ-NN), in a high dimensional Hamming cube {0, 1} d . Here X = {0, 1} d , Y = {0,1} d n and f(x, y) ∈ {0, 1} answers whether there exists a point in y within distance λ from the x. As in [3, 1], we assume that d = ω(log n) ∩ n o(1) to make the problem non-trivial. We prove that with the above setting, there does not exist a (Poly(n), no(1) , 1)- CPP for the λ-NN problem, thus due to Lemma 1, a super-constant lower bound holds for the problem. To show this, we show the same lower bound for the partial match problem[9, 8], which is an instantiation of the λ-NN problem as shown in [3]
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity The partial match problem is defined as follow:The domain is a Hamming cube {0,1d,where d=w(logn)nn(1),and each data instance y is a set of n points from the domain,i.e.Y().The set of queries is=1 Given a data instancey∈(o,1 and a query∈{0,l,*}4,fz,g)=1if and only if there is a zEy such that z matches x except for the bits assigned with “*” Theorem 2.There is no (s,b,1)-CPP for the partial match problem,if s= Poly(n)and b=no(1). Proof.We denote the problem as f.From the characterization of(s,b,1)-CPP given in Theorem 1,it is sufficient to show that for any s x 2 partition F of y, there exist x∈X and y∈Y such that for all F∈F(y),lf(z,F)l=2.We prove this with the probabilistic method.With some distribution of x and y,we show that for any s x 25 partition F of Y,Pr[VF EF(y),If(,F)=2]>0. For the rest of the proof,we assume that y is uniformly selected from Y,and x is generated by uniformly choosing r 2logn bits and fixing each of them uniformly and independently at random with 0 or 1,and setting the other bits to“*”. We then prove two supporting lemmas.Recall that for a partition P of Y, P(y)denotes the set F∈P that y∈F. Lemma 2.For any partition P ofY,if P(1-2-)2d for k=logn,then Pr[f(x,A)=0≤n-w()
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 7 The partial match problem is defined as follow: The domain is a Hamming cube {0, 1} d , where d = ω(log n) ∩ n o(1), and each data instance y is a set of n points from the domain, i.e. Y = {0,1} d n . The set of queries is X = {0, 1, ∗}d . Given a data instance y ∈ {0,1} d n and a query x ∈ {0, 1, ∗}d , f(x, y) = 1 if and only if there is a z ∈ y such that z matches x except for the bits assigned with “∗”. Theorem 2. There is no (s, b, 1)-CPP for the partial match problem, if s = Poly(n) and b = n o(1) . Proof. We denote the problem as f. From the characterization of (s, b, 1)-CPP given in Theorem 1, it is sufficient to show that for any s × 2 b partition F of Y , there exist x ∈ X and y ∈ Y such that for all F ∈ F(y), |f(x, F)| = 2. We prove this with the probabilistic method. With some distribution of x and y, we show that for any s × 2 b partition F of Y , Pr[∀F ∈ F(y), |f(x, F)| = 2] > 0. For the rest of the proof, we assume that y is uniformly selected from Y , and x is generated by uniformly choosing r = 2 log n bits and fixing each of them uniformly and independently at random with 0 or 1, and setting the other bits to “∗”. We then prove two supporting lemmas. Recall that for a partition P of Y , P(y) denotes the set F ∈ P that y ∈ F. Lemma 2. For any partition P of Y , if |P| ≤ 2 b , where b = n o(1), then Pr y |P(y)| ≤ 2 d n . 2 n Ω(1) ≤ n −ω(1) . Proof. We let P = {F1, F2, . . . , Fk} where k ≤ 2 b , and let pi = |Fi |/|Y |. Because P is a partition of Y , we know that P i pi = 1. We define a random variable Z = |P(y)|/|Y |. Since y is picked uniformly at random from Y , it holds that Z = pi with probability pi . Since there are at most 2b different P(y), by union bound, Pr y |P(y)| ≤ 2 d n . 2 n Ω(1) ≤ 2 b · Pr h Z = pi where pi ≤ 2 −n Ω(1) i = 2b−n Ω(1) = n −ω(1) . ut For simplicity, we generalize the notation of f to arbitrary point set A ⊆ {0, 1} d , where f(x, A) is conventionally defined to indicate whether there is a z ∈ A that matches x Lemma 3. For any A ⊆ {0, 1} d , if |A| > (1 − 2 −k )2d for k = 1 2 log n, then Pr x [f(x, A) = 0] ≤ n −ω(1)
Yitong Yin Proof.We let B=f0,1d\A be the complement of A in the d-dimensional cube. Note that Bl(()2).According to Lemma 2,for any partition Pof Y with≤2,the probability that(gl≤(1-2行2)= (/2nis at most n() We let Ay=UP(y)=UveP().Note that Ay (0,1)4,and f(,P(y))= (0}implies that f(,A)=0.For such P(y)that P(y)>(()2),by the Pigeonhole Principle,it holds that Au>(1-2-)2d.Due to Lemma 3, f(,Ay)=0 with prohibitively small probability.Putting these together,it holds
8 Yitong Yin Proof. We let B = {0, 1} d \A be the complement of A in the d-dimensional cube. Note that |B| (1−2 −k )2d n . According to Lemma 2, for any partition P of Y with |P| ≤ 2 b , the probability that |P(y)| ≤ (1−2 −k )2d n = 2 d n /2 n Ω(1) is at most n −ω(1) . We let Ay = S P(y) = S y0∈P(y) y 0 . Note that Ay ⊆ {0, 1} d , and f(x,P(y)) = {0} implies that f(x, Ay) = 0. For such P(y) that |P(y)| > (1−2 −k )2d n , by the Pigeonhole Principle, it holds that |Ay| ≥ (1 − 2 −k )2d . Due to Lemma 3, f(x, Ay) = 0 with prohibitively small probability. Putting these together, it holds
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity for an arbitrary partition P of y with P(-】 ≤naa+Prfz,A,)=0l4,l>(1-2-*)2 ≤nw(1) Combining with (1),we have that gF∈F,fe,F)={01≤sn-=oL). Therefore,for an arbitrary sx 26 partition F of y,it holds that gFeF),lf(z,F1=2≥1-PrF∈F(,f,F)={1H I.y -Pr日F∈F(y),f(x,F)={0] z.y ≥1-o(1) It follows that for any s x 26 partition F of y,where s=Poly(n)and b=n(1) there exist z∈X and y∈Y such that for every F∈F(y),it holds that If(,F)=2.By Theorem 1,there is no (s,b,1)-CPP for f with the above range of s and b. ▣ In [3],it is shown that the partial match problem can be reduced to the A-NN problem.Because the reduction only involves mapping between instances of problems,the existence of an (s,b,1)-CPP for A-NN implies the existence of a CPP for partial match with essentially the same parameters.The following corollary is implied. Corollary 1.There does not erist a (Poly(n),n(1),1)-CPP for the nearest neighbor search problem with n points in d-dimensional Hamming space where d =w(logn)nno(1). Due to Lemma 1,the following super-constant lower bound on the nondeter- ministic cell-probe complexity holds. Corollary 2.There does not erist a (Poly(n),n01),O(1))-CPP for the near- est neighbor search problem or the partial match problem with n points in d- dimensional Hamming space where d=w(logn)n(1). 5 Polynomial evaluation Let 2k be a finite field.Let Y=2kd be the set of all polynomials of degree <(d-1)over the finite field 2k.Throughout this section,we assume that d≤2k
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 9 for an arbitrary partition P of Y with |P| ≤ 2 b that Pr x,y [f(x,P(y)) = {0}] ≤ Pr y |P(y)| ≤ (1 − 2 −k )2d n + Pr x,y f(x,P(y)) = {0} |P(y)| > (1 − 2 −k )2d n ≤ n −ω(1) + Pr x h f(x, Ay) = 0 |Ay| > (1 − 2 −k )2d i ≤ n −ω(1) . Combining with (1), we have that Pr x,y [∃F ∈ F(y), f(x, F) = {0}] ≤ s · n −ω(1) = o(1). Therefore, for an arbitrary s × 2 b partition F of Y , it holds that Pr x,y [∀F ∈ F(y), |f(x, F)| = 2] ≥ 1 − Pr x,y [∃F ∈ F(y), f(x, F) = {1}] − Pr x,y [∃F ∈ F(y), f(x, F) = {0}] ≥ 1 − o(1). It follows that for any s × 2 b partition F of Y , where s = Poly(n) and b = n o(1) , there exist x ∈ X and y ∈ Y such that for every F ∈ F(y), it holds that |f(x, F)| = 2. By Theorem 1, there is no (s, b, 1)-CPP for f with the above range of s and b. ut In [3], it is shown that the partial match problem can be reduced to the λ-NN problem. Because the reduction only involves mapping between instances of problems, the existence of an (s, b, 1)-CPP for λ-NN implies the existence of a CPP for partial match with essentially the same parameters. The following corollary is implied. Corollary 1. There does not exist a (Poly(n), no(1) , 1)-CPP for the nearest neighbor search problem with n points in d-dimensional Hamming space where d = ω(log n) ∩ n o(1) . Due to Lemma 1, the following super-constant lower bound on the nondeterministic cell-probe complexity holds. Corollary 2. There does not exist a (Poly(n), no(1), O(1))-CPP for the nearest neighbor search problem or the partial match problem with n points in ddimensional Hamming space where d = ω(log n) ∩ n o(1) . 5 Polynomial evaluation Let 2k be a finite field. Let Y = 2kd be the set of all polynomials of degree ≤ (d − 1) over the finite field 2k . Throughout this section, we assume that d ≤ 2 k
10 Yitong Yin Let X=22k be the set of all pairs of elements of the finite field 2%.A decision version of the polynomial evaluation problem f is defined as:for every query(x,z)∈X and every data instance g∈Y,f(x,z),g)=lifg(z)=zand f((,z),g)=0 otherwise.Intuitively,a polynomial g is preprocessed and stored as a data structure,so that for each query (x,z),the data structure answers whether g(r)=z. There are two naive upper bounds for one-cell proofs: 1.A (1,kd,1)-CPP:store the whole polynomial in a single cell,and on each query,one probe reveals the whole polynomial; 2.A(2,k,1)-CPP:each cell corresponds to an input x,and the cell stores the value of g(),thus on each query (z),one probe to the cell corresponding to x answers whether g(x)=z. We are going to prove that the above naive upper bounds are essentially optimal for single-probe proofs.We show that for any (s,b,1)-CPP,either b is close to large enough to store a whole polynomial as in(1),or the total storage size s.b is exactly as large as in(2). We first prove two lemmas.For any subset P C Y,let r(P)=Ifr E2k Vg1,92 E P,g()=g2(x)},which represents the number of such assignments of x that all polynomials in P yield the same outcome.It is trivial to see that for |P≤1,T(P)=2k Lemma 4.If P>1,it holds that r(P)≤d-logIP k Proof.We write r(P)briefly as T.Let z1,x2,...,be such that all polynomials in P yield the same outcomes.We arbitrarily pick other r+1,+2,...,d.For any two different polynomials g1,92E P,it can never hold that g(ri)=g2(i) for all i=T+1,T+2,...,d,since if otherwise,g=g2 by interpolation.Recall that g is a polynomial over the finite field 2k,thus for an arbitrary gE P and an arbitrary x,there are at most 2 possible values for g().Therefore,due to Pigeonhole Principle,in order to guarantee that no two polynomials in P agree on all+2.....d,it must hold that k(d)Pl i.e.(P)sd-log el ▣ Lemma 5.Given a partition P of Y,let g be a uniformly random polynomial in Y.ET(P(g))}represents the expected number of the input rs such that all polynomials in the partition block P(g)yield the same outcome,where the expec- tation is taken over random g.For any partition P of Y such that P<2b and b<k(d-1)-logk,it holds that b E{r(P(g}≤
10 Yitong Yin Let X = 22k be the set of all pairs of elements of the finite field 2k . A decision version of the polynomial evaluation problem f is defined as: for every query (x, z) ∈ X and every data instance g ∈ Y , f((x, z), g) = 1 if g(x) = z and f((x, z), g) = 0 otherwise. Intuitively, a polynomial g is preprocessed and stored as a data structure, so that for each query (x, z), the data structure answers whether g(x) = z. There are two naive upper bounds for one-cell proofs: 1. A (1, kd, 1)-CPP: store the whole polynomial in a single cell, and on each query, one probe reveals the whole polynomial; 2. A (2k , k, 1)-CPP: each cell corresponds to an input x, and the cell stores the value of g(x), thus on each query (x, z), one probe to the cell corresponding to x answers whether g(x) = z. We are going to prove that the above naive upper bounds are essentially optimal for single-probe proofs. We show that for any (s, b, 1)-CPP, either b is close to large enough to store a whole polynomial as in (1), or the total storage size s · b is exactly as large as in (2). We first prove two lemmas. For any subset P ⊆ Y , let τ (P) = |{x ∈ 2 k | ∀g1, g2 ∈ P, g1(x) = g2(x)}|, which represents the number of such assignments of x that all polynomials in P yield the same outcome. It is trivial to see that for |P| ≤ 1, τ (P) = 2k . Lemma 4. If |P| > 1, it holds that τ (P) ≤ d − log |P| k . Proof. We write τ (P) briefly as τ . Let x1, x2, . . . , xτ be such that all polynomials in P yield the same outcomes. We arbitrarily pick other xτ+1, xτ+2, . . . , xd. For any two different polynomials g1, g2 ∈ P, it can never hold that g1(xi) = g2(xi) for all i = τ + 1, τ + 2, . . . , d, since if otherwise, g1 ≡ g2 by interpolation. Recall that g is a polynomial over the finite field 2k , thus for an arbitrary g ∈ P and an arbitrary x, there are at most 2k possible values for g(x). Therefore, due to Pigeonhole Principle, in order to guarantee that no two polynomials in P agree on all xτ+1, xτ+2, . . . , xd, it must hold that 2k(d−τ) ≥ |P|, i.e. τ (P) ≤ d − log |P | k . ut Lemma 5. Given a partition P of Y , let g be a uniformly random polynomial in Y . E{τ (P(g))} represents the expected number of the input xs such that all polynomials in the partition block P(g) yield the same outcome, where the expectation is taken over random g. For any partition P of Y such that |P| ≤ 2 b and b < k(d − 1) − log k, it holds that E{τ (P(g))} ≤ b k