Cell-Probe Proofs YITONG YIN Nanjing University We study the nondeterministic cell-probe complexity of static data structures.We introduce cell. probe proofs (CPP),a proof system for the cell-probe model,which describes verification instead of computation in the cell-probe model.We present a combinatorial characterization of CPP.With this novel tool,we prove the following lower bounds for the nondeterministic cell-probe complexity of static data structures. -There exists a data structure problem with high nondeterministic cell-probe complexity. -For the exact nearest neighbor search(NNS)problem or the partial match problem in high di- mensional Hamming space,for any data structure with Poly(n)cells,each of which contains O(nc)bits where C 1,the nondeterministic cell-probe complexity is at least (log(d/logn)). where d is the dimension and n is the number of points in the data set. -For the polynomial evaluation problem of d-degree polynomial over finite field of size 2k where ds 2,for any data structure with s cells,each of which contains b bits,the nondeterministic cell-probe complexity is at least min ((d-1). 10g8 Categories and Subject Descriptors:E.1 [Datal:Data Structures;F.1.2 [Computation by Abstract Devices]:Modes of Computation General Terms:Theory Additional Key Words and Phrases:Cell-probe model,nondeterminism ACM Reference Format: Yin,Y.2010.Cell-probe proofs.ACM Trans.Comput.Theory,2,1,Article 1(November 2010), 17 pages..D0I-10.1145/1867719.1867720.http/doi.acm.org/10.1145/1867719.1867720. 1.INTRODUCTION We study the problem of the nondeterministic cell-probe complexity of static data structures. This work was supported by the Program for New Century Excellent Talents at University(NCET) and by the National Science Foundation of China under grant 60721002. A preliminary version of this article appeared in the Proceedings of the 35th International Collo. quium on Automata,Languages and Programming,pp.72-83. Author's address:Y.Yin,Nanjing University,Department of Computer Science and Technology,22 Hankou Road,Nanjing,Jiangsu,China 210093;email:yinyt@nju.edu.cn. Permission to make digital or hard copies part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,to republish,to post on servers, to redistribute to lists,or to use any component of this work in other works requires prior specific permission and/or a fee.Permissions may be requested from Publications Dept.,ACM,Inc.,2 Penn Plaza,Suite 701,New York,NY 10121-0701 USA,fax +1(212)869-0481,or permissions@acm.org ©2010ACM1942-3454/2010/11-ART1$10.00D0L:10.1145/1867719.1867720. http:/doi.acm.org/10.1145/1867719.1867720 ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
1 Cell-Probe Proofs YITONG YIN Nanjing University We study the nondeterministic cell-probe complexity of static data structures. We introduce cellprobe proofs (CPP), a proof system for the cell-probe model, which describes verification instead of computation in the cell-probe model. We present a combinatorial characterization of CPP. With this novel tool, we prove the following lower bounds for the nondeterministic cell-probe complexity of static data structures. —There exists a data structure problem with high nondeterministic cell-probe complexity. —For the exact nearest neighbor search (NNS) problem or the partial match problem in high dimensional Hamming space, for any data structure with Poly(n) cells, each of which contains O nC bits where C < 1, the nondeterministic cell-probe complexity is at least log(d/ log n) , where d is the dimension and n is the number of points in the data set. —For the polynomial evaluation problem of d-degree polynomial over finite field of size 2k where d ≤ 2k, for any data structure with s cells, each of which contains b bits, the nondeterministic cell-probe complexity is at least min k b (d − 1), k−log(d−1) log s . Categories and Subject Descriptors: E.1 [Data]: Data Structures; F.1.2 [Computation by Abstract Devices]: Modes of Computation General Terms: Theory Additional Key Words and Phrases: Cell-probe model, nondeterminism ACM Reference Format: Yin, Y. 2010. Cell-probe proofs. ACM Trans. Comput. Theory, 2, 1, Article 1 (November 2010), 17 pages. DOI = 10.1145/1867719.1867720. http://doi.acm.org/10.1145/1867719.1867720. 1. INTRODUCTION We study the problem of the nondeterministic cell-probe complexity of static data structures. This work was supported by the Program for New Century Excellent Talents at University (NCET) and by the National Science Foundation of China under grant 60721002. A preliminary version of this article appeared in the Proceedings of the 35th International Colloquium on Automata, Languages and Programming, pp. 72–83. Author’s address: Y. Yin, Nanjing University, Department of Computer Science and Technology, 22 Hankou Road, Nanjing, Jiangsu, China 210093; email: yinyt@nju.edu.cn. Permission to make digital or hard copies part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. c 2010 ACM 1942-3454/2010/11-ART1 $10.00 DOI: 10.1145/1867719.1867720. http://doi.acm.org/10.1145/1867719.1867720. ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.
1:2 .Y.Yin Given a set Y of data instances and a set X of possible queries,a data struc- ture problem can be abstractly defined as a function f mapping each pair con- sisting ofa queryx X and a data instance y e Y to an answer.One of the most well-studied examples of data structure problems is the "membership query": X=[m]is a data universe,Y=(m),and f(x,y)=1ifxey and f(x.y)=o if otherwise. There are some other important examples of data structure problems: Exact nearest neighbor search (NNS)[Barkol and Rabani 2002;Borodin et al.1999;Indyk et al.2004]:given a metric space U,let X =U and Y=(U),and for every xX and y Y,f(x,y)is defined as the closest point to x in y according to the metric. Partial match [Borodin et al.1999;Jayram et al.2004;Patrascu 2008]: =(0.1,Y =(10 )and f(x.y)e(0,1)such that for every xx and y e Y,f(x,y)=1 if and only if there exists z ey having either xi=z or xi=*for every i. Polynomial evaluation [Miltersen 1995]:X =2k is a finite field,Y =2kd is the set of all (d-1)-degree polynomials over the finite field 2,and f(x,y) returns the value of y(x). A classic computational model for static data structures is the cell-probe model Yao [1981].For each data instance y,a table of cells is constructed to store y. This table is called a static data structure for some problem f.Upon a query x, an all-powerful algorithm tries to compute f(x,y),based on adaptive random access(probes)to the cells. The cell-probe model is a clean and general model for static data structures, and serves as a great tool for the study of lower bounds.Previous research on static data structures in the cell-probe model has focused on the complexity of adaptive cell-probes.In this work,we focus on the complexity of nondeter- ministic cell-probes and the tradeoff between the number of probes needed and space.We speculate that it is an important problem with following motivations: (1)In considering the complexity of data structures,nondeterminism is a very natural extension to the cell-probe model.Instead of adaptive computa- tions,nondeterministic cell-probes capture the notion of verification,which is a natural and important aspect of data structures. (2)A classic tool for analyzing the cell-probe model is communication complex- ity [Kushilevitz and Nisan 1997;Miltersen et al.1998],for which the com- putations in a data structure are viewed as communications between two adaptive players.This observation makes computations in data structures analyzable by granting the data structure extra power(i.e.,an adaptive table).By the same light,assuming nondeterminism provides us a differ- ent way of proving lower bounds:instead of having a table that can "talk," it assumes a probing algorithm that can "guess". ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
1: 2 · Y. Yin Given a set Y of data instances and a set X of possible queries, a data structure problem can be abstractly defined as a function f mapping each pair consisting of a query x ∈ X and a data instance y ∈ Y to an answer. One of the most well-studied examples of data structure problems is the “membership query”: X = [m] is a data universe, Y = [m] n , and f(x, y) = 1 if x ∈ y and f(x, y) = 0 if otherwise. There are some other important examples of data structure problems: Exact nearest neighbor search (NNS) [Barkol and Rabani 2002; Borodin et al. 1999; Indyk et al. 2004]: given a metric space U, let X = U and Y = U n , and for every x ∈ X and y ∈ Y, f(x, y) is defined as the closest point to x in y according to the metric. Partial match [Borodin et al. 1999; Jayram et al. 2004; Patras ˘ ¸cu 2008]: X = {0, 1, ∗}d, Y = {0,1}d n , and f(x, y) ∈ {0, 1} such that for every x ∈ X and y ∈ Y, f(x, y) = 1 if and only if there exists z ∈ y having either xi = zi or xi = ∗ for every i. Polynomial evaluation [Miltersen 1995]: X = 2k is a finite field, Y = 2kd is the set of all (d− 1)-degree polynomials over the finite field 2k, and f(x, y) returns the value of y(x). A classic computational model for static data structures is the cell-probe model Yao [1981]. For each data instance y, a table of cells is constructed to store y. This table is called a static data structure for some problem f. Upon a query x, an all-powerful algorithm tries to compute f(x, y), based on adaptive random access (probes) to the cells. The cell-probe model is a clean and general model for static data structures, and serves as a great tool for the study of lower bounds. Previous research on static data structures in the cell-probe model has focused on the complexity of adaptive cell-probes. In this work, we focus on the complexity of nondeterministic cell-probes and the tradeoff between the number of probes needed and space. We speculate that it is an important problem with following motivations: (1) In considering the complexity of data structures, nondeterminism is a very natural extension to the cell-probe model. Instead of adaptive computations, nondeterministic cell-probes capture the notion of verification, which is a natural and important aspect of data structures. (2) A classic tool for analyzing the cell-probe model is communication complexity [Kushilevitz and Nisan 1997; Miltersen et al. 1998], for which the computations in a data structure are viewed as communications between two adaptive players. This observation makes computations in data structures analyzable by granting the data structure extra power (i.e., an adaptive table). By the same light, assuming nondeterminism provides us a different way of proving lower bounds: instead of having a table that can “talk,” it assumes a probing algorithm that can “guess”. ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.
Cell-Probe Proofs ·1:3 Although nondeterministic cell-probe complexity is an important problem,the optimal nondeterministic bounds for static data structure problems are still un- known.For many naturally defined data structure problems,nondeterminism trivializes the problem in the sense that a constant number of nondeterministic cell-probes are sufficient to answer the queries.For the problems that remain nontrivial after allowing nondeterminism,no lower bounds are known for the nondeterministic cell-probe complexity. It is thus worth asking whether there exists any data structure problem such that in data structures with feasible sizes(polynomial in the size of data set),the nondeterministic cell-probe complexity is relatively high.More impor- tantly,it calls for a general technique to prove lower bounds on the nondeter- ministic cell-probe complexity of static data structures. 1.1 Our Contribution In this article.we initiate the study of nondeterministic cell-probe complexity for static data structures. We introduce cell-probe proofs,a proof system in the cell-probe model.This notion of proof corresponds to considering verification instead of computation in the cell-probe model.Unlike the fully adaptive computation in the tradi- tional cell-probe model,the formulation of cell-probe proofs shows a combinato- rial simplicity.We introduce a combinatorial structure that fully characterizes which problems have cell-probe proofs with specified parameters. With these novel tools,we show the following lower bounds on nondetermin- istic cell-probe complexity. -There exists a data structure problem with high nondeterministic cell-probe complexity.This result can be seen as a nondeterministic parallel to the existential lower bound for deterministic cell-probe complexity in Miltersen [1999],which is proved by counting.As mentioned in Miltersen [2008],the nondeterministic lower bound cannot be easily proved by the similar count- ing argument -For the exact nearest neighbor search(NNS)problem or the partial match problem in high dimensional Hamming space,for any data structure with Poly(n)cells,each of which contains O(nc)bits where C 1,the nonde- terministic cell-probe complexity is at least (log(d/logn)),where d is the dimension and n is the number of points in the data set.The highest known deterministic cell-probe lower bound for the problems for polynomial space is Q(d/logn)(see Barkol and Rabani [2002]and Patrascu [2008]).These lower bounds are proved by communication complexity techniques. -For the polynomial evaluation problem of d-degree polynomial over a fi- nite field of size 2 where d is high,for any data structure with s cells, each of which contains 6 bits,the nondeterministic cell-probe complex- ity is at least min 1).) This bound is nearly equal to the ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
Cell-Probe Proofs · 1: 3 Although nondeterministic cell-probe complexity is an important problem, the optimal nondeterministic bounds for static data structure problems are still unknown. For many naturally defined data structure problems, nondeterminism trivializes the problem in the sense that a constant number of nondeterministic cell-probes are sufficient to answer the queries. For the problems that remain nontrivial after allowing nondeterminism, no lower bounds are known for the nondeterministic cell-probe complexity. It is thus worth asking whether there exists any data structure problem such that in data structures with feasible sizes (polynomial in the size of data set), the nondeterministic cell-probe complexity is relatively high. More importantly, it calls for a general technique to prove lower bounds on the nondeterministic cell-probe complexity of static data structures. 1.1 Our Contribution In this article, we initiate the study of nondeterministic cell-probe complexity for static data structures. We introduce cell-probe proofs, a proof system in the cell-probe model. This notion of proof corresponds to considering verification instead of computation in the cell-probe model. Unlike the fully adaptive computation in the traditional cell-probe model, the formulation of cell-probe proofs shows a combinatorial simplicity. We introduce a combinatorial structure that fully characterizes which problems have cell-probe proofs with specified parameters. With these novel tools, we show the following lower bounds on nondeterministic cell-probe complexity. —There exists a data structure problem with high nondeterministic cell-probe complexity. This result can be seen as a nondeterministic parallel to the existential lower bound for deterministic cell-probe complexity in Miltersen [1999], which is proved by counting. As mentioned in Miltersen [2008], the nondeterministic lower bound cannot be easily proved by the similar counting argument. —For the exact nearest neighbor search (NNS) problem or the partial match problem in high dimensional Hamming space, for any data structure with Poly(n) cells, each of which contains O(nC) bits where C < 1, the nondeterministic cell-probe complexity is at least log d/ log n , where d is the dimension and n is the number of points in the data set. The highest known deterministic cell-probe lower bound for the problems for polynomial space is (d/ logn) (see Barkol and Rabani [2002] and Patras ˘ ¸cu [2008]). These lower bounds are proved by communication complexity techniques. —For the polynomial evaluation problem of d-degree polynomial over a fi- nite field of size 2k where d is high, for any data structure with s cells, each of which contains b bits, the nondeterministic cell-probe complexity is at least min k b (d − 1), k−log(d−1) log s . This bound is nearly equal to the ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.
1:4.Y.Yin (mind+1deterministic lower bound [Miltersen 9nwhich it is assumed that 6 =k 1.2 Related Work To the best of our knowledge,there is no general technique for proving lower bounds for nondeterministic cell-probe complexity of static data structures.Nor do there exist any nontrivial lower bounds for this question.Previous work on static data structures in the cell-probe model have focused on the complexity of adaptive cell-probes.The most important tool for proving such lower bounds is asymmetric communication complexity as introduced by Miltersen et al.[1998]. Fredman and Saks [1989],introduce the chronogram method.This power- ful technique is specialized for proving the query/update tradeoff for dynamic data structures,especially for the problems that are hard only in the dynamic case.It is worth noting that the chronogram method can prove nondetermin- istic lower bounds for certain dynamic data structure problems.This is for- mally addressed by Husfeldt and Rauhe in [1998],and recently by Patrascu and Demaine [2006].However,as pointed in Husfeldt and Rauhe [1998],this is only a byproduct of the nondeterministic nature of chronogram method,and can only yield amortized query/update tradeoffs for dynamic data structure problems with a certain property.Due to the unique structure of the chrono- gram method,this technique cannot be utilized to prove lower bounds for static data structures. 2.CELL-PROBE PROOFS A static data structure problem is represented as a boolean function f:XxY> (0,1).For the purposes of proving lower bounds,it is sufficient to consider only the decision problems.We refer to each y e y as data and each x e X as a query.For each pair ofx and y,f(x,y)specifies the result of the query x to the data structure that represents the data y. In the cell-probe model(c.f.,Fredman and Saks [1989];Yao [1981]),the data instance y is preprocessed and stored in cells,and for each query x,the value of f(x,y)is decided by adaptive probes to the cells.Formally,a cell-probe scheme consists of a table structure and a query algorithm.The table structure T: YxI>(0,1)5 specifies a table T:I>(0,1)5 for each data instance y,which maps indices of cells to their contents.Given a query x,the query algorithm makes a sequence of probes i,i2,...to the cells,where is depends on x,and all previous cell probes (i,T,(i1),(i2,T,(i2),...,(i1,T(i-1)).The value of f(x,y)is decided at last based on the collected information. In this work,we focus on nondeterministic cell-probes.Given a query x to a data instance y,a set oft cells i1,i2,...,i are probed nondeterministically,such that the value of f(x,y)can be uniquely decided on the query input x and the probed information. ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
1: 4 · Y. Yin min d + 1, k−log d log s deterministic lower bound [Miltersen 1995], in which it is assumed that b = k. 1.2 Related Work To the best of our knowledge, there is no general technique for proving lower bounds for nondeterministic cell-probe complexity of static data structures. Nor do there exist any nontrivial lower bounds for this question. Previous work on static data structures in the cell-probe model have focused on the complexity of adaptive cell-probes. The most important tool for proving such lower bounds is asymmetric communication complexity as introduced by Miltersen et al. [1998]. Fredman and Saks [1989], introduce the chronogram method. This powerful technique is specialized for proving the query/update tradeoff for dynamic data structures, especially for the problems that are hard only in the dynamic case. It is worth noting that the chronogram method can prove nondeterministic lower bounds for certain dynamic data structure problems. This is formally addressed by Husfeldt and Rauhe in [1998], and recently by Patras ˘ ¸cu and Demaine [2006]. However, as pointed in Husfeldt and Rauhe [1998], this is only a byproduct of the nondeterministic nature of chronogram method, and can only yield amortized query/update tradeoffs for dynamic data structure problems with a certain property. Due to the unique structure of the chronogram method, this technique cannot be utilized to prove lower bounds for static data structures. 2. CELL-PROBE PROOFS A static data structure problem is represented as a boolean function f : X ×Y → {0, 1}. For the purposes of proving lower bounds, it is sufficient to consider only the decision problems. We refer to each y ∈ Y as data and each x ∈ X as a query. For each pair of x and y, f(x, y) specifies the result of the query x to the data structure that represents the data y. In the cell-probe model (c.f., Fredman and Saks [1989]; Yao [1981]), the data instance y is preprocessed and stored in cells, and for each query x, the value of f(x, y) is decided by adaptive probes to the cells. Formally, a cell-probe scheme consists of a table structure and a query algorithm. The table structure T : Y × I → {0, 1}b specifies a table Ty : I → {0, 1}b for each data instance y, which maps indices of cells to their contents. Given a query x, the query algorithm makes a sequence of probes i1,i2,... to the cells, where ik depends on x, and all previous cell probes i1, Ty(i1) , i2, Ty(i2) ,..., ik−1, Ty(ik−1) . The value of f(x, y) is decided at last based on the collected information. In this work, we focus on nondeterministic cell-probes. Given a query x to a data instance y, a set of t cells i1,i2,...,it are probed nondeterministically, such that the value of f(x, y) can be uniquely decided on the query input x and the probed information. ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010
Cell-Probe Proofs 1:5 Formally,a nondeterministic cell-probing algorithm is a function A which maps the query input x and the probed cells to the result value or L.such that for any query x and data y, (1)there exists a set of t cells in,i2,...,it,such that A(x,i,Ty(i》,(i2,T(i2),,,T()0=fx,y月 (2)for any set of t cells i,i2,...,it,it holds that A(x,1,Ty(i》,(i2,T,(i2)》,,(,T()》≠fx,y In order to formally characterize nondeterministic cell-probes for data struc- tures,we introduce a new concept,cell-probe proofs,which formalizes the notion of proof and verification in the cell-probe model.For a specific data structure problem f,a cell-probe proof system(CPP)may be defined for f as described in the following. We can think of a cell-probe proof system as a game played between an honest verifier and an untrusted prover.Both of them have unlimited com- putational power.Given an instance of data,a table of cells is honestly con- structed according to the rules known to both prover and verifier.Both the prover and the verifier know the query,but only the prover can observe the whole table and thus knows the data.The prover tries to convince the veri- fier about the result of the query to the data by revealing certain cells.After observing the revealed cells,the verifier either decides the correct answer,or rejects the proof,but cannot be tricked by the prover into returning a wrong answer. Formally,a cell-probe proof system(CPP)consists of three parts: -A table structure T:Y×I→{0,lb.For any data y,a table Ty:I→{0,lh is a mapping from indices of cells to their contents. -A prover P.For every x and y,Pxy I is a set of cells.We refer to Pxy as a proof and i,Ty(i)lie Pxy as a certificate. -A verifier v,which maps the queries with the certificates to the answers (0,1,L).Given an instance of data y,for any query x,both of the following conditions hold: (Completeness)Py I D(x,[i,T(i))li Pxy)=f(x,y),and (Soundness)P∈I:D(x,{iT,()Ii∈P})e{fx,y),⊥} An(s,b,t)-CPP is a CPP such that for every x and y:(1)the table has s cells, that is,I=s;(2)each cell contains b bits;and (3)each proof consists of t cell probes,that is,IPyl=t. Example.For the membership problem [Yao 1981],where X=[ml and Y ()and f)=1 if and only ify,a naive construction shows there is a 2-cell proof With a sorted table storing y,if x e y,the proof is the cell that contains x;ifx y,the proof consists of two consecutive cells which are the predecessor and successor of x.The same CPP also works for predecessor search Beame and Fich 2002]. ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
Cell-Probe Proofs · 1: 5 Formally, a nondeterministic cell-probing algorithm is a function A which maps the query input x and the probed cells to the result value or ⊥, such that for any query x and data y, (1) there exists a set of t cells i1,i2,...,it, such that A x, i1, Ty(i1) , i2, Ty(i2) ,..., it, Ty(it) = f(x, y); (2) for any set of t cells i1,i2,...,it, it holds that A x, i1, Ty(i1) , i2, Ty(i2) ,..., it, Ty(it) = f(x, y). In order to formally characterize nondeterministic cell-probes for data structures, we introduce a new concept, cell-probe proofs, which formalizes the notion of proof and verification in the cell-probe model. For a specific data structure problem f, a cell-probe proof system (CPP) may be defined for f as described in the following. We can think of a cell-probe proof system as a game played between an honest verifier and an untrusted prover. Both of them have unlimited computational power. Given an instance of data, a table of cells is honestly constructed according to the rules known to both prover and verifier. Both the prover and the verifier know the query, but only the prover can observe the whole table and thus knows the data. The prover tries to convince the veri- fier about the result of the query to the data by revealing certain cells. After observing the revealed cells, the verifier either decides the correct answer, or rejects the proof, but cannot be tricked by the prover into returning a wrong answer. Formally, a cell-probe proof system (CPP) consists of three parts: —A table structure T : Y × I → {0, 1}b . For any data y, a table Ty : I → {0, 1}b is a mapping from indices of cells to their contents. —A prover P. For every x and y, Pxy ⊆ I is a set of cells. We refer to Pxy as a proof and i, Ty(i) | i ∈ Pxy as a certificate. —A verifier v, which maps the queries with the certificates to the answers {0, 1, ⊥}. Given an instance of data y, for any query x, both of the following conditions hold: (Completeness) ∃Pxy ⊆ I : v x, i, Ty(i) | i ∈ Pxy = f(x, y), and (Soundness) ∀P ⊆ I : v x, i, Ty(i) | i ∈ P ∈ f(x, y), ⊥ . An (s, b,t)-CPP is a CPP such that for every x and y: (1) the table has s cells, that is, |I| = s; (2) each cell contains b bits; and (3) each proof consists of t cell probes, that is, |Pxy| = t. Example. For the membership problem [Yao 1981], where X = [m] and Y = [m] n , and f(x, y) = 1 if and only if x ∈ y, a naive construction shows there is a 2-cell proof. With a sorted table storing y, if x ∈ y, the proof is the cell that contains x; if x ∈ y, the proof consists of two consecutive cells which are the predecessor and successor of x. The same CPP also works for predecessor search [Beame and Fich 2002]. ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.
1:6·Y.Yin The fact that this simple example also works for predecessor has an impor- tant implication.Combining with the super-constant lower bound for predeces- sor search [Patrascu and Thorup 2006],this gives an example where one can separate deterministic and nondeterministic complexity. We remark that cell-probe proofs characterize the nondeterministic probes in the cell-probe model.Specifically,the verifier o is just a nondeterministic cell-probing algorithm.It is natural to see that for a cell-probe scheme,for any query,the cells probed by an adaptive algorithm contain a cell-probe proof. This can be seen as a data structure counterpart of P NP. It is important to note that although a data structure problem is nothing but a boolean function,CPP is very different from the certificate complexity of boolean functions [Buhrman and de Wolf 2002].In CPP,the prover and the verifier communicate with each other via a table structure,which distinguishes CPP from standard certificate complexity.For any data structure problem,the table structure can always store the results for all queries,making one cell- probe sufficient to prove the result,which is generally impossible in the model of certificate complexity. We should also be specific that the model of cell-probe proofs prohibits ran- domization.The nondeterministic model for randomized data structures is a possible future direction. Unlike adaptive cell-probes,CPP has a static nature,which is convenient for reductions.As stated by the following lemma,any CPP can be trivially reduced to 1-cell proofs. LEMMA 2.1 REDUCTION LEMMA.For any data structure problem f,if there exists an (s,b,t)-CPP,then there exists an (st,bt,1)-CPP. PROOF.Just store every t-tuple of cells in the(s,b,t)-CPP as a new cell in the (sf,bt,1)-CPP. ▣ 3.CHARACTERIZATION OF CPPS We now introduce a combinatorial characterization of CPP.Given a set system FC2Y,for any yY,we let F(y)=(FEFIy e F).For convenience,for a partition P of y,we abuse this notation and let P(y)denote the set FeP that y∈F. Definition 3.1.We say a set system FC2Y is an s x k-partition of y if F is a union ofs many partitions of y,where the cardinality of each partition is at most k. This particular notion of partitions of y fully captures the structure of cell- probe proofs.The following theorem provides a full characterization of cell- probe proofs. THEOREM3.2.There is an(s,b,t)-CPP for f:X×Y→{0,1},if and only if there exists an s x 26-partition F of Y,such that for every x e X and every yeY,there exist F1,...,FeF(y)such that f(x,F)=1. PROOF.First,we prove the“only if”part. ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
1: 6 · Y. Yin The fact that this simple example also works for predecessor has an important implication. Combining with the super-constant lower bound for predecessor search [Patras ˘ ¸cu and Thorup 2006], this gives an example where one can separate deterministic and nondeterministic complexity. We remark that cell-probe proofs characterize the nondeterministic probes in the cell-probe model. Specifically, the verifier v is just a nondeterministic cell-probing algorithm. It is natural to see that for a cell-probe scheme, for any query, the cells probed by an adaptive algorithm contain a cell-probe proof. This can be seen as a data structure counterpart of P ⊆ NP. It is important to note that although a data structure problem is nothing but a boolean function, CPP is very different from the certificate complexity of boolean functions [Buhrman and de Wolf 2002]. In CPP, the prover and the verifier communicate with each other via a table structure, which distinguishes CPP from standard certificate complexity. For any data structure problem, the table structure can always store the results for all queries, making one cellprobe sufficient to prove the result, which is generally impossible in the model of certificate complexity. We should also be specific that the model of cell-probe proofs prohibits randomization. The nondeterministic model for randomized data structures is a possible future direction. Unlike adaptive cell-probes, CPP has a static nature, which is convenient for reductions. As stated by the following lemma, any CPP can be trivially reduced to 1-cell proofs. LEMMA 2.1 REDUCTION LEMMA. For any data structure problem f, if there exists an (s, b,t)-CPP, then there exists an (st , b t, 1)-CPP. PROOF. Just store every t-tuple of cells in the (s, b,t)-CPP as a new cell in the (st , b t, 1)-CPP. 3. CHARACTERIZATION OF CPPS We now introduce a combinatorial characterization of CPP. Given a set system F ⊆ 2Y , for any y ∈ Y, we let F(y) = {F ∈ F | y ∈ F}. For convenience, for a partition P of Y, we abuse this notation and let P(y) denote the set F ∈ P that y ∈ F. Definition 3.1. We say a set system F ⊆ 2Y is an s × k-partition of Y if F is a union of s many partitions of Y, where the cardinality of each partition is at most k. This particular notion of partitions of Y fully captures the structure of cellprobe proofs. The following theorem provides a full characterization of cellprobe proofs. THEOREM 3.2. There is an (s, b,t)-CPP for f : X × Y → {0, 1}, if and only if there exists an s × 2b -partition F of Y, such that for every x ∈ X and every y ∈ Y, there exist F1,..., Ft ∈ F(y) such that f(x,∩t i=1Fi) = 1. PROOF. First, we prove the “only if ” part. ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010
Cell-Probe Proofs ·1:7 LetT:Y×s→{0,l]6 be the table structure in the(s,b,t)-CPpP. Define the map of the table structure as an s x 25 matrix M such that Mii=ly e Y I T(i)=il,that is,Mii is the set of all such data instances y that the content of the i-th cell of the table is j,assuming that the data instance is y.It is easy to verify that (Miil is an s x 25-partition of Y. We then show that for every xe X and y e Y,there must exist i,i2,...,it such that f(x,M(i,j))=1,where j=T(i).To the contrary,we assume that for some x,there exists such y that for all i1,i2,...,i,there always exists y'eM(i,T())such that f(x,y)f(x,y).According to the definition of M,it implies that for any i,i2,...,it,there exists a y that shares the same certificate [i,T,(i)Ik=1,2,...,t}with y,but f(x,y)f(x,y). Due to the completeness of CPP,the verifier maps one of the certificates i,T(i)k=1,2,...,th of y to f(x,y),but in this case the adversary can choose y'as the real data instance,contradicting the soundness of the CPP. We then deal with the "if"part. Let F be an s x 25-partition of Y such that for every x and every y there is an Fe F(y)that f(x,F)=1.We rewrite F as an s x 25 matrix M such that Mii is indexed as the ith partition set in the ith partition in F,that is, (Mijlij=F and for any i,(Miilj is a partition of y. We can define the table structure T:Y x [s](0,1]5 as follows:T(i)is assigned with the unique jthat ye Mij. The verifier is defined as follows:for any xX and any certificate {(i,Ik=1,2,...,t),if f(x,)is constant onM(is,j),then the verifier re- turns that value,otherwise it returns"L".It is easy to verify that both the com- pleteness and soundness of CPP are satisfied. ▣ For all CPPs,one-cell proofs are the simplest nontrivial proofs.As we ex- plained in the last section,the case of a one-cell proof is also very important because all CPPs can be reduced to it.We apply the above theorem to the one-cell case in the following corollary. COROLLARY3.3.There is an(sb,1)-CPP for f:X×Y→{0,1},if and only if there exists an sx2b-partition F of Y,such that for every x e X and every yY,there is an FeF(y)that f(x,F)=1. Let Yo=(y eY I f(x,y)=0)and Yi=(y EY I f(x,y)=1).An alternative char- acterization is that there is a(s,b,l)-CPP for a problem f:X×Y→{0,l, if and only if there exists an s x 25-partition F of y such that (Yo,Yilxex is contained by the union-closure of F.It can easily be noted that this character- ization is equivalent to the one in Corollary 3.3.With this formulation,we get some intuition about 1-cell proofs;that is,a problem f:Xx Y>(0,1)has simple proofs,if and only if there exists some set system F2 with a simple structure such that the complexity ofF matches the complexity of the problem. 4.AN EXISTENTIAL LOWER BOUND In this section we prove an existential lower bound for cell-probe proofs by showing that uniformly random data structure problem has no efficient CPP ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
Cell-Probe Proofs · 1: 7 Let T : Y × [s] → {0, 1}b be the table structure in the (s, b,t)-CPP. Define the map of the table structure as an s × 2b matrix M such that Mij = {y ∈ Y | Ty(i) = j}, that is, Mij is the set of all such data instances y that the content of the i-th cell of the table is j, assuming that the data instance is y. It is easy to verify that {Mij} is an s × 2b -partition of Y. We then show that for every x ∈ X and y ∈ Y, there must exist i1,i2,...,it such that f x,∩t k=1M (ik, jk) = 1, where jk = Ty(ik). To the contrary, we assume that for some x, there exists such y that for all i1,i2,...,it, there always exists y ∈ ∩t k=1M ik, Ty(ik) such that f(x, y) = f(x, y ). According to the definition of M, it implies that for any i1,i2,...,it, there exists a y that shares the same certificate ik, Ty(ik) | k = 1, 2,...,t with y, but f(x, y ) = f(x, y). Due to the completeness of CPP, the verifier maps one of the certificates ik, Ty(ik) | k = 1, 2,...,t of y to f(x, y), but in this case the adversary can choose y as the real data instance, contradicting the soundness of the CPP. We then deal with the “if ” part. Let F be an s × 2b -partition of Y such that for every x and every y there is an F ∈ F(y) that f(x, F) = 1. We rewrite F as an s × 2b matrix M such that Mij is indexed as the jth partition set in the ith partition in F, that is, {Mij}i, j = F and for any i, {Mij}j is a partition of Y. We can define the table structure T : Y × [s] → {0, 1} b as follows: Ty(i) is assigned with the unique j that y ∈ Mij. The verifier is defined as follows: for any x ∈ X and any certificate { ik, jk | k = 1, 2,...,t}, if f(x,·) is constant on ∩t k=1M(ik, jk), then the verifier returns that value, otherwise it returns “⊥”. It is easy to verify that both the completeness and soundness of CPP are satisfied. For all CPPs, one-cell proofs are the simplest nontrivial proofs. As we explained in the last section, the case of a one-cell proof is also very important because all CPPs can be reduced to it. We apply the above theorem to the one-cell case in the following corollary. COROLLARY 3.3. There is an (s, b, 1)-CPP for f : X × Y → {0, 1}, if and only if there exists an s × 2b -partition F of Y, such that for every x ∈ X and every y ∈ Y, there is an F ∈ F(y) that | f(x, F)| = 1. Let Yx 0 = {y ∈ Y | f(x, y)=0} and Yx 1 = {y ∈ Y | f(x, y)=1}. An alternative characterization is that there is a (s, b, 1)-CPP for a problem f : X × Y → {0, 1}, if and only if there exists an s × 2b -partition F of Y such that {Yx 0,Yx 1}x∈X is contained by the union-closure of F. It can easily be noted that this characterization is equivalent to the one in Corollary 3.3. With this formulation, we get some intuition about 1-cell proofs; that is, a problem f : X × Y → {0, 1} has simple proofs, if and only if there exists some set system F ⊆ 2Y with a simple structure such that the complexity of F matches the complexity of the problem. 4. AN EXISTENTIAL LOWER BOUND In this section we prove an existential lower bound for cell-probe proofs by showing that uniformly random data structure problem has no efficient CPP ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.
1:8.Y.Yin with high probability,which means that for most data structure problems, there does not exist efficient nondeterministic data structures.Like the ex- istential lower bound for deterministic data structures in Miltersen [1999],our lower bound is in the form of bit-probe complexity,that is,each cell contains only one bit. We first introduce some notations. -Given a set family F,we denote that AilA1,A2.....A. We call IF)the t-intersection closure of F. Similarly,we define the union closure of a set system F as U'F= UAIF),which is a set of all the unions of the members of F. -For a fixed a set Y of data instances,we define the parameter x(s,t)as the smallest integer that for every s x 2-partition F of Y,x(s,t)I(F). The value x(s,t)depends on s,t,and the size of Y. The following lemma shows a relation between thethe parameter x(s,t)and the existence of hard problems. LEMMA4.l.Letf:X×Y→{0,l)be a data structure problem where X= (0,1ym is the set of queries and Y =(0,1y"is the set of data instances.The following statements hold: (1)With probability at least 1-(x(s,t)2-2)222,there does not exists an (s,1,t)-CPP for uniformly random f:XxY(0,1). (2)If x(st)<),then there exists a problem f such that there does not exist an (s,1,t)-CPP for f. PROOF.We first prove 1.Then 2 follows naturally. Let F be an s×2-partition of y and h:y→{0,l}be a uniformly random 2-coloring of Y.We evaluate the probability that for ally e Y,there exists some A I(F)such that y e A and h is constant over A.Let Q(h)be a predicate defined on h:Y→{0,l}such that Q(h)is true iffVy∈Y,3A∈I(F)y),lh(AI= 1.The probability is Pr[Q(h)]. If Q(h)holds,it must hold that there exist Bo,B1(F)such that (Bo,B1)is a bipartition of Y.In fact,given an h:Y(0,1)such that Vy e Y, Ay I(F)(y),h(Ay)=1,we can construct such Bo and Bi as Bo=Ay,and B1=Ay. y:h(y=0 y:h(y)=1 It is easy to check that (Bo,B1)CUI(F)is a bipartition of Y and h(B)=(b} forb∈{0,1. ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
1: 8 · Y. Yin with high probability, which means that for most data structure problems, there does not exist efficient nondeterministic data structures. Like the existential lower bound for deterministic data structures in Miltersen [1999], our lower bound is in the form of bit-probe complexity, that is, each cell contains only one bit. We first introduce some notations. —Given a set family F, we denote that It(F) = t i=1 Ai | A1, A2,..., At ∈ F . We call It(F) the t-intersection closure of F. —Similarly, we define the union closure of a set system F as ∗ F = A∈G A | G ⊆ F , which is a set of all the unions of the members of F. —For a fixed a set Y of data instances, we define the parameter χ(s,t) as the smallest integer that for every s × 2-partition F of Y, χ(s,t) ≥ 1 2 ∗ It(F) . The value χ(s,t) depends on s, t, and the size of Y. The following lemma shows a relation between the the parameter χ(s,t) and the existence of hard problems. LEMMA 4.1. Let f : X × Y → {0, 1} be a data structure problem where X = {0, 1}m is the set of queries and Y = {0, 1}n is the set of data instances. The following statements hold: (1) With probability at least 1 − χ(s,t)2−2n 2m 2s2n , there does not exists an (s, 1,t)-CPP for uniformly random f : X × Y → {0, 1}. (2) If χ(s,t) < 22n(1−s/2m) , then there exists a problem f such that there does not exist an (s, 1,t)-CPP for f. PROOF. We first prove 1. Then 2 follows naturally. Let F be an s × 2-partition of Y and h : Y → {0, 1} be a uniformly random 2-coloring of Y. We evaluate the probability that for all y ∈ Y, there exists some A ∈ It(F) such that y ∈ A and h is constant over A. Let Q(h) be a predicate defined on h : Y → {0, 1} such that Q(h) is true iff ∀y ∈ Y, ∃A ∈ It(F)(y),|h(A)| = 1. The probability is Prh[Q(h)]. If Q(h) holds, it must hold that there exist B0, B1 ∈ ∗ It(F) such that {B0, B1} is a bipartition of Y. In fact, given an h : Y → {0, 1} such that ∀y ∈ Y, ∃Ay ∈ It(F)(y), h(Ay) = 1, we can construct such B0 and B1 as B0 = y:h(y)=0 Ay, and B1 = y:h(y)=1 Ay. It is easy to check that {B0, B1} ⊂ ∗ It(F) is a bipartition of Y and h(Bb ) = {b} for b ∈ {0, 1}. ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.
Cel-Probe Proofs·1:9 Therefore,each h with the desirable property Q(h)corresponds to a distinct pair of members in IF).According to the definition of x(s,t),the total number of the h that Q(h)holds is upper bounded by x(s,t).There are totally 22"differenth:Y→{0,ll.Therefore, ≤x(s,t02-2 Let f be a uniformly random function f:X×Y→{0,l,it holds that X PF[xeX,Qfx,川s(RrQh) ≤(a.t02-22. There are at most 2s2 different s x 2-partitions of Y.By union bound,for uni- formly random f:X×Y→{0,l,with probability at most(x(s,t02-2)222 there exists an s x 2-partition F of Y such that for allxe X and yY,there exists A∈I(F)y)such that|f(x,A)川=1. Due to Theorem 3.2,it holds that the probability that there exists an (s,1,t)-CPP for a uniformly random f:(0,1)m x (0,1)"(0,1]is at most (x(s,t)2-2)2252.Statement 1 is proved. If(),then ((s))1.With positive probability, a uniformly random f has no(s,1,t)-CPP.There must exist a data structure problem without(s,1,t)-CPP.Statement 2 holds. The following lemma gives an estimation of the parameter x(s,t). LEMMA 4.2.If ≤2-) then it holds that y(s,t)<22"(1-s12m) PROOF.For any s x 2-partition F of Y,I(F)is a l x k-partition of y, where l≤(食andk≤2,thus|I(F)≤(图2.Note that the cardi- nality of the union closure of a set system is upper bounded by the car- dinality of its power set.Therefore,for any s x 2-partition F of Y,it holds that(F)s 2-1+()2<22"d-/2"),which means that x(s.)< 22™(1-s/2m) ▣ Combining the above two lemmas,we have the following theorem. THEOREM4.3.If(月2≤2m(1-品),there exists a problem f:{0,1m× (0,1)"(0,1)such that there does not exist (s,1,t)-CPP for f. Applying the above theorem to specific s,t,m,and n,we can see that there exists nondeterministically hard problem for some usual settings of these parameters. -s=2m-1 and t<m;so the size of the data structure is one bit less than the naive solution of storing answers to all queries,and the size of the proof is less thanof the size of a raw data instance. ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
Cell-Probe Proofs · 1: 9 Therefore, each h with the desirable property Q(h) corresponds to a distinct pair of members in ∗ It(F). According to the definition of χ(s,t), the total number of the h that Q(h) holds is upper bounded by χ(s,t). There are totally 22n different h : Y → {0, 1}. Therefore, Pr h [Q(h)] ≤ 1 2 ∗ It(F) /22n ≤ χ(s,t)2−2n . Let f be a uniformly random function f : X × Y → {0, 1}, it holds that Pr f ∀x ∈ X, Q( f(x,·)) ≤ Pr h [Q(h)]|X| ≤ χ(s,t)2−2n 2m . There are at most 2s2n different s × 2-partitions of Y. By union bound, for uniformly random f : X × Y → {0, 1}, with probability at most χ(s,t)2−2n 2m 2s2n , there exists an s × 2-partition F of Y such that for all x ∈ X and y ∈ Y, there exists A ∈ It(F)(y) such that f(x, A) = 1. Due to Theorem 3.2, it holds that the probability that there exists an (s, 1,t)-CPP for a uniformly random f : {0, 1}m × {0, 1}n → {0, 1} is at most (χ(s,t)2−2n ) 2m 2s2n . Statement 1 is proved. If χ(s,t) < 22n(1−s/2m) , then χ(s,t)2−2n 2m 2s2n < 1. With positive probability, a uniformly random f has no (s, 1,t)-CPP. There must exist a data structure problem without (s, 1,t)-CPP. Statement 2 holds. The following lemma gives an estimation of the parameter χ(s,t). LEMMA 4.2. If s t 2t ≤ 2n 1 − s 2m , then it holds that χ(s,t) < 22n(1−s/2m) . PROOF. For any s × 2-partition F of Y, It(F) is a l × k-partition of Y, where l ≤ s t and k ≤ 2t , thus It(F) ≤ s t 2t . Note that the cardinality of the union closure of a set system is upper bounded by the cardinality of its power set. Therefore, for any s × 2-partition F of Y, it holds that 1 2 ∗ It(F) ≤ 2−1+( s t)2t < 22n(1−s/2m) , which means that χ(s,t) < 22n(1−s/2m) . Combining the above two lemmas, we have the following theorem. THEOREM 4.3. If s t 2t ≤ 2n 1 − s 2m , there exists a problem f : {0, 1}m × {0, 1}n → {0, 1} such that there does not exist (s, 1,t)-CPP for f. Applying the above theorem to specific s, t, m, and n, we can see that there exists nondeterministically hard problem for some usual settings of these parameters. —s = 2m − 1 and t < n m; so the size of the data structure is one bit less than the na¨ıve solution of storing answers to all queries, and the size of the proof is less than 1 m of the size of a raw data instance. ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.
1:10 ·Y.Yin -The typical case for practically efficient data structures:m=@(logn)no(n), s=Poly(n),andt=o(o) 5.NEAREST NEIGHBOR SEARCH We consider the decision version of nearest neighbor search,4-near neighbor(- NN),in a high dimensional Hamming cube (0,1)4.Here,=(0,1)4,Y=(0.1), and f(x,y)E(0,1)answer whether there exists a point in y within distance from the x.As in Borodin et al.[1999]and Barkol and Rabani [2002],we assume that d=@(logn)nn(1)to make the problem nontrivial. We prove that with the above setting,there does not exist a (s,b,t)-CPP for the -NN problem,if s =Poly(n),b nl-for some positive constant a and t o(log(d/logn)).To prove this,we prove that the lower bound holds for the partial match problem [Indyk et al.2004;Jayram et al.2004; Patrascu 2008,which is an instantiation of the A-NN problem as shown in Borodin et al.[1999]. The partial match problem is defined as follows:The domain is a Hamming cube (0,1]d,where d =o(logn)nno(1,and each data instance y is a set of n points from the domain,that is,Y=().The set of queries is=(0.1, Given a data instance y()and a queryxe(01,),f(x,y)=1if and only if there is a z e y such that z matches x,except for the bits assigned with“*”. THEOREM 5.1.There is no (s,b,1)-CPP for the partial match problem if s0. For the rest of the proof,we assume that y is uniformly selected from Y and that x is generated by uniformly choosing r=1+logn bits and fixing each of them uniformly and independently at random with 0 or 1,and setting the other bits to“*”. We then prove two supporting lemmas.Recall that for a partition P of y, Py)denotes the set F∈P that y∈F. LEMMA 5.2.For any partition P of Y,if Ps 25,then for k logn it holds that [po≤(-] exp (6 In2-n2-*) PROOF.We let P=(F1,F2,...,Fe),whereks 25,and let pi=Fl/Yl.Be- cause P is a partition of Y,we know that pi=1.We define a random variable ACM Transactions on Computation Theory,Vol.2,No.1,Article 1,Pub.date:November 2010
1: 10 · Y. Yin —The typical case for practically efficient data structures: m = ω(log n) ∩ o(n), s = Poly(n), and t = o n log n . 5. NEAREST NEIGHBOR SEARCH We consider the decision version of nearest neighbor search, λ-near neighbor (λ- NN), in a high dimensional Hamming cube {0, 1} d. Here, X = {0, 1} d, Y = {0,1}d n , and f(x, y) ∈ {0, 1} answer whether there exists a point in y within distance λ from the x. As in Borodin et al. [1999] and Barkol and Rabani [2002], we assume that d = ω(log n) ∩ no(1) to make the problem nontrivial. We prove that with the above setting, there does not exist a (s, b,t)-CPP for the λ-NN problem, if s = Poly(n), b 0. For the rest of the proof, we assume that y is uniformly selected from Y and that x is generated by uniformly choosing r = 1 + log n bits and fixing each of them uniformly and independently at random with 0 or 1, and setting the other bits to “∗”. We then prove two supporting lemmas. Recall that for a partition P of Y, P(y) denotes the set F ∈ P that y ∈ F. LEMMA 5.2. For any partition P of Y, if |P| ≤ 2b , then for k < log n it holds that Pr y P(y) ≤ 1 − 2−k 2d n ≤ exp b ln 2 − n2−k . PROOF. We let P = {F1, F2,..., Fk}, where k ≤ 2b , and let pi = |Fi|/|Y|. Because P is a partition of Y, we know that i pi = 1. We define a random variable ACM Transactions on Computation Theory, Vol. 2, No. 1, Article 1, Pub. date: November 2010.