Lecture 4.Multiparty Communication Complexity 1.Models In this class we study communication complexity models in which input variables are distributed to k>2 parties.Two most studied ways of distributing input variables are Number in Hand (NiH):The input is(x1,...,xx)where xi E Xi for some set Xi,and xi is given to Player i. Number on the Forehead (NoF):The input is (x1,...,xx)where xi E Xi for some set Xi,and xi=(x1,..,,+1,...,xn)is given to Player i. Typical ways of communication models include: ● Blackboard:There is a common blackboard for players to write on,and it's visible to everyone. The communication complexity is the total number of bits written on the blackboard. ● Message passing:There are channels between every pair ofplayers,and the communication complexity is the total number of transmitted bits during the whole protocol. ● One-way:Player I sends a message to Player 2,who then sends a message to Player 3,and so on. The last player outputs a value. ● SMP:One can further restrict the allowed communication channels:each of the k players sends one message to a Referee who then announces the result. The NiH model draws much attention in recent years because of its connection to data streaming algorithms.NiH with SMP has an intimate connection to the data sketching algorithms.Earlier studies in the multiparty communication complexity focused more on the NoF model with a blackboard because of its connection to circuit complexity.(We will come to the connection later in the class.) Across today's lecture,we assume that there is a blackboard.It is easy to see that some of the results hold for other communication models as well
Lecture 4. Multiparty Communication Complexity 1. Models In this class we study communication complexity models in which input variables are distributed to k>2 parties. Two most studied ways of distributing input variables are ⚫ Number in Hand (NiH): The input is (x1, …, xk) where xi ∈ Xi for some set Xi , and xi is given to Player i. ⚫ Number on the Forehead (NoF): The input is (x1, …, xk) where xi ∈ Xi for some set Xi , and 𝑥−𝑖 = (𝑥1,… ,𝑥𝑖−1,𝑥𝑖+1,…,𝑥𝑛) is given to Player i. Typical ways of communication models include: ⚫ Blackboard: There is a common blackboard for players to write on, and it’s visible to everyone. The communication complexity is the total number of bits written on the blackboard. ⚫ Message passing: There are channels between every pair of players, and the communication complexity is the total number of transmitted bits during the whole protocol. ⚫ One-way: Player 1 sends a message to Player 2, who then sends a message to Player 3, and so on. The last player outputs a value. ⚫ SMP: One can further restrict the allowed communication channels: each of the k players sends one message to a Referee who then announces the result. The NiH model draws much attention in recent years because of its connection to data streaming algorithms. NiH with SMP has an intimate connection to the data sketching algorithms. Earlier studies in the multiparty communication complexity focused more on the NoF model with a blackboard because of its connection to circuit complexity. (We will come to the connection later in the class.) Across today’s lecture, we assume that there is a blackboard. It is easy to see that some of the results hold for other communication models as well
2.Number in Hand,with a blackboard. 2.1 From rectangle to tensor The counterpart ofrectangle in NiH case is tensor.A tensor in X=Xx...x Xk is a set S= S x...x Sk where SiE Xi.The following is a straightforward generalization of the rectangle property Proposition 2.1.Any c-bit deterministic protocol for a function f in NiH model partitions the set X at most into 2 monochromatic tensors. We sometimes consider partial functions,which are functions with domain not the entire X but only a subset of it.Equivalently,we can imagine that there is a promise that the input comes from a fixed subset of X.For inputs that fall outside the subset,we don't care.A protocol is correct for this partial function if the protocol outputs the correct value on inputs coming from the subset,the protocol is allowed to output anything on other inputs.For partial functions,a tensor is monochromatic if it doesn't contain both 1-and 0-inputs.Indeed,as long as a protocol reaches a monochromatic tensor, then it can output a value that is right for all inputs on which the function is defined. 2.2 Disjointness Denote the input by (x1,...,xk)where eachxi=xi...xin is an n-bit string.Recall that Disjn in two-party case is defined as,in our current notation,Disjn(x1,x2)=1 iff 3js.t.x=x2j=1.When generalized to k-party case,different requirements can be imposed:We can require 3is.t.x=...= x=1,or only require the sets ,..,xk are not pairwise disjoint.Let's consider the partial function 1 if x1,...,Xk are pairwise disjoint Disjn.k((&,4)=0if3js.t.X=…=X为=1 Theorem 2.2.D(Disjn.k)=R(n/k). Proof.We show the following two properties. 1.There are(k+1)"1-inputs.The input is basically a partition of [n]into k+1 parts,with the i-th part given to player i and the last part unassigned to any player. 2.Any monochromatic tensor without a 0-input contains at most k"I-inputs.Fix a monochromatic tensor.For any element jE [n],there is a player's input xi not containing it.The size the tensor is at most the number ofassignments ofelements in [n]to [k]
2. Number in Hand, with a blackboard. 2.1 From rectangle to tensor The counterpart of rectangle in NiH case is tensor. A tensor in X = X1 × …× Xk is a set S = S1 × …× Sk where Si ∈ Xi . The following is a straightforward generalization of the rectangle property. Proposition 2.1. Any c-bit deterministic protocol for a function f in NiH model partitions the set X at most into 2c monochromatic tensors. We sometimes consider partial functions, which are functions with domain not the entire X but only a subset of it. Equivalently, we can imagine that there is a promise that the input comes from a fixed subset of X. For inputs that fall outside the subset, we don’t care. A protocol is correct for this partial function if the protocol outputs the correct value on inputs coming from the subset; the protocol is allowed to output anything on other inputs. For partial functions, a tensor is monochromatic if it doesn’t contain both 1- and 0-inputs. Indeed, as long as a protocol reaches a monochromatic tensor, then it can output a value that is right for all inputs on which the function is defined. 2.2 Disjointness Denote the input by (x1, …, xk) where each xi = xi1…xin is an n-bit string. Recall that Disjn in two-party case is defined as, in our current notation, Disjn(x1,x2) = 1 iff j s.t. x1j = x2j = 1. When generalized to k-party case, different requirements can be imposed: We can require i s.t. x1i = … = xki = 1, or only require the sets x1, …, xk are not pairwise disjoint. Let’s consider the partial function Disjn,k(x1 ,… , xk) = { 1 if x1 ,… , xk are pairwise disjoint 0 if ∃j 𝑠.𝑡. x1j = … = xkj = 1 . Theorem 2.2. D(Disjn,k ) = Ω(n/k). Proof. We show the following two properties. 1. There are (k+1)n 1-inputs. The input is basically a partition of [n] into k+1 parts, with the i-th part given to player i and the last part unassigned to any player. 2. Any monochromatic tensor without a 0-input contains at most kn 1-inputs. Fix a monochromatic tensor. For any element j ∈ [n], there is a player’s input xi not containing it. The size the tensor is at most the number of assignments of elements in [n] to [k]
Then,there shoud beat lest(monochromatie teno Therefore.the k theorem follows from the previous proposition. 2.3 Computing frequency moments in data stream Suppose that there is a stream of data x=(x1,...,Xm)E [n]m.We hope to compute some statistical value of the data using a small amount,ideally O(logn logm)bits,of space.The requirement for the algorithms is that we only see the data passing once in front of us.So we can processx in whatever way we like,but then it's gone and never comes back,and we see x2 and process x2,and so on.After formally initialized in a paper [AMS99],the model has been rapidly developed in the last decade or so. To see why sometimes techniquescan be used to achieve low space complexity,let's consider some interesting puzzles first. Example.Suppose m=n-1,and all xi's are different.Then there is exactly one number in [n]that is missing in the data.They want to find out the missing number.What's the lowest space your algorithm can achieve? Example.What if m=n-2 and they want to find the two missing numbers? From the above examples,you can see that algorithms can be crafty,which also raises the issue of proving lower bounds.Communication complexity is one powerful tool to prove lower bounds for space complexity of streaming algorithms.Here we illustrate the main idea using one classic example. In the data x=(x,..,Xm)E [n]m,suppose the number j appears ri times.Define the d-th frequency moment ofx to be fa(x)=n+...+rd.In particular,fo(x)is the number of distinct elements in x, and fi(x)=m.For d>2,fa(x)gives useful statistical information about the string. Theorem2.3.Ford≥3,D(fa)=2(n1-2/a). Proof.The idea is,as always,design a protocol to simulate an algorithm.Suppose that there is an algorithm A computing fa using only c bits of space.Let k=(n+1)1/d.We want to solve the Disjnk problem on input(y,..,y).Note that eachy;is a subset of[n].Player I gives the elements in y to A as a stream of data items and runs A.When A finishes reading these elements,Player 1 pass her space to Player 2 who takes over to continue to run A upon seeing elements in y2,and so on.Since A always uses c bits of space,the communication is at most(k-1)c bits.Since the itemsy;come in a
Then, there should be at least (k+1)n kn = (1 + 1 𝑘 ) 𝑛 ≈ 𝑒 𝑛/𝑘 monochromatic tensors. Therefore, the theorem follows from the previous proposition. □ 2.3 Computing frequency moments in data stream Suppose that there is a stream of data x = (x1 , …, xm) ∈ [n]m. We hope to compute some statistical value of the data using a small amount, ideally O(log𝑛 + log𝑚) bits, of space. The requirement for the algorithms is that we only see the data passing once in front of us. So we can process x1 in whatever way we like, but then it’s gone and never comes back, and we see x2 and process x2, and so on. After formally initialized in a paper [AMS99], the model has been rapidly developed in the last decade or so. To see why sometimes techniques can be used to achieve low space complexity, let’s consider some interesting puzzles first. Example. Suppose m = n-1, and all xi’s are different. Then there is exactly one number in [n] that is missing in the data. They want to find out the missing number. What’s the lowest space your algorithm can achieve? Example. What if m = n-2 and they want to find the two missing numbers? From the above examples, you can see that algorithms can be crafty, which also raises the issue of proving lower bounds. Communication complexity is one powerful tool to prove lower bounds for space complexity of streaming algorithms. Here we illustrate the main idea using one classic example. In the data x = (x1, …, xm) ∈ [n]m, suppose the number j appears rj times. Define the d-th frequency moment of x to be 𝑓𝑑 (𝑥) = 𝑟1 𝑑 + ⋯ + 𝑟𝑛 𝑑. In particular, f 0(x) is the number of distinct elements in x, and f1(x) = m. For d ≥ 2, fd(x) gives useful statistical information about the string. Theorem 2.3. For 𝑑 ≥ 3, 𝐷(𝑓𝑑 ) = 𝛺(𝑛 1−2/𝑑). Proof. The idea is, as always, design a protocol to simulate an algorithm. Suppose that there is an algorithm A computing f d using only c bits of space. Let k = (n + 1) 1/d. We want to solve the Disjn,k problem on input (y1, …, yk). Note that each yi is a subset of [n]. Player 1 gives the elements in y1 to A as a stream of data items and runs A. When A finishes reading these elements, Player 1 pass her space to Player 2 who takes over to continue to run A upon seeing elements in y2, and so on. Since A always uses c bits of space, the communication is at most (k-1)c bits. Since the items yi come in a
stream for A,we can use a one-way protocol to simulate it.(Since blackboard model is stronger,we can of course also use the blackboard protocol to simulate this.) Now we say that computing the d-th moment can solve the Disjnk problem.Indeed, If(y1,...,y)are pairwise disjoint,then fa(x)sn. If there is a common element in y1,...,yk,then fa(x)>kd =n+1. Therefore, cn+1)va=ck≥D(Disink)≥n(月=n(at) which gives c=Q(n1-2d). 3.Number on the Forehead We are back to total functions,i.e.the domain off is the whole X =Xx...x Xk 3.1 Cylinder intersection Let's first see what the analog of monochromatic tensors is in the NoF model.A subset C of X= X1 x...x Xk is a cylinder in the i-th dimension if C=S_ix Xi for some set S-iX-i.Namely,the membership of C doesn't depend on the i-th coordinate.A subset C of X is a cylinder intersection if CCi,where each Ci is a cylinder in dimensioni.The following theorem is an analog ofthe monochromatic rectangle/tensor decomposition in previously studied models. Theorem 3.1.Any c-bit deterministic protocol for a function fin NoF model partitions the set X into at most 2 monochromatic cylinder intersections. 3.2 Discrepancy and binary cube bound We've seen discrepancy in two-party randomized communication complexity.We can surely extend it to multiparty as well.For the sake of simplicity,let's consider the unifom case.For any Boolean function f:X{+1,-1}and any subset S of X,its discrepancy is disc,S)=☆∑xesf(xl Let's then define disc(f)=max disc(f,S)
stream for A, we can use a one-way protocol to simulate it. (Since blackboard model is stronger, we can of course also use the blackboard protocol to simulate this.) Now we say that computing the d-th moment can solve the Disjn,k problem. Indeed, ⚫ If (y1, …, yk) are pairwise disjoint, then fd (x) ≤ n. ⚫ If there is a common element in y1, …, yk, then fd (𝑥) ≥ 𝑘 𝑑 = 𝑛 + 1. Therefore, c(n + 1) 1/d = 𝑐𝑘 ≥ 𝐷(𝐷𝑖𝑠𝑗𝑛,𝑘 ) ≥ Ω( 𝑛 𝑘 ) = Ω( 𝑛 (𝑛+1)1/𝑑 ) which gives c = Ω(n 1−2d). □ 3. Number on the Forehead We are back to total functions, i.e. the domain of f is the whole X = X1 × …× Xk. 3.1 Cylinder intersection Let’s first see what the analog of monochromatic tensors is in the NoF model. A subset C of X = X1 × … × Xk is a cylinder in the i-th dimension if C = S−i × Xi for some set S−i ⊆ X−i . Namely, the membership of C doesn’t depend on the i-th coordinate. A subset C of X is a cylinder intersection if C =∩i=1 k Ci , where each Ci is a cylinder in dimension i. The following theorem is an analog of the monochromatic rectangle/tensor decomposition in previously studied models. Theorem 3.1. Any c-bit deterministic protocol for a function f in NoF model partitions the set X into at most 2c monochromatic cylinder intersections. 3.2 Discrepancy and binary cube bound We’ve seen discrepancy in two-party randomized communication complexity. We can surely extend it to multiparty as well. For the sake of simplicity, let’s consider the uniform case. For any Boolean function f:X → {+1,−1} and any subset S of X, its discrepancy is disc(f, S) = 1 |X| |∑ 𝑓(𝑥) 𝑥∈𝑆 |. Let’s then define disc(f) = max S disc(f, S)
where the maximum is taken over all cylinder intersections S. Theorem3.2.D()≥log2disc(⑤. Proof.For any monochromatic cylinder intersections C,we have disc(f,C)=C/X.Therefore ICl disc(f,C)lxI s disc(f)l, and thus we need at least 1/disc(f)many monochromatic cylinder intersections to be able to cover the entire X.Applying Theorem 3.I gives the claimed bound. The discrepancy bound is not easy to use,since one need to argue for all cylinder intersections.An easier bound is the binary cube bound.For any ai,biE Xi,definea binary cube D=(a,b)x...x (ak,bk).Note that here a,and b;may be the same,so D is a multi-set.Define f(D)=IIxeD f(x). Choose a;and b;uniformly at random from Xi,and define B(f)=E[f(D)].The following theorem was firstly given by Chung [Chu90];see [Raz00]for a simplified proof. Theorem33.B(0≥disc()2,thusD(≥安loga7 3.3 A specific function:GIP The Generalized Inner Product(GIP)function is defined as follows. GIPnk(x)=⊕是1A=1xi Namely,the input x is an n x k matrix,and each of the k players gets all but one columns. Theorem3.4.D(GIPnk)=Q(n4-k). Proof.We'll show B(f)=(1-21-k)"en/,which,combined with Theorem 3.3,givesthe claimed bound.Since the binary cube bound works for the (+1,-1)range,let's first change fto g (-1)f.Note that the change doesn't affect the communication complexity.We need to compute B(g=Ea,brexLg(D)]where D=(a1,b)×.×(ak,bk) o---=--- i=1 xED -2aw--awaa--wo
where the maximum is taken over all cylinder intersections S. Theorem 3.2. D(f) ≥ log2 disc(f). Proof. For any monochromatic cylinder intersections C, we have disc(f,C) = |C|/|X|. Therefore |C| = disc(f,C)|X| ≤ disc(f)|X|, and thus we need at least 1/disc(f) many monochromatic cylinder intersections to be able to cover the entire X. Applying Theorem 3.1 gives the claimed bound. □ The discrepancy bound is not easy to use, since one need to argue for all cylinder intersections. An easier bound is the binary cube bound. For any 𝑎𝑖 ,𝑏𝑖 ∈ 𝑋𝑖 , define a binary cube 𝐷 = (𝑎1,𝑏1 ) × … × (𝑎𝑘,𝑏𝑘). Note that here ai and bi may be the same, so D is a multi-set. Define 𝑓(𝐷) = ∏𝑥∈𝐷 𝑓(𝑥). Choose ai and bi uniformly at random from Xi , and define 𝐵(𝑓) = E[𝑓(𝐷)]. The following theorem was firstly given by Chung [Chu90]; see [Raz00] for a simplified proof. Theorem 3.3. 𝐵(𝑓) ≥ disc(𝑓) 2 𝑘 , thus 𝐷(𝑓) ≥ 1 2𝑘 log 1 𝐵(𝑓) . 3.3 A specific function: GIP The Generalized Inner Product (GIP) function is defined as follows. 𝐺𝐼𝑃𝑛,𝑘 (𝑥) =⊕𝑖=1 𝑛 ∧𝑗=1 𝑘 𝑥𝑖𝑗. Namely, the input x is an n × k matrix, and each of the k players gets all but one columns. Theorem 3.4. D(GIPn,k ) = Ω(𝑛4 −𝑘 ). Proof. We’ll show B(f) = (1 − 2 1−k) 𝑛 ≈ 𝑒 −𝑛/2 𝑘−1 , which, combined with Theorem 3.3, gives the claimed bound. Since the binary cube bound works for the {+1,-1} range, let’s first change f to g = (−1) f . Note that the change doesn’t affect the communication complexity. We need to compute B(g) = E{ai ,bi∈Xi} [𝑔(𝐷)] where D = (𝑎1,𝑏1 ) × … × (𝑎𝑘,𝑏𝑘 ). 𝑔(𝐷) = ∏𝑔(𝑥) 𝑥∈𝐷 = ∏(−1) ∑𝑖𝑥𝑖1…𝑥𝑖𝑘 𝑥∈𝐷 = ∏∏(−1) 𝑥𝑖1…𝑥𝑖𝑘 𝑛 𝑥∈𝐷 𝑖=1 = ∏∏(−1) 𝑥𝑖1…𝑥𝑖𝑘 𝑥∈𝐷 𝑛 𝑖=1 = ∏(−1) ∑𝑥∈𝐷𝑥𝑖1…𝑥𝑖𝑘 𝑛 𝑖=1 = ∏(−1) (𝑎𝑖1+𝑏𝑖1 )…(𝑎𝑖𝑘+𝑏𝑖𝑘) 𝑛 𝑖=1 = ∏(−1) 𝟏[𝑎𝑖𝑗≠𝑏𝑖𝑗,∀𝑗] 𝑛 𝑖=1
where we used the indicator function 1[].Now we take the expectation of D.Consider to pick a,'s first, then pick bi's. E[g(D]=EΠ%1(-1)arb小=Πg1E(-1)ab=(1-21-k)” as desired. The bound deteriorates exponentially with k,but this is inevitable in view of the following fact, proven by Grolmusz [Gro94].(We'll omit the proof here.) Fact.D(GIPnx)=O(kn/2k). References [AMS99]N.Alon,Y.Matias,and M.Szegedy.The space complexity of approximating the frequency moments,J.Comput.Syst.Sci.58(1),137-147,1999. [Chu90]Fan R.K.Chung:Quasi-random classes of hypergraphs,Rand.Struct.Algorithms 1,363- 382,1990 [Raz00]Ran Raz.The BNS-Chung criterion for multi-party communication complexity, Computational Complexity 9,113-122,2000. [Gro94]V.Grolmusz.The BNS lower bound for multi-party protocols is nearly optimal, Information and Computation 112(1),51-54,1994
where we used the indicator function 1[]. Now we take the expectation of D. Consider to pick ai’s first, then pick bi’s. 𝐸[𝑔(𝐷)] = 𝐸 [∏ (−1) 𝑛 𝟏[𝑎𝑖𝑗≠𝑏𝑖𝑗,∀𝑗] 𝑖=1 ] = ∏ 𝐸 [(−1) 𝟏[𝑎𝑖𝑗≠𝑏𝑖𝑗,∀𝑗] ] 𝑛 𝑖=1 = (1 − 2 1−k) 𝑛 , as desired. □ The bound deteriorates exponentially with k, but this is inevitable in view of the following fact, proven by Grolmusz [Gro94]. (We’ll omit the proof here.) Fact. D(GIPn,k) = O(kn/2k ). References [AMS99] N. Alon, Y.Matias, and M. Szegedy. The space complexity of approximating the frequency moments, J. Comput. Syst. Sci. 58 (1), 137–147, 1999. [Chu90] Fan R. K. Chung: Quasi-random classes of hypergraphs, Rand. Struct. Algorithms 1, 363– 382, 1990. [Raz00] Ran Raz. The BNS-Chung criterion for multi-party communication complexity, Computational Complexity 9, 113-122, 2000. [Gro94] V. Grolmusz.The BNS lower bound for multi-party protocols is nearly optimal, Information and Computation 112(1), 51-54, 1994