clMSc5706TopicsinTheoreticalcomputerScience Week 3:Streaming and Sketching Instructor:Shengyu Zhang 1
Instructor: Shengyu Zhang 1
Map Motivations and model Problem 1:Missing numbers Problem 2:Count-Min sketch Lower bounds 0( Communication complexity 2
Map ◼ Motivations and model ◼ Problem 1: Missing numbers ◼ Problem 2: Count-Min sketch ◼ Lower bounds ❑ Communication complexity 2
Motivations Big mass of data. Data comes as a stream. Cannot see future data Relatively small space."sketch" Cannot store past data Need to process each item fast. Quick update time. Examples:Phone calls,Internet packets, satellite pictures,.. 3
Motivations ◼ Big mass of data. ◼ Data comes as a stream. ❑ Cannot see future data. ◼ Relatively small space. “sketch” ❑ Cannot store past data ◼ Need to process each item fast. ❑ Quick update time. ◼ Examples: Phone calls, Internet packets, satellite pictures, … 3
Problem 1:Missing numbers A set of numbers S={1,2,...,n} n-1 of them come in a stream x1,x2,...,one number is missing. 3,25,6,19,1,10,… Task:identify which one is missing. ▣Using small space
Problem 1: Missing numbers ◼ A set of numbers 𝑆 = {1,2, … , 𝑛} ◼ 𝑛 − 1 of them come in a stream 𝑥1, 𝑥2, …, 𝑥𝑛−1; one number is missing. ◼ Task: identify which one is missing. ❑ Using small space. 4 3, 25, 6, 19,1,10,…
A simple algorithm Maintain the sum of the input numbers. sum =0 ■fori=1ton-1 sumsum xi return n(n+1 2 .-sum 5
A simple algorithm ◼ Maintain the sum of the input numbers. ◼ 𝑠𝑢𝑚 = 0 ◼ for 𝑖 = 1 to 𝑛 − 1 𝑠𝑢𝑚 = 𝑠𝑢𝑚 + 𝑥𝑖 ◼ return 𝑛 𝑛+1 2 − 𝑠𝑢𝑚 5
Space complexity sum is at mostduring the algorithm. 2 Thus it takes at most log(lg bits to write it down. Space complexity:0(1og2 n). Much smaller than storing the whole stream, which takes at least 0(nlogn). 6
Space complexity ◼ 𝑠𝑢𝑚 is at most 𝑛 𝑛+1 2 during the algorithm. ◼ Thus it takes at most log2 𝑛 𝑛+1 2 = 𝑂 log2 𝑛 bits to write it down. ◼ Space complexity:𝑂 log2 𝑛 . ◼ Much smaller than storing the whole stream, which takes at least 𝑂(𝑛 log 𝑛). 6
More complicated Now the task gets harder. n-2 of them come in a stream x1,x2,...xn-2,two numbers are missing. 3,25,6,19,1,10,… Task:identify which two are missing. ▣Using small space
More complicated ◼ Now the task gets harder. ◼ 𝑛 − 2 of them come in a stream 𝑥1, 𝑥2, …, 𝑥𝑛−2, two numbers are missing. ◼ Task: identify which two are missing. ❑ Using small space. 7 3, 25, 6, 19,1,10,…
First try Maintain the sum and product of the input numbers. sum =0;product =1 ■fori=1ton-2 sumsum xi product=product·xi n(n+1-sum,b=n!/product 2 solve equations x+y=a,xy=b ▣return(x,y) 8
First try ◼ Maintain the sum and product of the input numbers. ◼ 𝑠𝑢𝑚 = 0; 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = 1 ◼ for 𝑖 = 1 to 𝑛 − 2 𝑠𝑢𝑚 = 𝑠𝑢𝑚 + 𝑥𝑖 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ⋅ 𝑥𝑖 ◼ 𝑎 = 𝑛 𝑛+1 2 − 𝑠𝑢𝑚, 𝑏 = 𝑛!/𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ◼ solve equations 𝑥 + 𝑦 = 𝑎, 𝑥 ⋅ 𝑦 = 𝑏 ◼ return (𝑥, 𝑦) 8
Problem and solution Issue:product is at least (n-2)! Thus even writing down the number needs log2(n-2)!=Θ(nlog n)bits. Too much compared to 0(logn)before. How to do?
Problem and solution ◼ Issue: 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 is at least 𝑛 − 2 ! ◼ Thus even writing down the number needs log2 𝑛 − 2 ! = Θ(𝑛 log 𝑛) bits. ❑ Too much compared to 𝑂(log𝑛) before. ◼ How to do? 9
Improvement Note that we don't need to maintain product We can maintain anything,as long as finally we can reconstruct the solution from the stored results. One summary that is much smaller than product:sum of squares. "Recall::12+22+…+n2= n(n+1)(2n+1) 6 10
Improvement ◼ Note that we don’t need to maintain product. ◼ We can maintain anything, as long as finally we can reconstruct the solution from the stored results. ◼ One summary that is much smaller than product: sum of squares. ◼ Recall: 1 2 + 2 2 + ⋯ + 𝑛 2 = 𝑛 𝑛+1 2𝑛+1 6 10