Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof charles e. leiserson
Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson
A weakness of hashing Problem: For any hash function h, a set of keys exists that can cause the average access time of a hash table to skyrocket An adversary can pick all keys from kEU: h(k)=i for some slot i IDEA: Choose the hash function at random independently of the keys Even if an adversary can see your code, he or she cannot find a bad set of keys since he or she doesn ' t know exactly which hash function will be chosen o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8.2
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.2 A weakness of hashing Problem: For any hash function h, a set of keys exists that can cause the average access time of a hash table to skyrocket. IDEA: Choose the hash function at random, independently of the keys. • Even if an adversary can see your code, he or she cannot find a bad set of keys, since he or she doesn’t know exactly which hash function will be chosen. • An adversary can pick all keys from {k ∈ U : h(k) = i} for some slot i
Universal hashing Definition. let u a universe of keys, and let be a finite collection of hash functions each mapping ( to (0,1,., m-1. We say 升 is universal if for allx,y∈ where x≠ we have{h∈升:h(x)=h(y)}=1H|m That is the chance of a collision th: h(x)=ho)) between x and y is 1/m, if we choose h randomly from H 升 o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8.3
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.3 Universal hashing Definition. Let U be a universe of keys, and let H be a finite collection of hash functions, each mapping U to {0, 1, …, m–1}. We say H is universal if for all x, y ∈ U, where x ≠ y, we have |{h ∈ H : h(x) = h(y)}| = |H|/m. That is, the chance of a collision between x and y is 1/m if we choose h randomly from H. {h : h(x) = h(y)} H |H| m
Universality is good Theorem. let h be a hash function chosen uniformly ) at random from a universal set H of hash functions. Suppose h is used to hash n arbitrary keys into the m slots of a table T. Then, for a given key x, we have E#collisions with x <n/m o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8. 4
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.4 Universality is good Theorem. Let h be a hash function chosen (uniformly) at random from a universal set H of hash functions. Suppose h is used to hash n arbitrary keys into the m slots of a table T. Then, for a given key x, we have E[#collisions with x] < n/m
Proof of theorem Proof. Let Cr be the random variable denoting the total number of collisions of keys in Twith x and let 1 if h(x)=h(v) 0 otherwise Note:Ecn]=1 /m and c=∑ Xy y∈7-{x} o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8.5
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.5 Proof of theorem Proof. Let Cx be the random variable denoting the total number of collisions of keys in T with x, and let cxy = 1 if h(x) = h(y), 0 otherwise. Note: E[cxy] = 1/m and ∑ ∈ − = y T {x} x xy C c
Proof (continued) ECx]=E∑c Take expectation y∈7-{x} of both sides o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8.6
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.6 Proof (continued) = ∑∈ −{ } [ ] y T x x xy E C E c • Take expectation of both sides
Proof (continued) ECx=E|∑cy·. Take expectation y∈7-{x} of both sides E Xy ]· Linearity of y∈7-{x} expectation o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8.7
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.7 Proof (continued) ∑ ∑ ∈ − ∈ − = = { } { } [ ] [ ] y T x xy y T x x xy E c E C E c • Linearity of expectation. • Take expectation of both sides
Proof (continued) ECx]=E∑c Take expectation y∈7-{x} of both sides ∑Ecx1]. Linearity of y∈7-{x} expectation ∑1/m E XI y∈7-{x} o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 l8.8
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.8 Proof (continued) ∑ ∑ ∑ ∈ − ∈ − ∈ − = = = { } { } { } 1/ [ ] [ ] y T x y T x xy y T x x xy m E c E C E c • E[cxy] = 1/m. • Linearity of expectation. • Take expectation of both sides
Proof (continued) ECx]=E∑ Take expectation y∈7-{x} of both sides ∑Ecy]· Linearity of ∈T-{x} expectation ∑1/m E XI =1/m y∈7-{x} gebra o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8.9
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.9 Proof (continued) m n m E c E C E c y T x y T x xy y T x x xy 1 1/ [ ] [ ] { } { } { } − = = = = ∑ ∑ ∑ ∈ − ∈ − ∈ − • Take expectation of both sides. • Linearity of expectation. • E[cxy] = 1/m. . • Algebra
Constructing a set of universal hash functions Let m be prime. Decompose key k into r+ 1 digits, each with value in the set (0, 1,..., m-1) That is,letk=(ko,k12…,k, Where0≤k<m Randomized strategy: Pick a=(ao, a,.,a, where each a; is chosen randomly from 0, 1,.,m-1) Define ha(k)=2a, k; mod m. DO ot product. modulo m i=0 How big is={hn}?升|=m+ REMEMBER THIS! o 2001 by Charles E Leiserson Introduction to Algorithms Day 12 L8.10
© 2001 by Charles E. Leiserson Introduction to Algorithms Day 12 L8.10 REMEMBER THIS! Constructing a set of universal hash functions Let m be prime. Decompose key k into r + 1 digits, each with value in the set {0, 1, …, m–1}. That is, let k = 〈k0, k1, …, kr〉, where 0 ≤ ki < m. Randomized strategy: Pick a = 〈a0, a1, …, ar〉 where each ai is chosen randomly from {0, 1, …, m–1}. h k a k m r i a ( ) i i mod 0 ∑ = Define = . How big is H = {ha}? |H| = mr + 1. Dot product, modulo m