Data Structures and Algorithm Xiaoqing Zheng Zhengxq@fudan.edu.cn
Data Structures and Algorithm Xiaoqing Zheng zhengxq@fudan.edu.cn
Dictionary problem Dictionary T holding n records records Operations on T. x- key[x] ● INSERT(T,x) ● DELETE(T2x) Other fields. SEARCH(T, k) containing satellite data How should the data structure T'be organized?
Dictionary problem Dictionary T holding n records: key [ x ] Other fields containing satellite data records x How should the data structure T be organized? Operations on T: y INSERT ( T, x ) y DELETE ( T, x ) y SEARCH ( T, k)
Assumptions assumptions The set of keys is K CU=(1,2, ...,u-13 Keys are distinct What can we do
Assumptions Assumptions: y The set of keys is y Keys are distinct KU u ⊆= − {1, 2, , 1} " What can we do ?
Direct access table Create a table T10..u-1 fk∈ K and key{x]=k 7[k] NIL otherwise Benefit Each operation takes constant time rawbacks The range of keys can be large 64-bit numbers(which represent 18446,744,073,709,551,616 different keys) character strings(even larger!)
Direct access table Create a table T[0 … u −1]: x if k ∈ K and key [ x] = k, NIL otherwise. T[ k] = Benefit: – Each operation takes constant time Drawbacks: – The range of keys can be large: y 64-bit numbers (which represent 18,446,744,073,709,551,616 different keys), y character strings (even larger!)
Hash functions Solution: Use a hash function h to map the universe U of all keys into 0,1 h(k1) h(k4) k k h(k2)=h(k5) K ko h(k3) When a record to be inserted maps to an already occupied slot in t a collision occurs
Hash functions Solution: Use a hash function h to map the universe U of all keys into {0, 1, …, m–1}: h ( k1) h ( k4 ) h ( k2) = h ( k5) h ( k3 ) 0 m − 1 When a record to be inserted maps to an already occupied slot in T, a collision occurs. U K k1 k2 k3 k5 k4 T
Collisions resolution by chaining Records in the same slot are linked into a list T 4968|心521 h(49)=h(86)=h(52)=i
Collisions resolution by chaining Records in the same slot are linked into a list. T i 49 68 52 / h(49) = h(86) = h(52) = i
Hash functions Designing good functions is quite nontrivial For now, we assume they exist. Namely, we assume simple uniform hashing Each key k e K of keys is equally likely to be hashed to any slot of table T, independent of where other keys are hashed
Hash functions Designing good functions is quite nontrivial For now, we assume they exist. Namely, we assume simple uniform hashing: – Each key k ∈ K of keys is equally likely to be hashed to any slot of table T, independent of where other keys are hashed
Analysis of chaining Let n be the number of keys in the table and let m be the number of slots Define the load factor of T to be a=n/m average number of keys per slot The number of elements examined during a successful search for an element x is 1 more that the number of elements that appear before x in x's list
Analysis of chaining α = n m/ Let n be the number of keys in the table, and let m be the number of slots. Define the load factor of T to be = average number of keys per slot. The number of elements examined during a successful search for an element x is 1 more that the number of elements that appear before x in x’s list
Search cost Expected time to search for a record with a given key (1+a) apply hash h search function and the list access slot Expected search time o(1)ifa=O(1 or equivalently, ifn=O(m)
Search cost Expected time to search for a record with a given key Θ(1 ) +α apply hash function and access slot search the list Expected search time = Θ(1) if = O(1), or equivalently, if n = O(m). α
Analysis of successful search Let x: denote the ith element inserted into the table for i=1, 2,,n, and let ki-keyxl For keys k, and k we define the indicator variable rh(ki=h(k,)) Under the assumption of simple uniform hashing we have pr{hn(k1)=h(k,)}=m:(
Analysis of successful search 11 1 { ( ) ( )} ( ) i j pr h k h k m mm m = =⋅ ⋅ = Let xi denote the ith element inserted into the table, for i = 1, 2, .., n, and let ki = key [ xi ]. Under the assumption of simple uniform hashing, we have: For keys ki and kj, we define the indicator variable Xij = I{ h ( ki) = h ( kj)}