General Idea Purpose:Design a kind of tables that can perform insertions,deletions,and finds in constant average time. In this chapter,we discuss the Hash table ADT. The ideal hash table data structure is merely an array of some fixed size,containing the keys. Typically,a key is a string with an associated value (for instance,salary information)
General Idea ◼ Purpose: Design a kind of tables that can perform insertions, deletions, and finds in constant average time. ◼ In this chapter, we discuss the Hash table ADT. ◼ The ideal hash table data structure is merely an array of some fixed size, containing the keys. ◼ Typically, a key is a string with an associated value (for instance, salary information)
General Idea The implementation of hash tab/es is frequently called hashing. We will refer to the table size as Tab/eSize. The common convention is to have the table run from 0 to Tab/eSize-1
General Idea ◼ The implementation of hash tables is frequently called hashing. ◼ We will refer to the table size as TableSize. ◼ The common convention is to have the table run from 0 to TableSize-1
General Idea Each key is mapped into some number in the range 0 to Tab/eSize-1 and placed in the appropriate cell. ■ The mapping is called a hash function,which ideally should be simple to compute and should ensure that any two distinct keys get different cells. ■ Since there are a finite number of cells and a virtually inexhaustible supply of keys,this is clearly impossible, and thus we seek a hash function that distributes the keys evenly among the cells
General Idea ◼ Each key is mapped into some number in the range 0 to TableSize-1 and placed in the appropriate cell. ◼ The mapping is called a hash function, which ideally should be simple to compute and should ensure that any two distinct keys get different cells. ◼ Since there are a finite number of cells and a virtually inexhaustible supply of keys, this is clearly impossible, and thus we seek a hash function that distributes the keys evenly among the cells
General Idea 0 1 John 25000 2 3 Phil 45000 4 5 6 Dave 15000 7Mary35000 ■ The only remaining problems deal with choosing a function,deciding what to do when two keys hash to the same value (this is known as a collision),and deciding on the table size
General Idea ◼ The only remaining problems deal with choosing a function, deciding what to do when two keys hash to the same value (this is known as a collision), and deciding on the table size. 0 1 John 25000 2 3 Phil 45000 4 5 6 Dave 15000 7 Mary 35000
Hash Function If the input keys are integers,when simply returning (Key mod Tab/eSize)is generally a reasonable strategy,unless Key happens to have some undesirable properties. In this case,the choice of hash function needs to be carefully considered. For instance,if the table size is 10 and the keys all end in zero,then the standard hash function is a bad choice
Hash Function ◼ If the input keys are integers, when simply returning (Key mod TableSize) is generally a reasonable strategy, unless Key happens to have some undesirable properties. ◼ In this case, the choice of hash function needs to be carefully considered. ◼ For instance, if the table size is 10 and the keys all end in zero, then the standard hash function is a bad choice
Hash Function To avoid situations like the one above,it is usually a good idea to ensure that the table size is prime. When the input keys are random integers,then this function is not only very simple to compute but also distributes the keys evenly
Hash Function ◼ To avoid situations like the one above, it is usually a good idea to ensure that the table size is prime. ◼ When the input keys are random integers, then this function is not only very simple to compute but also distributes the keys evenly
Hash Function Usually,the keys are strings;in this case,the hash function needs to be chosen carefully. One option is to add up the ASCII values of the characters in the string. int Hash(char *Key,int TableSize) unsigned int HashVal=0; while(*Key!=O') HashVal+=*Key++; return HashVal%TableSize;
Hash Function ◼ Usually, the keys are strings; in this case, the hash function needs to be chosen carefully. ◼ One option is to add up the ASCII values of the characters in the string. int Hash(char *Key, int TableSize) { unsigned int HashVal=0; while (*Key!=‘\0’) HashVal+=*Key++; return HashVal%TableSize; }
Hash Function However,if the table size is large,the function does not distribute the keys well. For instance,suppose that 7ab/eSize=10007 (10007 is a prime number),and all keys are eight or fewer characters long. ■ Since a charhas an integer value that is always at most 127,the hash function can only assume values between 0 and 1016,which is 127*8. This is clearly not an equitable distribution!
Hash Function ◼ However, if the table size is large, the function does not distribute the keys well. ◼ For instance, suppose that TableSize=10007 (10007 is a prime number), and all keys are eight or fewer characters long. ◼ Since a char has an integer value that is always at most 127, the hash function can only assume values between 0 and 1016, which is 127*8. ◼ This is clearly not an equitable distribution!
Hash Function int Hash(char *Key,int TableSize) return (Key[0]+27*Key[1]+729*Key[2])TableSize; This hash function assumes that Key has at least two characters plus the NULL terminator. The value 27 represents the number of letters in the English alphabet,plus the blank,and 729 is 272. This function examines only the first three characters, but if these are random and the table size is 10,007, as before,then we would expect a reasonably equitable distribution
Hash Function ◼ This hash function assumes that Key has at least two characters plus the NULL terminator. ◼ The value 27 represents the number of letters in the English alphabet, plus the blank, and 729 is 272 . ◼ This function examines only the first three characters, but if these are random and the table size is 10,007, as before, then we would expect a reasonably equitable distribution. int Hash(char *Key, int TableSize) { return (Key[0]+27*Key[1]+729*Key[2]) % TableSize; }
Hash Function Unfortunately,English is not random. Although there are 263=17,576 possible combinations of three characters (ignoring blanks),a check of a reasonably large on-line dictionary reveals that the number of different combinations is actually only 2,851. Even if non of these combinations collide,only 28 percent of the table can actually be hashed to
Hash Function ◼ Unfortunately, English is not random. ◼ Although there are 263=17,576 possible combinations of three characters (ignoring blanks), a check of a reasonably large on-line dictionary reveals that the number of different combinations is actually only 2,851. ◼ Even if non of these combinations collide, only 28 percent of the table can actually be hashed to