Before going into Hashing techniques, let us understand certain key terms that are used in hashing.
A hash table is an array that stores pointers to data mapping to a given hashed key.
nullif no existing key has hash value equal to the index for the entry.
O(1)under reasonable assumptions.
On the previous slide, we introduced the notion of hashing, mapping a piece of data such as a string to some kind of a representative integer value. We can then create a map by using this hash as an index into an array of key/value pairs. Such a structure is generally called a hash table or, Hashmap in Java, or dictionary in Python, or unordered_map in C++ ( Sorry C users, you will have to implement your own hashmap ). We saw that using the string length to create the hash, and indexing a simple array, could work in some restricted cases, but is no good generally: for example, we have the problem of collisions (several keys with the same length) and wasted space if a few keys are vastly larger than the majority.
Now, we can solve the problem of collisions by having an array of (references to) linked lists rather than simply an array of keys/values. Each little list is generally called a bucket.
Then, we can solve the problem of having an array that is too large simply by taking the hash code modulo a certain array size3. So for example, if the array were 32 positions in size (going from 0-31 ), rather than storing a key/value pair in the list at position 33, we store it at position
(33 mod 32) = 1. (In simple terms, we “wrap round” when we reach the end of the array.) So we end up with a structure something like this:
Each node in the linked lists stores a pairing of a key with a value. Now, to look for the mapping for, say, Ireland, we first compute this key’s hash code (in this case, the string length, 7). Then we start traversing the linked list at position 7 in the table. We traverse each node in the list, comparing the key stored in that node with Ireland. When we find a match, we return the value from the pair stored in that node (Dublin). In our example here, we find it on the second comparison. So although we have to do some comparisons, if the list at a given position in the table is fairly short, we’ll still reduce significantly the amount of work we need to do to find a given key / value mapping.
The structure we have just illustrated is essentially the one used by Java’s hash maps and hash sets. However, we generally wouldn’t want to use the string length as the hash code.
A hash function is a function or algorithm that is used to generate the encrypted or shortened value to any given key. The resultant of hash function is termed as a hash value or simply hash.
Properties of good hash function:
Types of Hash Functions:
Index Mapping Method: The most trivial form of hashing technique is called “Index Mapping (or Trivial Hashing)”. Here:
h(key) = key % table_length=>
h(key) = key MODULO table_length
pof certain number, say
rsuch that if table_length = rp, then
plowest-order bits of key. We need to ensure that hash function we design depends on all the bits of the key unless we are sure that all low-order p-bit patterns are equally likely.
table_lengthis prime and if
ris the number of possible character codes on a system such that
r % table_length = 1, then
h(key) = sum of binary representation of characters in key % table_length.
Mid square method:
h(key) = h(3101) = 162(by taking middle 3 digit from 9616201). Place the record in 162nd index of the hash table.
Digit folding method:
h(key) = h(12345678) = 123 + 456 + 78 = 657
In this method, the hash function implementation is as follows:
keyby a real number
cthat lies in the range
0 < c < 1and fractional part of their product is extracted.
table_size. The floor of the result is considered as the final hash. It is represented as
Here, the function floor(x) returns the integer part of the real number x, and fractional(x) yields the fractional part of that number by performing
h(key) = floor (table_size * (key * (c mod 1))) or h(key) = floor (table_size * fractional (key * c))
fractional(x) = x – floor(x)
The main advantage of this method is that the value of table size (table_size) doesnt matter. Typically, this value is chosen as a power of 2 since this can be easily implemented on most computing systems.
α = number of occupied slots/total slots
α = 500/1000 = 0.5