Hash Table Collision Resolution Techniques

An Open Addressed hash table is a very straightforward structure: it uses a base array but items are not added sequentially, instead a hash function is used to quickly find an element where the item will be stored. (The same hash function is used to retrieve the item later.)

In the likely event that two keys hash to the same bucket, we say that a collision has occurred. There are several ways to resolve the collision.

Recurring Themes

Loop: If a collision occurs, we will need to keep trying until we get a successful hash value. This kind of "try it again" repetition lends itself nicely to a loop. We also use an iterator (the ubiquitous 'i') to count passes through the loop.
Modulo: Note that with all of the above collision-resolution techniques, if you get to the end of the table, you will need to "wrap around" to the beginning again (index 0) using the blessed modulo operator.

Linear Probing

If the destination element is full, just try the next one. (This is the approach used in today's lab #2.) The code for it could look something like this:

i = 0; // counter used for probing
do {
	idx = (hash(key) + i) % size;
	i++;
} while (is_occupied(ary[idx]));
ary[idx] = new_item;

The problem with this approach is that it tends to lead to clustering where most keys are grouped together in one part of the table and the rest of it is empty.

Quadratic Probing

In the event of a collision we make larger probes, jumping by squares (first jump by 1, then 4, then 9, then 16, etc.). The code for it might look like this:

i = 0; // counter used for probing
do {
	idx = (hash(key) + i * i) % size; // note that 'i' is now squared
	i++;
} while (is_occupied(ary[idx]));
ary[idx] = new_item;

This is an improvement over linear probing because keys do not get clustered in a single area, although more groups of smaller clusters tend to be produced.

Double Hashing

If using the first hash function produces a collision, a second hash function kicks in to try to produce a different value.

i = 0;
do {
	idx = (hash1(key) + i * hash2(key)) % size;
	i++;
} while (is_occupied(ary[idx]));
ary[idx] = new_item;

As with quadratic probing, we also multiply the value returned by the second hash function by the iterator ('i') that causes it to initially be discarded (because it's multiplied by zero), and afterward make bigger jumps, thereby distributing the keys more evenly throughout the hash table.