Suppose we want to store contact detail(name, phone number). What type of data structure would you use?

The first obvious choice will array.

If you are using an array, then searching will be of O(n) complexity.

Then you might say, let’s go for the binary search, then its complexity will be log(n).

Yes, searching efficient increased by binary search, but now for binary searching, we need to maintain the data in the sorted fashion, which increases the complexity of insertion and modifications.

So what next?

If you go for balanced BST, it guarantees all operation to be of complexity O(Logn) time.

Is there any further improvement possible?

You may think of the **Direct Access Table**, ie.

use the phone number as an index like

array[984456236]=’Anil’

So complexity will be O(1) for all operations.

But it creates one problem.

Yes space complexity, now we need a large array.

If the phone number size is n, then we need O(m*10^{n}) space, where m is the size of a pointer to record.

What next?

Hashing is a solution to all this, with this you will get O(1) time search time on an average and in the worst case O(n).

Hashing

Hashing converts a given data to a smaller and fixed size data.

These smaller data are used as indexes in the hash table.

Example:

Let say we have hash function h, with hash table size 10

Hashing

H(9812236542) =>118

H(8754623548) =>564

Indexing

118 % 10 => 8

564 % 10 => 4

Normally hashing give unique values, but indexes can generate duplicate values.

So data will be stored at the location in 8 and 4 in the hash table.

What if multiple data generates the same index?

This is called as the collision.

To avoid the collision, we have various technique. Like Chaining Or Linear Probing.

Chaining:

If we have duplicate values, we can make each cell of the hash table to point to a link list; all data are stored in that link list.

Hash Map

It is data structure to store (key, value) pair.