Friday, 28 November 2014

Internals of HashMap :-

A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:
  • If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
  • If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.


HashMap use 2 data structure:
  • Hash
    Use hash value to group elements into slots, control by hash() method of HashMap,
  • linked list (singly)
    Each slot is a singly linked list, their key has the same hash value,
    the slot index is control by indexFor() method of HashMap,
Find value:
First find the slot by hash value, then loop each element in the slot until found or end,
Add value:
First find the slot by hash value,
then try find the value:
* if found, then replace the value,
* if not found, then add a new one to begining of the slot,
capacity
Capacity is slot size, as element count increase, capacity is larger but liner to element count, and finally equals to size (Integer.MAX_VALUE),
linked list length:
As element count increase, length is liner to a small constant value, and finally equals to 1,
speed:
put / get, has O(1) speed, because slot is access via index, and linked list length is very small,
space:
The slot size increase as element count increase,
but it's empty element are null, so not much space is taking,
resize:
When resize capacity, it also need to do rehash, this might take.



The bucket is the linked list effectively . Its not a LinkedList as in a java.util.LinkedList - It's a separate (simpler) implementation just for the map .

So we traverse through linked list , comparing keys in each entries using keys.equals() until it return true.  Then the corresponding entry object Value is returned .



how hashmap works internally in java

1 comment: