How HashMap works internally

How Hashmap works in Java

HashMap works on the principle of Hashing . To understand Hashing , we should understand the three terms first i.e Hash Function , Hash Value and Bucket .

What is Hash Function , Hash Value and Bucket ?

hashCode() function which returns an integer value is the Hash function. The important point to note that , this method is present in Object class ( Mother of all class ) .

This is the code for the hash function(also known as hashCode method) in Object Class :

public native int hashCode();

The most important point to note from the above line : hashCode method return int value .
So the Hash value is the int value returned by the hash function .

So summarize the terms in the diagram below :

What is bucket ?
A bucket is used to store key value pairs . A bucket can have multiple key-value pairs . In hash map, bucket used simple linked list to store objects .

After understanding the terms we are ready to move next step , How hash map works in java or How get() works internally in java .

Code inside Java Api (HashMap class internal implementation) for HashMap get(Obejct key) method

1.  Public  V get(Object key)
   {
2.     if (key ==null)
3.     //Some code
    
4.     int hash = hash(key.hashCode());
    
5.     // if key found in hash table then  return value
6.     //    else return null
   }

Hash map works on the principle of hashing

HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of Entry object . So thinking that Only value is stored in the bucket is not correct and will not give a good impression on the interviewer .

* Whenever we call get( Key k ) method on the HashMap object . First it checks that whether key is null or not . Note that there can only be one null key in HashMap .

If key is null , then Null keys always map to hash 0, thus index 0.

If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like

4. int hash = hash(hashValue)

, and now ,it applies returned hashValue into its own hashing function .

We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is ,It defends against poor quality hash functions.

Now step 4 final hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex) .

Interviewer: What if when two different keys have the same hashcode ?

Solution, equals() method comes to rescue.Here candidate gets puzzled. Since bucket is one and we have two objects with the same hashcode .Candidate usually forgets that bucket is a simple linked list.

The bucket is the linked list effectively . Its not a LinkedList as in a java.util.LinkedList - It's a separate (simpler) implementation just for the map .

So we traverse through linked list , comparing keys in each entries using keys.equals() until it return true. Then the corresponding entry object Value is returned .

Ref: http://javahungry.blogspot.com/2013/08/hashing-how-hash-map-works-in-java-or.html

-----------------------------------------

What is Hashing?
Hashing in its simplest form, is a way to assigning a unique code for any variable/object after applying any formula/algorithm on its properties. A true Hashing function must follow this rule:

Hash function should return the same hash code each and every time, when function is applied on same or equal objects. In other words, two equal objects must produce same hash code consistently.

Note: All objects in java inherit a default implementation of hashCode() function defined in Object class. This function produce hash code by typically converting the internal address of the object into an integer, thus producing different hash codes for all different objects.

HashMap is an array of Entry objects:
Consider HashMap as just an array of objects.

Have a look what this Object is:

view plainprint?
static class Entry<K,V> implements Map.Entry<K,V> {  
        final K key;  
        V value;  
        Entry<K,V> next;  
        final int hash;  
...   
}  

Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.

Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null.
When you create HashMap with default constructor

view plainprint?
HashMap hashMap = new HashMap();  

Array is gets created with size 16 and default 0.75 load balance.

Adding a new key-value pair

Calculate hashcode for the key
Calculate position hash % (arrayLength-1)) where element should be placed(bucket number)
If you try to add a value with a key which has already been saved in HashMap, then value gets overwritten.
Otherwise element is added to the bucket. If bucket has already at least one element - a new one is gets added and placed in the first position in the bucket. Its next field refers to the old element.

Deletion:

Calculate hashcode for the given key
Calculate bucket number (hash % (arrayLength-1))
Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find correct Entry. If desired element is not found - return null

What put() method actually does:
Before going into put() method’s implementation, it is very important to learn that instances of Entry class are stored in an array.HashMap class defines this variable as:

view plainprint?
/** 
     * The table, resized as necessary. Length MUST Always be a power of two. 
     */  
    transient Entry[] table;  

Now look at code implementation of put() method:

view plainprint?
/** 
  * Associates the specified value with the specified key in this map. If the 
  * map previously contained a mapping for the key, the old value is 
  * replaced. 
  * 
  * @param key 
  *            key with which the specified value is to be associated 
  * @param value 
  *            value to be associated with the specified key 
  * @return the previous value associated with <tt>key</tt>, or <tt>null</tt> 
  *         if there was no mapping for <tt>key</tt>. (A <tt>null</tt> return 
  *         can also indicate that the map previously associated 
  *         <tt>null</tt> with <tt>key</tt>.) 
  */  
 public V put(K key, V value) {  
  if (key == null)  
   return putForNullKey(value);  
  int hash = hash(key.hashCode());  
  int i = indexFor(hash, table.length);  
  for (Entry<k , V> e = table[i]; e != null; e = e.next) {  
   Object k;  
   if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
    V oldValue = e.value;  
    e.value = value;  
    e.recordAccess(this);  
    return oldValue;  
   }  
  }  
  
  modCount++;  
  addEntry(hash, key, value, i);  
  return null;  
 }  

Lets note down the steps one by one:

Step1- First of all, key object is checked for null. If key is null, value is stored in table[0] position. Because hash code for null is always 0.

Step2- Then on next step, a hash value is calculated using key’s hash code by calling its hashCode() method. This hash value is used to calculate index in array for storing Entry object. JDK designers well assumed that there might be some poorly writtenhashCode() functions that can return very high or low hash code value. To solve this issue, they introduced another hash() function, and passed the object’s hash code to this hash() function to bring hash value in range of array index size.

Step3- Now indexFor(hash, table.length) function is called to calculate exact index position for storing the Entry object.

Step4- Here comes the main part. Now, as we know that two unequal objects can have same hash code value, how two different objects will be stored in same array location [called bucket].

Answer is LinkedList. If you remember, Entry class had an attribute “next”. This attribute always points to next object in chain. This is exactly the behavior of LinkedList.

So, in case of collision, Entry objects are stored in LinkedList form. When an Entry object needs to be stored in particular index, HashMap checks whether there is already an entry?? If there is no entry already present, Entry object is stored in this location.

If there is already an object sitting on calculated index, its next attribute is checked. If it is null, and current Entry object becomes next node in LinkedList. If next variable is not null, procedure is followed until next is evaluated as null.

What if we add the another value object with same key as entered before. Logically, it should replace the old value. How it is done? Well, after determining the index position of Entry object, while iterating over LinkedList on calculated index, HashMap calls equals method on key object for each Entry object. All these Entry objects in LinkedList will have similar hash code but equals() method will test for true equality. If key.equals(k) will be true then both keys are treated as same key object. This will cause the replacing of value object inside Entry object only.

In this way, HashMap ensure the uniqueness of keys.

How get() methods works internally
Now we have got the idea, how key-value pairs are stored in HashMap. Next big question is : what happens when an object is passed in get method of HashMap? How the value object is determined?

Answer we already should know that the way key uniqueness is determined in put() method , same logic is applied in get() method also. The moment HashMap identify exact match for the key object passed as argument, it simply returns the value object stored in current Entry object.

If no match is found, get() method returns null.

Let have a look at code:

view plainprint?
/** 
  * Returns the value to which the specified key is mapped, or {@code null} 
  * if this map contains no mapping for the key. 
  * 
  * <p> 
  * More formally, if this map contains a mapping from a key {@code k} to a 
  * value {@code v} such that {@code (key==null ? k==null : 
  * key.equals(k))}, then this method returns {@code v}; otherwise it returns 
  * {@code null}. (There can be at most one such mapping.) 
  * 
  * </p><p> 
  * A return value of {@code null} does not <i>necessarily</i> indicate that 
  * the map contains no mapping for the key; it's also possible that the map 
  * explicitly maps the key to {@code null}. The {@link #containsKey 
  * containsKey} operation may be used to distinguish these two cases. 
  * 
  * @see #put(Object, Object) 
  */  
 public V get(Object key) {  
  if (key == null)  
   return getForNullKey();  
  int hash = hash(key.hashCode());  
  for (Entry<k , V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {  
   Object k;  
   if (e.hash == hash && ((k = e.key) == key || key.equals(k)))  
    return e.value;  
  }  
  return null;  
 }  

Ref: http://www.dineshonjava.com/2013/06/how-does-java-hashmap-work-internally.html#.VqukNLJ9601

Search This Blog

Techie Madness

How HashMap works internally - Java

Comments

Post a Comment

Popular posts from this blog

Sql server - stored procedures vs functions

EJB - Stateful vs Stateless

Inversion of Control vs Dependency Injection