Tech cheat sheets - Maps

Also sometimes called an associative array, symbol table or dictionary - Maps are collections of key-value pairs and are probably one of the other most common data structures you might come across day-to-day.

HashMap

The most commonly used Map implementation in Java is probably the HashMap. The HashMap makes use of the equals() and hashCode() method on Java's Object API.

The basic premise is that the HashMap has a collection of "buckets", each which can hold several objects. When an object is added to a HashMap the hashCode() method on the key is used to select the bucket to use, the object is then added in that bucket.  For retrieval, its the same process - hashCode() is used to determine the bucket, then the entries in the bucket are inspected and the equals() method is used to determine the match.  In Java's HashMap, the buket is essentially implemented as a linked list (not a LinkedList - but Entry<k,v> has a pointer to the next entry)

The obvious implication of this, is that the performance is dependent largely on the design of a good equals() and hashCode() method.  For example, if you designed a hashCode() method that always returned the constant 1 (which would be legal, as the Java contract is that if two Objects are equal() then they must have the same hashCode(), but if two Objects have the same hashCode() they do not need to be equals()) - then it would mean all entries would be put in a single bucket.


The hashCode() method

Designing a good hash code implementation is very important - for performance (see this stackoverflow discussion on the performance impact of large HashMaps with poor hashCode implementations), but also if your hash code is erroneous then your HashMaps might just not work and you may insert objects in your Map and never be able to retrieve them (if hashCode() doesn't return consistent values for example, it could be placed in a bucket, then when trying to retrieve it generates a different hashCode() so looks in a different bucket).

If you know the complete key set, and it fits in to the Integer range (hashCode() returns an int) then you could design a perfect hashing algorithm that allows every unique key to have its own bucket, so guarantees O(1) time for insert/retrieve. However, in practice this is also quite unlikely, so ideally want to design for as even a spread across buckets as possible.


Performance

Due to the dependency on the implementation of the objects used as keys, and the data set, the worst case vs best case performance is varying.

Search/insert/remove - All these operations suffer the same problem - in best/average case performance these can be done in constant O(1) - However, the worst case (all elements in one bucket) the performance drops to linear O(n)

In practice, HashMaps are usually more efficient than search trees and other look ups, which is why they are very commonly used.

0 comments: