HashMap

Hash function

  • map larger to smaller values
  • should be O(1) or O(len) in case of string
  • should uniformly distribute keys into hash table
  • Technically constant (for example 1) is a hash function, but it is a bad example yielding multiple collisions
  • examples
    • Integer: h(key) = key % m
    • String: h(key) = weighted sum = (s[0] * p^0 + s[1] * p^1 + s[2] * p^2 + ...) % m
      • p is prime number
    • Object: h(key) = weighted sum of elements
  • index calculation for n = hashTable size:
    • hash % n
    • (n - 1) & hash (HashMap Java implementation)

Capacity

  • number of buckets in HashMap
  • default initial capacity is 16 (must be power of 2) in Java
  • So even if the HashMap is empty, its hashTable will have 16 null elements
  • default max capacity is 1 << 30 or 2^30

Load factor

  • measure that decides when to increase the capacity of the Map
  • default load factor is 0.75f (75%) in Java

Threshold

  • approximately the product of current capacity and load factor
  • default is 16 * 0.75f = 12
  • we can say capacity will increase from 16 to 32 when size of HashMap > 12

Rehashing

  • aka Resizing
  • process of re-calculating the hash code of already stored entries.
  • when the number of entries (total nodes/key-value-pairs) in the hash table exceeds the threshold, the Map is rehashed so that it has approximately twice the number of buckets as before.
  • it is done to maintain a low load factor and low complexity for all the operations.

Collision

  • when a hash function returns the same bucket location for two different keys
  • Collision resolution:
    • Open Hashing aka Separate Chaining
    • Closed hashing aka Open addressing - Linear Probing - just find the next available empty slot - Quadratic Probing - Double Hashing
  • Open Chaining is used in HashMap in Java
  • In Java 8+, if there are too many collisions in a bucket, then the linked list converts to a balanced tree. Prior to Java 8 collision was handled using linked list only.

Implementation

  • Each Node is represented as:
// Map.Entry class from JDK 17
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}
  • It has amortized O(1) complexity, rehashing has O(N) complexity but it does not happen too frequently
  • put(key, value):
    • Calculate key.hashCode()
    • project it to index: idx = key.hashCode() % N, N = table size
    • if hashTable[idx] == null:
      • create a node and save
    • else:
      • we need to traverse and check if the key already exists
      • But all nodes have same hash, hence we need to use equals() for comparison
      • traverse linked list and check if there is node with key.equals(node.key)
      • if node exists:
        • replace value
      • else:
        • add to end of linked list
  • get(key):
    • Calculate key.hashCode()
    • project it to index: idx = key.hashCode() % N, N = table size
    • if hashTable[idx] == null:
      • return null
    • else:
      • Iterate on hashTable[idx] and check if there is any node with key.equals(node.key)
      • if node exists:
        • return node.value
      • else:
        • return null

Iteration and entrySet() ??

Can we have null as key??

Implement Custom Class as Key in HashMap

  • override both equals() and hashCode()
  • The key must be immutable, else, if the contents inside the key changes then mapping inside HashMap will become inconsistent
@Getter
public class Coordinate {
    private final int x;
    private final int y;
    private int hashCode;
 
    public Coordinate(int x, int y) {
        this.x = x;
        this.y = y;
        this.hashCode = Objects.hash(x, y);
    }
 
    // perform logical equality
    @Override
    public boolean equals(Object o) {
        if (this == o)
            return true;
        if (o == null || getClass() != o.getClass())
            return false;
        Coordinate that = (Coordinate) o;
        return x == that.x && y == that.y;
    }
 
    // return hash value
    @Override
    public int hashCode() {
        return this.hashCode;
    }
}
 
// Aliter using lombok
@RequiredArgsConstructor
@Getter
@EqualsAndHashCode
public class Coordinate {
    private final int x;
    private final int y;
}
  • Use
Map<Coordinate, Color> pixels = new HashMap<>();
Coordinate coord = new Coordinate(1, 2); // x=1, y=2
pixels.put(coord, Color.CYAN);
// read the color
Color color = pixels.get(new Coordinate(1, 2));
 
// Bad example: modifiable hash value
coord.setX(3); // x=3, y=2
// returns null
Color color = pixels.get(coord); 

Is it mandatary to override equals() and hashCode method?

HashMap vs ConcurrentHashMap

  • ConcurrentHashMap doesn’t allow null for keys and values
  • ConcurrentHashMap is thread safe
  • ConcurrentHashMap is slow