Java集合-HashMap内部结构

时间 2019-11-17

原文原文链接

首先看一下Map接口的继承关系java

说明

Map 为最顶层的接口，AbstractMap 抽象类实现Map接口，TreeMap HashMap ConcurrentHashMap 都是继承自 AbstractMap，实现了不一样的功能。ConcurrentHashMap 另外又实现了一个 ConcurrentMap 接口，这个接口继承自Map，对Map接口进行了一些扩展（看名字是在扩展了并发方面）。node

概要

接下来经过分析HashMap代码，了解HashMap的内部结构。主要内容为：bootstrap

Map 接口
Map.Entry
HashMap 内部结构
get 操做
put 操做
resize
hash 扰动函数

Map 接口

首先看一下什么是Map，Map是一个接口（Interface）。在 api 中的定义为api

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.数组

一个拥有键值对的对象。一个map不能包含重复的key，没一个key最多能够映射到一个值。网络

看一下map接口中主要的方法并发

public interface Map<K,V> {
    // Query Operations

    int size();
    boolean isEmpty();
    boolean containsKey(Object key);
    boolean containsValue(Object value);
    V get(Object key);


    // Modification Operations

    V put(K key, V value);
    V remove(Object key);


    // Bulk Operations

    void putAll(Map<? extends K, ? extends V> m);
    void clear();


    // Views

    Set<K> keySet();
    Collection<V> values();
    Set<Map.Entry<K, V>> entrySet();

    interface Entry<K,V> {

        K getKey();

        V getValue();

        V setValue(V value);

        boolean equals(Object o);

        int hashCode();

        。。。
    }

    // Comparison and hashing

    boolean equals(Object o);
    int hashCode();


    // Defaultable methods

    ...
}

注释写得很清楚，接口中有一些增长获取移除等操做（Query Opertions, Modification Operations, Buld Operations，View）, 还有一些java8以后引入的默认的方法（这里没有显示出来）。views 部分提供了一些能够查看map内部的方法，keySet() 返回全部key的一个Set集合，values() 返回全部value的集合，entrySet() 返回全部键值对的集合。app

Map.Entry

Map 接口中有一个内部接口 Entry<K, V>。这个接口很是重要，咱们平时所说的键值对就是这个东西。less

它提供的方法很简单函数

interface Entry<K,V> {

    K getKey();

    V getValue();

    V setValue(V value);

    boolean equals(Object o);

    int hashCode();

    。。。
}

获取key，获取value，设置value的值，equals hashCode方法。

HashMap 内部结构

定义

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

继承自 AbstractMap 实现了 Map 接口

看下 AbstractMap 的定义

public abstract class AbstractMap<K,V> implements Map<K,V> {

AbstractMap 是一个抽象类也实现了 Map 接口。

看到这里就很奇怪了，为何 AbstractMap 已经实现了 Map 接口，HashMap 还要再实现一下 Map 接口？

查询了不少资料，听说是做者写得多余了，其实 HashMap 不必再 implements Map<K, V> 一下，下面的连接有人也提出了一样的疑问。

https://stackoverflow.com/questions/2165204/why-does-linkedhashsete-extend-hashsete-and-implement-sete

如今来看一下 HashMap 中定义的一些主要的变量

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    。。。

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

    ...

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

    ...

}

保留了源码中的注释说明，基本上看下说明能够了解这些字段的做用。

DEFAULT_INITIAL_CAPACITY 定义了初始化容量，一个map在无参数的状况下被建立出来，默认的大小就是 1<<4 （16）。

DEFAULT_LOAD_FACTOR 默认负载因子 0.75，这个很是重要，在后面的扩容会用到。

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);

    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;

    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " + loadFactor);

    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

HashMap 提供了4个构造方法，能够接收修改初始化大小和负载因子，可是通常状况下就不要去修改了，避免设置得很差性能上出现问题。

MAXIMUM_CAPACITY 最大容量 1 << 30。1左移30位二进制的形势下就是 0100 0000 0000 0000 0000 0000 0000 0000，这个的意思是2的30次方，十进制下是 1073741824。注释说了 MUST be a power of two（必定要是2的次方），再多移动一位 1<<31 就变成负数了。

TREEIFY_THRESHOLD，UNTREEIFY_THRESHOLD， MIN_TREEIFY_CAPACITY 这几个参数是后面当红黑树的参数。

结构

接下来看到2个东西 static class Node<K,V> implements Map.Entry<K,V> 和 transient Node<K,V>[] table。这2个东西就是 HashMap 的本质了。其实 HashMap 就是一个由 Node 类组成的一个二维数组，Node 是 Map.Entry 的具体实现类。

class Node<K,V>

内部定义了了4个字段，hash值，泛型key，泛型value，指向下一个Node节点的引用。

Node<K,V>[] table

The table, initialized on first use, and resized as necessary. When allocated, length is always a power of two.

table 会在第一次使用的时候初始化，而且在有必要的时候（容量超过负载因子）扩容。当扩容以后，数组的长度必定是2的n次方。（后面会解释为何必定是2的n次方，而不是其余值。）

内部接口示意图

（此图来源于网络）

map的大体容貌是这样的，当put一个对象的时候会根据对象的hash值计算出它在数组中存放的位置（经过扰动函数计算，后面会讲到），而后判断这个位置上有没有已经存在的对象，若是没有就直接放到这个位置，若是有将已存在对象的next指向当前对象造成一个链表，当链表长度超过必定数量以后，链表会转换成红黑树（这是java8以后的修改，为了提高查询效率）。因此hashmap本质上是一个二维数组加链表加红黑树的组合。

基本操做

Get

HashMap 的 get 方法以下

transient Node<K,V>[] table;

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && 
        (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
        // always check first node
        if (first.hash == hash && 
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash && 
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

先经过key获取hash值（拿key的hashCode进行高位异或），经过key的hash值判断出这个key应该在数组的哪一个位置读取（first = tab[(n-1) & hash]，这个(n-1) & hash为“扰动函数”，意在减小不一样的key落在数组同一位置的机率，已在另外一篇文中详细说明），经过hash值和hashcode相等来判断该位置是否已经有元素，若是没有返回null，若是有按链表顺序检索，若是链表为红黑树，则转换为红黑树的查找，找到相同的元素即返回，没有找到返回null。

Put

HashMap 的 put 方法以下

transient Node<K,V>[] table;

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
    return new Node<>(hash, key, value, next);
}

首先判断table是否为空，若是是空的那么就进行resize（resize方法下面说明），也就是说在第一次put的时候进行扩容，接着仍是经过扰动函数算出key在数组中的位置，若是该位置没有元素，那么直接建立一个元素（newNode）放到该位置，若是该位置不是空的，先判断一次节点元素和传进来的key相同，若是不一样判断是不是红黑树，若是是则进行红黑树查找，若是不是则循环链表，若是遍历完整个链表都没有找到相同的元素，就建立一个新的元素放到链表的最后，若是找到就返回元素的值，最后再判断一次数组的大小是否超过阀值，若是超过的话就要进行一个扩容。

Resize

resize 方法以下

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

对旧的容量判断是否须要扩容，若是须要扩容，新的数据容量大小为原来的2倍（newThr = oldThr << 1; 假设oldThr为16，转换成2进制以后左移一位结果是32，若是再次扩容左移一位，结果是64 ）。算出新的容量大小时候先建立指定大小的空数组，而后将原来的数组数据复制过来，轮询原数组，利用扰动函数从新计算出位置，若是不是链表就直接放入，若是是链表以及红黑树，则就相应的方法复制数据。

扰动函数

我在另外一篇文中具体说明了。