源码阅读-HashMap

时间 2019-11-24

标签源码阅读 hashmap 繁體版

原文原文链接

title: 源码阅读-HashMap date: 2018-8-16 0:34:42 tags: - HashMap - SourceCode - Java - 源码阅读 categories: Java

写在开头

扩容是一个特别耗性能的操做，所以建议使用HashMap时，尽可能指定必定大小的初始容量
HashMap是线程不安全的，并发环境中建议使用ConcurrentHashMap
JDK8中引入的红黑树优化了大量hash碰撞时的性能
HashMap中的红黑树代码做者实在没力气看了，所以这篇文章不涉及内部红黑树分析
该篇文章纯粹是做者我的观点，并不是官方权威，阅读请勿迷信

数据结构

hashmap使用数组+链表（红黑树）做为总体结构html

节点数据结构

通常状况下，节点会使用以下代码构建结构,该结构为单链表java

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        ...
    }
复制代码

JDK8中增长特性：当链表长度超过8时（等于也会触发），会转换为红黑树结构。红黑树须要了解的小伙伴能够看下这篇文章：一步一步数据结构-红黑树node

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }
    }
复制代码

总体结构

图片来源美团技术博客：Java 8系列之从新认识HashMap

本质上，在通常状况下的HashMap就是一维数组+单链表，其中一维数组在这里的做用我的感受更像是指针，当用户经过get获取值的时候，先经过hash找到对应数组位置，再经过数组找到对应的链表，再进行链表遍历找到彻底符合的键值节点数组

数据操做

在初步了解HashMap长相以后，咱们能够经过基本操做来了解它的工做过程。安全

hash与索引计算

HashMap经过Key的hash值找到数组的对应位置，所以咱们须要先行了解hash的运算规则，因使用的是key对应对象的hashCode()函数，所以在使用自定义对象做为key时，需格外注意。
运算符相关介绍：
>>> 右移运算符
 ^ 按位异或运算符bash

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
复制代码

索引计算经过(n - 1) & hash，引用美团技术博客中原文(n即length)：数据结构

这个方法很是巧妙，它经过h & (table.length -1)来获得该对象的保存位，而HashMap底层数组的长度老是2的n次方，这是HashMap在速度上的优化。当length老是2的n次方时，h& (length-1)运算等价于对length取模，也就是h%length，可是&比%具备更高的效率。并发

增

&app

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // 此处是对HashMap容量判断，属于边界异常判断
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 需注意的是，上一小节是介绍hash的运算过程，转换为数组索引是`(n - 1) & hash`
        if ((p = tab[i = (n - 1) & hash]) == null)
            // 索引位置链表为空，直接插入
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            // 此处判断的是对应索引位置单链表的头结点，需注意(k = p.key) == key || (key != null && key.equals(k)))
            // 因key能够为任意对象，所以==在某些时候不能做为判断相等
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                // 若是索引对应链表已是红黑树，交由红黑树处理
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // 当头结点并不是所寻节点，则遍历链表
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        // 无相同key节点，直接尾插法插入新节点
                        p.next = newNode(hash, key, value, null);
                        // 当链表达到阈值，则进行转化为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                // 相同key状况在此到处理，直接进行值覆盖（至关于修改）
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // 数组阈值判断，若是达到阈值，进行扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);// 意义不明
        return null;
    }
复制代码

删

public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
            null : e.value;
    }

    final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        // 避免数组为空（即HashMap为空），数组索引对应链表为空
        // (n - 1) & hash 已经计算索引位置
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            // 找对应节点
            // 判断索引对应链表头结点
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            // 添加if判断，避免边界问题（我的猜想主要是避免红黑树的状况）
            // 由于单链表的边界避免能够经过循环条件控制，此处使用的是do{}while循环，可使用while改变条件判断时机
            // 此处都是我的猜想，极具争议性，非正规解释
            else if ((e = p.next) != null) {
                if (p instanceof TreeNode)
                    // 红黑树节点交由红黑树内部方法定位所寻找节点
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            // 找到节点后进行删除操做，判断条件避免不存在节点状况和须要严格匹配值状况
            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    // 红黑树交由内部处理
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                else if (node == p)
                    // 此处状况是，链表头结点就是所寻节点，所以node与p相等
                    tab[index] = node.next;
                else
                    // 单链表删除操做
                    p.next = node.next;
                ++modCount;
                --size;
                afterNodeRemoval(node);// 意义不明
                return node;
            }
        }
        return null;
    }
复制代码

查

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // 避免为空条件三连
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // 链表头结点判断
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 当链表不只仅只有头节点
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    // 红黑树交由内部处理
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 遍历链表寻找
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }
复制代码

小结

全部操做在须要对链表进行判断的状况下，在JDK8中，都是先判断头结点，再判断是否存在后续节点，而后红黑树交由红黑树内部方法处理，单链表遍历经过do{}while进行循环遍历。对比JDK1.7咱们能够看出代码变化函数

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

    private V getForNullKey() {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

    final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }
复制代码

JDK7中，由于没有引入红黑树优化，所以链表都为单链表，所以遍历都是经过for循环，后续节点非空判断也在for循环的判断条件中。所以，此处我的大胆总结，该变化由红黑树引发。可能因为instanceof存在必定程度的性能损耗，所以，先进行首节点判断以尽量的避免首节点就是所寻节点从而不用使用instanceof能够提高必定程度的性能（存在争议）。

自动扩容

HashMap的建立

// 可自定义初始容量和加载因子
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    // 仅指定初始容量，使用默认的加载因子0.75
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    // 使用默认的加载因子初始化，未指定初始容量，将会使用默认初始容量16
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    // 根据已有map导入至新的hashmap中，使用默认加载因子0.75
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }
复制代码

以上建立HashMap的方式中，须要注意指定初始容量的函数，二者都会执行this.threshold = tableSizeFor(initialCapacity);这段代码将会将使用者指定的容量转化为2的整数次方数，举例说明tableSizeFor(11)将会返回16,tableSizeFor(17)将会返回32。其内部实现为

/** * Returns a power of two size for the given target capacity. */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
复制代码

他人解释，根据该解释，再加上程序语言通常状况下int最大值为2147483647，转化为二进制是32位，而1+2+4+8+16=31，所以基本能够认定只要传入int不是非法，都会被该函数运算处理，此时须要注意static final int MAXIMUM_CAPACITY = 1 << 30;该函数限定了最大值。通常状况下，咱们使用HashMap并不会指定初始容量与加载因子，会使用默认的无参构造（即将会建立初始容量为16，加载因子为0.75的一个HashMap），那么很容易会碰到容量达到阈值（总容量*加载因子）从而触发自动扩容。由于底层就是建立新数组，而后数据内容从旧数组中转移至新的，所以咱们先看下数据的新增定位过程。

HashMap的索引计算

以前查看put源码时，很容易看出索引位置由(n - 1) & hash算出，其中n为当前数组容量长度。&是按位与运算，我简易模拟下hash为48和1568时的运算：

48-> 0000 0000 0000 0000 0000 0000 0011 0000
  15-> 0000 0000 0000 0000 0000 0000 0000 1111
&
       0000 0000 0000 0000 0000 0000 0000 0000 -> 0

1568-> 0000 0000 0000 0000 0000 0110 0010 0000
  15-> 0000 0000 0000 0000 0000 0000 0000 1111
&
       0000 0000 0000 0000 0000 0000 0000 0000 -> 0
Tips:48为字符串"0"的hash值，1568为字符串"11"的hash值
由于&运算的特性，仅有1&1的结果才为1，所以n-1的值限定了计算&的长度
当前例子中仅仅计算最后四位，由于前面的全部都是0，无需考虑
复制代码

HashMap的扩容

扩容由++size > threshold触发，所以我使用以下代码进行简易的触发扩容，而且确保至少有一条单链表存在一个以上的节点。

HashMap<String, Integer> test = new HashMap<String, Integer>();
        for (int i = 0; i < 13; i++) {
            // 0 与 11 索引位置相同，索引为0
            // 1 与 12 索引位置相同，索引为1
            test.put(String.valueOf(i), i);
        }
复制代码

从图中能够轻易看出数组已经存在 11个数据，所以当前 size为11，此时知足 ++size > threshold条件，触发扩容，容量会由 newThr = oldThr << 1; // double threshold扩大一倍（即原来的两倍），由于容量的扩大，计算索引时的公式 (n - 1) & hash，此时 n-1的二进制确定比以前多一位，所以节点的位置须要从新计算。而根据函数 tableSizeFor咱们可知，基本上全部的HashMap的容量都是 2的整数次方数。所以能够看以下过程（以初始容量为16举例）

原始内容    hash值   hash值的二进制                               与15&结果  与31&结果
0           48      0000 0000 0000 0000 0000 0000 0011 0000     0           16
11          1568    0000 0000 0000 0000 0000 0110 0010 0000     0           0
-----------------------------------------------------------
n-1=15(非hash值!)   0000 0000 0000 0000 0000 0000 0000 1111
n^2-1=31            0000 0000 0000 0000 0000 0000 0001 1111
Tip：对不齐我也没办法，我也很难受，将就看吧
复制代码

能够轻易看出，元素是否须要转移位置取决于新增的那一位是1仍是0，所以和n进行&运算便可得知，为0即保留位置无需移动，为1则表明须要移动n个位置。先看下源码

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
复制代码

我挑出两部分着重看一下

if (e.next == null)
        // 此处意味着当前索引的链表仅有头结点
        // 所以直接从新计算索引
        newTab[e.hash & (newCap - 1)] = e;
复制代码

if ((e.hash & oldCap) == 0) {
        // 保留原位
        if (loTail == null)
            loHead = e;
        else
            loTail.next = e;
        loTail = e;
    }
    else {
        // 移动原数组容量位置
        if (hiTail == null)
            hiHead = e;
        else
            hiTail.next = e;
        hiTail = e;
    }
复制代码

以上两段代码能够清晰看出不管当前索引位置的链表仅有一个节点仍是多个，都会进行&计算，所以美团关于HashMap的文章中有这么一段

这个设计确实很是的巧妙，既省去了从新计算hash值的时间，并且同时，因为新增的1bit是0仍是1能够认为是随机的，所以resize的过程，均匀的把以前的冲突的节点分散到新的bucket了。这一块就是JDK1.8新增的优化点。有一点注意区别，JDK1.7中rehash的时候，旧链表迁移新链表的时候，若是在新表的数组索引位置相同，则链表元素会倒置，可是从上图能够看出，JDK1.8不会倒置。有兴趣的同窗能够研究下JDK1.8的resize源码，写的很赞，以下:

同时咱们对比JDK7中扩容迁移的源码来看

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }
复制代码

此时，咱们观察源码得知，的确省去从新计算hash值的时间，不过链表元素会产生倒置是由于JDK7中put使用的是头插法。所以，我的在此猜想，当链表超过一个节点时不直接使用newTab[e.hash & (newCap - 1)] = e;是为了不索引一致时的尾插法的链表遍历（链表插入删除操做快，查询满；而数组查询快，删除插入操做稍慢），使用第二段代码明显能够减小一次单链表的遍历。