HashMap源码分析

时间 2019-12-11

标签 hashmap 源码分析繁體版

原文原文链接

HashMap源码分析

1.看源码看着看着就看不懂了，怎么办？

2.HashMap猜测

集合：java

ArrayList 数组node

LinkedList 链表数组

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

猜测HashMap底层的数据结构安全

数据结构：数据结构

集合、线性结构、树形结构、图多线程

那么由上面的数据结构猜测HashMap的结构是Key和value组成，多是数组和链表组成app

3.源码验证

那么HashMap存储的单元是函数

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;   // key
    V value;      // value
    Node<K,V> next;  //下一个

数组的表示怎么表示呢？源码分析

String[]、Integer[]this

transient Node<K,V>[] table;

那么数组的大小呢

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

初始化大小为16，为何使用位运算？1.快，2.MUST be a power of two.

若是16的初始值用完了，怎么办？

那么须要扩容，扩容的条件是使用量达到了百分之75.

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

使用一个变量去记录这个数组使用了多少了

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold; //阈值，也就是数组长度乘以0.75

链表的长度也要有限制

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

若是链表长度超过了8就采用红黑树(平衡二叉树)来存储（jdk1.8）

以上是基于已有的知识猜测的，下面就去源码验证

经常使用的put方法

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

里面的Hash函数，是一个Object自带的方法获取到的是一个整形，可是这个hashcode容易重复，会致使不一样的元素存储到同一个数组下标位置，还有一个问题，咱们的数组只有16

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);  // hash值是32位，
}

hash函数就是将高16为和低16为进行异或运算，获得的一个结果。

是为了让node落点分布均匀，减小碰撞的几率，若是碰撞高了，就会致使数组下标的链表很长

a. 0000000000000000 0000000000000000 32位的hashcode

b. 0000000000000111 1110000000000000

c. 0000000000000000 1110000000000000

若是不作低16位和高16位的与运算（hashcode就是一个32位的int型），那么b和c将会落到同一个节点上，也就是说若是全部低16位相同的都会落到相同的节点，这样显然不合适，会致使单个节点的链表变得很长。这里可能会疑惑若是个人数组不在是16而是一个很大的数字，那么高16位不就是能够参与运算了吗？确实是这样的，可是源码中是这样定义了数组的最大值

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

也就是说不可能超过32位，还有个缘由是，咱们使用hashmap时大部分时间也不会存储那么大的数据，通常也就是低几位的数字就搞定了。

如今的问题是hash函数的值太大，咱们的数组并无那么大，怎样让他落到咱们的数组里面呢？

1.使用Hash%16取末，必定不会超过16

2. n表示数组长度，tab表示数组，hash表示Hash值

(p = tab[i = (n - 1) & hash])

0111110111 & 001111 ----》15到0

这里就能看出MUST be a power of two.的意图了，由于若是不是2的n次幂，那么就会致使数据不能均匀分布到数组上。

例如

a. 0111110011 &

b. 0111110111 &

01011 不是2的n次幂,从右到左的第三位为0了，无论a、b的第三位为何，进行&计算后都是0，就会致使不均匀，由于2的n次幂是标准，不能有偏向性。

初始化数组

n = (tab = resize()).length;

这里的resize（）方法作数组初始化

// zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;     //16
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);  //12

使用默认的初始值（16,12）初始化数组

Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

开始放入数据，判断放入数据点的位置是否为空

if ((p = tab[i = (n - 1) & hash]) == null)

为空就直接放进去
tab[i] = newNode(hash, key, value, null);

不为空，就有三种状况

1.key存在了，就覆盖，并返回旧值

if (p.hash == hash &&
    ((k = p.key) == key || (key != null && key.equals(k))))
    e = p;

if (e != null) { // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;
}

2.是链表，由代码能够看出，新的数据是添加到链表的末尾的（jdk1.8）

for (int binCount = 0; ; ++binCount) {
    if ((e = p.next) == null) {
        p.next = newNode(hash, key, value, null);
        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
            treeifyBin(tab, hash); //超过了8就变红黑树
        break;
    }
    if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        break;
    p = e;
}

3.是红黑树

e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

判断数据是否超过了阈值

if (++size > threshold)  // threshold=12
    resize();     // 扩容

因此由上面能够看出resize()既能够初始化大小也能够扩容。

如何扩容的？

1.扩展空间

if (oldCap > 0) {
    if (oldCap >= MAXIMUM_CAPACITY) { // 超过了阈值
        threshold = Integer.MAX_VALUE;
        return oldTab;
    }
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && // 扩大一倍
             oldCap >= DEFAULT_INITIAL_CAPACITY)
        newThr = oldThr << 1; // double threshold  // 阈值也扩大一倍
}

2.把旧数据均匀分布在新的空间上

Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

if (oldTab != null) {
    for (int j = 0; j < oldCap; ++j) {// 遍历旧数组的每一个节点
        Node<K,V> e;
        if ((e = oldTab[j]) != null) {
            oldTab[j] = null; // 原位置设置为null
            if (e.next == null) // 没有子节点
                newTab[e.hash & (newCap - 1)] = e; // 使用hashcode & n-1 计算新的位置
            else if (e instanceof TreeNode)// 红黑树的处理
                ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
            else { // preserve order  // 链表
                Node<K,V> loHead = null, loTail = null;
                Node<K,V> hiHead = null, hiTail = null;
                Node<K,V> next;
                do {
                    next = e.next; // 当前节点
                    if ((e.hash & oldCap) == 0) { // 不须要移动
                        if (loTail == null)
                            loHead = e;
                        else
                            loTail.next = e;
                        loTail = e;
                    }
                    else { // 须要移动的
                        if (hiTail == null)
                            hiHead = e;
                        else
                            hiTail.next = e;
                        hiTail = e;
                    }
                } while ((e = next) != null);
                if (loTail != null) {
                    loTail.next = null;
                    newTab[j] = loHead;
                }
                if (hiTail != null) {
                    hiTail.next = null;
                    newTab[j + oldCap] = hiHead;
                }
            }
        }
    }
}

(e.hash & oldCap) == 0) // 省去了e.hash & n-1

16 10000 第五位确定为1

110010101100111 &

？0000 若是第五位是0，那么上面的第五位必定是0

011111 这是新的32位的数组，作&运算后，第五位必定是0,那么这个元素在16和32大小的数组里面都是落在相同位置。

就像这样一个特殊数字

a. 110010101101111 &

b. 01111 15

c. 10000 16

d. 011111 32

a&b == a&d = 1111 也就说明这个节点是不须要移动的

得出新的节点的Hash的值，第五位是0，这样就能判断出16位的和32位的与运算出来的结果是同样的，就不须要移动数据。

若是不等于0就须要移动

if (hiTail != null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;// 将该数据移动到本身如今的位置+就数组的长度的位置处
}

这里补充一句，因为jdk1.7和jdk1.8的实现不少不一样，在jdk7中HashMap在多线程环境下出现的死循环已经被上面介绍的扩容机制规避了。若是是多线程环境仍是应该使用HashTable、ConcurrentHashMap、Synchronized Map这些线程安全的类。