Java集合源码分析（四）HashMap

时间 2019-11-17

标签 java 集合源码分析 hashmap 栏目 Java 繁體版

原文原文链接

1、HashMap简介

1.一、HashMap概述

　　HashMap是基于哈希表的Map接口实现的，它存储的是内容是键值对<key,value>映射。此类不保证映射的顺序，假定哈希函数将元素适当的分布在各桶之间，可为基本操做(get和put)提供稳定的性能。html

　　在API中给出了相应的定义：node

//一、哈希表基于map接口的实现，这个实现提供了map全部的操做，而且提供了key和value能够为null，(HashMap和HashTable大体上是同样的除了hashmap是异步的和容许key和value为null)，
这个类不肯定map中元素的位置，特别要提的是，这个类也不肯定元素的位置随着时间会不会保持不变。
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. 
(The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map;
 in particular, it does not guarantee that the order will remain constant over time. 

//假设哈希函数将元素合适的分到了每一个桶(其实就是指的数组中位置上的链表)中，则这个实现为基本的操做(get、put)提供了稳定的性能，迭代这个集合视图须要的时间跟hashMap实例(key-value映射的数量)的容量(在桶中)
成正比，所以，若是迭代的性能很重要的话，就不要将初始容量设置的过高或者loadfactor设置的过低，【这里的桶，至关于在数组中每一个位置上放一个桶装元素】
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
 Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings
). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

//HashMap的实例有两个参数影响性能，初始化容量(initialCapacity)和loadFactor加载因子，在哈希表中这个容量是桶的数量【也就是数组的长度】，一个初始化容量仅仅是在哈希表被建立时容量，在
容量自动增加以前加载因子是衡量哈希表被容许达到的多少的。当entry的数量在哈希表中超过了加载因子乘以当前的容量，那么哈希表被修改(内部的数据结构会被从新创建)因此哈希表有大约两倍的桶的数量
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, 
and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before
 its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table 
is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

//一般来说，默认的加载因子(0.75)可以在时间和空间上提供一个好的平衡，更高的值会减小空间上的开支可是会增长查询花费的时间（体如今HashMap类中get、put方法上），当设置初始化容量时，应该考虑到map中会存放
entry的数量和加载因子，以便最少次数的进行rehash操做，若是初始容量大于最大条目数除以加载因子，则不会发生 rehash 操做。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup
 cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken 
into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of
 entries divided by the load factor, no rehash operations will ever occur.

//若是不少映射关系要存储在 HashMap 实例中，则相对于按需执行自动的 rehash 操做以增大表的容量来讲，使用足够大的初始容量建立它将使得映射关系能更有效地存储。
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting 
it perform automatic rehashing as needed to grow the table

HashMap的定义

1.二、HashMap在JDK1.8之前数据结构和存储原理

　　1）链表散列数组

　　　　首先咱们要知道什么是链表散列？经过数组和链表结合在一块儿使用，就叫作链表散列。这其实就是hashmap存储的原理图。数据结构

　　2）HashMap的数据结构和存储原理app

　　　　HashMap的数据结构就是用的链表散列。那HashMap底层是怎么样使用这个数据结构进行数据存取的呢？分红两个部分：异步

　　　　第一步：HashMap内部有一个entry的内部类，其中有四个属性，咱们要存储一个值，则须要一个key和一个value，存到map中就会先将key和value保存在这个Entry类建立的对象中。ide

  　　　　 static class Entry<K,V> implements Map.Entry<K,V> {
      　　　　  final K key;    //就是咱们说的map的key
       　　　　 V value;    //value值，这两个都不陌生
       　　　　 Entry<K,V> next;//指向下一个entry对象
       　　　　 int hash;//经过key算过来的你hashcode值。

　　　　Entry的物理模型图：函数

　　　　第二步：构造好了entry对象，而后将该对象放入数组中，如何存放就是这hashMap的精华所在了。源码分析

　　　　　　大概的一个存放过程是：经过entry对象中的hash值来肯定将该对象存放在数组中的哪一个位置上，若是在这个位置上还有其余元素，则经过链表来存储这个元素。布局

　　3）Hash存放元素的过程

　　　　经过key、value封装成一个entry对象，而后经过key的值来计算该entry的hash值，经过entry的hash值和数组的长度length来计算出entry放在数组中的哪一个位置上面，

　　　　每次存放都是将entry放在第一个位置。在这个过程当中，就是经过hash值来肯定将该对象存放在数组中的哪一个位置上。

1.三、JDK1.8后HashMap的数据结构

　　上图很形象的展现了HashMap的数据结构（数组+链表+红黑树），桶中的结构多是链表，也多是红黑树，红黑树的引入是为了提升效率。

1.四、HashMap的属性

　　HashMap的实例有两个参数影响其性能。

　　初始容量：哈希表中桶的数量

　　加载因子：哈希表在其容量自动增长以前能够达到多满的一种尺度

　　当哈希表中条目数超出了当前容量*加载因子(其实就是HashMap的实际容量)时，则对该哈希表进行rehash操做，将哈希表扩充至两倍的桶数。

　　Java中默认初始容量为16，加载因子为0.75。

  static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
  static final float DEFAULT_LOAD_FACTOR = 0.75f;

　　1）loadFactor加载因子

　　　　定义：loadFactor译为装载因子。装载因子用来衡量HashMap满的程度。loadFactor的默认值为0.75f。计算HashMap的实时装载因子的方法为：size/capacity，而不是占用桶的数量去除以capacity。

　　　　loadFactor加载因子是控制数组存放数据的疏密程度，loadFactor越趋近于1，那么数组中存放的数据(entry)也就越多，也就越密，也就是会让链表的长度增长，loadFactor越小，也就是趋近于0，

　　　　那么数组中存放的数据也就越稀，也就是可能数组中每一个位置上就放一个元素。那有人说，就把loadFactor变为1最好吗，存的数据不少，可是这样会有一个问题，就是咱们在经过key拿到咱们的value时，

　　　　是先经过key的hashcode值，找到对应数组中的位置，若是该位置中有不少元素，则须要经过equals来依次比较链表中的元素，拿到咱们的value值，这样花费的性能就很高，

　　　　若是能让数组上的每一个位置尽可能只有一个元素最好，咱们就能直接获得value值了，因此有人又会说，那把loadFactor变得很小不就行了，可是若是变得过小，在数组中的位置就会太稀，也就是分散的太开，

　　　　浪费不少空间，这样也很差，因此在hashMap中loadFactor的初始值就是0.75，通常状况下不须要更改它。

   static final float DEFAULT_LOAD_FACTOR = 0.75f;

　　2）桶

　　　　根据前面画的HashMap存储的数据结构图，你这样想，数组中每个位置上都放有一个桶，每一个桶里就是装一个链表，链表中能够有不少个元素(entry)，这就是桶的意思。也就至关于把元素都放在桶中。

　　3）capacity

　　　　capacity译为容量表明的数组的容量，也就是数组的长度，同时也是HashMap中桶的个数。默认值是16。

　　　通常第一次扩容时会扩容到64，以后好像是2倍。总之，容量都是2的幂。

   static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

　　4）size的含义

　　　　size就是在该HashMap的实例中实际存储的元素的个数

　　5）threshold的做用

　　　　threshold = capacity * loadFactor，当Size>=threshold的时候，那么就要考虑对数组的扩增了，也就是说，这个的意思就是衡量数组是否须要扩增的一个标准。

　　　　注意这里说的是考虑，由于实际上要扩增数组，除了这个size>=threshold条件外，还须要另一个条件。

　　　　何时会扩增数组的大小？在put一个元素时先size>=threshold而且还要在对应数组位置上有元素，这才能扩增数组。

   int threshold;

　　咱们经过一张HashMap的数据结构图来分析：

2、HashMap的源码分析（一）

2.一、HashMap的层次关系与继承结构

　　1）HashMap继承结构

　　上面就继承了一个abstractMap，也就是用来减轻实现Map接口的编写负担。

　　2）实现接口

　　　　Map<K,V>：在AbstractMap抽象类中已经实现过的接口，这里又实现，其实是多余的。但每一个集合都有这样的错误，也没过大影响

　　　　Cloneable：可以使用Clone()方法，在HashMap中，实现的是浅层次拷贝，即对拷贝对象的改变会影响被拷贝的对象。

　　　　Serializable：可以使之序列化，便可以将HashMap对象保存至本地，以后能够恢复状态。

2.二、HashMap类的属性

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    // 序列号
    private static final long serialVersionUID = 362498820763181265L;    
    // 默认的初始容量是16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;   
    // 最大容量
    static final int MAXIMUM_CAPACITY = 1 << 30; 
    // 默认的填充因子
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    // 当桶(bucket)上的结点数大于这个值时会转成红黑树
    static final int TREEIFY_THRESHOLD = 8; 
    // 当桶(bucket)上的结点数小于这个值时树转链表
    static final int UNTREEIFY_THRESHOLD = 6;
    // 桶中结构转化为红黑树对应的table的最小大小
    static final int MIN_TREEIFY_CAPACITY = 64;
    // 存储元素的数组，老是2的幂次倍
    transient Node<k,v>[] table; 
    // 存放具体元素的集
    transient Set<map.entry<k,v>> entrySet;
    // 存放元素的个数，注意这个不等于数组的长度。
    transient int size;
    // 每次扩容和更改map结构的计数器
    transient int modCount;   
    // 临界值 当实际大小(容量*填充因子)超过临界值时，会进行扩容
    int threshold;
    // 填充因子
    final float loadFactor;
}

属性

2.三、HashMap的构造方法

　　有四个构造方法，构造方法的做用就是记录一下16这个数给threshold（这个数值最终会看成第一次数组的长度。）和初始化加载因子。注意，hashMap中table数组一开始就已是个没有长度的数组了。

　　构造方法中，并无初始化数组的大小，数组在一开始就已经被建立了，构造方法只作两件事情，一个是初始化加载因子，另外一个是用threshold记录下数组初始化的大小。注意是记录。

　　1）HashMap()

//看上面的注释就已经知道，DEFAULT_INITIAL_CAPACITY=16，DEFAULT_LOAD_FACTOR=0.75
//初始化容量：也就是初始化数组的大小
//加载因子：数组上的存放数据疏密程度。
    public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

　　2）HashMap(int)

    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

　　3）HashMap(int,float)

public HashMap(int initialCapacity, float loadFactor) {
    // 初始容量不能小于0，不然报错
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                            initialCapacity);
    // 初始容量不能大于最大值，不然为最大值
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    // 填充因子不能小于或等于0，不能为非数字
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                            loadFactor);
    // 初始化填充因子                                        
    this.loadFactor = loadFactor;
    // 初始化threshold大小
    this.threshold = tableSizeFor(initialCapacity);    
}

　　4）HashMap(Map<? extends K, ? extends V> m)

public HashMap(Map<? extends K, ? extends V> m) {
    // 初始化填充因子
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    // 将m中的全部元素添加至HashMap中
    putMapEntries(m, false);
}

　　putMapEntries(Map<? extends K, ? extends V> m, boolean evict)函数将m的全部元素存入本HashMap实例中

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        // 判断table是否已经初始化
        if (table == null) { // pre-size
            // 未初始化，s为m的实际元素个数
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                    (int)ft : MAXIMUM_CAPACITY);
            // 计算获得的t大于阈值，则初始化阈值
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        // 已初始化，而且m元素个数大于阈值，进行扩容处理
        else if (s > threshold)
            resize();
        // 将m中的全部元素添加至HashMap中
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

3、HashMap源码分析（二）

　　这里咱们来看一下咱们经常使用的一些方法的源码

3.一、put方法

　　1）put(K key,V value)

   public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

　　2）putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict)

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // table未初始化或者长度为0，进行扩容
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // (n - 1) & hash 肯定元素存放在哪一个桶中，桶为空，新生成结点放入桶中(此时，这个结点是放在数组中)
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // 桶中已经存在元素
    else {
        Node<K,V> e; K k;
        // 比较桶中第一个元素(数组中的结点)的hash值相等，key相等
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
                // 将第一个元素赋值给e，用e来记录
                e = p;
        // hash值不相等，即key不相等；为红黑树结点
        else if (p instanceof TreeNode)
            // 放入树中
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 为链表结点
        else {
            // 在链表最末插入结点
            for (int binCount = 0; ; ++binCount) {
                // 到达链表的尾部
                if ((e = p.next) == null) {
                    // 在尾部插入新结点
                    p.next = newNode(hash, key, value, null);
                    // 结点数量达到阈值，转化为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    // 跳出循环
                    break;
                }
                // 判断链表中结点的key值与插入的元素的key值是否相等
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    // 相等，跳出循环
                    break;
                // 用于遍历桶中的链表，与前面的e = p.next组合，能够遍历链表
                p = e;
            }
        }
        // 表示在桶中找到key值、hash值与插入元素相等的结点
        if (e != null) { 
            // 记录e的value
            V oldValue = e.value;
            // onlyIfAbsent为false或者旧值为null
            if (!onlyIfAbsent || oldValue == null)
                //用新值替换旧值
                e.value = value;
            // 访问后回调
            afterNodeAccess(e);
            // 返回旧值
            return oldValue;
        }
    }
    // 结构性修改
    ++modCount;
    // 实际大小大于阈值则扩容
    if (++size > threshold)
        resize();
    // 插入后回调
    afterNodeInsertion(evict);
    return null;
}

　　HashMap并无直接提供putVal接口给用户调用，而是提供的put函数，而put函数就是经过putVal来插入元素的。　　

　　3）putAlll()

3.二、get方法

　　1）get(Object key)

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

　　2）getNode(int hash,Pbject key)

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // table已经初始化，长度大于0，根据hash寻找table中的项也不为空
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 桶中第一项(数组元素)相等
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 桶中不止一个结点
        if ((e = first.next) != null) {
            // 为红黑树结点
            if (first instanceof TreeNode)
                // 在红黑树中查找
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 不然，在链表中查找
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

　　HashMap并无直接提供getNode接口给用户调用，而是提供的get函数，而get函数就是经过getNode来取得元素的。

3.三、resize方法

final Node<K,V>[] resize() {
    // 当前table保存
    Node<K,V>[] oldTab = table;
    // 保存table大小
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // 保存当前阈值 
    int oldThr = threshold;
    int newCap, newThr = 0;
    // 以前table大小大于0
    if (oldCap > 0) {
        // 以前table大于最大容量
        if (oldCap >= MAXIMUM_CAPACITY) {
            // 阈值为最大整形
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 容量翻倍，使用左移，效率更高
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
            oldCap >= DEFAULT_INITIAL_CAPACITY)
            // 阈值翻倍
            newThr = oldThr << 1; // double threshold
    }
    // 以前阈值大于0
    else if (oldThr > 0)
        newCap = oldThr;
    // oldCap = 0而且oldThr = 0，使用缺省值（如使用HashMap()构造函数，以后再插入一个元素会调用resize函数，会进入这一步）
    else {           
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 新阈值为0
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    // 初始化table
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // 以前的table已经初始化过
    if (oldTab != null) {
        // 复制元素，从新进行hash
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // 将同一桶中的元素根据(e.hash & oldCap)是否为0进行分割，分红两个不一样的链表，完成rehash
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

resize()

　　进行扩容，会伴随着一次从新hash分配，而且会遍历hash表中全部的元素，是很是耗时的。在编写程序中，要尽可能避免resize。

　　在resize前和resize后的元素布局以下:

　　　　上图只是针对了数组下标为2的桶中的各个元素在扩容后的分配布局，其余各个桶中的元素布局能够以此类推。

4、总结

4.一、关于数组扩容

　　从putVal源代码中咱们能够知道，当插入一个元素的时候size就加1，若size大于threshold的时候，就会进行扩容。假设咱们的capacity大小为32，loadFator为0.75,则threshold为24 = 32 * 0.75，

　　此时，插入了25个元素，而且插入的这25个元素都在同一个桶中，桶中的数据结构为红黑树，则还有31个桶是空的，也会进行扩容处理，其实，此时，还有31个桶是空的，好像彷佛不须要进行扩容处理，

　　可是是须要扩容处理的，由于此时咱们的capacity大小可能不适当。咱们前面知道，扩容处理会遍历全部的元素，时间复杂度很高；前面咱们还知道，通过一次扩容处理后，元素会更加均匀的分布在各个桶中，

　　会提高访问效率。因此，说尽可能避免进行扩容处理，也就意味着，遍历元素所带来的坏处大于元素在桶中均匀分布所带来的好处。　

4.二、总结

　　1）要知道hashMap在JDK1.8之前是一个链表散列这样一个数据结构，而在JDK1.8之后是一个数组加链表加红黑树的数据结构。

　　2）经过源码的学习，hashMap是一个能快速经过key获取到value值得一个集合，缘由是内部使用的是hash查找值得方法。

参考博文：

　　http://www.cnblogs.com/whgk/p/6091316.html

　　http://www.cnblogs.com/leesf456/p/5242233.html