Java容器类框架分析(3)HashMap源码分析

时间 2019-11-30

标签 java 容器框架分析 hashmap 源码栏目 Java 繁體版

原文原文链接

概述

在分析HashMap的源码以前，先看一下HashMap在数据结构中的位置，常见的数据结构按照逻辑结构跟存储结构能够作以下划分：
数组

先看一下源码中的注释安全

Hash table based implementation of the Map interface. This
implementation provides all of the optional map operations, and permits
null values and the null key. (The HashMap
class is roughly equivalent to Hashtable, except that it is
unsynchronized and permits nulls.) This class makes no guarantees as to
the order of the map; in particular, it does not guarantee that the order
will remain constant over time.
哈希表是基于Map接口实现类。这个实现类提供全部Map接口的操做方法，Key跟Value都可以为空。HashMap除了不是线程安全跟容许Key跟Value为空以外，大体能够认为跟Hashtable相同。HashMap不保证map的顺序；尤为是，随着时间的推移，随着时间的推移，map的顺序也会发生变化。

从注释中能够看出，HashMap是非线程安全的，而且容许Key跟Value为空，同时也知道HashMap不是传统意义上的链表或者数组，实质上是一个链表数组bash

前面分析过ArrayList跟LinkedList，各有利弊，可是实际上咱们在进行数据操做的时候但愿查找跟修改效率都高起来，那么它们俩实际上都不符合咱们的预期，因此就有了HashMap这种数据结构，下面看一下HashMap的继承关系。数据结构

这个比较清晰，没什么好说的，如今分析一下HashMap的内部结构并发

正文

成员变量

//默认的初始化容量，必须是2的幂，默认为4
    static final int DEFAULT_INITIAL_CAPACITY = 4;
     //最大容量，2的幂，而且小于1 << 30
    static final int MAXIMUM_CAPACITY = 1 << 30;
     //默认的负载因子，当构造方法中没有指定负载因子的时候
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
     //一个空的HashMapEntry数组，table为空的时候进行赋值
    static final HashMapEntry<?,?>[] EMPTY_TABLE = {};
    //HashMapEntry数组。长度必须是2的幂
    transient HashMapEntry<K,V>[] table = (HashMapEntry<K,V>[]) EMPTY_TABLE;
   //Map中Key——Value的对数
    transient int size;
    //阈值，size超过这个值就会进行扩容
    int threshold;
    //哈希表实际的负载因子
    final float loadFactor = DEFAULT_LOAD_FACTOR;
    //哈希表被修改的次数
    transient int modCount;复制代码

下面看一下这个内部类ide

static class HashMapEntry<K,V> implements Map.Entry<K,V> {
        final K key;//Key值
        V value;//Value值
        HashMapEntry<K,V> next;//指向下一个HashMapEntry的指针
        int hash;//Key的hash值

        /**
         * Creates new entry.
         */
        HashMapEntry(int h, K k, V v, HashMapEntry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }复制代码

在成员变量中有一个HashMapEntry数组，而此时的HashMapEntry中包含有指针,说明这个数组中的元素是链表，这样一来就比较好理解了，HashMap的底层是一个数组，数组中的元素是一个链表，也就是一般所说的链表数组。ui

构造方法

采用默认的数组容量，默认的增加因子构造一个HashMap

public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }复制代码

自定义初始化容量，采用默认的增加因子构造一个HashMap

public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }复制代码

经过传入一个Map来构造一个HashMap

public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1, DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        inflateTable(threshold);
        putAllForCreate(m);
    }复制代码

自定义初始化容量跟增加因子构造一个HashMap

public HashMap(int initialCapacity, float loadFactor) {
    //检查初始化容量是否合乎规范
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY) {
            initialCapacity = MAXIMUM_CAPACITY;
        } else if (initialCapacity < DEFAULT_INITIAL_CAPACITY) {
            initialCapacity = DEFAULT_INITIAL_CAPACITY;
        }
    //检查负载因子是否合乎规范
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +loadFactor);
        threshold = initialCapacity;//初始化的时候阈值是默认跟容量相同，当size改变的时候会从新赋值
        init();//空实现
    }复制代码

void init() { }复制代码

存储元素

public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);//首次添加元素，扩容
        }
        //Key为空时，单独处理
        if (key == null)
            return putForNullKey(value);
        int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);
        int i = indexFor(hash, table.length);
        for (HashMapEntry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);//增长一个entry元素
        return null;
    }复制代码

根据Hash值与table的size来计算entry在数组中的位置this

static int indexFor(int h, int length) {
        return h & (length-1);
    }复制代码

首次设置table的容量spa

private void inflateTable(int toSize) {
        // 若是toSize不是2的幂，那么久将其转化成值最相近的2的幂
        int capacity = roundUpToPowerOf2(toSize);
        //计算阈值
        float thresholdFloat = capacity * loadFactor;
        if (thresholdFloat > MAXIMUM_CAPACITY + 1) {
            thresholdFloat = MAXIMUM_CAPACITY + 1;
        }
        //threshold 从新赋值
        threshold = (int) thresholdFloat;
        //自定义容量初始化table
        table = new HashMapEntry[capacity];
    }复制代码

//处理key为空的状况线程

private V putForNullKey(V value) {
        for (HashMapEntry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);//增长一个entry元素,key为null的时候hash值设置为0
        return null;
    }复制代码

咱们看到不论是处理Key为空仍是不为空，最后都须要调用addEntry方法，下面来分析一下addEntry

void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            //扩容，扩容因子是2
            resize(2 * table.length);
            //计算key的hash值
            hash = (null != key) ? sun.misc.Hashing.singleWordWangJenkinsHash(key) : 0;
            //根据hash和table的size来计算桶下表，也就是该元素在数组中的位置
            bucketIndex = indexFor(hash, table.length);
        }
        //生成一个新的Entry数组
        createEntry(hash, key, value, bucketIndex);
    }复制代码

HashMap扩容

void resize(int newCapacity) {
        HashMapEntry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
        //建立一个新的table数组，容量为旧数组的2倍
        HashMapEntry[] newTable = new HashMapEntry[newCapacity];
        //将旧数组中的元素所有转移到新数组里面
        transfer(newTable);
        //将新数组赋值table
        table = newTable;
        //从新计算阈值
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }复制代码

计算元素的下标

static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }复制代码

生成一个新数组

void createEntry(int hash, K key, V value, int bucketIndex) {
        //无论Key是否重复，都会去取下标为bucketIndex的元素
        HashMapEntry<K,V> e = table[bucketIndex];
        //而后从新给bucketIndex进行赋值，若是出现了hash冲突，就把最后添加的这个entry的指针指向上一个元素
        table[bucketIndex] = new HashMapEntry<>(hash, key, value, e);
        //size自增
        size++;
    }复制代码

基本上到这里，put方法已经比较清楚了，，而后将hash值跟table的size进行位运算获得该元素的下标，而后再数组中新增该元素，若是出现了hash冲突，那么不会删除该元素，会将最新的entry放在该位置，而且将entry的指针指向上一个元素，下面用一张图来解释。

上面这个图是根据HashMap的原理进行绘制的，我定义了容量为4(2的整数次幂)的Entry数组，能够看到每一个Entry都有一个next指针，当有hash冲突的时候，新加入的entry的指针会指向上一个entry，不然指向Null。

读取元素

public V get(Object key) {
    //先判断Key值是否为空，为空单独处理
        if (key == null)
            return getForNullKey();
    //经过getEntry获取相应的Value
      Entry<K,V> entry = getEntry(key);
      return null == entry ? null : entry.getValue();
    }复制代码

getForNullKey

private V getForNullKey() {
        if (size == 0) {
            return null;
        }
        //遍历整个数组，获取对应的Value
        for (HashMapEntry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }复制代码

getEntry

final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }
        //先计算hash值
        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        //经过hash值获得key在数组中的下标,遍历此下标entry的链表
        for (HashMapEntry<K,V> e = table[indexFor(hash, table.length)]; e != null;e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }复制代码

跟put的套路基本同样，须要注意的是在key非空的时候，须要遍历entry链表上面的全部Key，由于有可能这个链表冲突了，就是说不通的key对应的hash值是同样的，因此须要经过key跟hash双重判断

移除元素

public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.getValue());
    }复制代码

removeEntryForKey

final Entry<K,V> removeEntryForKey(Object key) {
        if (size == 0) {
            return null;
        }
        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        int i = indexFor(hash, table.length);
        //获取当前key对应的下标的entry
        HashMapEntry<K,V> prev = table[i];
        //将entry给e
        HashMapEntry<K,V> e = prev;
        //遍历整个entry下面的整个链表
        while (e != null) {
            HashMapEntry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                //若是是链表的head，须要对table从新赋值
                    table[i] = next;
                else
                //若是是链表的中间位置，只须要改变head的指针便可
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            //将当前的entry传给pre
            prev = e;
            //继续遍历
            e = next;
        }

        return e;
    }复制代码

包含某个Key

public boolean containsKey(Object key) {
  //跟移除同样，调用getEntry
        return getEntry(key) != null;
    }复制代码

包含某个Value

public boolean containsValue(Object value) {
   //Null单独处理
        if (value == null)
            return containsNullValue();
        HashMapEntry[] tab = table;
        //遍历整个table数组
        for (int i = 0; i < tab.length ; i++)
        //遍历entry的整个链表
            for (HashMapEntry e = tab[i] ; e != null ; e = e.next)
                if (value.equals(e.value))
                    return true;
        return false;
    }复制代码

private boolean containsNullValue() {
    //跟非空同样
        HashMapEntry[] tab = table;
        for (int i = 0; i < tab.length ; i++)
            for (HashMapEntry e = tab[i] ; e != null ; e = e.next)
                if (e.value == null)
                    return true;
        return false;
    }复制代码

清空table

调用Arrays的fill方法

public void clear() {
        modCount++;
        Arrays.fill(table, null);
        size = 0;
    }复制代码

遍历

遍历key

public Set<K> keySet() {
        Set<K> ks = keySet;
        return (ks != null ? ks : (keySet = new KeySet()));
    }复制代码

遍历value

public Collection<V> values() {
        Collection<V> vs = values;
        return (vs != null ? vs : (values = new Values()));
    }复制代码

遍历entry

private Set<Map.Entry<K,V>> entrySet() {
        Set<Map.Entry<K,V>> es = entrySet;
        //首次调用为空，会调用new EntrySet
        return es != null ? es : (entrySet = new EntrySet());
    }复制代码

因为遍历最后都是调用的都是Collection的iterator方法，看一下他们的实现：

Key持有的Iterator为KeyIterator

Iterator<K> newKeyIterator()   {
        return new KeyIterator();
    }复制代码

Value持有的Iterator为ValueIterator

Iterator<V> newValueIterator()   {
        return new ValueIterator();
    }

       private final class ValueIterator extends HashIterator<V> {
        public V next() {
            return nextEntry().getValue();
        }
    }复制代码

Entry持有的Iterator为EntryIterator

Iterator<Map.Entry<K,V>> newEntryIterator()   {
        return new EntryIterator();
    }

    private final class EntryIterator extends HashIterator<Map.Entry<K,V>> {
        public Map.Entry<K,V> next() {
            return nextEntry();
        }
    }复制代码

上述的三个Iterator都是继承自HashIterator，只是复写了next方法而已，因此如今只须要研究一下HashIterator

HashIterator

private abstract class HashIterator<E> implements Iterator<E> {
        HashMapEntry<K,V> next;        // 下一个须要返回的entry
        int expectedModCount;   // 当expectedModCount跟modCount不同时，会报异常,快速失败
        int index;              // current slot，当前节点
        HashMapEntry<K,V> current;     // 当前的entry

        HashIterator() {
            expectedModCount = modCount;
            if (size > 0) { // advance to first entry
                HashMapEntry[] t = table;
                //遍历整个table，不为空的时候而后退出循环，此时将next指向第一个元素
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
        }

        public final boolean hasNext() {
            return next != null;
        }
     //获取下一个元素
        final Entry<K,V> nextEntry() {
        modCount与expectedModCount不相等的时候，说明有多个线程同时操做Map
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            HashMapEntry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();
            if ((next = e.next) == null) {
                HashMapEntry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
            current = e;
            return e;
        }
    //删除某一个元素
        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k);
            expectedModCount = modCount;
        }
    }复制代码

有几点须要注意：

快速失败

Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast operations throw {@code ConcurrentModificationException} on a best-effort basis.
Therefore, it would be wrong to write a program that depended on this exception for its correctness: should be used only to detect bugs.
迭代器的快速失败行为不能获得保证，通常来讲，存在非同步的并发修改时，不可能做出任何坚定的保证。快速失败迭代器尽最大努力抛出 ConcurrentModificationException。所以，编写依赖于此异常的程序的作法是错误的，正确作法是：迭代器的快速失败行为应该仅用于检测程序错误。

小结

HashMap支持扩容，默认的扩容因子是2
HashMap存储数据是无序的，若是key值为null，则放在第一个位置
HashMap的key能够为空，若是不为空须要复写hashCode跟equal方法
HashMap是非线程安全的：若是想保证线程安全，可使用Collections.synchronizedMap()或者ConcurrentHashMap，不建议使用Hashtable，效率较低