【源码分析】HashMap（一）

时间 2020-02-17

标签源码分析 hashmap 繁體版

原文原文链接

花了几个小时的时间，给本身总结一下所学的

java

1、HashMap

HashMap 是基于哈希表 Map 接口的一个实现，经过 Key-Value (key、value都支持 null)存放数据，在 JDK1.7 时，底层是数组 + 链表 ， JDK1.8 改成了数组 + 链表 + 红黑树
HashMap 实现了 Map 接口， HashMap 中的 Node 静态内部类则是实现了 Map 接口中的内部接口 Entrynode

静态内部类 Node 是一个单向链表, 对应HashMap的拉链式存储. 它实现了 getKey、getValue、setValue、hashCode、equals 这些函数数组

2、数组 + 链表 + 红黑树

HashMap 为何采用这种实现方式app

1. 容量、加载因子

首先介绍 HashMap 中两个概念： 容量 和 加载因子
HashMap 的默认初始容量为 16 ，加载因子为 0.75f
实际容量 = 16 * 0.75 = 12less

加载因子指的是元素对容器的充满程度，当元素达到这个充满程度就会进行自动扩容
为何默认规定负载因子是 0.75，而不是0.8，0.76 函数

由于负载因子越大，表示可填充的程度越大，那么空间利用率越大，但链表的的长度就会愈来愈大，查询的效率就会下降，同时hash冲突的机会也会增长
负载因子越小，表示可填充的程度越小，那么空间的利用绿越小，形成空间资源浪费，可是链表的长度短，hash冲突的机会小，查询效率高大数据

因此 0.75f 是官方给出的一种时间和空间权衡的折衷选择this

transient Node<K,V>[] table; // 存放内容的实体数组
 transient int size;    // 存放的大小
 transient int modCount;    // 被修改的次数
 int threshold;   // 临界值 = 容量 * 加载因子
 final float loadFactor;    // 加载因子

transient 关键字 表示不参与序列化过程spa

2. hash冲突

hashmap存放键值对时，经过对象的hashCode 算出 hash 值来肯定存储位置的，当hashCode同样时，hash值就是同样的, 当存储的对象多时，可能会出现不一样的对象 hash 值相同，这就是hash 冲突，hashmap 底层是经过链表来解决hash冲突的.3d

3. 为何转换成红黑树结构

hashmap默认初始化是链表存储的，当链表的长度过长时，查询效率慢，在相同条件下链表的查询复杂度为O(n)，树型的查询复杂的为O(log(n)). 将查询效率从 O(n) 提高到 O(log(n))

何时才转换

首先，看代码

/*
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     */

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

定义在 hashmap中的 TREEIFY_THRESHOLD = 8， 8 链表转成树的阈值
当hashmap 中一个链表的节点足够多时(由于TreeNodes占用空间是普通Nodes的两倍)，长度达到了 8 ，就转换成红黑树
当红黑树的节点长度降到为 6 时，又转成链表

为何阈值设置为 8 而不是其余值
在 hashCode 离散性均匀的状况下，hashmap 中数据的位置均匀，很小的几率会用到红黑树结构，几乎链表的长度不会达到 8
然而随机 hashCode 离散性不是很好的状况下，JDK 又不能阻止用户实现离散性低的hashCode，所以就可能致使不均匀的数据分布

根据什么大数据几率统计，泊松分布，得出
当hashCode理想状况下，链表的长度能达到 8 时的几率为 0.00000006 ，几乎是不发生事件，因此阈值为 8

3、进入putVal方法

执行语句 map.put("lankeren","1069941886") 时，调用的时是

public V put(K key, V value) {
       return putVal(hash(key), key, value, false, true);
     }

见招拆招，进入 putVal()
先经过对象hashCode 算出 hash 值

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

用到了 ^ 运算

/**
     *  map.put()  
     **/
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
                       
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0) 
            // 若是table数组还没初始化的进行初始化操做
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)  // 获取该 bin 的头节点
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                // 若插入的key已经存在哈希映射了(mapping)
               // 将当前节点赋值给 e  而后进行新旧值更新
                e = p;
                
            else if (p instanceof TreeNode)
                // 若是当前节点属于树结构的，建立树结构型节点存放
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // 头节点没有发生重复，也不是树型结构
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        // 若是发生已存在该key的哈希映射(mapping)
                        break;
                        // break，而后对该key进行新旧值更新
                        
                    p = e;
                }
            }
            
           if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
           }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        // 暂时不知道做用
        afterNodeInsertion(evict); 
        return null;
    }

思路：

将数据加入到哈希表中时，
1. 先对实体数组进行初始化(默认长度16) 
2. 判断该 hash 值对应的位置（table数组下标）是否已经有哈希映射了
    1. 若是有，进入 else 
    2. 若是没有，存进该头部
3. 当前链表不为空，判断待插入的key是否已存在哈希映射
    1. 与头节点判断
    2. 与头节点以后节点的判断
    3. 是否属于树型结构的节点
4. 更新新旧值
5. 增长被修改次数
6. 是否大于临界值
    1. 是 --> 扩容
    2. 不是  --> 无操做

返回：若是首次插入该数据位置，返回null，若是新旧value更换，返回旧的value