为何HashMap的加载因子是0.75？

时间 2020-02-02

标签为何 hashmap 加载因子 0.75 繁體版

原文原文链接

说在前面

在HashMap中，默认建立的数组长度是16，也就是哈希桶个数为16，当添加key-value的时候，会先计算出他们的哈希值(h = hash),而后用return h & (length-1)就能够算出一个数组下标，这个数组下标就是键值对应该存放的位置。java

可是，当数据较多的时候，不一样键值对算出来的hash值相同，而致使最终存放的位置相同，这就是hash冲突，当出现hash冲突的时候，该位置的数据会转变成链表的形式存储，可是咱们知道，数组的存储空间是连续的，因此能够直接使用下标索引来查取，修改，删除数据等操做，并且效率很高。而链表的存储空间不是连续的，因此不能使用下标索引，对每个数据的操做都要进行从头至尾的遍历，这样会使效率变得很低，特别是当链表长度较大的时候。为了防止链表长度较大，须要对数组进行动态扩容。node

数组扩容须要申请新的内存空间，而后把以前的数据进行迁移，扩容频繁，须要耗费较多时间，效率下降，若是在使用完一半的时候扩容，空间利用率就很低，若是等快满了再进行扩容，hash冲突的几率增大！！那么何时开始扩容呢？？？数组

为了平衡空间利用率和hash冲突（效率），设置了一个加载因子(loadFactor)，而且设置一个扩容临界值（threshold = DEFAULT_INITIAL_CAPACITY * loadFactor）,就是说当使用了16*0.75=12个数组之后，就会进行扩容，且变为原来的两倍。less

为何加载因子是0.75呢？

先看一段源码注释：dom

Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

大概意思就是说，在理想状况下,使用随机哈希码,节点出现的频率在hash桶中遵循泊松分布，同时给出了桶中元素个数和几率的对照表。从上面的表中能够看到当桶中元素到达8个的时候，几率已经变得很是小，也就是说用0.75做为加载因子，每一个碰撞位置的链表长度超过８个的几率达到了一百万分之一。code

为何HashMap的加载因子是0.75？

说在前面

为何加载因子是0.75呢？