JDK1.8 的 HashMap 源码之注意事项

时间 2019-11-16

原文原文链接

文章目录

英语渣靠着翻译插件，大概翻译的，不免有错误之处，注意甄别；java

链表变树

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. Most methods try to use normal bins, but relay to TreeNode methods when applicable (simply by checking instanceof a node). Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. However, since the vast majority of bins in normal use are not overpopulated, checking for existence of tree bins may be delayed in the course of table methods.

HashMap 一般表现为一张哈希表，每一个桶里面放一个元素，当有多个元素的时候，变为链表，这时候每一个桶里面都是放链表；可是当链表的长度达到一个临界的时候，链表转换为树，每一个树的结构就像 TreeMap 同样，这时候，每一个桶里面就是一个树形结构；node

大多数方法只是用普通的桶，即里面只是链表；可是在合适的时候，链表会被转为树，好比检查每一个节点的时候；由于这时候转换为树，不但支持原来链表的遍历和使用，同时还能得到更快的查找；web

可是因为大多数时候，每一个桶都没有被过分填充，即里面都是链表，还达不到转换为树的条件，所以 HashMap 的方法可能会延迟检查桶里面究竟是链表仍是树形结构；编程

树形结构与Comparable，性能极致与下降

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable", type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)

当桶里面的结构是树形结构的时候，一般状况下是按照 HashCode 来算下标位置的；可是若是要插入的元素实现了 Comparable 接口，则使用接口的 compareTo 方法，计算排序的位置；app

树形结构（HashMap中的树是红黑树）保证了在元素具备 不一样的哈希码或者能够排序 的状况下，插入元素复杂度在最坏的状况下是 O log(n) ，所以，将桶中的链表在必定状况下转成树是值得的;dom

所以，若是恶意的将 hashCode 方法的返回值故意分布在一块儿，好比返回同一个哈希码，或者实现了 Comparable 接口，可是 compareTo 永远返回 0，这时候 HashCdoe 的性能会降低；ide

若是这两种方法都不适用（不一样的哈希码或者能够排序），与不采起预防措施相比，咱们可能会浪费大约两倍的时间和空间。但目前所知的惟一案例源于糟糕的用户编程实践，这些实践已经很是缓慢，以致于没有什么区别；svg

链表与树之间转换的阈值

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)).

由于树节点的大小大约是普通节点的两倍，因此咱们只在桶中包含足够的节点（TREEIFY_THRESHOLD = 8）时才进行链表转换成树的操做；性能

当树由于删除节点变得很小的时候，会再次转换回链表；this

若是 HashCode 方法设计的很好的话，哈希冲突下降，链表的长度基本就不会很长，树是不多用到的；

理想状况下，随机的哈希码，遵循 Poisson （泊松）分布；

下面还有一些，粗略看了下，没有什么震撼的信息，就不在翻译了；

看源码，英文注释，也很费时间啊