Why hash maps in Java 8 use binary tree instead of linked list?

Q:    I recently came to know that in Java 8 hash maps uses binary tree instead of linked list and hash code is used as the branching factor.I understand that in case of high collision the lookup is reduced to O(log n) from O(n) by using binary trees.My question is what good does it really do as the amortized time complexity is still O(1) and maybe if you force to store all the entries in the same bucket by providing the same hash code for all keys we can see a significant time difference but no one in their right minds would do that.html

Binary tree also uses more space than singly linked list as it stores both left and right nodes.Why increase the space complexity when there is absolutely no improvement in time complexity except for some spurious test cases.java

我最近才知道在Java 8哈希映射中使用二叉树而不是链表,并使用哈希代码做为分支因子。我知道在高冲突的状况下,查找从 O(n)减小到O(log n) 经过使用二叉树。个人问题是它真正作了什么好处,由于摊销的时间复杂度仍然是 O(1)而且若是你强制经过为全部键提供相同的哈希码来存储同一桶中的全部条目 能够看到一个显着的时间差别,但没有一我的在他们正确的思想中会这样作。二进制树比单链表使用更多空间,由于它存储左右节点。当除了一些虚假测试用例以外,当时间复杂度彻底没有改善时,为何增长空间复杂度。node

A:    This is mostly security-related change. While in normal situation it's rarely possible to have many collisions, if hash keys arrive from untrusted source (e.g. HTTP header names received from the client), then it's possible and not very hard to specially craft the input, so the resulting keys will have the same hashcode. Now if you perform many look-ups, you may experience denial-of-service. It appears that there's quite a lot of code in the wild which is vulnerable to this kind of attacks, thus it was decided to fix this on the Java side.安全

For more information refer to JEP-180.app

这主要是与安全相关的变化。 虽然在正常状况下不多有可能发生不少冲突,若是哈希密钥来自不受信任的来源(例如从客户端收到的HTTP头名称),那么可能而且不是很难专门设计输入,所以生成的密钥将具备 相同的哈希码。 如今,若是您执行许多查找,您可能会遇到拒绝服务。 彷佛在野外有至关多的代码容易受到这种攻击,所以决定在Java端解决这个问题。ide

有关更多信息,请参阅JEP-180函数

 

PS(参考原文):性能

在设计hash函数时,由于目前的table长度n为2的幂,而计算下标的时候,是这样实现的(使用&位操做,而非%求余):测试

(n - 1) & hash

设计者认为这方法很容易发生碰撞。为何这么说呢?不妨思考一下,在n – 1为15(0×1111)时,其实散列真正生效的只是低4bit的有效位,固然容易碰撞了。ui

所以,设计者想了一个顾全大局的方法(综合考虑了速度、做用、质量),就是把高16bit和低16bit异或了一下。设计者还解释到由于如今大多数的hashCode的分布已经很不错了,就算是发生了碰撞也用O(logn)的tree去作了。仅仅异或一下,既减小了系统的开销,也不会形成的由于高位没有参与下标的计算(table长度比较小时),从而引发的碰撞。

若是仍是产生了频繁的碰撞,会发生什么问题呢?做者注释说,他们使用树来处理频繁的碰撞(we use trees to handle large sets of collisions in bins),在JEP-180中,描述了这个问题:

Improve the performance of java.util.HashMap under high hash-collision conditions byusing balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

以前已经提过,在获取HashMap的元素时,基本分两步:

  1. 首先根据hashCode()作hash,而后肯定bucket的index;
  2. 若是bucket的节点的key不是咱们须要的,则经过keys.equals()在链中找。

在Java 8以前的实现中是用链表解决冲突的,在产生碰撞的状况下,进行get时,两步的时间复杂度是O(1)+O(n)。所以,当碰撞很厉害的时候n很大,O(n)的速度显然是影响速度的。

所以在Java 8中,利用红黑树替换链表,这样复杂度就变成了O(1)+O(logn)了,这样在n很大的时候,可以比较理想的解决这个问题,在Java 8:HashMap的性能提高一文中有性能测试的结果

 

JEP 180: Handle Frequent HashMap Collisions with Balanced Trees

Author Mike Duigou
Owner Brent Christian
Type Feature
Scope Implementation
Status Closed / Delivered
Release 8
Component core-libs
Discussion core dash libs dash dev at openjdk dot java dot net
Effort M
Duration M
Reviewed by Alan Bateman
Endorsed by Brian Goetz
Created 2013/02/08 20:00
Updated 2017/06/14 18:44
Issue 8046170

Summary

Improve the performance of java.util.HashMap under high hash-collision conditions by using balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

Motivation

Earlier work in this area in JDK 8, namely the alternative string-hashing implementation, improved collision performance for string-valued keys only, and it did so at the cost of adding a new (private) field to every String instance.

The changes proposed here will improve collision performance for any key type that implements Comparable. The alternative string-hashing mechanism, including the private hash32 field added to the String class, can then be removed.

Description

The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).

This technique has already been implemented in the latest version of thejava.util.concurrent.ConcurrentHashMap class, which is also slated for inclusion in JDK 8 as part of JEP 155. Portions of that code will be re-used to implement the same idea in the HashMap and LinkedHashMap classes. Only the implementations will be changed; no interfaces or specifications will be modified. Some user-visible behaviors, such as iteration order, will change within the bounds of their current specifications.

We will not implement this technique in the legacy Hashtable class. That class has been part of the platform since Java 1.0, and some legacy code that uses it is known to depend upon iteration order. Hashtable will be reverted to its state prior to the introduction of the alternative string-hashing implementation, and will maintain its historical iteration order.

We also will not implement this technique in WeakHashMap. An attempt was made, but the complexity of having to account for weak keys resulted in an unacceptable drop in microbenchmark performance. WeakHashMap will also be reverted to its prior state.

There is no need to implement this technique in the IdentityHashMap class. It uses System.identityHashCode() to generate hash codes, so collisions are generally rare.

Testing

  • Run Map tests from Doug Lea's JSR 166 CVS workspace (includes a couple microbenchmarks)
  • Run performance tests of standard workloads
  • Possibly develop new microbenchmarks

Risks and Assumptions

This change will introduce some overhead for the addition and management of the balanced trees; we expect that overhead to be negligible.

This change will likely result in a change to the iteration order of the HashMap class. The HashMap specification explicitly makes no guarantee about iteration order. The iteration order of the LinkedHashMap class will be maintained.

相关文章
相关标签/搜索