/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
上次在面试中被问及一个问题:若是直接拿key的内存地址的long值与table的长度作取余操做(%),有什么很差?面试
我作了一番研究。app
first = tab[(n - 1) & hash]
首先,在计算一个key在table中的位置时,用的是table的长度减1,与hash值取位与的结果。而不是取余(%)操做。ide
若是一个table的长度为8,那么n=8 (1000),n-1=7 (111),若是hash是什么值,取and的结果必定是000 ~ 111 之间,即0-7,正好对应table的index的范围。spa
注释中写道,Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide.翻译
翻译过来就是:table的长度老是2的n次幂,若是一组hash值只是在(111....1111)之上的高位互相不一样,那么它们与(n-1) 位与 的结果总会碰撞。code
一句话归纳就是,key只有与(n-1)低位为1的长度相同位参与了hash碰撞的计算,高位没有体现出来。orm
JDK做者的解决方案是:(h = key.hashCode()) ^ (h >>> 16), JDK的doc中一开始说: spread higher bits of hash to lowerblog
将高位的影响传播到低位,这样与(n-1)位与的计算,高低位就同时参与了。内存
咱们都知道,一个int值是32位的,hash >>> 16 的含义就是右移16位,左边以0补齐。移位的结果是,低16位被抛弃,原高16位变成新低16位,新高16位用0补充。hash
0与0异或是0,0与1异或是1,即一个bit与0异或结果不变。 因此,hash xor (hash >>> 16) 的最终结果是:高16位不变,低16位与高16位异或。
若是 (n-1) 的二进制表示有16位,那么 n = 2的16次方 = 65536,hashmap的容量只要不大于65536,都是高低混合之16位在参与碰撞检测。