String 源码探究

时间 2020-08-02

标签 string 源码探究繁體版

原文原文链接

原由：突然想到平时用的HashMap 当key是字符串的时候为何总能够覆盖，而后看了String的源码发现：java

private final char value[];

private int hash; // Default to 0

hashCode方法：算法

public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }

equals方法：数组

public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = value.length; if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; int i = 0; while (n-- != 0) { if (v1[i] != v2[i]) return false; i++; } return true; } } return false; }

很显然hashCode和eques方法都是根据char[]数组中的char判断的，可是hashCode函数里面为何是app

h = 31 * h + val[i];这个数字为何选择31吶，引发了个人兴趣。

下面是知乎上的回答：less

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.ide
设计者选择 31 这个值是由于它是一个奇质数。若是它是一个偶数，在使用乘法当中产生数值溢出时，原有数字的信息将会丢失，由于乘以二至关于位移。
选择质数的优点不是那么清晰，可是这是一个传统。31 的一个优良的性质是：乘法能够被位移和减法替代： 31 * i == (i << 5) - i 现代的 VM 能够自行完成这个优化。

As Goodrich and Tamassia point out, If you take over 50,000 English words (formed as the union of the word lists
 provided in two variants of Unix), using the constants 31, 33, 37, 39, and 41 will produce less than 7 collisions 
in each case. Knowing this, it should come as no surprise that many Java implementations choose one of these constants.
Coincidentally, I was in the middle of reading the section "polynomial hash codes" when I saw this question. 正如 Goodrich 和 Tamassia 指出的那样，若是你使用 31，33， 37，39 和 41 这几个数值，将其应用于 hashCode 的算法中，每个数字对超过 
50000 个英语单词（由两个 Unix 版本的字典的并集构成）产生的 hash 只会产生少于 7 个的冲突。知道了这个以后，Java 大多数的发行版均会使用这几个
数值之一的事实对你也不会显得奇怪了。巧合的是，我是在阅读『多项式哈希值』这一个章节的时候看到这个问题的。

但是为何java能够s="abcd"这样直接赋值吶？难道和c语言里面的重载同样吗？函数

可是否认的：优化

由于从语言一级来看，java不支持运算符重载，这点是确定的。ui

String类的”=”,”+”,”+=”，看似运算符重载，实际不是，只是在java编译器里作了一点手脚。
java编译器对String的运算符作了特殊处理。this

例如：
String s = “a”;
s += “b”;
编译器转换成了：
String s = “a”;
s = (new StringBuilder()).append(s).append(“b”).toString();

HashSet: 继承的AbstractSet内

public int hashCode() { int h = 0; Iterator<E> i = iterator(); while (i.hasNext()) { E obj = i.next(); if (obj != null) h += obj.hashCode(); } return h; }

Integer：

public int hashCode() { return hashCode(this.value); } public static int hashCode(int var0) { return var0; }

Double:

public int hashCode() { return hashCode(this.value); } public static int hashCode(double var0) { long var2 = doubleToLongBits(var0); return (int)(var2 ^ var2 >>> 32); }

>>：带符号右移。正数右移高位补0，负数右移高位补1

>>>：无符号右移。不管是正数仍是负数，高位统统补0。

下面是关于hashCode的一些解释：

Hash是散列的意思，就是把任意长度的输入，经过散列算法变换成固定长度的输出，该输出就是散列值。关于散列值，有如下几个关键结论：

一、若是散列表中存在和散列原始输入K相等的记录，那么K一定在f(K)的存储位置上

二、不一样关键字通过散列算法变换后可能获得同一个散列地址，这种现象称为碰撞

三、若是两个Hash值不一样（前提是同一Hash算法），那么这两个Hash值对应的原始输入一定不一样

HashCode

而后讲下什么是HashCode，总结几个关键点：

一、HashCode的存在主要是为了查找的快捷性，HashCode是用来在散列存储结构中肯定对象的存储地址的

二、若是两个对象equals相等，那么这两个对象的HashCode必定也相同

三、若是对象的equals方法被重写，那么对象的HashCode方法也尽可能重写

四、若是两个对象的HashCode相同，不表明两个对象就相同，只能说明这两个对象在散列存储结构中，存放于同一个位置