为何阿里巴巴Java开发手册中强制要求整型包装类对象值用 equals 方法比较？

时间 2019-11-10

标签为何阿里巴巴 java 开发手册强制要求整型包装对象 equals 方法比较栏目阿里巴巴繁體版

原文原文链接

在阅读《阿里巴巴Java开发手册》时，发现有一条关于整型包装类对象之间值比较的规约，具体内容以下：java

这条建议很是值得你们关注，并且该问题在 Java 面试中十分常见。面试

还须要思考如下几个问题：数组

若是不看《阿里巴巴Java开发手册》，如何知道 Integer var = ? 会缓存 -128 到 127 之间的赋值？
为何会缓存这个范围的赋值？
如何学习和分析相似的问题？

Integer 缓存问题分析

先看下面的示例代码，并思考该段代码的输出结果：缓存

public class IntegerTest {
    public static void main(String[] args) {
        Integer a = 100, b = 100, c = 666, d = 666;
        System.out.println(a == b);
        System.out.println(c == d);
    }
}

经过运行代码能够获得答案，程序输出的结果分别为： true , false。ide

那么为何答案是这样？函数

结合《阿里巴巴Java开发手册》的描述不少人可能会回答：由于缓存了 -128 到 127 之间的数值，就没有而后了。工具

那么为何会缓存这一段区间的数值？缓存的区间能够修改吗？其它的包装类型有没有相似缓存？源码分析

接下来，让咱们一块儿进行分析。性能

源码分析法

首先咱们能够经过源码对该问题进行分析。学习

咱们知道，Integer var = ? 形式声明变量，会经过 java.lang.Integer#valueOf(int) 来构造 Integer 对象。

怎么知道会调用 valueOf() 方法呢？

你们能够经过打断点，运行程序后会调到这里。

先看 java.lang.Integer#valueOf(int) 源码：

/**
 * Returns an {@code Integer} instance representing the specified
 * {@code int} value.  If a new {@code Integer} instance is not
 * required, this method should generally be used in preference to
 * the constructor {@link #Integer(int)}, as this method is likely
 * to yield significantly better space and time performance by
 * caching frequently requested values.
 *
 * This method will always cache values in the range -128 to 127,
 * inclusive, and may cache other values outside of this range.
 *
 * @param  i an {@code int} value.
 * @return an {@code Integer} instance representing {@code i}.
 * @since  1.5
 */
public static Integer valueOf(int i) {
    if (i >= IntegerCache.low && i <= IntegerCache.high)
        return IntegerCache.cache[i + (-IntegerCache.low)];
    return new Integer(i);
}

经过源码能够看出，若是用 Ineger.valueOf(int) 来建立整数对象，参数大于等于整数缓存的最小值（ IntegerCache.low ）并小于等于整数缓存的最大值（ IntegerCache.high）, 会直接从缓存数组 (java.lang.Integer.IntegerCache#cache) 中提取整数对象；不然会 new 一个整数对象。在 JDK9 直接把 new 的构造方法标记为 deprecated，推荐使用 valueOf()，合理利用缓存，提高程序性能。

那么这里的缓存最大和最小值分别是多少呢？

从上述注释中咱们能够看出，最小值是 -128, 最大值是 127。

那么为何会缓存这一段区间的整数对象呢？

经过注释咱们能够得知：若是不要求必须新建一个整型对象，缓存最经常使用的值（提早构造缓存范围内的整型对象），会更省空间，速度也更快。

这给咱们一个很是重要的启发：

若是想减小内存占用，提升程序运行的效率，能够将经常使用的对象提早缓存起来，须要时直接从缓存中提取。

那么咱们再思考下一个问题： Integer 缓存的区间能够修改吗？

经过上述源码和注释咱们还没法回答这个问题，接下来，咱们继续看 java.lang.Integer.IntegerCache 的源码：

/**
 * Cache to support the object identity semantics of autoboxing for values between
 * -128 and 127 (inclusive) as required by JLS.
 *
 * The cache is initialized on first usage.  The size of the cache
 * may be controlled by the {@code -XX:AutoBoxCacheMax=<size>} option.
 * During VM initialization, java.lang.Integer.IntegerCache.high property
 * may be set and saved in the private system properties in the
 * sun.misc.VM class.
 */

private static class IntegerCache {
    static final int low = -128;
    static final int high;
    static final Integer cache[];
    static {
            // high value may be configured by property
            int h = 127;
            String integerCacheHighPropValue =
                sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
           // 省略其它代码
    }
      // 省略其它代码
}

经过 IntegerCache 代码和注释咱们能够看到，最小值是固定值 -128，最大值并非固定值，缓存的最大值是能够经过虚拟机参数 -XX:AutoBoxCacheMax=<size>} 或 -Djava.lang.Integer.IntegerCache.high=<value> 来设置的，未指定则为 127。

所以能够经过修改这两个参数其中之一，让缓存的最大值大于等于 666。

若是做出这种修改，示例的输出结果便会是： true,true。

学到这里是否是发现，对此问题的理解和最初的想法有些不一样呢？

这段注释也解答了为何要缓存这个范围的数据：

是为了自动装箱时能够复用这些对象，这也是 JLS2 的要求。

咱们能够参考 JLS 的 Boxing Conversion 部分的相关描述。

If the valuepbeing boxed is an integer literal of type intbetween -128and 127inclusive (§3.10.1), or the boolean literal trueorfalse(§3.10.3), or a character literal between '\u0000'and '\u007f'inclusive (§3.10.4), then let aand bbe the results of any two boxing conversions of p. It is always the case that a==b.

在 -128 到 127 （含）之间的 int 类型的值，或者 boolean 类型的 true 或 false，以及范围在’u0000’和’u007f’ （含）之间的 char 类型的数值 p，自动包装成 a 和 b 两个对象时，可使用 a == b 判断 a 和 b 的值是否相等。

反编译法

那么究竟 Integer var = ? 形式声明变量，是否是经过 java.lang.Integer#valueOf(int) 来构造 Integer 对象呢？总不能都是猜想 N 个可能的函数，而后断点调试吧？

若是遇到其它相似的问题，没人告诉我底层调用了哪一个方法，该怎么办？

这类问题，能够经过对编译后的 class 文件进行反编译来查看。

首先编译源代码：javac IntegerTest.java

而后须要对代码进行反编译，执行：javap -c IntegerTest

若是想了解 javap 的用法，直接输入 javap -help 查看用法提示（不少命令行工具都支持 -help 或 --help 给出用法提示）。

反编译后，咱们获得如下代码：

Compiled from "IntegerTest.java"
public class com.wupx.demo.IntegerTest {
  public com.wupx.demo.IntegerTest();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: bipush        100
       2: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       5: astore_1
       6: bipush        100
       8: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      11: astore_2
      12: sipush        666
      15: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      18: astore_3
      19: sipush        666
      22: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      25: astore        4
      27: getstatic     #3                  // Field java/lang/System.out:Ljava/io/PrintStream;
      30: aload_1
      31: aload_2
      32: if_acmpne     39
      35: iconst_1
      36: goto          40
      39: iconst_0
      40: invokevirtual #4                  // Method java/io/PrintStream.println:(Z)V
      43: getstatic     #3                  // Field java/lang/System.out:Ljava/io/PrintStream;
      46: aload_3
      47: aload         4
      49: if_acmpne     56
      52: iconst_1
      53: goto          57
      56: iconst_0
      57: invokevirtual #4                  // Method java/io/PrintStream.println:(Z)V
      60: return
}

能够明确得 "看到" 这四个 `Integer var = ? 形式声明的变量的确是经过 java.lang.Integer#valueOf(int) 来构造 Integer 对象的。

接下来对编译后的代码进行详细分析，若是看不懂可略过：

根据《Java Virtual Machine Specification : Java SE 8 Edition》3，后缩写为 JVMS , 第 6 章虚拟机指令集的相关描述以及《深刻理解 Java 虚拟机》4 414-149 页的附录 B “虚拟机字节码指令表”。咱们对上述指令进行解读：

偏移为 0 的指令为：bipush 100 ，其含义是将单字节整型常量 100 推入操做数栈的栈顶；

偏移为 2 的指令为：invokestatic #2 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer; 表示调用一个 static 函数，即 java.lang.Integer#valueOf(int)；

偏移为 5 的指令为：astore_1 ，其含义是从操做数栈中弹出对象引用，而后将其存到第 1 个局部变量 Slot 中；

偏移 6 到 25 的指令和上面相似；

偏移为 30 的指令为 aload_1 ，其含义是从第 1 个局部变量 Slot 取出对象引用（即 a），并将其压入栈；

偏移为 31 的指令为 aload_2 ，其含义是从第 2 个局部变量 Slot 取出对象引用（即 b），并将其压入栈；

偏移为 32 的指令为 if_acmpn，该指令为条件跳转指令，if_ 后以 a 开头表示对象的引用比较。

因为该指令有如下特性：

if_acmpeq 比较栈两个引用类型数值，相等则跳转
if_acmpne 比较栈两个引用类型数值，不相等则跳转
因为 Integer 的缓存问题，因此 a 和 b 引用指向同一个地址，所以此条件不成立（成立则跳转到偏移为 39 的指令处），执行偏移为 35 的指令。

偏移为 35 的指令: iconst_1，其含义为将常量 1 压栈（ Java 虚拟机中 boolean 类型的运算类型为 int ，其中 true 用 1 表示，详见 2.11.1 数据类型和 Java 虚拟机。

而后执行偏移为 36 的 goto 指令，跳转到偏移为 40 的指令。

偏移为 40 的指令：invokevirtual #4 // Method java/io/PrintStream.println:(Z)V。

可知参数描述符为 Z ，返回值描述符为 V。

根据 4.3.2 字段描述符，可知 FieldType 的字符为 Z 表示 boolean 类型，值为 true 或 false。
根据 4.3.3 字段描述符，可知返回值为 void。

所以能够知，最终调用了 java.io.PrintStream#println(boolean) 函数打印栈顶常量即 true。

而后比较执行偏移 43 到 57 之间的指令，比较 c 和 d，打印 false 。

执行偏移为 60 的指令，即 retrun ，程序结束。

可能有些朋友会对反编译的代码有些抵触和恐惧，这都是很是正常的现象。

咱们分析和研究问题的时候，看懂核心逻辑便可，不要纠结于细节，而失去了重点。

一回生两回熟，随着遇到的例子愈来愈多，遇到相似的问题时，会喜欢上 javap 来分析和解决问题。

若是想深刻学习 java 反编译，强烈建议结合官方的 JVMS 或其中文版:《Java 虚拟机规范》这本书进行拓展学习。

Long 的缓存问题分析

学习的目的之一就是要学会触类旁通，所以对 Long 也进行相似的研究，探究二者之间有何异同。

源码分析

相似的，接下来分析 java.lang.Long#valueOf(long) 的源码：

/**
 * Returns a {@code Long} instance representing the specified
 * {@code long} value.
 * If a new {@code Long} instance is not required, this method
 * should generally be used in preference to the constructor
 * {@link #Long(long)}, as this method is likely to yield
 * significantly better space and time performance by caching
 * frequently requested values.
 *
 * Note that unlike the {@linkplain Integer#valueOf(int)
 * corresponding method} in the {@code Integer} class, this method
 * is <em>not</em> required to cache values within a particular
 * range.
 *
 * @param  l a long value.
 * @return a {@code Long} instance representing {@code l}.
 * @since  1.5
 */
public static Long valueOf(long l) {
    final int offset = 128;
    if (l >= -128 && l <= 127) { // will cache
        return LongCache.cache[(int)l + offset];
    }
    return new Long(l);
}

发现该函数的写法和 Ineger.valueOf(int) 很是类似。

咱们一样也看到， Long 也用到了缓存。使用 Ineger.valueOf(int) 构造 Long 对象时，值在 [-128, 127] 之间的 Long 对象直接从缓存对象数组中提取。

并且注释一样也提到了：缓存的目的是为了提升性能。

可是经过注释咱们发现这么一段提示：

Note that unlike the {@linkplain Integer#valueOf(int) corresponding method} in the {@code Integer} class, this method is not required to cache values within a particular range.

注意：和 Ineger.valueOf(int) 不一样的是，此方法并无被要求缓存特定范围的值。

这也正是上面源码中缓存范围判断的注释为什么用 // will cache 的缘由（能够对比一下上面 Integer 的缓存的注释）。

所以咱们可知，虽然此处采用了缓存，但应该不是 JLS 的要求。

那么 Long 类型的缓存是如何构造的呢？

咱们查看缓存数组的构造：

private static class LongCache {
    private LongCache(){}

    static final Long cache[] = new Long[-(-128) + 127 + 1];

    static {
        for(int i = 0; i < cache.length; i++)
            cache[i] = new Long(i - 128);
    }
}

能够看到，它是在静态代码块中填充缓存数组的。

反编译

一样地咱们也编写一个示例片断：

public class LongTest {

    public static void main(String[] args) {
        Long a = -128L, b = -128L, c = 666L, d = 666L;
        System.out.println(a == b);
        System.out.println(c == d);
    }
}

编译源代码： javac LongTest.java

对编译后的类文件进行反编译: javap -c LongTesg

获得下面反编译的代码：

Compiled from "LongTest.java"
public class com.wupx.demo.LongTest {
  public com.wupx.demo.LongTest();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: ldc2_w        #2                  // long -128l
       3: invokestatic  #4                  // Method java/lang/Long.valueOf:(J)Ljava/lang/Long;
       6: astore_1
       7: ldc2_w        #2                  // long -128l
      10: invokestatic  #4                  // Method java/lang/Long.valueOf:(J)Ljava/lang/Long;
      13: astore_2
      14: ldc2_w        #5                  // long 666l
      17: invokestatic  #4                  // Method java/lang/Long.valueOf:(J)Ljava/lang/Long;
      20: astore_3
      21: ldc2_w        #5                  // long 666l
      24: invokestatic  #4                  // Method java/lang/Long.valueOf:(J)Ljava/lang/Long;
      27: astore        4
      29: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
      32: aload_1
      33: aload_2
      34: if_acmpne     41
      37: iconst_1
      38: goto          42
      41: iconst_0
      42: invokevirtual #8                  // Method java/io/PrintStream.println:(Z)V
      45: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
      48: aload_3
      49: aload         4
      51: if_acmpne     58
      54: iconst_1
      55: goto          59
      58: iconst_0
      59: invokevirtual #8                  // Method java/io/PrintStream.println:(Z)V
      62: return
}

从上述代码中发现 Long var = ? 的确是经过 java.lang.Long#valueOf(long) 来构造对象的。

事实上，除 Float 和 Double 外，其余包装数据类型都会缓存，6 个包装类直接赋值时，就是调用对应包装类的静态工厂方法 valueOf()。

各个包装类的缓存区间以下：

Boolean：使用静态 final 变量定义，valueOf() 就是返回这两个静态值
Byte：表示范围是 -128 ~ 127，所有缓存
Short：表示范围是 - 32768 ~ 32767，缓存范围是 -128~127
Character：表示范围是 0 ~ 65535，缓存范围是 0~127
Long：表示范围是 [-2^63 ~ 2^63-1]，缓存范围是 -128~127
Integer：表示范围是 [-2^31 ~ 2^31-1]，缓存范围是 -128~127，但它是惟一能够修改缓存范围的包装类，在 VM options 加入参数 -XX:AutoBoxCacheMax=6666，便可设置最大缓存值为 6666

另外，在选择使用包装类仍是基本数据类型时，推荐使用以下方式：

全部的 POJO 类属性必须使用包装数据类型
RPC 方法的返回值和参数必须使用包装数据类型
全部的局部变量推荐使用基本数据类型

总结

本文首先对阿里巴巴Java开发手册中强制要求整型包装类对象值用 equals 方法比较做了简单介绍，并经过源码分析法、阅读 JLS 和 JVMS、使用反编译法，对 Integer 和 Long 缓存的目的和实现方式问题进行了深刻分析。

让你们可以用更丰富的手段来学习知识和分析问题，经过对缓存目的的思考来学到更通用和本质的东西。

还介绍了其余包装类型的缓存范围，以及包装类和基本数据类型的推荐使用场景。

参考
《Java开发手册》华山版

《码出高效：Java开发手册》

《深刻理解Java虚拟机》