Java 的基本数据类型(int、double、 char)都不是对象。但因为不少Java代码须要处理的是对象(Object),Java给全部基本类型提供了包装类(Integer、Double、Character)。有了自动装箱,你能够写以下的代码java
Character boxed = 'a'; char unboxed = boxed;
编译器自动将它转换为web
Character boxed = Character.valueOf('a'); char unboxed = boxed.charValue();
然而,Java虚拟机不是每次都能理解这类过程,所以要想获得好的系统性能,避免没必要要的装箱很关键。这也是 OptionalInt 和 IntStream 等特殊类型存在的缘由。在这篇文章中,我将概述JVM很难消除自动装箱的一个缘由。app
例如,咱们想要计算任意一类数据的编辑距离(Levenshtein距离),只要这些数据能够被看做一个序列:jvm
public class Levenshtein{ private final Function> asList; public Levenshtein(Function> asList) { this.asList = asList; } public int distance(T a, T b) { // Wagner-Fischer algorithm, with two active rows List aList = asList.apply(a); List bList = asList.apply(b); int bSize = bList.size(); int[] row0 = new int[bSize + 1]; int[] row1 = new int[bSize + 1]; for (int i = 0; i row0[i] = i; } for (int i = 0; i < bSize; ++i) { U ua = aList.get(i); row1[0] = row0[0] + 1; for (int j = 0; j < bSize; ++j) { U ub = bList.get(j); int subCost = row0[j] + (ua.equals(ub) ? 0 : 1); int delCost = row0[j + 1] + 1; int insCost = row1[j] + 1; row1[j + 1] = Math.min(subCost, Math.min(delCost, insCost)); } int[] temp = row0; row0 = row1; row1 = temp; } return row0[bSize]; } }
只要两个对象能够被看做List,这个类就能够计算它们的编辑距离。若是想计算String类型的距离,那么就须要把String转变为List类型:ide
public class StringAsList extends AbstractList{ private final String str; public StringAsList(String str) { this.str = str; } @Override public Character get(int index) { return str.charAt(index); // Autoboxing! } @Override public int size() { return str.length(); } } ... Levenshteinlev = new Levenshtein<>(StringAsList::new); lev.distance("autoboxing is fast", "autoboxing is slow"); // 4
因为Java泛型的实现方式,不能有List类型,因此要提供List和装箱操做。(注:Java10中,这个限制也许会被取消。)工具
为了测试 distance() 方法的性能,须要作基准测试。Java中微基准测试很难保证准确,但幸亏OpenJDK提供了JMH(Java Microbenchmark Harness),它能够帮咱们解决大部分难题。若是感兴趣的话,推荐你们阅读文档和实例;它会很吸引你。如下是基准测试:性能
@State(Scope.Benchmark) public class MyBenchmark { private Levenshtein lev = new Levenshtein<>(StringAsList::new); @Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public int timeLevenshtein() { return lev.distance("autoboxing is fast", "autoboxing is slow"); } }
(返回方法的结果,这样JMH就能够作一些操做让系统认为返回值会被使用到,防止冗余代码消除影响告终果。)测试
如下是结果:优化
$ java -jar target/benchmarks.jar -f 1 -wi 8 -i 8 # JMH 1.10.2 (released 3 days ago) # VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java # VM options: # Warmup: 8 iterations, 1 s each # Measurement: 8 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: com.tavianator.boxperf.MyBenchmark.timeLevenshtein # Run progress: 0.00% complete, ETA 00:00:16 # Fork: 1 of 1 # Warmup Iteration 1: 1517.495 ns/op # Warmup Iteration 2: 1503.096 ns/op # Warmup Iteration 3: 1402.069 ns/op # Warmup Iteration 4: 1480.584 ns/op # Warmup Iteration 5: 1385.345 ns/op # Warmup Iteration 6: 1474.657 ns/op # Warmup Iteration 7: 1436.749 ns/op # Warmup Iteration 8: 1463.526 ns/op Iteration 1: 1446.033 ns/op Iteration 2: 1420.199 ns/op Iteration 3: 1383.017 ns/op Iteration 4: 1443.775 ns/op Iteration 5: 1393.142 ns/op Iteration 6: 1393.313 ns/op Iteration 7: 1459.974 ns/op Iteration 8: 1456.233 ns/op Result "timeLevenshtein": 1424.461 ±(99.9%) 59.574 ns/op [Average] (min, avg, max) = (1383.017, 1424.461, 1459.974), stdev = 31.158 CI (99.9%): [1364.887, 1484.034] (assumes normal distribution) # Run complete. Total time: 00:00:16 Benchmark Mode Cnt Score Error Units MyBenchmark.timeLevenshtein avgt 8 1424.461 ± 59.574 ns/op
为了查看代码热路径(hot path)上的结果,JMH集成了Linux工具perf,能够查看最热代码块的JIT编译结果。(要想查看汇编代码,须要安装hsdis插件。我在AUR上提供了下载,Arch用户能够直接获取。)在JMH命令行添加 -prof perfasm 命令,就能够看到结果:this
$ java -jar target/benchmarks.jar -f 1 -wi 8 -i 8 -prof perfasm ... cmp $0x7f,%eax jg 0x00007fde989a6148 ;*if_icmpgt ; - java.lang.Character::valueOf@3 (line 4570) ; - com.tavianator.boxperf.StringAsList::get@8 (line 14) ; - com.tavianator.boxperf.StringAsList::get@2; (line 5) ; - com.tavianator.boxperf.Levenshtein::distance@121 (line 32) cmp $0x80,%eax jae 0x00007fde989a6103 ;*aaload ; - java.lang.Character::valueOf @ 10 (line 4571) ; - com.tavianator.boxperf.StringAsList::get@8 (line 14) ; - com.tavianator.boxperf.StringAsList::get @ 2 (line 5) ; - com.tavianator.boxperf.Levenshtein::distance@121 (line 32) ...
输出内容不少,但上面的一点内容就说明装箱没有被优化。为何要和0x7f/0×80的内容作比较呢?缘由在于Character.valueOf()的取值来源:
private static class CharacterCache { private CharacterCache(){} static final Character cache[] = new Character[127 + 1]; static { for (int i = 0; i < cache.length; i++) cache[i] = new Character((char)i); } } public static Character valueOf(char c) { if (c return CharacterCache.cache[(int)c]; } return new Character(c); }
能够看出,Java语法标准规定前127个char的Character对象放在缓冲池中,Character.valueOf()的结果在其中时,直接返回缓冲池的对象。这样作的目的是减小内存分配和垃圾回收,但在我看来这是过早的优化。并且它妨碍了其余优化。JVM没法肯定 Character.valueOf(c).charValue() == c,由于它不知道缓冲池的内容。因此JVM从缓冲池中取了一个Character对象并读取它的值,结果获得的就是和 c 同样的内容。
解决方法很简单:
@ @ -11,7 +11,7 @ @ public class StringAsList extends AbstractList { @Override public Character get(int index) { - return str.charAt(index); // Autoboxing! + return new Character(str.charAt(index)); } @Override
用显式的装箱代替自动装箱,就避免了调用Character.valueOf(),这样JVM就很容易理解代码:
private final char value; public Character(char value) { this.value = value; } public char charValue() { return value; }
虽然代码中加了一个内存分配,但JVM能理解代码的意义,会直接从String中获取char字符。性能提高很明显:
$ java -jar target/benchmarks.jar -f 1 -wi 8 -i 8 ... # Run complete. Total time: 00:00:16 Benchmark Mode Cnt Score Error Units MyBenchmark.timeLevenshtein avgt 8 1221.151 ± 58.878 ns/op
速度提高了14%。用 -prof perfasm 命令能够显示,改进之后是直接从String中拿到char值并在寄存器中比较的:
movzwl 0x10(%rsi,%rdx,2),%r11d ;*caload ; - java.lang.String::charAt@27 (line 648) ; - com.tavianator.boxperf.StringAsList::get@9 (line 14) ; - com.tavianator.boxperf.StringAsList::get @ 2 (line 5) ; - com.tavianator.boxperf.Levenshtein::distance@121 (line 32) cmp %r11d,%r10d je 0x00007faa8d404792 ;*if_icmpne ; - java.lang.Character::equals@18 (line 4621) ; - com.tavianator.boxperf.Levenshtein::distance@137 (line 33)
装箱是HotSpot的一个弱项,但愿它能作到愈来愈好。它应该多利用装箱类型的语义,消除装箱操做,这样以上的解决办法就没有必要了。