优化技巧:提早if判断帮助CPU分支预测

摘要: 在stackoverflow上有一个很是有名的问题:为何处理有序数组要比非有序数组快?,可见分支预测对代码运行效率有很是大的影响。要提升代码执行效率,一个重要的原则就是尽可能避免CPU把流水线清空,那么提升分支预测的成功率就很是重要。java

分支预测

在stackoverflow上有一个很是有名的问题:为何处理有序数组要比非有序数组快?,可见分支预测对代码运行效率有很是大的影响。git

现代CPU都支持分支预测(branch prediction)和指令流水线(instruction pipeline),这两个结合能够极大提升CPU效率。对于像简单的if跳转,CPU是能够比较好地作分支预测的。可是对于switch跳转,CPU则没有太多的办法。switch本质上是据索引,从地址数组里取地址再跳转。github

要提升代码执行效率,一个重要的原则就是尽可能避免CPU把流水线清空,那么提升分支预测的成功率就很是重要。api

那么对于代码里,若是某个switch分支几率很高,是否能够考虑代码层面帮CPU把判断提早,来提升代码执行效率呢?数组

Dubbo里ChannelEventRunnable的switch判断

ChannelEventRunnable里有一个switch来判断channel state,而后作对应的逻辑:查看dom

一个channel创建起来以后,超过99.9%状况它的state都是ChannelState.RECEIVED,那么能够考虑把这个判断提早。性能

benchmark验证

下面经过jmh来验证下:spa

public class TestBenchMarks {
public enum ChannelState {
    CONNECTED, DISCONNECTED, SENT, RECEIVED, CAUGHT
}

@State(Scope.Benchmark)
public static class ExecutionPlan {
    @Param({ "1000000" })
    public int size;
    public ChannelState[] states = null;

    @Setup
    public void setUp() {
        ChannelState[] values = ChannelState.values();
        states = new ChannelState[size];
        Random random = new Random(new Date().getTime());
        for (int i = 0; i < size; i++) {
            int nextInt = random.nextInt(1000000);
            if (nextInt > 100) {
                states[i] = ChannelState.RECEIVED;
            } else {
                states[i] = values[nextInt % values.length];
            }
        }
    }
}

@Fork(value = 5)
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void benchSiwtch(ExecutionPlan plan, Blackhole bh) {
    int result = 0;
    for (int i = 0; i < plan.size; ++i) {
        switch (plan.states[i]) {
        case CONNECTED:
            result += ChannelState.CONNECTED.ordinal();
            break;
        case DISCONNECTED:
            result += ChannelState.DISCONNECTED.ordinal();
            break;
        case SENT:
            result += ChannelState.SENT.ordinal();
            break;
        case RECEIVED:
            result += ChannelState.RECEIVED.ordinal();
            break;
        case CAUGHT:
            result += ChannelState.CAUGHT.ordinal();
            break;
        }
    }
    bh.consume(result);
}

@Fork(value = 5)
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void benchIfAndSwitch(ExecutionPlan plan, Blackhole bh) {
    int result = 0;
    for (int i = 0; i < plan.size; ++i) {
        ChannelState state = plan.states[i];
        if (state == ChannelState.RECEIVED) {
            result += ChannelState.RECEIVED.ordinal();
        } else {
            switch (state) {
            case CONNECTED:
                result += ChannelState.CONNECTED.ordinal();
                break;
            case SENT:
                result += ChannelState.SENT.ordinal();
                break;
            case DISCONNECTED:
                result += ChannelState.DISCONNECTED.ordinal();
                break;
            case CAUGHT:
                result += ChannelState.CAUGHT.ordinal();
                break;
            }
        }
    }
    bh.consume(result);
}

}code

  • benchSiwtch里是纯switch判断
  • benchIfAndSwitch 里用一个if提早判断state是否ChannelState.RECEIVED

benchmark结果是:索引

Result "io.github.hengyunabc.jmh.TestBenchMarks.benchSiwtch":
  576.745 ±(99.9%) 6.806 ops/s [Average]
  (min, avg, max) = (490.348, 576.745, 618.360), stdev = 20.066
  CI (99.9%): 569.939, 583.550

Run complete. Total time: 00:06:48

Benchmark (size) Mode Cnt Score Error Units
TestBenchMarks.benchIfAndSwitch 1000000 thrpt 100 1535.867 ± 61.212 ops/s
TestBenchMarks.benchSiwtch 1000000 thrpt 100 576.745 ± 6.806 ops/s

能够看到提早if判断的确提升了代码效率,这种技巧能够放在性能要求严格的地方。
Benchmark代码:https://github.com/hengyunabc/jmh-demo

总结

  • switch对于CPU来讲难以作分支预测
  • 某些switch条件若是几率比较高,能够考虑单独提早if判断,充分利用CPU的分支预测机制

原文连接

相关文章
相关标签/搜索