曹工说Spring Boot源码（26）-- 学习字节码也太难了，实在不能忍受了，写了个小小的字节码执行引擎

时间 2020-06-14

标签 spring boot 源码学习字节太难实在不能忍受写了小小执行引擎栏目 Spring 繁體版

原文原文链接

曹工说Spring Boot源码（26）-- 学习字节码也太难了，实在不能忍受了，写了个小小的字节码执行引擎

写在前面的话

概要

原本，这两三讲，不是和asm有些关系吗，可是asm难的地方，历来不在他自身，而是难在如何读懂字节码。我给你们举个例子，以下这个简单的类：

public class CheckAndSet {
    private int f;

    public void checkAndSetF(int f) {
        if (f >= 0) {
            this.f = f;
        } else {
            throw new IllegalArgumentException();
        }
    }

    public boolean checkAndSetF1(int f) {
        boolean a = true;
        boolean b = f >= 0;
        return b;
    }
}

咱们假设要用asm来写出这个代码，要怎么写？能够利用咱们上一讲提到的asm插件：ASM ByteCode Outline来辅助，可是，若是不懂字节码，仍是有不少坑的，一时半会趟不出来那种。字节码这个东西，若是始终绕不开的话，那仍是要学。

上面那个简单的类，用javap -v CheckAndSet.class 来反编译的话，checkAndSetF1方法，会生成以下的字节码：

public boolean checkAndSetF1(int);
    descriptor: (I)Z
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=4, args_size=2
         0: iconst_1
         1: istore_2
         2: iload_1
         3: iflt          10
         6: iconst_1
         7: goto          11
        10: iconst_0
        11: istore_3
        12: iload_3
        13: ireturn

这些字节码看起来，是否是抠脑袋？怎么知道字节码对应的意思呢，这个固然是看文档。

JVM虚拟机规范.pdf

或者

https://docs.oracle.com/javase/specs/jvms/se10/html/jvms-4.html#jvms-4.1

针对第一个pdf，你们能够从后往前查找（pdf最后附了个全部字节码指令的介绍），如：

再往上查找，还会有详细的说明：

靠着这个文档，我开始了逐行手动计算：执行这个字节码以前，栈和本地变量表是什么样的；执行这个指令后，栈和本地变量表是什么样的。过程，那是至关痛苦，大概和下面的图差很少（图片来源于网络，我只是拿来描述下）：

我可能还要原始一点，图也没画，直接在notepad++里，记录执行每一步以后，本地变量表和操做数栈的状况。这样的效率真的过低了，并且看一会，我就忘了。。

而后我以为，这个东西，好像能够写个程序来帮我执行，无非就是一条条地执行字节码，而后维护一个本地变量list，维护一个栈；执行字节码的时候，我就照着字节码的意思来作：要取本地变量我就取本地变量，要入栈我就入栈，要出栈我就出栈，反正文档很详细嘛，照着来便可。

说干就干。

效果展现

最终实现出来，效果以下，能够展现每一步的字节码和执行以后的本地变量表和操做数栈的状态。
好比执行以下方法：

public void checkAndSetF(int f) {
        if (f >= 0) {
            this.f = f;
        } else {
            throw new IllegalArgumentException();
        }
    }

字节码：

public void checkAndSetF(int);
    descriptor: (I)V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=2, args_size=2
         0: iload_1
         1: iflt          12
         4: aload_0
         5: iload_1
         6: putfield      #2                  // Field f:I
         9: goto          20
        12: new           #3                  // class java/lang/IllegalArgumentException
        15: dup
        16: invokespecial #4                  // Method java/lang/IllegalArgumentException."<init>":()V
        19: athrow
        20: return

执行效果：

大体思路与实现

编译目标class，我这里拿前面的CheckAndSet.class举例
javap -v CheckAndSet.class > a.txt，后续咱们就会读取a.txt来获取方法的指令集合
编写字节码执行引擎，一条一条地执行字节码

用javap -v来反编译class，能够拿到class的字节码，大概有两块东西比较重要：

方法的指令集合，这是咱们最须要的东西，我拿一条指令来举例：

public void checkAndSetF(int);
    descriptor: (I)V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=2, args_size=2
         0: iload_1
         1: iflt          12
         4: aload_0
         5: iload_1
         6: putfield      #2                  // Field f:I
         9: goto          20
        12: new           #3                  // class java/lang/IllegalArgumentException
        15: dup
        16: invokespecial #4                  // Method java/lang/IllegalArgumentException."<init>":()V
        19: athrow
        20: return

好比，其中的 6: putfield #2 // Field f:I这条，其中，真正的指令，其实只有下面这部分：

6: putfield      #2

剩下的// Field f:I是javap给咱们提供的注释，真正的class中是没有这部分的。那么，

6: putfield      #2

要怎么看呢，其中的#2是什么鬼意思？别慌，接着看另外一块很重要的东西：常量池。

常量池

Constant pool:
   #1 = Methodref          #6.#26         // java/lang/Object."<init>":()V
   #2 = Fieldref           #5.#27         // com/yn/sample/CheckAndSet.f:I
   #3 = Class              #28            // java/lang/IllegalArgumentException
   ...
   #5 = Class              #29            // com/yn/sample/CheckAndSet
   ...
   #27 = NameAndType        #7:#8          // f:I

前面的#2，就是上面的：

#2 = Fieldref           #5.#27         // com/yn/sample/CheckAndSet.f:I

其中，// com/yn/sample/CheckAndSet.f:I也是注释，前面的#5.#27 才是class中真实存在的。

无论怎么说，你们反正也知道#2的意思，就是CheckAndSet的f这个field。
有了这两块东西，基本能够开搞了。

单条指令的执行

好比，我要执行：

6: putfield      #2

利用#2拿到要执行指令的field（利用反射），而后再从栈里，弹出来：目标对象、要设置的field的入参。就能够像下面这样执行了：

Field field;		
	...
          
	/**
         * 从堆栈依次出栈：
         * value，objectref
         */
        Object value = context.getOperandStack().removeLast();
        Object target = context.getOperandStack().removeLast();
        try {
            field.set(target,value);
        } catch (IllegalAccessException e) {
            throw new RuntimeException(e);
        }

执行引擎核心逻辑与指令的执行顺序控制

原本，我一开始是直接遍历某个方法的指令集的：

public boolean checkAndSetF1(int);

descriptor: (I)Z
flags: ACC_PUBLIC
Code:
  stack=1, locals=4, args_size=2
     0: iconst_1
     1: istore_2
     2: iload_1
     3: iflt          10
     6: iconst_1
     7: goto          11
    10: iconst_0
    11: istore_3
    12: iload_3
    13: ireturn

就是按顺序执行，0 1 2 ...13 。可是这是有bug的，由于我忽略了下面这种跳转指令：

3: iflt          10
	 ...
     7: goto          11

因此，后来我改为了，将这个指令集合，弄成一个链表，每一个指令中，维护下一条指令的引用。

@Data
public class MethodInstructionVO {
    /**
     * 序列号
     */
    private String sequenceNumber;

    /**
     * 操做码
     */
    private String opcode;

    /**
     * 操做码的说明
     */
    private String opCodeDesc;

    /**
     * 操做数
     */
    private String operand;

    /**
     * 操做数的说明
     */
    private String comment;

    /**
     * 按顺序执行的状况下的下一条指令，好比，javap反编译后，字节码以下：
     *          0: iconst_1
     *          1: istore_2
     *          2: iload_1
     *          3: iflt          10
     *          6: iconst_1
     *          7: goto          11
     * 那么，0: iconst_1 这条指令的nextInstruction就会执行偏移为1的那个；
     */
    @JSONField(serialize = false)
    MethodInstructionVO nextInstruction;
}

上面的最后一个字段，就是用来指向下一条指令的。默认就是指向下一条，好比：

stack=1, locals=4, args_size=2
     0: iconst_1     -- next指向 1
     1: istore_2     -- next指向 2
     2: iload_1      -- next指向 3，最后一条的next为null

大概的核心执行框架以下：

1. 
		MethodInstructionVO currentInstruction = instructionVOList.get(0);
		
        while (true) {
            // 2.
            ExecutorByOpCode executorByOpCode = executorByOpCodeMap.get(currentInstruction.getOpcode());
            if (executorByOpCode == null) {
                log.info("currentInstruction:{}", currentInstruction);
            }
            // 3.
            InstructionExecutionContext context = new InstructionExecutionContext();
            context.setTarget(target);
            context.setConstantPoolItems(constantPoolItems);
            context.setLocalVariables(localVariables);
            context.setOperandStack(operandStack);
            String desc = OpCodeEnum.getDescByNameIgnoreCase(currentInstruction.getOpcode());
            currentInstruction.setOpCodeDesc(desc);
            context.setInstructionVO(currentInstruction);

            /**
             * 4. 若是该字节码执行后，返回值不为空，则表示，须要跳转到其余指令执行
             */
            InstructionExecutionResult instructionExecutionResult =
                    executorByOpCode.execute(context);
            log.info("after {},\noperand stack:{},\nlocal variables:{}", JSONObject.toJSONString(currentInstruction, SerializerFeature.PrettyFormat),
                    operandStack, localVariables);

			// 5
            if (instructionExecutionResult == null) {
                currentInstruction = currentInstruction.getNextInstruction();
                if (currentInstruction == null) {
                    System.out.println("execute over---------------");
                    break;
                }
                continue;
            } else if (instructionExecutionResult.isReturnInstruction()) {
                // 6
                return instructionExecutionResult.getResult();
            } else if (instructionExecutionResult.isExceptional()) {
                // 7
                log.info("method execute over,throw exception:{}", instructionExecutionResult.getResult());
                throw (Throwable) instructionExecutionResult.getResult();
            }
          // 8
                String sequenceNum = instructionExecutionResult.getInstructionSequenceNum();
            currentInstruction = instructionVOHashMap.get(sequenceNum);
            log.info("will skip to {}", currentInstruction);
        }

1处，默认获取第一条指令
2处，获取指令对应的处理器，好比，获取iconst_1指令对应的处理器
3处，构造要传入处理器的参数上下文，包括了当前指令、操做数栈、本地变量表、常量池等
4处，调用第二步的处理器的execute方法，传入第三步的参数；将执行结果赋值给局部变量

instructionExecutionResult。

5处，若是返回结果为null，说明不须要跳转，则将当前指令的next，赋值给当前指令。

if (instructionExecutionResult == null) {
                currentInstruction = currentInstruction.getNextInstruction();

6处，若是返回结果不为空，且是return指令，则直接返回结果
7处，若是返回结果不为空，且是抛出了异常，则将异常继续抛出
8处，若是返回结果不为空，好比遇到goto 指令，处理器返回时，会在instructionExecutionResult的instructionSequenceNum字段，设置要跳转到的指令；则查找到该指令，赋值给currentInstruction

如何根据字节码指令，查找处理器

定义了一个通用的处理器：

public interface ExecutorByOpCode {
    String getOpCode();

    /**
     *
     * @param context
     * @return 若是须要跳转，则返回要跳转的指令的偏移量；不然返回null
     */
    InstructionExecutionResult execute(InstructionExecutionContext context);
}

而后，我这边针对各类指令，写了一堆实现类：

拿一个最简单的iconst_0举例：

@Component
public class ExecutorForIConst0 extends BaseExecutorForIConstN implements ExecutorByOpCode{

    @Override
    public String getOpCode() {
        return OpCodeEnum.iconst_0.name();
    }

    @Override
    public InstructionExecutionResult execute(InstructionExecutionContext context) {
        super.execute(context, 0);
        return null;
    }
}

public class BaseExecutorForIConstN {
	// 1 
    public void execute(InstructionExecutionContext context,Integer counter) {
        context.getOperandStack().addLast(counter);
    }
}

1处，将常量0，压入操做数栈。

每一个字节码处理器，都注解了@Component，而后在执行引擎类中，注入了所有的处理器：

@Component
@Slf4j
public class MethodExecutionEngine implements InitializingBean {
    ClassInfo classInfo;
	
    // 1
    @Autowired
    private List<ExecutorByOpCode> executorByOpCodes;
  	
  	private Map<String, ExecutorByOpCode> executorByOpCodeMap = new HashMap<>();
	
  // 2
  @Override
    public void afterPropertiesSet() throws Exception {
        if (executorByOpCodes != null) {
            for (ExecutorByOpCode executorByOpCode : executorByOpCodes) {
                executorByOpCodeMap.put(executorByOpCode.getOpCode().toLowerCase(), executorByOpCode);
            }

        }
    }

1处，注入所有的处理器
2处，将处理器写入map，key：字节码指令；value：处理器自己。
后续执行引擎，就能够根据字节码指令，查找到对应的处理器。

遍历读取文件全部行，采用visitor模式回调visitor接口

就是普通的读文件，写得比较随意，读成了行的集合。

String filepath = "F:\\ownprojects\\all-simple-demo-in-work\\class-bytecode-analyse-engine\\target\\classes\\com\\yn\\sample\\a.txt";
        JavapClassFileParser javapClassFileParser = context.getBean(JavapClassFileParser.class);
        ClassInfo classInfo = javapClassFileParser.parse(filepath);

在parse方法内，代码以下：

// 1	
		lines = FileReaderUtil.readFile2Lines(filePath);
        if (CollectionUtils.isEmpty(lines)) {
            return null;
        }
		
		// 2
        ClassMethodCodeVisitor classMethodCodeVisitor = null;
        for (int i = 0; i < lines.size(); i++) {
            String currentLine = lines.get(i);
            if (i == 0) {
              ...

1处，读取文件，获取所有行

遍历全部行，这块写得比较乱一点，好比，当前行包含了“Constant pool:”时，将当前解析状态修改成常量池解析开始：

/**
 * 当本行包含Constant pool:时，接下来就是一堆的常量：
 * Constant pool:
 *    #1 = Methodref          #6.#25         //  java/lang/Object."<init>":()V
 *    #2 = Fieldref           #5.#26         //  com/yn/sample/CheckAndSet.f:I
 * 切换状态到常量池解析开始的状态
 */
if (currentLine.contains("Constant pool:")) {
    classConstantPoolInfoVisitor.visitConstantPoolStarted();
    state = ParseStateEnum.CONSTANT_POOL_STARTED.state;
    continue;
}

下一次循环，就会进入解析状态为常量池解析开始时的逻辑：

if (state == ParseStateEnum.CONSTANT_POOL_STARTED.state) {
  // 1.
  ConstantPoolItem item = ParseEngineHelper.parseConstantPoolItem(currentLine);
  if (item == null) {
	// 2.
    classConstantPoolInfoVisitor.visitConstantPoolEnd();
    state = ParseStateEnum.METHOD_INFO_STARTED.state;
    continue;
  } else {
    // 3
    classConstantPoolInfoVisitor.visitConstantPoolItem(item);
    continue;
  }
}

1处，当前行的格式应该为，

#1 = Methodref #6.#26 // java/lang/Object."<init>":()V

根据正则，解析当前行为以下结构：

public class ConstantPoolItem {
    /**
     * 格式如：
     * #1
     */
    private String id;

    /**
     * 如：
     * Methodref
     */
    private ConstantPoolItemTypeEnum constantPoolItemTypeEnum;

    /**
     * #6.#25
     */
    private String value;

    /**
     * 对于value的注释，由于value字段通常就是对常量池的id引用，
     * javap反编译后，为了方便你们阅读，这里会显示为相应的常量
     */
    private String comment;
}

2处，若是返回的常量池对象为null，说明当前常量池解析结束，则修改解析状态为：方法解析开始。
3处，若是解析出来了常量池对象，则回调visitor接口。

在解析过程当中，会不断回调咱们的visitor接口，好比：

package com.yn.sample.visitor;

import com.yn.sample.domain.ConstantPoolItem;

import java.util.ArrayList;

public interface ClassConstantPoolInfoVisitor {
    /**
     * 常量池解析开始
     */
    void visitConstantPoolStarted();

    /**
     * 解析到每个常量池对象时，回调本方法
     * @param constantPoolItem
     */
    void visitConstantPoolItem(ConstantPoolItem constantPoolItem);

    /**
     * 常量池解析结束
     */
    void visitConstantPoolEnd();

    /**
     * 获取最终的常量池对象
     * @return
     */
    ArrayList<ConstantPoolItem> getConstantPoolItemList();
}

总体流程

读取文件，获取字节码

package com.yn.sample;


@Component
@ComponentScan("com.yn.sample")
public class BootStrap {
    public static void main(String[] args) throws Throwable {
        AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext(BootStrap.class);
        /**
         * 解析文件
         */
        String filepath = "F:\\ownprojects\\all-simple-demo-in-work\\class-bytecode-analyse-engine\\target\\classes\\com\\yn\\sample\\a.txt";
        JavapClassFileParser javapClassFileParser = context.getBean(JavapClassFileParser.class);
        ClassInfo classInfo = javapClassFileParser.parse(filepath);

    }
}

字节码读取后，存在classInfo中。

调用CheckAndSet类的实例的checkAndSetF(int)接口，参数为12，即，调用以下方法：

public void checkAndSetF(int f) {
        if (f >= 0) {
            this.f = f;
        } else {
            throw new IllegalArgumentException();
        }
    }

构造本地变量list、操做数栈

private Object doExecute(Object target, MethodInfo methodInfo,
                         List<ConstantPoolItem> constantPoolItems, List<Object> arguments) throws Throwable {
    List<MethodInstructionVO> instructionVOList = methodInfo.getInstructionVOList();
    /**
     * 构造next字段,将字节码指令list转变为链表
     */
    assemblyInstructionList2LinkedList(instructionVOList);

    /**
     * 本地变量表,按照从javap中解析出来的：
     *     Code:
     *       stack=1, locals=4, args_size=2
     * 来建立本地变量的堆栈
     */
    Integer localVariablesSize = methodInfo.getMethodCodeStackSizeAndLocalVariablesTableSize().getLocalVariablesSize();
    List<Object> localVariables = constructLocalVariableList(target, arguments, localVariablesSize);

    /**
     * 构造指令map，方便后续跳转指令使用
     * key：指令的sequenceNum
     * value：指令
     */
    HashMap<String, MethodInstructionVO> instructionVOHashMap = new HashMap<>();
    for (MethodInstructionVO vo : instructionVOList) {
        instructionVOHashMap.put(vo.getSequenceNumber(), vo);
    }


    return null;
}

调用执行引擎逐行解释执行字节码

这部分参见前面，已经讲过。

总结

源码放在：

https://gitee.com/ckl111/class-bytecode-analyse-engine

目前没实现的有：

方法调用方法，只支持调用单个方法。方法堆栈待实现。
不少其余各类指令

目前只能执行下面这个类中的方法，后续遇到其余字节码指令，再慢慢加吧：

后续有时间再写其余的吧，若是你们有兴趣，能够本身写。