从零写一个编译器（六）：语法分析之表驱动语法分析

时间 2019-11-09

标签一个编译器语法分析驱动繁體版

原文原文链接

项目的完整代码在 C2j-Compilerjava

前言

上一篇已经正式的完成了有限状态自动机的构建和足够判断reduce的信息，接下来的任务就是根据这个有限状态自动机来完成语法分析表和根据这个表来实现语法分析git

reduce信息

在完成语法分析表以前，还差最后一个任务，那就是描述reduce信息，来指导自动机是否该进行reduce操做github

reduce信息在ProductionsStateNode各自的节点里完成，只要遍历节点里的产生式，若是符号“.”位于表达式的末尾，那么该节点便可根据该表达式以及表达式对应的lookAhead set获得reduce信息数据结构

reduce信息用一个map来表示，key是能够进行reduce的符号，也就是lookahead sets中的符合，value则是进行reduce操做的产生式this

public HashMap<Integer, Integer> makeReduce() {
      HashMap<Integer, Integer> map = new HashMap<>();
      reduce(map, this.productions);
      reduce(map, this.mergedProduction);

      return map;
  }

  private void reduce(HashMap<Integer, Integer> map, ArrayList<Production> productions) {
      for (int i = 0; i < productions.size(); i++) {
          if (productions.get(i).canBeReduce()) {
              ArrayList<Integer> lookAhead = productions.get(i).getLookAheadSet();
              for (int j = 0; j < lookAhead.size(); j++) {
                  map.put(lookAhead.get(j), (productions.get(i).getProductionNum()));
              }
          }
      }
  }

语法分析表的构建

语法分析表的构建主要在StateNodeManager类里，能够先忽略loadTable和storageTableToFile的逻辑，这一部分主要是为了储存这张表，可以屡次使用debug

主要逻辑从while开始，遍历全部节点，先从跳转信息的Map里拿出跳转关系和跳转的目的节点，而后把这个跳转关系（这个本质上对应的是一开始Token枚举的标号）和目的节点的标号拷贝到另外一个map里。接着拿到reduce信息，找到以前对应在lookahead set里的符号，把它们的value改写成- （进行reduce操做的产生式编号），之因此写成负数，就是为了区分shift操做。code

因此HashMap<Integer, HashMap<Integer, Integer>>这个数据结构做为解析表表示：get

第一个Integer表示当前节点的编号
第二个Integer表示输入字符
第三个Integer表示，若是大于0则是作shift操做，小于0则根据推导式作reduce操做

public HashMap<Integer, HashMap<Integer, Integer>> getLrStateTable() {
      File table = new File("lrStateTable.sb");
      if (table.exists()) {
          return loadTable();
      }

      Iterator it;
      if (isTransitionTableCompressed) {
          it = compressedStateList.iterator();
      } else {
          it = stateList.iterator();
      }

      while (it.hasNext()) {
          ProductionsStateNode state = (ProductionsStateNode) it.next();
          HashMap<Integer, ProductionsStateNode> map = transitionMap.get(state);
          HashMap<Integer, Integer> jump = new HashMap<>();

          if (map != null) {
              for (Map.Entry<Integer, ProductionsStateNode> item : map.entrySet()) {
                  jump.put(item.getKey(), item.getValue().stateNum);
              }
          }

          HashMap<Integer, Integer> reduceMap = state.makeReduce();
          if (reduceMap.size() > 0) {
              for (Map.Entry<Integer, Integer> item : reduceMap.entrySet()) {

                  jump.put(item.getKey(), -(item.getValue()));
              }
          }

          lrStateTable.put(state.stateNum, jump);
      }

      storageTableToFile(lrStateTable);

      return lrStateTable;
  }

表驱动的语法分析

语法分析的主要过程在LRStateTableParser类里，由parse方法启动.input

和第二篇讲的同样须要一个输入堆栈，节点堆栈，其它的东西如今暂时不须要用到。在初始化的时候先把开始节点压入堆栈，当前输入字符设为EXT_DEF_LIST，而后拿到语法解析表it

public LRStateTableParser(Lexer lexer) {
    this.lexer = lexer;
    statusStack.push(0);
    valueStack.push(null);
    lexer.advance();
    lexerInput = Token.EXT_DEF_LIST.ordinal();
    lrStateTable = StateNodeManager.getInstance().getLrStateTable();
}

语法解析的步骤：

拿到当前节点和当前字符所对应的下一个操做，也就是action > 0是shift操做，action < 0是reduce操做
若是进入action > 0，也就是shift操做
1. 把当前状态节点和输入字符分别压入堆栈
2. 这里要区分若是当前的字符是终结符，这时候就能够直接读入下一个字符
3. 可是这里若是是非终结符，就应该直接用当前字符跳转到下一个状态。这里是一个须要注意的一个点，这里须要把当前的这个非终结符，放入到下一个节点的对应输入堆栈中，这样它进行reduce操做时弹出退栈的符号才是正确的
若是action > 0，也就是reduce操做
1. 拿到对应的产生式
2. 把产生式右边对应的状态节点弹出堆栈
3. 把完成reduce的这个符号放入输入堆栈

public void parse() {
      while (true) {
          Integer action = getAction(statusStack.peek(), lexerInput);

          if (action == null) {
              ConsoleDebugColor.outlnPurple("Shift for input: " + Token.values()[lexerInput].toString());
              System.err.println("The input is denied");
              return;
          }

          if (action > 0) {
              statusStack.push(action);
              text = lexer.text;

              // if (lexerInput == Token.RELOP.ordinal()) {
              //     relOperatorText = text;
              // }

              parseStack.push(lexerInput);

              if (Token.isTerminal(lexerInput)) {
                  ConsoleDebugColor.outlnPurple("Shift for input: " + Token.values()[lexerInput].toString() + "   text: " + text);

                  // Object obj = takeActionForShift(lexerInput);

                  lexer.advance();
                  lexerInput = lexer.lookAhead;
                  // valueStack.push(obj);
              } else {
                  lexerInput = lexer.lookAhead;
              }
          } else {
              if (action == 0) {
                  ConsoleDebugColor.outlnPurple("The input can be accepted");
                  return;
              }

              int reduceProduction = -action;
              Production product = ProductionManager.getInstance().getProductionByIndex(reduceProduction);
              ConsoleDebugColor.outlnPurple("reduce by product: ");
              product.debugPrint();

              // takeActionForReduce(reduceProduction);

              int rightSize = product.getRight().size();
              while (rightSize > 0) {
                  parseStack.pop();
                  // valueStack.pop();
                  statusStack.pop();
                  rightSize--;
              }

              lexerInput = product.getLeft();
              parseStack.push(lexerInput);
              // valueStack.push(attributeForParentNode);
          }
      }
  }

  private Integer getAction(Integer currentState, Integer currentInput) {
      HashMap<Integer, Integer> jump = lrStateTable.get(currentState);
      return jump.get(currentInput);
  }

歧义性语法

到如今已经完成了语法分析的全部内容，接下来就是语义分析了，可是在这以前还有一个须要说的是，咱们当前构造的有限状态自动机属于LALR(1)语法，即便LALR(1)语法已经足够强大，可是依旧有LALR(1)语法处理不了的语法，若是给出的推导式不符合，那么这个有限状态自动机依旧不能正确解析，可是以前给出的语法都是符合LALR(1)语法的

小结

这一篇主要就是

利用有限状态自动机和reduce信息完成语法解析表
利用语法解析表实现表驱动的语法解析