从零写一个编译器（四）：语法分析之构造有限状态自动机

时间 2019-11-09

标签一个编译器语法分析构造有限状态自动机繁體版

原文原文链接

项目的完整代码在 C2j-Compilerjava

经过上一篇对几个构造自动机的基础数据结构的描述，如今就能够正式来构造有限状态自动机node

咱们先用一个小一点的语法推导式来描述这个过程git

s -> e
e -> e + t
e -> t
t -> t * f
t -> f
f -> ( e )
f -> NUM

初始化

状态0是状态机的初始状态，它包含着语法表达式中的起始表达式，也就是编号为0的表达式：github

0: s -> . e数据结构

这里的点也就是以前Production类中的dosPos闭包

负责这个操做的方法在StateNodeManager类中，前面先判断当前目录下是否是已经构建好语法分析表了，若是有的话就不须要再次构建了。ui

productionManager.buildFirstSets();能够先略过，后面会讲到。this

ProductionsStateNode就是用来描述状态节点的debug

public static int stateNumCount = 0;
/** Automaton state node number */
public int stateNum;
/** production of state node */
public ArrayList<Production> productions;

接着就是放入开始符号做为第一个状态节点，也就是这一步的初始化code

public void buildTransitionStateMachine() {
    File table = new File("lrStateTable.sb");
    if (table.exists()) {
        return;
    }
    ProductionManager productionManager = ProductionManager.getInstance();
    productionManager.buildFirstSets();
    ProductionsStateNode state = getStateNode(productionManager.getProduction(Token.PROGRAM.ordinal()));

    state.buildTransition();

    debugPrintStateMap();
}

对起始推导式作闭包操做

注意以前的 . ,也就是Production里的dosPos，这一步就有用了，利用这个点来作闭包操做

对.右边的符号作闭包操做，也就是说若是 . 右边的符号是一个非终结符，那么确定有某个表达式，->左边是该非终结符，把这些表达式添加进来

s -> . e
e -> . e + t
e -> . t

对新添加进来的推导式反复重复这个操做，直到全部推导式->右边是非终结符的那个所在推导式都引入，这也就是ProductionsStateNode里的makeClosure方法

主要逻辑就是先将这个节点中的全部产生式压入堆栈中，再反复的作闭包操做。closureSet是每一个节点中保存闭包后的产生式

private void makeClosure() {
    Stack<Production> productionStack = new Stack<Production>();
    for (Production production : productions) {
        productionStack.push(production);
    }

    if (Token.isTerminal(production.getDotSymbol())) {
        ConsoleDebugColor.outlnPurple("Symbol after dot is not non-terminal, ignore and process next item");
        continue;
    }
            
    while (!productionStack.empty()) {
        Production production = productionStack.pop();
        int symbol = production.getDotSymbol();
        ArrayList<Production> closures = productionManager.getProduction(symbol);
        for (int i = 0; closures != null && i < closures.size(); i++) {
            if (!closureSet.contains(closures.get(i))) {
                closureSet.add(closures.get(i));
                productionStack.push(closures.get(i));
            }
        }
    }
}

对引入的产生式进行分区

把 . 右边拥有相同非终结符的表达式划入一个分区，好比

s -> . e
e -> . e + t

就做为同一个分区。最后把每一个分区中的表达式中的 . 右移动一位，造成新的状态节点

s -> e .
e -> e . + t

分区操做就在ProductionsStateNode类中的partition方法中

主要逻辑也很简单，遍历当前的closureSet，若是分区不存在，就以产生式点的右边做为key，产生式列表做为value，而且若是当前产生式列表里不包含这个产生式，就把这个产生式加入当前的产生式列表

private void partition() {
    ConsoleDebugColor.outlnPurple("==== state begin make partition ====");

    for (Production production : closureSet) {
        int symbol = production.getDotSymbol();
        if (symbol == Token.UNKNOWN_TOKEN.ordinal()) {
            continue;
        }

        ArrayList<Production> productionList = partition.get(symbol);
        if (productionList == null) {
            productionList = new ArrayList<>();
            partition.put(production.getDotSymbol(), productionList);
        }

        if (!productionList.contains(production)) {
            productionList.add(production);
        }
    }

    debugPrintPartition();
    ConsoleDebugColor.outlnPurple("==== make partition end ====");
}

对全部分区节点构建跳转关系

根据每一个节点 . 左边的符号来判断输入什么字符来跳入该节点

好比， . 左边的符号是 t, 因此当状态机处于状态0时，输入时 t 时，跳转到状态1。

. 左边的符号是e, 因此当状态机处于状态 0 ，且输入时符号e时，跳转到状态2：
0 – e -> 2

这个操做的实现再ProductionsStateNode的makeTransition方法中

主要逻辑是遍历全部分区，每一个分区都是一个新的节点，因此拿到这个分区的跳转关系，也就是partition的key，即以前产生式的点的右边。而后构造一个新的节点和两个节点之间的关系

private void makeTransition() {
    for (Map.Entry<Integer, ArrayList<Production>> entry : partition.entrySet()) {
        ProductionsStateNode nextState = makeNextStateNode(entry.getKey());

        transition.put(entry.getKey(), nextState);

        stateNodeManager.addTransition(this, nextState, entry.getKey());
    }

    debugPrintTransition();

    extendFollowingTransition();
}

makeNextStateNode的逻辑也很简单，就是拿到这个分区的产生式列表，而后返回一个新节点

private ProductionsStateNode makeNextStateNode(int left) {
    ArrayList<Production> productions = partition.get(left);
    ArrayList<Production> newProductions = new ArrayList<>();

    for (int i = 0; i < productions.size(); i++) {
        Production production = productions.get(i);
        newProductions.add(production.dotForward());
    }

    return stateNodeManager.getStateNode(newProductions);
}

stateNodeManager已经出现不少次了，它是类StateNodeManager，它的做用是管理节点，分配节点，统一节点。以后对节点的压缩和语法分析表的最终构建都在这里完成，这是后话了。

上面用到的两个方法：

transitionMap至关于一个跳转表：key是起始节点，value是一个map，这个map的key是跳转关系，也就是输入一个终结符或者非终结符，value则是目标节点

public void addTransition(ProductionsStateNode from, ProductionsStateNode to, int on) {
        HashMap<Integer, ProductionsStateNode> map = transitionMap.get(from);
        if (map == null) {
            map = new HashMap<>();
        }

        map.put(on, to);
        transitionMap.put(from, map);
}

getStateNode先从判断若是这个节点没有建立过，建立过的节点都会加入stateList中，就建立一个新节点。若是存在就会返回这个原节点

public ProductionsStateNode getStateNode(ArrayList<Production> productions) {
    ProductionsStateNode node = new ProductionsStateNode(productions);

    if (!stateList.contains(node)) {
        stateList.add(node);
        ProductionsStateNode.increaseStateNum();
        return node;
    }

    for (ProductionsStateNode sn : stateList) {
        if (sn.equals(node)) {
            node = sn;
        }
    }

    return node;
}

对全部新生成的节点重复构建

这时候的第一轮新节点才刚刚完成，到等到全部节点都完成节点的构建才算是真正的完成，在makeTransition中调用的extendFollowingTransition正是这个做用

private void extendFollowingTransition() {
    for (Map.Entry<Integer, ProductionsStateNode> entry : transition.entrySet()) {
        ProductionsStateNode state = entry.getValue();
        if (!state.isTransitionDone()) {
            state.buildTransition();
        }
    }
}

小结

建立有限状态自动机的四个步骤

makeClosure
partition
makeTransition
最后重复这些步骤直到全部的节点都构建完毕

至此咱们对

public void buildTransition() {
    if (transitionDone) {
        return;
    }
    transitionDone = true;

    makeClosure();
    partition();
    makeTransition();
}

的四个过程都已经完成，自动机的构建也算完成，应该进行语法分析表的建立了，可是这个自动机还有些问题，下一篇会来改善它。

另外个人github博客：https://dejavudwh.cn/