实现一个简单的解释器（3）

时间 2020-03-04

标签实现一个简单解释器繁體版

原文原文链接

译自：https://ruslanspivak.com/lsbasi-part3/
（已得到做者受权）html

我今天早上醒来，心想：“为何咱们学习新技能如此困难？”python

我不认为这仅仅是由于不够努力，我认为缘由之一多是咱们花费大量时间和精力来经过阅读和观看获取知识，而没有足够的时间经过实践将知识转化为技能。以游泳为例，你能够花费大量时间阅读数百本有关游泳的书籍，与经验丰富的游泳者和教练交谈数小时，观看全部可用的培训视频，可是当你第一次跳入泳池时，你仍然会像石头同样沉下去。git

无论咱们有多么了解咱们的学科，其实并不重要，重要的是将这些知识付诸实践，这样才能将其转化为技能。为了帮助你进行练习，我在第一部分和第二部分中都添加了练习，我保证会在今天的文章和之后的文章中增长更多练习：)github

好吧，让咱们开始吧！express

到目前为止，你已经了解如何解释两个整数相加或相减的算术表达式，例如"7 + 3"或"12 - 9"。今天，我将讨论如何解析（识别）和解释包含任意数量的正负运算符的算术表达式，例如"7 - 3 + 2 - 1"。编程

咱们能够用如下语法图(syntax diagram)表示本文中将要处理的算术表达式：
编程语言

什么是语法图？语法图是编程语言的语法规则(syntax rules)的图形表示(graphical representation)。基本上，语法图直观地显示了编程语言中容许哪些语句，哪些不容许。函数

语法图很是易于阅读：只需遵循箭头指示的路径，有的路径表示选择，有的路径表示循环。学习

咱们来阅读上面的语法图：一个term后面可选地跟上加号或者减号，而后又跟上另外一个term，而后又可选地带上加号或减号，以后能够继续循环。你可能想知道什么是term，在这篇文章中，term只是一个整数。测试

语法图主要用于两个目的：

一、它们以图形方式表示编程语言的规范（语法）(grammar)。
二、它们能够用来帮助编写解析器(parser),咱们能够按照简单的规则将图表映射(map)为代码。

你已经了解到，识别Token流中的短语的过程称为解析(parsing)，执行该工做的解释器或编译器部分称为解析器(parser)，解析也称为语法分析(syntax analysis)，咱们也将解析器称为语法分析器(syntax analyzer)。

根据上面的语法图，如下全部算术表达式都是有效的：

3
3 + 4
7-3 + 2-1

因为不一样编程语言中算术表达式的语法规则很是类似，所以咱们可使用Python Shell来“测试(test)”语法图。启动你的Python Shell，亲自看看：

>>> 3
3
>>> 3 + 4
7
>>> 7 - 3 + 2 - 1
5

这里并不意外，和咱们与预想的同样。

注意，表达式"3 +"不是一个有效(valid)的算术表达式，由于根据语法图，加号后必须加上一个term（整数），不然是语法错误，再次使用Python Shell尝试一下：

>>> 3 +
  File "<stdin>", line 1
    3 +
      ^
SyntaxError: invalid syntax

可以使用Python Shell进行测试很是好，但咱们更想将上面的语法图映射为代码，并使用咱们本身的解释器进行测试。

能够从前面的文章（第1部分和第2部分）中知道expr函数实现了解析器(parser)和解释器(interperter)，解析器仅识别结构，以确保它与某些规范相对应，而且一旦解析器成功识别（解析）了该表达式，解释器便会实际计算(evaluate)该表达式(expression)。

如下代码段显示了与该图相对应的解析器代码。语法图中的矩形框成为解析整数的term函数，而expr函数仅遵循语法图流程(syntax diagram flow)：

def term(self):
    self.eat(INTEGER)

def expr(self):
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            self.term()

能够看到expr首先调用term函数，而后expr函数有一个while循环，能够执行零次或屡次，在循环内，解析器根据Token（是加号仍是减号）进行选择，能够看出上面的代码确实遵循了算术表达式的语法图流程。

解析器自己不解释(interpret)任何内容：若是识别不出来表达式，它会抛出语法错误。
让咱们修改expr函数并添加解释器代码：

def term(self):
    """Return an INTEGER token value"""
    token = self.current_token
    self.eat(INTEGER)
    return token.value

def expr(self):
    """Parser / Interpreter """
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    result = self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            result = result + self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            result = result - self.term()

    return result

因为解释器须要计算(evaluate)表达式，所以对term函数进行了修改以返回整数值，而且对expr函数进行了修改以在适当的位置执行加法和减法并返回解释的结果。
即便代码很是简单，我仍是建议你花一些时间来研究它。

如今咱们来看完整的解释器代码。

这是新版本计算器的源代码，它能够处理包含任意数量整数的加减运算的有效算术表达式：

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS, '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5 + 3", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    ##########################################################
    # Lexer code                                             #
    ##########################################################
    def error(self):
        raise Exception('Invalid syntax')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            self.error()

        return Token(EOF, None)

    ##########################################################
    # Parser / Interpreter code                              #
    ##########################################################
    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def term(self):
        """Return an INTEGER token value."""
        token = self.current_token
        self.eat(INTEGER)
        return token.value

    def expr(self):
        """Arithmetic expression parser / interpreter."""
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        result = self.term()
        while self.current_token.type in (PLUS, MINUS):
            token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
                result = result + self.term()
            elif token.type == MINUS:
                self.eat(MINUS)
                result = result - self.term()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

将以上代码保存到calc3.py文件中，或直接从GitHub下载，它能够处理以前显示的语法图中得出的算术表达式。

这是我在笔记本上的运行效果：

$ python calc3.py
calc> 3
3
calc> 7 - 4
3
calc> 10 + 5
15
calc> 7 - 3 + 2 - 1
5
calc> 10 + 1 + 2 - 3 + 4 + 6 - 15
5
calc> 3 +
Traceback (most recent call last):
  File "calc3.py", line 147, in <module>
    main()
  File "calc3.py", line 142, in main
    result = interpreter.expr()
  File "calc3.py", line 123, in expr
    result = result + self.term()
  File "calc3.py", line 110, in term
    self.eat(INTEGER)
  File "calc3.py", line 105, in eat
    self.error()
  File "calc3.py", line 45, in error
    raise Exception('Invalid syntax')
Exception: Invalid syntax

记住我在文章开头提到的，如今该作练习了：

一、为仅包含乘法和除法的算术表达式绘制语法图，例如"7 * 4 / 2 * 3"。
二、修改计算器的源代码以解释仅包含乘法和除法的算术表达式，例如"7 * 4 / 2 * 3"。
三、从头开始编写一个解释器，处理诸如"7 - 3 + 2 - 1"之类的算术表达式。使用你喜欢的任何编程语言，并在不看示例的状况下将其写在脑海中，请考虑所涉及的组件：一个词法分析器，它接受输入并将其转换为Token流；解析器，将从词法分析器提供的Token流中尝试识别该流中的结构；以及在解析器成功解析（识别）有效算术表达式后生成结果的解释器。将这些组合在一块儿，花一些时间将学到的知识翻译成可用于算术表达式的解释器。

最后再来复习回忆一下：

一、什么是语法图？
二、什么是语法分析？
三、什么是语法分析器？

你看！你一直阅读到最后，感谢你今天在这里闲逛，不要忘了作练习，:)敬请期待。