简易Parser入门【三】:数字、字母、括号Parser

咱们将设计一个Parser解析以下字符串:python

str1 = "ab2(c3(d)2(ef)"

每遇到一个数字都会将括号里的元素打印 “数字”倍:app

假设3(d)那么将打印ddd,2(ef)则打印efef。函数

若是遇到括号套括号,那么将括号内的元素处理好后,再打印相应数字倍:.net

例如:2(c3(d)),咱们会先将d打印三遍2(cddd)而后再展开里边的元素2遍:cdddcddd设计

因为数字、字母、括号都属于不一样的元素,咱们将定义一个元素状态字典,并创造一个分词器,先进行分词。这里的思路跟 简易Parser入门【二】 同样:code

from enum import Enum

class Mark1(Enum):  # 定义不一样类别
    En = 0
    LeftBracket = 1
    RightBracket = 2
    Numb = 3

咱们构造一个search字典,这样每来一个字符都能很快归类其属于什么类别 blog

class Parser2:

    def __init__(self):
        words = 'abcdefghijklmnopqrstuvwxyz'
        # ------ 12345678901234567890123456
        nums = '0123456789'
        self.search_dict = {}  # 构造Search字典
        for c in words:
            self.search_dict[c] = Mark1.En
        for n in nums:
            self.search_dict[n] = Mark1.Numb
        self.search_dict['('] = Mark1.LeftBracket
        self.search_dict[')'] = Mark1.RightBracket

        self.pos = 0
        self.length = None

咱们按照 简易Parser入门【二】 的方式一样构造一个分词器:递归

# -------------------- 该函数属于class Parser2的一部分 ----------------------
    def str_to_ast(self, str_in):
        # type: (str) -> None
        word_list = []
        last_state = self.search_dict[str_in[0]]
        for i, c in enumerate(str_in):
            curr_state = self.search_dict.get(c, Mark1.En)
            if curr_state != last_state:
                word_list.append((str_in[self.pos:i], last_state))
                self.pos = i
                last_state = curr_state
        print(word_list)
        print(list(map(lambda x: x[0], word_list)))

咱们调用一下看看效果,能够看到把字符串和类别都分好了:字符串

[('ab', <Mark1.En: 0>), ('2', <Mark1.Numb: 3>), ('(', <Mark1.LeftBracket: 1>), ('c', <Mark1.En: 0>), ('3', <Mark1.Numb: 3>), ('(', <Mark1.LeftBracket: 1>), ('d', <Mark1.En: 0>), (')', <Mark1.RightBracket: 2>), ('2', <Mark1.Numb: 3>), ('(', <Mark1.LeftBracket: 1>), ('ef', <Mark1.En: 0>)]

['ab', '2', '(', 'c', '3', '(', 'd', ')', '2', '(', 'ef']

接下来咱们递归构造语法树。此时,这里的语法树不太同样,咱们尝试引入字典来表示2(c3(d))这种状况:get

{2: ['c', {3: 'd'}]}

最终咱们的递归Parser表示以下:

# ---------------------- 该方法位于class Parser2 中 ------------------
    @staticmethod
    def iter_find(word_list, p2_in):
        # type: (list, Parser2) -> Optional[list, dict]
        save_list = []
        while p2_in.pos < len(word_list):
            curr_word = word_list[p2_in.pos]
            if curr_word[1] == Mark1.Numb:
                p2_in.pos += 2
                tmp_dict = {int(curr_word[0]): Parser2.iter_find(word_list, p2_in)}
                save_list.append(tmp_dict)
            elif curr_word[1] == Mark1.RightBracket:
                p2_in.pos += 1
                break
            else:
                save_list.append(curr_word[0])
                p2_in.pos += 1

        if len(save_list) == 1:
            return save_list[0]
        else:
            return save_list

咱们获得以下的树状结构:

['ab', {2: ['c', {3: 'd'}, {2: 'ef'}]}]

那么如何打印呢?一样运用递归打印:

@staticmethod
    def iter_print(ast_in):
        if isinstance(ast_in, list):
            tmp_str = ''
            for e in ast_in:
                tmp_str += Parser2.iter_print(e)
            return tmp_str
        elif isinstance(ast_in, dict):
            nums = list(ast_in.keys())[0]
            return nums * Parser2.iter_print(ast_in[nums])
        elif isinstance(ast_in, str):
            return ast_in
        else:
            return ''

最终打印效果以下:

abcdddefefcdddefef
相关文章
相关标签/搜索