咱们将设计一个Parser解析以下字符串:python
str1 = "ab2(c3(d)2(ef)"
每遇到一个数字都会将括号里的元素打印 “数字”倍:app
假设3(d)那么将打印ddd,2(ef)则打印efef。函数
若是遇到括号套括号,那么将括号内的元素处理好后,再打印相应数字倍:.net
例如:2(c3(d)),咱们会先将d打印三遍2(cddd)而后再展开里边的元素2遍:cdddcddd设计
因为数字、字母、括号都属于不一样的元素,咱们将定义一个元素状态字典,并创造一个分词器,先进行分词。这里的思路跟 简易Parser入门【二】 同样:code
from enum import Enum class Mark1(Enum): # 定义不一样类别 En = 0 LeftBracket = 1 RightBracket = 2 Numb = 3
咱们构造一个search字典,这样每来一个字符都能很快归类其属于什么类别 blog
class Parser2: def __init__(self): words = 'abcdefghijklmnopqrstuvwxyz' # ------ 12345678901234567890123456 nums = '0123456789' self.search_dict = {} # 构造Search字典 for c in words: self.search_dict[c] = Mark1.En for n in nums: self.search_dict[n] = Mark1.Numb self.search_dict['('] = Mark1.LeftBracket self.search_dict[')'] = Mark1.RightBracket self.pos = 0 self.length = None
咱们按照 简易Parser入门【二】 的方式一样构造一个分词器:递归
# -------------------- 该函数属于class Parser2的一部分 ---------------------- def str_to_ast(self, str_in): # type: (str) -> None word_list = [] last_state = self.search_dict[str_in[0]] for i, c in enumerate(str_in): curr_state = self.search_dict.get(c, Mark1.En) if curr_state != last_state: word_list.append((str_in[self.pos:i], last_state)) self.pos = i last_state = curr_state print(word_list) print(list(map(lambda x: x[0], word_list)))
咱们调用一下看看效果,能够看到把字符串和类别都分好了:字符串
[('ab', <Mark1.En: 0>), ('2', <Mark1.Numb: 3>), ('(', <Mark1.LeftBracket: 1>), ('c', <Mark1.En: 0>), ('3', <Mark1.Numb: 3>), ('(', <Mark1.LeftBracket: 1>), ('d', <Mark1.En: 0>), (')', <Mark1.RightBracket: 2>), ('2', <Mark1.Numb: 3>), ('(', <Mark1.LeftBracket: 1>), ('ef', <Mark1.En: 0>)] ['ab', '2', '(', 'c', '3', '(', 'd', ')', '2', '(', 'ef']
接下来咱们递归构造语法树。此时,这里的语法树不太同样,咱们尝试引入字典来表示2(c3(d))这种状况:get
{2: ['c', {3: 'd'}]}
最终咱们的递归Parser表示以下:
# ---------------------- 该方法位于class Parser2 中 ------------------ @staticmethod def iter_find(word_list, p2_in): # type: (list, Parser2) -> Optional[list, dict] save_list = [] while p2_in.pos < len(word_list): curr_word = word_list[p2_in.pos] if curr_word[1] == Mark1.Numb: p2_in.pos += 2 tmp_dict = {int(curr_word[0]): Parser2.iter_find(word_list, p2_in)} save_list.append(tmp_dict) elif curr_word[1] == Mark1.RightBracket: p2_in.pos += 1 break else: save_list.append(curr_word[0]) p2_in.pos += 1 if len(save_list) == 1: return save_list[0] else: return save_list
咱们获得以下的树状结构:
['ab', {2: ['c', {3: 'd'}, {2: 'ef'}]}]
那么如何打印呢?一样运用递归打印:
@staticmethod def iter_print(ast_in): if isinstance(ast_in, list): tmp_str = '' for e in ast_in: tmp_str += Parser2.iter_print(e) return tmp_str elif isinstance(ast_in, dict): nums = list(ast_in.keys())[0] return nums * Parser2.iter_print(ast_in[nums]) elif isinstance(ast_in, str): return ast_in else: return ''
最终打印效果以下:
abcdddefefcdddefef