手写编译器之词法分析器一

时间 2019-11-09

标签手写编译器词法分析器繁體版

原文原文链接

　　写一个编译器，首先要知道的就是什么是编译器，我以为能看到我这篇文章的基本上都知道了。我认为，编译器就是node

让计算机读懂代码的程序，在这个程序里，定义了各类规则（编程语言的语法），只要人们按照这个规则和计算机说ios

话（编程）就能让计算机懂得咱们想干吗。编程

　　编译器包括几个模块，也能够说是过程，即词法分析，语法分析，中间代码生成等等。好吧我认可我知道的不清楚，数组

不过万物起源词法分析（我编的）必定没问题。这里咱们就先来第一步，词法分析。词法分析是编译器中比较简单的模块了，数据结构

也是最基础的模块。它的做用就是将输入的程序文本文件切割成一个一个的单词和符号，以便接下来的模块使用。编程语言

　　好了，高深的道理就不讲了，我如今的目标就是要写一个函数，这个函数接收一个文件地址，将该文件中的代码分割ide

成单词和符号，每一个单词和符号被称为一个token，返回一个链表，存储全部的token。函数定义就写为：函数

TokenList_tag LexExcute(string sourcefilepath);//TokenList_tag是链表头指针，指向头节点，sourcefilepath为源文件地址

　　我将词法分析程序写在lex.cpp文件中，二话不说先上几行代码，顿时感受自信心爆棚。代码4-7行能够用一行this

using namespace std;代替spa

1 #include<iostream>
2 #include<string >
3 #include<fstream>
4 using std::cout;
5 using std::string;
6 using std::endl;
7 using std::ifstream;

　　讲道理如今应该先定义一下token了，不过我还不知道token怎么定义，就先不这么搞了，先解决读取文件的问题，

定义一个string变量（filepath）存储文件地址，定义文件流（in）用于读取文件内容，而后从流中每次读取一行，存入

字符串变量line_str，这样思路就很清晰了，咱们从line_str中逐个取出字符，而后分析。

1 ifstream  in;//文件流
2 string    filepath;//文件路径
3 string    line_str;//存储每次读出的行
4 int       line_len;//读出行的长度
5 int       line_pos;//逐字符处理时，用于记录位置
6 int       line_num;//记录当前源文件第几行，可能之后报语法错误的时候有用
7 char      ch;//逐字符处理时，记录当前字符
8 string    token_str;//记录token字符串

　　为防止出错，咱们定义初始化函数init()，将全部变量的初始化放在该函数中，以下：

 1 void init()
 2 {
 3     cout<<"func enter: init"<<endl;
 4     line_pos=0;
 5     line_num=0;
 6     token_str="";
 7     ch='\0';
 8     //end_of_file=false;
 9     //end_of_lex=false;
10     //tokenlist=new TokenNode_tag;
11     //tokenlist->token_str="#";
12     //tokenlist->next=NULL;
13     //listtail=tokenlist;
14     in.open(filepath.c_str());
15     cout<<"func end: init"<<endl;
16 }

　　在此函数执行以前，filepath就已经在函数外赋值了，我懒得传参数，就在该函数调用前加了一句：

filepath=sourcefilepath; //sourcefilepath是传进来的参数。

　　下面，咱们就开始定义token数据结构，目前我能想到的数据结构属性只有两个，一个token_str，

一个token_type。token_str用来保存token字符串，而token_type记录该token是什么类型，好比标识符，

关键字，数字等等。在定义token前先定义TokenType（枚举类型），以下：

 1 enum TokenType
 2 {
 3 //顺序不可改变
 4     KEYWORDS_INT,
 5     KEYWORDS_DOUBLE,
 6     KEYWORDS_FLOAT,
 7     KEYWORDS_IF,
 8     KEYWORDS_ELSE,
 9     KEYWORDS_ELSIF,
10     KEYWORDS_WHILE,
11     KEYWORDS_FOR,
12 
13 
14     IDENTIFY,
15     OP_EQUAL,//==
16     OP_ASSIGN,//=
17     OP_LP,//(
18     OP_RP,//)
19     OP_ADD,//+
20     OP_SUB,//-
21     OP_MUL,//*
22     OP_DIV,// /
23     OP_SEMI,//;
24     NUM_FLOAT,
25     NUM_INT
26 };

　　TokenType里面的注释确定就很清晰了，就很少介绍，下面就是TokenNode的定义：

1 typedef struct TokenNode_tag
2 {
3     string    token_str;
4     TokenType token_type;
5     struct    TokenNode_tag   *next;//用于制做链表
6 }*TokenList_tag,TokenNode_tag;

　　可能会有人疑惑，命名为何要加一个tag呢？首先，名字是什么都无所谓（关键字除外），其次

加一个tag是为了区别内部使用和外部调用，所谓内部就是lex.cpp中的函数使用，外部就是为了之后的

其余模块调用（到时候会给这些结构从新起名为TokenNode）。

　　好了，万事俱备，只欠最关键的函数了：

1 TokenList_tag LexExcute(string sourcefilepath)
2 {  
3     cout<<"func enter: main"<<endl;
4     filepath=sourcefilepath;//;
5     init();
　　　　//这里将写关键代码，一个大while循环
6     cout<<"func end: main"<<endl;
7  }

　　这个函数做为主函数，他的工做原理就是循环从line_str中取出字符，而后根据字符判断目前是否是一个单词

或符号，好比出现空格就说明字符串结束等（说法不许确）。好吧，我知道这里有个自动机什么的，我也讲不清楚，

下面直接说个人实现方法：

　　我将词法分析运做过程分为几个阶段，每一个阶段都用一种状态记录，好比说，当while大循环第一次运做的时候，

此时为初始状态（STATUS_NON），在这个状态下若取出字符ch为数字（0-9），那么词法分析状态就转变为数字

态（STATUS_NUM）,在该状态下若取出字符ch为数字则状态不变，若ch为 ‘.’ （小数点）说明数字为浮点型，状态转为浮点

态（STATUS_FLOAT），若ch为空或者其余字符，说明数字结束，状态转为初始态，开始下一个循环。

　　可能我本身以为讲的挺清晰，看到的人反而有困难，不如我就画个图：

　　好吧，真是不画不知道本身画多丑，不过抛开这些仍是挺清晰的吧。。。额，这不重要，这应该就有传说

中的自动机的影子了吧，好吧，不强求能不能看懂了，最下面会有函数完整代码，看懂代码确定就没问题了。

　　在写主函数（LexExcute）以前，咱们还要作一些必要工做：

 1 bool      end_of_file;//初始为false，当读取文件结束的时候置为true
 2 bool      end_of_lex;//当读取文件结束，而其当前行也分析完毕时置为true（词法分析结束标志）
 3 TokenList_tag tokenlist;//词法分析器执行结束后产生的 token  串（主函数返回链表的头指针）
 4 TokenNode_tag*listtail;//链表尾指针，用于尾部插入节点 5 void getline();//文件中读取一行
 6 void getch();//获取一个字符
 7 bool ischar();//是否为a-z或者A-Z
 8 bool isnum();//是否为0-9
 9 void concat();//链接到token_str
10 //
11 void backwords();//回退一个字符
12 void tokenlist_insert(TokenNode_tag*);//链表插入函数
13 void tokenlist_visit();//链表遍历函数，用于检查正误，对功能没用用
14 int iskeywords(string );//判断是不是关键字

　　关键字数组，自动机状态定义具体函数实现就看下面的完整代码吧：

 1  string keywords[]=
 2 {
 3     //顺序不可改变
 4     "int",
 5     "double",
 6     "float",
 7     "if",
 8     "else",
 9     "elsif",
10     "while",
11     "for"
12 };
13 
14 enum LexStatus
15 {
16     STATUS_NON,
17     STATUS_NUM,
18     STATUS_STR,
19     STATUS_FLOAT,
20     STATUS_ASSIGN,
21 
22 };

  1 TokenList_tag LexExcute(string sourcefilepath)
  2 {
  3     cout<<"func enter: main"<<endl;
  4     filepath=sourcefilepath;//;
  5     init();
  6     getline();
  7     LexStatus lexstatus=STATUS_NON;
  8     TokenNode_tag *tokennode;
  9     while(!end_of_lex)
 10     {
 11         getch();
 12         if(lexstatus==STATUS_NON)//初始状态下
 13         {
 14             cout<<"LexStatus: STATUS_NON"<<endl;
 15             if(ischar()||ch=='_')
 16             {
 17                 lexstatus=STATUS_STR;
 18                 cout<<"LexStatus: STATUS_NON -> STATUS_STR"<<endl;
 19                 concat();
 20 
 21             }
 22             else if(isnum())
 23             {
 24                 lexstatus=STATUS_NUM;
 25                 cout<<"LexStatus: STATUS_NON -> STATUS_NUM"<<endl;
 26                 concat();
 27             }
 28             else if(ch=='(')
 29             {
 30                 concat();
 31                 tokennode=new TokenNode_tag;
 32                 tokennode->token_str=token_str;
 33                 tokennode->token_type=OP_LP;
 34                 tokennode->next=NULL;
 35                 tokenlist_insert(tokennode);
 36                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 37                 lexstatus=STATUS_NON;
 38                 token_str="";
 39             }
 40             else if(ch==')')
 41             {
 42                 concat();
 43                 tokennode=new TokenNode_tag;
 44                 tokennode->token_str=token_str;
 45                 tokennode->token_type=OP_RP;
 46                 tokennode->next=NULL;
 47                 tokenlist_insert(tokennode);
 48                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 49                 lexstatus=STATUS_NON;
 50                 token_str="";
 51             }
 52             else if(ch==' ')
 53             {
 54                 cout<<"do nothing"<<endl;
 55             }
 56             else if(ch=='\n')
 57             {
 58                 cout<<"do nothing"<<endl;
 59             }
 60             else if(ch=='=')
 61             {
 62                 concat();
 63                 cout<<"LexStatus: STATUS_NON -> STATUS_ASSIGN"<<endl;
 64                 lexstatus=STATUS_ASSIGN;
 65                 token_str="";
 66             }
 67             else if(ch=='+')
 68             {
 69                 concat();
 70                 tokennode=new TokenNode_tag;
 71                 tokennode->token_str=token_str;
 72                 tokennode->token_type=OP_ADD;
 73                 tokennode->next=NULL;
 74                 tokenlist_insert(tokennode);
 75                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 76                 lexstatus=STATUS_NON;
 77                 token_str="";
 78 
 79             }
 80             else if(ch=='-')
 81             {
 82                 concat();
 83                 tokennode=new TokenNode_tag;
 84                 tokennode->token_str=token_str;
 85                 tokennode->token_type=OP_SUB;
 86                 tokennode->next=NULL;
 87                 tokenlist_insert(tokennode);
 88                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
 89                 lexstatus=STATUS_NON;
 90                 token_str="";
 91             }
 92             else if(ch=='*')
 93             {
 94                 concat();
 95                 tokennode=new TokenNode_tag;
 96                 tokennode->token_str=token_str;
 97                 tokennode->token_type=OP_MUL;
 98                 tokennode->next=NULL;
 99                 tokenlist_insert(tokennode);
100                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
101                 lexstatus=STATUS_NON;
102                 token_str="";
103             }
104             else if(ch=='/')
105             {
106                 concat();
107                 tokennode=new TokenNode_tag;
108                 tokennode->token_str=token_str;
109                 tokennode->token_type=OP_DIV;
110                 tokennode->next=NULL;
111                 tokenlist_insert(tokennode);
112                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
113                 lexstatus=STATUS_NON;
114                 token_str="";
115             }
116             else if(ch==';')
117             {
118                 concat();
119                 tokennode=new TokenNode_tag;
120                 tokennode->token_str=token_str;
121                 tokennode->token_type=OP_SEMI;
122                 tokennode->next=NULL;
123                 tokenlist_insert(tokennode);
124                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
125                 lexstatus=STATUS_NON;
126                 token_str="";
127             }
128 
129         }//lexstatus
130         else if(lexstatus==STATUS_ASSIGN)
131         {
132             if(ch=='=')//==
133             {
134                 concat();
135                 tokennode=new TokenNode_tag;
136                 tokennode->token_str=token_str;
137                 tokennode->token_type=OP_EQUAL;
138                 tokennode->next=NULL;
139                 tokenlist_insert(tokennode);
140                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
141                 lexstatus=STATUS_NON;
142                 token_str="";
143             }
144             else  //=
145             {
146                 tokennode=new TokenNode_tag;
147                 tokennode->token_str=token_str;
148                 tokennode->token_type=OP_ASSIGN;
149                 tokennode->next=NULL;
150                 tokenlist_insert(tokennode);
151                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
152                 lexstatus=STATUS_NON;
153                 token_str="";
154                 backwords();
155             }
156         }
157         else if(lexstatus==STATUS_FLOAT)
158         {
159             if(isnum())
160             {
161                 concat();
162             }
163             else
164             {
165                 tokennode=new TokenNode_tag;
166                 tokennode->token_str=token_str;
167                 tokennode->token_type=NUM_FLOAT;
168                 tokennode->next=NULL;
169                 tokenlist_insert(tokennode);
170                 cout<<"LexStatus: STATUS_FLOAT -> STATUS_NON"<<endl;
171                 lexstatus=STATUS_NON;
172                 token_str="";
173                 backwords();
174             }
175         }
176         else if(lexstatus==STATUS_NUM)
177         {
178             if(isnum())
179             {
180                 concat();
181             }
182             else if(ch=='.')
183             {
184                 concat();
185                 lexstatus=STATUS_FLOAT;
186             }
187             else
188             {
189                 tokennode=new TokenNode_tag;
190                 tokennode->token_str=token_str;
191                 tokennode->token_type=NUM_INT;
192                 tokennode->next=NULL;
193                 tokenlist_insert(tokennode);
194                 cout<<"LexStatus: STATUS_NUM -> STATUS_NON"<<endl;
195                 lexstatus=STATUS_NON;
196                 token_str="";
197                 backwords();
198             }
199         }
200         else if(lexstatus==STATUS_STR)
201         {
202             if(ischar()||ch=='_'||isnum())
203             {
204                 concat();
205             }
206             else
207             {
208                 tokennode=new TokenNode_tag;
209                 tokennode->token_str=token_str;
210                 int check=iskeywords(token_str);
211                 if(check!=-1)tokennode->token_type=(TokenType)check;
212                 else tokennode->token_type=IDENTIFY;
213                 tokennode->next=NULL;
214                 tokenlist_insert(tokennode);
215                 cout<<"LexStatus: STATUS_STR -> STATUS_NON"<<endl;
216                 lexstatus=STATUS_NON;
217                 token_str="";
218                 backwords();
219             }
220         }
221 
222 
223 
224     }
225 
226     cout<<"func end: main"<<endl;
227     tokenlist_visit();
228     return tokenlist;
229 }

完整代码：（代码是从好多文件剪出来的，应该能够运行吧，额。。。）感受代码好长，高手应该用很短就好了吧，惭愧。

  1 #include<iostream>
  2 #include<string >
  3 #include<fstream>
  4 using std::cout;
  5 using std::string;
  6 using std::endl;
  7 using std::ifstream;
  8  string keywords[]=
  9 {
 10     //顺序不可改变
 11     "int",
 12     "double",
 13     "float",
 14     "if",
 15     "else",
 16     "elsif",
 17     "while",
 18     "for"
 19 };
 20 
 21 enum LexStatus
 22 {
 23     STATUS_NON,
 24     STATUS_NUM,
 25     STATUS_STR,
 26     STATUS_FLOAT,
 27     STATUS_ASSIGN,
 28 
 29 };
 30 enum TokenType
 31 {
 32 //顺序不可改变
 33     KEYWORDS_INT,
 34     KEYWORDS_DOUBLE,
 35     KEYWORDS_FLOAT,
 36     KEYWORDS_IF,
 37     KEYWORDS_ELSE,
 38     KEYWORDS_ELSIF,
 39     KEYWORDS_WHILE,
 40     KEYWORDS_FOR,
 41 
 42 
 43     IDENTIFY,
 44     OP_EQUAL,//==
 45     OP_ASSIGN,//=
 46     OP_LP,//(
 47     OP_RP,//)
 48     OP_ADD,//+
 49     OP_SUB,//-
 50     OP_MUL,//*
 51     OP_DIV,// /
 52     OP_SEMI,//;
 53     NUM_FLOAT,
 54     NUM_INT
 55 };
 56 typedef struct TokenNode_tag
 57 {
 58     string    token_str;
 59     TokenType token_type;
 60     struct    TokenNode_tag   *next;
 61 }*TokenList_tag,TokenNode_tag;
 62 
 63 
 64 ifstream  in;
 65 string    filepath;
 66 string    line_str;
 67 int       line_len;
 68 int       line_pos;
 69 int       line_num;
 70 char      ch;
 71 string    token_str;
 72 
 73 bool      end_of_file;
 74 bool      end_of_lex;
 75 TokenList_tag tokenlist;//词法分析器执行结束后产生的 token  串
 76 TokenNode_tag*listtail;
 77 void init();
 78 void getline();
 79 void getch();
 80 bool ischar();
 81 bool isnum();
 82 void concat();
 83 void backwords();
 84 void tokenlist_insert(TokenNode_tag*);
 85 void tokenlist_visit();
 86 int iskeywords(string );
 87 TokenList_tag LexExcute(string sourcefilepath)
 88 {
 89     cout<<"func enter: main"<<endl;
 90     filepath=sourcefilepath;//;
 91     init();
 92     getline();
 93     LexStatus lexstatus=STATUS_NON;
 94     TokenNode_tag *tokennode;
 95     while(!end_of_lex)
 96     {
 97         getch();
 98         if(lexstatus==STATUS_NON)//初始状态下
 99         {
100             cout<<"LexStatus: STATUS_NON"<<endl;
101             if(ischar()||ch=='_')
102             {
103                 lexstatus=STATUS_STR;
104                 cout<<"LexStatus: STATUS_NON -> STATUS_STR"<<endl;
105                 concat();
106 
107             }
108             else if(isnum())
109             {
110                 lexstatus=STATUS_NUM;
111                 cout<<"LexStatus: STATUS_NON -> STATUS_NUM"<<endl;
112                 concat();
113             }
114             else if(ch=='(')
115             {
116                 concat();
117                 tokennode=new TokenNode_tag;
118                 tokennode->token_str=token_str;
119                 tokennode->token_type=OP_LP;
120                 tokennode->next=NULL;
121                 tokenlist_insert(tokennode);
122                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
123                 lexstatus=STATUS_NON;
124                 token_str="";
125             }
126             else if(ch==')')
127             {
128                 concat();
129                 tokennode=new TokenNode_tag;
130                 tokennode->token_str=token_str;
131                 tokennode->token_type=OP_RP;
132                 tokennode->next=NULL;
133                 tokenlist_insert(tokennode);
134                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
135                 lexstatus=STATUS_NON;
136                 token_str="";
137             }
138             else if(ch==' ')
139             {
140                 cout<<"do nothing"<<endl;
141             }
142             else if(ch=='\n')
143             {
144                 cout<<"do nothing"<<endl;
145             }
146             else if(ch=='=')
147             {
148                 concat();
149                 cout<<"LexStatus: STATUS_NON -> STATUS_ASSIGN"<<endl;
150                 lexstatus=STATUS_ASSIGN;
151                 token_str="";
152             }
153             else if(ch=='+')
154             {
155                 concat();
156                 tokennode=new TokenNode_tag;
157                 tokennode->token_str=token_str;
158                 tokennode->token_type=OP_ADD;
159                 tokennode->next=NULL;
160                 tokenlist_insert(tokennode);
161                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
162                 lexstatus=STATUS_NON;
163                 token_str="";
164 
165             }
166             else if(ch=='-')
167             {
168                 concat();
169                 tokennode=new TokenNode_tag;
170                 tokennode->token_str=token_str;
171                 tokennode->token_type=OP_SUB;
172                 tokennode->next=NULL;
173                 tokenlist_insert(tokennode);
174                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
175                 lexstatus=STATUS_NON;
176                 token_str="";
177             }
178             else if(ch=='*')
179             {
180                 concat();
181                 tokennode=new TokenNode_tag;
182                 tokennode->token_str=token_str;
183                 tokennode->token_type=OP_MUL;
184                 tokennode->next=NULL;
185                 tokenlist_insert(tokennode);
186                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
187                 lexstatus=STATUS_NON;
188                 token_str="";
189             }
190             else if(ch=='/')
191             {
192                 concat();
193                 tokennode=new TokenNode_tag;
194                 tokennode->token_str=token_str;
195                 tokennode->token_type=OP_DIV;
196                 tokennode->next=NULL;
197                 tokenlist_insert(tokennode);
198                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
199                 lexstatus=STATUS_NON;
200                 token_str="";
201             }
202             else if(ch==';')
203             {
204                 concat();
205                 tokennode=new TokenNode_tag;
206                 tokennode->token_str=token_str;
207                 tokennode->token_type=OP_SEMI;
208                 tokennode->next=NULL;
209                 tokenlist_insert(tokennode);
210                 cout<<"LexStatus: STATUS_NON -> STATUS_NON"<<endl;
211                 lexstatus=STATUS_NON;
212                 token_str="";
213             }
214 
215         }//lexstatus
216         else if(lexstatus==STATUS_ASSIGN)
217         {
218             if(ch=='=')//==
219             {
220                 concat();
221                 tokennode=new TokenNode_tag;
222                 tokennode->token_str=token_str;
223                 tokennode->token_type=OP_EQUAL;
224                 tokennode->next=NULL;
225                 tokenlist_insert(tokennode);
226                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
227                 lexstatus=STATUS_NON;
228                 token_str="";
229             }
230             else  //=
231             {
232                 tokennode=new TokenNode_tag;
233                 tokennode->token_str=token_str;
234                 tokennode->token_type=OP_ASSIGN;
235                 tokennode->next=NULL;
236                 tokenlist_insert(tokennode);
237                 cout<<"LexStatus: STATUS_ASSIGN -> STATUS_NON"<<endl;
238                 lexstatus=STATUS_NON;
239                 token_str="";
240                 backwords();
241             }
242         }
243         else if(lexstatus==STATUS_FLOAT)
244         {
245             if(isnum())
246             {
247                 concat();
248             }
249             else
250             {
251                 tokennode=new TokenNode_tag;
252                 tokennode->token_str=token_str;
253                 tokennode->token_type=NUM_FLOAT;
254                 tokennode->next=NULL;
255                 tokenlist_insert(tokennode);
256                 cout<<"LexStatus: STATUS_FLOAT -> STATUS_NON"<<endl;
257                 lexstatus=STATUS_NON;
258                 token_str="";
259                 backwords();
260             }
261         }
262         else if(lexstatus==STATUS_NUM)
263         {
264             if(isnum())
265             {
266                 concat();
267             }
268             else if(ch=='.')
269             {
270                 concat();
271                 lexstatus=STATUS_FLOAT;
272             }
273             else
274             {
275                 tokennode=new TokenNode_tag;
276                 tokennode->token_str=token_str;
277                 tokennode->token_type=NUM_INT;
278                 tokennode->next=NULL;
279                 tokenlist_insert(tokennode);
280                 cout<<"LexStatus: STATUS_NUM -> STATUS_NON"<<endl;
281                 lexstatus=STATUS_NON;
282                 token_str="";
283                 backwords();
284             }
285         }
286         else if(lexstatus==STATUS_STR)
287         {
288             if(ischar()||ch=='_'||isnum())
289             {
290                 concat();
291             }
292             else
293             {
294                 tokennode=new TokenNode_tag;
295                 tokennode->token_str=token_str;
296                 int check=iskeywords(token_str);
297                 if(check!=-1)tokennode->token_type=(TokenType)check;
298                 else tokennode->token_type=IDENTIFY;
299                 tokennode->next=NULL;
300                 tokenlist_insert(tokennode);
301                 cout<<"LexStatus: STATUS_STR -> STATUS_NON"<<endl;
302                 lexstatus=STATUS_NON;
303                 token_str="";
304                 backwords();
305             }
306         }
307 
308 
309 
310     }
311 
312     cout<<"func end: main"<<endl;
313     tokenlist_visit();
314     return tokenlist;
315 }//main
316 int iskeywords(string str)
317 {
318     int i=0;
319     int len=sizeof(keywords)/sizeof(string);
320     cout<<"length of keyword[]: "<<len<<endl;
321     while(i<len)
322     {
323         if(str==keywords[i])
324         {
325             cout<<str<<" is keywords"<<endl;
326             return i;
327         }
328         ++i;
329     }
330     cout<<str<<" isn't keywords"<<endl;
331     return -1;
332 }
333 void tokenlist_visit()
334 {
335     TokenNode_tag *tn;
336     tn=tokenlist;
337     while(tn->next!=NULL)
338     {
339         cout<<tn->next->token_str<<"   token_type"<<tn->next->token_type<<endl;
340         tn=tn->next;
341     }
342 
343 }
344 void tokenlist_insert(TokenNode_tag* token)
345 {
346     cout<<"func enter: tokenlist_insert"<<endl;
347     listtail->next=token;
348     listtail=listtail->next;
349     cout<<"func end: tokenlist_insert"<<endl;
350 }
351 bool ischar()
352 {
353     cout<<"func enter: ischar  ->ch:"<<ch<<endl;
354     if((ch>='a'&&ch<='z')||(ch>='A'&&ch<='Z'))
355     {
356         cout<<"func end: ischar ->"<<ch<<" is char"<<endl;
357         return true;
358     }
359     else
360     {
361         cout<<"func end: ischar ->"<<ch<<" isn't char"<<endl;
362         return false;
363     }
364 }
365 bool isnum()
366 {
367     cout<<"func enter: isnum ->ch:"<<ch<<endl;
368     if(ch>='0'&&ch<='9')
369     {
370         cout<<"func end: isnum ->"<<ch<<" is num"<<endl;
371         return true;
372     }
373     else
374     {
375         cout<<"func end: isnum ->"<<ch<<" isn't num"<<endl;
376         return false;
377     }
378 
379 }
380 /**
381 将ch加到token_str后
382 */
383 void concat()
384 {
385     cout<<"func enter: concat ->token:"<<token_str<<" ch:"<<ch<<endl;
386     token_str+=ch;
387     cout<<"func end: concat ->token:"<<token_str<<endl;
388 }
389 void backwords()
390 {
391     cout<<"func enter: backwords -> pos:"<<line_pos<<endl;
392     if(line_pos>0)line_pos--;
393     else cout<<"this is first ch in this line,can not backwords!";
394     cout<<"func end: backwords -> pos:"<<line_pos<<endl;
395 }
396 void getch()
397 {
398     cout<<"func enter: getch"<<endl;
399     if(line_pos<line_len)//从当前行获取一个字符
400     {
401         ch=line_str[line_pos++];
402         cout<<"ch: "<<ch<<endl;
403     }
404     else//此行结束
405     {
406         cout<<"end of line"<<endl;
407         if(end_of_file)
408         {
409             cout<<"file over!!!"<<endl;
410             end_of_lex=true;
411             ch='\0';//结束标志
412         }
413         else//文件并未结束，获取新行
414         {
415             cout<<"new line"<<endl;
416             getline();
417             getch();
418         }
419     }
420     cout<<"func end: getch"<<endl;
421 }
422 void init()
423 {
424     cout<<"func enter: init"<<endl;
425     line_pos=0;
426     line_num=0;
427     token_str="";
428     ch='\0';
429     end_of_file=false;
430     end_of_lex=false;
431     tokenlist=new TokenNode_tag;
432     tokenlist->token_str="#";
433     tokenlist->next=NULL;
434     listtail=tokenlist;
435     in.open(filepath.c_str());
436     cout<<"func end: init"<<endl;
437 }
438 
439 void getline()
440 {
441     cout<<"func enter: getline"<<endl;
442     if(in)
443     {
444         line_num++;
445         getline(in,line_str);
446         //in>>line_str;
447         line_str+='\n';
448         line_len=line_str.length();
449         if(line_len==0)cout<<"line"<<line_num<<": "<<line_str<<endl;
450         else
451         {
452             cout<<"line"<<line_num<<": "<<line_str<<endl;
453             cout<<"length: "<<line_len<<endl;
454         }
455         line_pos=0;
456     }
457     else
458     {
459         cout<<"end of file"<<endl;
460         end_of_file=true;
461     }
462     cout<<"func end: getline"<<endl;
463 }

View Code

欢迎发现错误，欢迎留言！！！