本节咱们先从一个简易的能够识别四则运算和整数值的词法分析扫描器开始。它实现的功能也很简单,就是读取咱们给定的文件,并识别出文件中的token将其输出。git
这个简易的扫描器支持的词法元素只有五个:ide
咱们须要事先定义好每个token,使用枚举类型来表示:函数
//defs.h // Tokens enum { T_PLUS, T_MINUS, T_STAR, T_SLASH, T_INTLIT };
在扫描到token后将其存储在一个以下的结构体中,当标记是 T_INTLIT(即整数文字)时,该intvalue 字段将保存咱们扫描的整数值:oop
//defs.h // Token structure struct token { int token; int intvalue; };
咱们如今假定有一个文件,其内部的的代码就是一个四则运算表达式:spa
2 + 34 * 5 - 8 / 3
咱们要实现的是读取他的每个有效字符并输出,就像这样:token
Token intlit, value 2 Token + Token intlit, value 34 Token * Token intlit, value 5 Token - Token intlit, value 8 Token / Token intlit, value 3
咱们看到了最终要实现的目标,让咱们来一步步分析须要的功能。ip
// Get the next character from the input file. static int next(void) { int c; if (Putback) { // Use the character put c = Putback; // back if there is one Putback = 0; return c; } c = fgetc(Infile); // Read from input file if ('\n' == c) Line++; // Increment line count return c; }
// Skip past input that we don't need to deal with, // i.e. whitespace, newlines. Return the first // character we do need to deal with. static int skip(void) { int c; c = next(); while (' ' == c || '\t' == c || '\n' == c || '\r' == c || '\f' == c) { c = next(); } return (c); }
// Return the position of character c // in string s, or -1 if c not found static int chrpos(char *s, int c) { char *p; p = strchr(s, c); return (p ? p - s : -1); } // Scan and return an integer literal // value from the input file. Store // the value as a string in Text. static int scanint(int c) { int k, val = 0; // Convert each character into an int value while ((k = chrpos("0123456789", c)) >= 0) { val = val * 10 + k; c = next(); } // We hit a non-integer character, put it back. putback(c); return val; }
因此如今咱们能够在跳过空格的同时读取字符;若是咱们读到一个字符太远,咱们也能够放回一个字符。咱们如今能够编写咱们的第一个词法扫描器:rem
int scan(struct token *t) { int c; // Skip whitespace c = skip(); // Determine the token based on // the input character switch (c) { case EOF: return (0); case '+': t->token = T_PLUS; break; case '-': t->token = T_MINUS; break; case '*': t->token = T_STAR; break; case '/': t->token = T_SLASH; break; default: // If it's a digit, scan the // literal integer value in if (isdigit(c)) { t->intvalue = scanint(c); t->token = T_INTLIT; break; } printf("Unrecognised character %c on line %d\n", c, Line); exit(1); } // We found a token return (1); }
如今咱们能够读取token并将其返回。get
main() 函数打开一个文件,而后扫描它的令牌:input
void main(int argc, char *argv[]) { ... init(); ... Infile = fopen(argv[1], "r"); ... scanfile(); exit(0); }
并scanfile()在有新token时循环并打印出token的详细信息:
// List of printable tokens char *tokstr[] = { "+", "-", "*", "/", "intlit" }; // Loop scanning in all the tokens in the input file. // Print out details of each token found. static void scanfile() { struct token T; while (scan(&T)) { printf("Token %s", tokstr[T.token]); if (T.token == T_INTLIT) printf(", value %d", T.intvalue); printf("\n"); } }
咱们本节的内容就到此为止。下一部分中,咱们将构建一个解析器来解释咱们输入文件的语法,并计算并打印出每一个文件的最终值。