(cons '(贰 . 输入/输出) 《为本身写本-Guile-书》)

时间 2019-12-05

标签 cons 输入输出为本身写本-Guile-书繁體版

原文原文链接

(car 《为本身写本-Guile-书》)segmentfault

这个世界上，有不少人讨厌 Lisp 语言。Guile 是 Sheme 的一种方言，然后者是 Lisp 的一种方言，所以 Guile 天然也会被这些人讨厌。在他们给出的本身之因此讨厌 Lisp 的众多理由中，第一个理由是他们所阅读的一些 Lisp 教材，读了一半，结果连文件读写这么简单的程序都写不出来（见 http://c2.com/cgi/wiki?WhyWeHateLisp）。为了避免让他们说『看！又有人写了本 Lisp 书，即便你读了一半，依然写不出能够读写文件的程序』，所以我不顾这本书刚刚进行到第二章这一悲催的现实，毅然的开始讲述如何用 Guile 编写文件读写方面一些基本的程序——真的猛士勇于直面文件的读写，勇于修改变量的值。希望他们不会抱怨：『有人写了本 Lisp 书，从第二章就开始讲文件读写，太变态了！』函数

端口

Guile 将串行的输入/输出抽象为端口（Port）的读写操做。端口能够关联到文件、终端以及字符串，也就是说，经过操做端口，可以以统一的形式读写文件、终端以及字符串等串行对象。工具

Guile 交互解释器之因此可以读取你的输入的表达式，并将求值结果在屏幕上显示出来，全拜 current-input-port 与 current-output-port 端口所赐，这两个端口分别关联了系统的标准输入与标准输出设备（一般是那个黑乎乎的控制台或终端界面）。这两个端口是默认存在的，因此从 current-input-port 中读取数据，或者向 current-output-port 写入数据，不须要指定它们。例如：ui

(define c (read-char))  ;; 变量 c 的值是交互输入的字符
(write-char c)          ;; 将 c 的值在屏幕上显示出来

上述代码等价于：编码

(define a (read-char (current-input-port)))
(write-char a (current-output-port))

current-input-port 与 current-output-port 本质上分别是能够返回默认的输入与输出端口的函数。spa

在 C 语言中，stdin 与 stdout 分别与 Guile 的 current-input-port 与 current-output-port 所返回的端口相对应，并且也有相似于 read-char 与 write-char 这样的函数。例如：命令行

int c = getchar();
putchar(c);

C 语言也容许向指定的 stdin 与 stdout 进行数据的读写，只是读写数据的函数前面带有 f 前缀，例如：code

int c = fgetc(stdin);
fputc(c, stdout);

getchar，putchar，fgetc，fputc 均为 C 标准库提供的函数，显然它们在命名上缺少一致性，这是先辈的罪。对象

字符串端口

将端口与一个字符串相关联，而后用 read-char 与 write-char 的方式来读写这个字符串，是否是很方便？我以为这比其余语言单独为字符串创建一套读写机制要简单干净一些。utf-8

open-input-string 函数能够将一个输入端口与字符串创建关联。例如，在 Guile 交互解释环境中，输入如下『指令』：

> (define in (open-input-string "hello"))

in 即是一个输入端口，它关联着一个字符串 "hello world"。因为这是一个输入端口，因此能够用 read-char 从中读取字符：

> (read-char in)
#\h
> (read-char in)
#\e
> (read-char in)
#\l
> (read-char in)
#\l
> (read-char in)
#\o
> (read-char in)
#<eof>

open-output-string 能够将输出端口关联到字符串：

> (define out (open-output-string))

用 write-char 向 out 写入几个字符：

> (write-char #\h out)
> (write-char #\e out)
> (write-char #\l out)
> (write-char #\l out)
> (write-char #\o out)

也能够用 display 来写：

> (display "world" out)

要想获取 out 里存储的字符串信息，能够用 get-output-string 函数：

> (get-output-string out)
"hello world"

若是决定再也不使用这些端口，可以使用 close-port 将它们关闭：

> (close-input-port in)
#t
> (close-output-port out)
#t
> (and (port-closed? in) (port-closed? out))
#t

文件端口

文件端口，就是与文件相关联的端口。打开文件端口的函数是 open-input-file 与 open-output-file。用于操做字符串端口的那些函数，对于文件端口一样适用。

下面的 Guile 代码表示，打开文本文件 foo.txt，而后读取它的前两个字符，最后关闭文件：

(define file (open-input-file "foo.txt" #:encoding "utf-8"))
(display (read-char file))
(display (read-char file))file
(close-input-port file)

与之等效的 C 代码以下：

FILE *file = fopen("foo.txt", "r");
printf("%c", fgetc(file));
printf("%c", fgetc(file));
fclose(file);

值得注意的是，Guile 函数 open-input-file 可以为其读取的文件指定编码，而 C 标准库函数 fopen 则没有此功能。

当 read-char 函数读至文件末尾时，Guile 提供了 eof-object? 函数来判断它返回的字符是否为文件结束符，即：

(eof-object? (read-char file))

与该谓词等效的 C 代码以下：

feof(fgetc(file));

与文本文件的读取过程相对应，对于文件的写入过程，Guile 提供了 open-output-file 以及 write-char 函数，其用法示例以下：

(define file (open-output-file "bar.txt" #:encoding "utf-8"))
(write-char #\测 file)
(write-char #\试 file)
(close-output-port file)

等效的 C 代码以下：

FILE *file = fopen("bar.txt", "w");
fputc('c', file);
fputc('s', file);
fclose(file);

因为 C 标准库函数 fputc 不支持 UTF-8 编码的字符，因此我只能用 c 与 s 来代替。

guile-wc

Linux 系统中有一个命令行工具 wc，能够用它统计文本文件的行数、单词数、字符数等信息。例如，对于下面这份文本文件　foo.txt：

用 C 写用 C 写程序，会以为本身在摆弄一台小马达。
用 Guile 写程序，则以为本身拿了根小树枝唆使一只毛毛虫。

应用下面这三条命令：

$ wc -l foo.txt
$ wc -w foo.txt
$ wc -m foo.txt

可分别获得如下输出结果：

2 foo.txt
8 foo.txt
69 foo.txt

根据 wc 统计的信息，能够称 foo.txt 的内容由 2 行文本构成——它包含 8 个单词，共 69 个字符。我相信 wc，因此我不打算数一遍。

如今，我要作的是，用 Guile 写一个名为 guile-wc.scm 脚本，让它去作上述 wc 所作的事，看看它们的结果是否一致。

;; guile-wc.scm
(define (get-file-name args)
  (cond ((null? (cdr args)) (car args))
        (else (get-file-name (cdr args)))))
(define (arg-parser args opt)
  (cond ((null? args) #f)
        ((string=? (car args) opt) #t)
        (else (arg-parser (cdr args) opt))))
(define (guile-wc args file)
  (define (lwm-count l w m)
    (let ((char (read-char file)))
      (cond ((eof-object? char) `(,l ,w ,m))
            ((char=? char #\newline) (lwm-count (+ l 1) (+ w 1) (+ m 1)))
            ((char=? char #\space) (lwm-count l (+ w 1) (+ m 1)))
            (else (lwm-count l w (+ m 1))))))
  (let ((lwm (lwm-count 0 0 0)))
    (cond ((arg-parser args "-l") (car lwm))
          ((arg-parser args "-w") (cadr lwm))
          ((arg-parser args "-m") (caddr lwm))
          (else lwm))))
(define args (command-line))
(define file (open-input-file (get-file-name args) #:encoding "utf-8"))
(display (guile-wc args file)) (newline)
(close-input-port file)

按如下次序执行 guile-wc.scm 脚本：

$ guile guile-wc.scm -l foo.txt
$ guile guile-wc.scm -w foo.txt
$ guile guile-wc.scm -m foo.txt

可分别获得如下输出结果：

2
8
69

这些结果与上述的 wc 的输出结果相同。

在 guile-wc.scm 脚本中，get-file-name 与 arg-parser 函数的定义均来自上一章，而且对后者进行了大幅删减——由于 guile-wc.scm 不须要处理带参数值的选项。

guile-wc 函数可改写为如下形式：

(define (lwm-count file l w m)
  (let ((char (read-char file)))
    (cond ((eof-object? char) `(,l ,w ,m))
          ((char=? char #\newline) (lwm-count file (+ l 1) (+ w 1) (+ m 1)))
          ((char=? char #\space) (lwm-count file l (+ w 1) (+ m 1)))
          (else (lwm-count file l w (+ m 1))))))
(define (guile-wc args file)
  (let ((lwm (lwm-count file 0 0 0)))
    (cond ((arg-parser args "-l") (car lwm))
          ((arg-parser args "-w") (cadr lwm))
          ((arg-parser args "-m") (caddr lwm))
          (else lwm))))

原版的 guile-wc 函数的那种写法只是想代表，Guile 容许函数的嵌套定义，即在一个函数的定义中定义另外一个函数。若是 lwm-count 函数只会应用于 guile-wc 函数的内部，那么将其定义嵌入 guile-wc 函数，这种方式是合理且值得提倡的，由于它能够直接访问外围环境中的变量，例如 file。

若是使用赋值运算，lwm-count 函数的参数能够省略：

(define (guile-wc args file)
  (let ((l 0) (w 0) (m 0))
    (define (lwm-count)
      (let ((char (read-char file)))
        (cond ((eof-object? char) `(,l ,w ,m))
              ((char=? char #\newline) (begin (set! l (+ l 1))
                                              (set! w (+ w 1))
                                              (set! m (+ m 1))
                                              (lwm-count)))
              ((char=? char #\space) (begin (set! w (+ w 1))
                                            (set! m (+ m 1))
                                            (lwm-count)))
              (else (begin (set! m (+ m 1)) (lwm-count))))))
    (let ((lwm (lwm-count)))
      (cond ((arg-parser args "-l") (car lwm))
            ((arg-parser args "-w") (cadr lwm))
            ((arg-parser args "-m") (caddr lwm))
            (else lwm)))))

其中，像 (set! w (+ w 1)) 这样的表达式，相似于 C 语言中的 w = w + 1。

反引号与逗号

须要注意，lwm-count 函数的返回结果是一个列表，即：

((eof-object? char) `(,l ,w ,m))

这个表达式中，引号以及逗号，不用不行。若是不用反引号，Guile 解释器，会认为 (,l ,w ,m) 是在应用一个名为 ,l 的函数，它的参数为 ,w 与 ,m。

引号是个语法糖，它其实是 quasiquote 函数。例如：

`(,l ,w ,m)

实质上是 (quasiquote (,l ,w ,m))。

若是不用逗号，那么 Guile 解释器会认为 (quasiquote (l w m)) 中的 l，w 以及 m 都是符号。例如：

> (symbol? `(l w m))
#f
> (symbol? (car `(l w m)))
#t

所谓符号，可简单的将其理解为 Guile 的变量名与函数名。

Guile 的变量，本质上是将一个符号绑定到一个值：

(define 符号 值)

Guile 的函数，本质上是将一个符号绑定到一个匿名的计算过程：

(define 函数
  (lambda (形参) <计算过程>))

(quasiquote (,l ,w ,m)) 中的逗号，是迫使 Guile 将列表中的符号 l，w 以及 m 做为表达式进行求值。

逗号也是一个语法糖，它其实是 unquote 函数。,w 其实是 (unquote w)。

Guile 中还有一个引号，一般状况下能够用它引用符号或列表，可是当列表中某些元素须要 unquote 时，须要用反引号。换句话说，引号的威力太大，它能够将列表中的一切东西拍扁为符号，例如 '(,l ,w ,m)，列表 (,l ,w ,m) 会被它拍的原型毕露：

((unquote l) (unquote w) (unquote m))

而反引号容许列表中的某些符号从引号中逃逸出来。

挑战

假设有一种被称为 zero 文档的文本格式，其扩展名为 .zero。例以下面这份 hello-world.zero 文档：

\starttext
下面咱们用 C 语言写一个 Hello World 程序：

@ hello.c 文件 #
#include <stdioi.h>

int main(void) {
        # 在屏幕上打印 "Hello world!" 字符串 @;
        return 0;
}
@

可以使用 C 标准库提供的 \type{printf} 函数在终端屏幕上显示文本，即：

@ 在屏幕上打印 "Hello world!" 字符串 #
print("Hello World!\n");
@

编译这个程序的命令为：

\starttyping
$ gcc hello.c -o hello
\stoptyping
\stoptext

如今，问题来了。我想对这种格式的文档进行区域划分，划分规则是，不管是 @ ... # 格式的文本仍是 @ 符号独占一行的文本，划分位置均在 @ 符号以前。上面的示例文档，按照这种划分规则，可将其划分为四个区域（间隔线仅做示意用）：

\starttext
下面咱们用 C 语言写一个 Hello World 程序：

----------------------------------------------------------------
@ hello.c 文件 #
#include <stdioi.h>

int main(void) {
        # 在屏幕上打印 "Hello world!" 字符串 @;
        return 0;
}
----------------------------------------------------------------
@

可以使用 C 标准库提供的 \type{printf} 函数在终端屏幕上显示文本，即：

@ 在屏幕上打印 "Hello world!" 字符串 #
print("Hello World!\n");
-----------------------------------------------------------------
@

编译这个程序的命令为：

\starttyping
$ gcc hello.c -o hello
\stoptyping
\stoptext

下面的代码可做为参考答案，是我做为 Guile 的初学者，用了一个下午的时间写出来的。

(define (zero-doc-split file blocks cache)
  (define (error-exit)
    (begin (display "Error: ") (display (get-output-string cache)) (newline) (exit)))
  (define (all-chars-before-@-are-spaces? text)
    (cond ((null? text) #t)
          (else (let ((first (car text)))
                  (cond ((char=? first #\newline) #t)
                        ((char=? first #\space)
                         (all-chars-before-@-are-spaces? (cdr text)))
                        (else #f))))))
  (define (@-alone?)
    (cond ((not (all-chars-before-@-are-spaces?
                 (reverse (string->list (get-output-string (car blocks)))))) #f)
          (else (let ((next-char (read-char file)))
                  (cond ((eof-object? next-char) #f)
                        (else
                         (begin
                           (write-char next-char cache)
                           (cond ((char=? next-char #\newline) #t)
                                 ((char=? next-char #\space) (@-alone?))
                                 (else #f)))))))))
  (define (@-lead-lines?)
    (define (all-chars-after-#-are-spaces?)
      (let ((next-char (read-char file)))
        (begin (write-char next-char cache)
               (cond ((eof-object? next-char) #f)
                     ((char=? next-char #\newline) #t)
                     ((char=? next-char #\space) (all-chars-after-#-are-spaces?))
                     (else #f)))))
    (cond ((not (all-chars-before-@-are-spaces?
                 (reverse (string->list (get-output-string (car blocks)))))) #f)
          (else (let ((next-char (read-char file)))
                  (cond ((eof-object? next-char) #f)
                        (else (begin
                                (write-char next-char cache)
                                (cond ((char=? next-char #\@) (error-exit))
                                      ((char=? next-char #\#)
                                       (cond ((all-chars-after-#-are-spaces?) #t)
                                             (else (error-exit))))
                                      (else (@-lead-lines?))))))))))
  (let ((char (read-char file)))
    (cond ((eof-object? char) blocks)
          ((char=? char #\@)
           (begin (write-char char cache)
                  (cond ((or (@-alone?) (@-lead-lines?))
                         (begin
                           (set! blocks (cons cache blocks))
                           (set! cache (open-output-string))
                           (zero-doc-split file blocks cache)))
                        (else (begin
                                (display (get-output-string cache) (car blocks))
                                (close-output-port cache)
                                (set! cache (open-output-string))
                                (zero-doc-split file blocks cache))))))
          (else (begin
                  (write-char char (car blocks))
                  (zero-doc-split file blocks cache))))))
(define (display-zero-blocks blocks)
  (cond ((null? blocks) #nil)
        (else (begin 
                (display (get-output-string (car blocks)))
                (cond ((null? (cdr blocks)) (newline))
                      (else  (display "----")
                             (newline)))
                (display-zero-blocks (cdr blocks))))))
(setlocale LC_ALL "")
(define (get-file-name args)
  (cond ((null? (cdr args)) (car args))
        (else (get-file-name (cdr args)))))
(define file (open-input-file (get-file-name (command-line)) #:encoding "utf-8"))
(define blocks (cons (open-output-string) #nil))
(define cache (open-output-string))
(display-zero-blocks (reverse (zero-doc-split file blocks cache)))
(close-input-port file)

在 zero-doc-split 函数的实现中，使用了两个个此前未涉及到的函数，string->list，reverse，这里给出它们的用法示例：

> (string->list "hello world!")
(#\h #\e #\l #\l #\o #\space #\w #\o #\r #\l #\d #\!)
> (reverse (string->list "hello world!"))
(#\! #\d #\l #\r #\o #\w #\space #\o #\l #\l #\e #\h)

利用这两章所学的知识，本身动手实现这两函数也不难，例如：

(define (my-reverse list)
    (define (my-reverse-iter list new-list)
      (cond ((null? list) new-list)
            (else (my-reverse-iter (cdr list) (cons (car list) new-list)))))
    (my-reverse-iter list #nil))
(define (string-to-list s)
  (let ((port (open-input-string s)))
    (define (string-to-list-iter list)
      (let ((char (read-char port)))
        (cond ((eof-object? char) (begin
                                    (close-input-port port)
                                    (my-reverse list)))
              (else (string-to-list-iter (cons char list))))))
    (string-to-list-iter #nil)))

(string-to-list "hello world!")
(my-reverse (string-to-list "hello world!"))

在 zero-doc-split 函数的实现中，还用到了函数 setlocale。对于中文用户而言，若是想让 Guile 程序在终端中显示中文字符，须要：

(setlocale LC_ALL "")

这样作的用意是，对系统 Locale 不做任何假设，这样 Guile 程序的 Locale 就会因系统中的 Locale 环境变量的值而异。

(cdr 《为本身写本-Guile-书》)