newLISP你也行 --- 字符串

时间 2019-11-12

标签 newlisp 也行字符串繁體版

原文原文链接

#############################################################################
# Name:         newLISP你也行 --- 流
# Author:       黄登(winger)
# Project:      http://code.google.com/p/newlisp-you-can-do
# Gtalk:        free.winger@gmail.com
# Gtalk-Group: zen0code@appspot.com
# Blog:         http://my.opera.com/freewinger/blog/
# QQ-Group:     31138659
# 大道至简 -- newLISP
#
# Copyright 2012 黄登(winger) All rights reserved.
# Permission is granted to copy, distribute and/or
# modify this document under the terms of the GNU Free Documentation License,
# Version 1.2 or any later version published by the Free Software Foundation;
# with no Invariant Sections, no Front-Cover Texts,and no Back-Cover Texts.
#############################################################################

        自由固不是钱所买到的，但可以为钱而卖掉。        --- 鲁迅

    现实中, 在人和计算机交互中, 涉及到最多的就是字符串了.
    以致于大部分的数据输入都被当作字符串来处理.
    若是说列表是天地, 那字符串就必定是这天地间的横流.

一. newLISP中的字符串
    Strings in newLISP code

    newLISP 处理字符串的能力无疑是强大的, 各类方便的刀具都给你备齐了, 每一把都
是居家宅男, 杀码越货, 的必备神器.

    广告完毕, 言归正传.~_~~

    在nl里有三种方法能够表示字符串:

    用双引号围起来 ;优势按键更少, 并且转义字符有效, 好比"\n"
    (set 's "this is a string")

    用花括号围起来 ;优势过滤一切转义字符
    (set 's {this is a string})

    用专门的标识码围起来 ;除了上面的优势外,他还能够构造大于2048字节的字符串
    (set 's [text]this is a string[/text])

    第一和第二中方法构建的字符串不能超过 2048 个字节.
    不少人会以为既然有了第二种, 为何还要有第一种?
    让咱们测试下下面的代码

> {\{}

ERR: string token too long : "\\{}"

> "\""
"\""

    看到没, 花括号的好处就是过滤一切的转义字符, 转义字符到了里面没有任何做用.
若是你要print 一个字符串:

> (print {\n road to freedom})
\n road to freedom"\\n road to freedom"
> (print "\n road to freedom")

road to freedom"\n road to freedom"

    花括号内内的转义字符没效了, 根本没换行. 这三种方法就第一种方法, 能够在内部
使用本身的TAG 双引号.

    第二种方法, 花括号, 这种方法我是很是鼓励使用的, 为何, 方便啊, 不用在转义
字符前加个反斜杠了, 在构造正则表达式的时候尤为好用.

> (println "\t45")
        45
"\t45"
> (println "\\t45")
\t45
"\\t45"
> (println {\t45})
\t45
"\\t45"

> (regex "\\d" "a9b6c4")
("9" 1 1)

> (regex {\d} "a9b6c4")
("9" 1 1)

    字符串一般支持如下几种转义字符:

character   description
\"          for a double quote inside a quoted string
\n          for a line-feed character (ASCII 10)
\r          for a return character (ASCII 13)
\t          for a TAB character (ASCII 9)
\nnn        for a three-digit ASCII number (nnn format between 000 and 255)
\xnn        for a two-digit-hex ASCII number (xnn format between x00 and xff)

(set 's "this is a string \n with two lines")
(println s)

this is a string
with two lines

(println "\110\101\119\076\073\083\080") ; 十进制 ASCII
newLISP

(println "\x6e\x65\x77\x4c\x49\x53\x50") ; 十六进制 ASCII
newLISP

    若是要你反过来把字符串写成上面的各类数字字符串, 该怎么呢?
    提示: 用 format 和 unpack .

    第三种[text] [\text] 一般用来处理超长的字符串数据(大于 2048 字节), 好比web
页面. nL 在传递长字符串的时候, 也会自动使用这种格式.

(set 'novel (read-file {my-latest-novel.txt}))
;->
[text]
It was a dark and "stormy" night...
...
The End.
[/text]

    使用 length 能够获得字符串的长度:

(length novel)
;-> 575196

    newLISP 能够高效的处理数百万的字符串.
    若是要统计unicode 字符串的长度, 必须使用utf8 版本的 newLISP:

(utf8len (char 955))
;-> 1
(length (char 955))
;-> 2
> (utf8len "个")
4
> (length "个")
2

    cmd.exe 在处理非ascii 字符的时候会产生不少问题, 几乎没法解决, 可是非Win32
的 console 没这个问题.

二. 构造字符串
    Making strings

    有N种方法构造字符串. 处处都是字符串. 遍地都是字符串...
    若是想一个一个字符的构造的话能够用 char :

(char 33)
;-> "!"

> (char "a")
97

> (char 0x61)
"a"

> (char 97)
"a"

    char 只能处理一个字符, 他能够将字符转换成数字, 也能够将数字转换成字符.

(join (map char (sequence (char "a") (char "z"))))
;-> "abcdefghijklmnopqrstuvwxyz"

    char 得到 "a" 和 "z" ascii码, 而后用sequence 产生一个数字序列, 接着用map
映射 char 函数到每一个数字, 产生数字相对应的字符. 最后join 将整个列表合成一个字
符串.

    咱们也能够给 join 传递一个参数, 作分隔符.

(join (map char (sequence (char "a") (char "z"))) "-")
;-> "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

    和 join 相似 append 也能够链接字符串. (大部分的列表函数也可用于字符串)

(append "con" "cat" "e" "nation")
;-> "concatenation"

    构造列表的时候咱们用list , 构造字符串咱们用string .
    string 能够将各类参数组合成, 一个字符串.

(define x 42)
(string {the value of } 'x { is } x)
;-> "the value of x is 42"

    更精细的字符串输出可使用format , 稍后就会见到.
    dup 能够复制字符串:

> (dup "帅锅" 5)
"帅锅帅锅帅锅帅锅帅锅"

    date 会产生一个包含当前时间信息的字符串.

> (date)
"Mon May 14 15:50:34 2012"

> (date 1234567890)
"Sat Feb 14 07:31:30 2009"

三. 字符串手术
    String surgery

    这里不知道怎么翻译鸟, 手术啊. 听起来很恐怖. 其实就是永久性改变.

-     不少函数均可以操做字符串, 部分是具备破坏性的(destructive 这些函数在手册
里, 都有一个 ! 标志).

(set 't "a hypothetical one-dimensional subatomic particle")
(reverse t)
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"
t
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"

    以前已经说过要用这些函数又不想破坏原来的数据, 就要用 copy.

(reverse (copy t))
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"
t
;-> "a hypothetical one-dimensional subatomic particle"

    上面的reverse 永久性的改变了 t. 可是下面的大小写转换函数, 却不会改变原字符
串.

(set 't "a hypothetical one-dimensional subatomic particle")
(upper-case t)
;-> "A HYPOTHETICAL ONE-DIMENSIONAL SUBATOMIC PARTICLE"
(lower-case t)
;-> "a hypothetical one-dimensional subatomic particle"
(title-case t)
;-> "A hypothetical one-dimensional subatomic particle"
t
;-> "a hypothetical one-dimensional subatomic particle"

四. 子串
    Substrings

    若是须要抽取字符串中的一部分能够用如下的方法:

(set 't "a hypothetical one-dimensional subatomic particle")
(first t)
;-> "a"
(rest t)
;-> " hypothetical one-dimensional subatomic particle"
(last t)
;-> "e"
(t 2)
;-> "h"

    你会发现这和上一章介绍的列表操做好像. 在nL里头大部分的列表操做函数, 也一样
能够操做字符串. 其中就包括各类选取函数.

1: 字符串分片
    String slices

    slice 能够将从一个现存的字符串中, 分割出一个新的字符串.

(set 't "a hypothetical one-dimensional subatomic particle")
(slice t 15 13) ;从第15个位置开始, 提取出出13个字符
;-> "one-dimension"
(slice t -8 8) ;从倒数第8个位置开始, 提取出8个字符
;-> "particle"
(slice t 2 -9) ;从第2个位置开始, 提取到倒数第9个字符为止(第9个字符不算)
;-> "hypothetical one-dimensional subatomic"
(slice "schwarzwalderkirschtorte" 19 -1) ;同上, 最后一个字符不取
;-> "tort"

    固然, 字符串也能够用隐式操做.

(15 13 t)
;-> "one-dimension"
(0 14 t)
;-> "a hypothetical"

    上面提取的字符串都是连续的. 若是要抽取出分散的字符. 就得用 select :

(set 't "a hypothetical one-dimensional subatomic particle")
(select t 3 5 24 48 21 10 44 8)
;-> "yosemite"
(select t (sequence 1 49 12)) ; 从第1个字符开始, 每隔12个提取出一个字符
;-> " lime"

> (help select)
syntax: (select <string> <list-selection>)
syntax: (select <string> [<int-index_i> ... ])

     <list-selection> 列表中包含了要提取的字符的位置.

2: 改变字符串的首位
    Changing the ends of strings

    chop 和 trim 能够给字符串作收尾切除术, 他们都具破坏性.
    切切切...

    chop 只能切除一个指定位置的字符...

(chop t) ; 默认是最后一个字符
;-> "a hypothetical one-dimensional subatomic particl"
(chop t 9) ; 切除第9个字符
;-> "a hypothetical one-dimensional subatomic"

    trim 修剪掉存在于字符串头尾的指定字符.

(set 's " centred ")
(trim s) ; defaults to removing spaces
;-> "centred"

(set 's "------centred------")
(trim s "-")
;-> "centred"

(set 's "------centred********")
(trim s "-" "*") ;能够分别指定须要修剪的头和尾 "字符"
;-> "centred"

3: push 和 pop 字符串
    push and pop work on strings too

    push 能够将元素压入指定字符串的指定位置. pop 相反.
    若是没有指定位置, 默认为字符串的第一个位置.

(set 't "some ")
(push "this is " t)
(push "text " t -1)
;-> t is now "this is some text"

    push 和 pop 都返回压入或者弹出的元素, 而不是目标字符串. 这样操做大的字符串
时, 就会更快. 不然你就得用slice 屏蔽输出了.

>(help pop)
syntax: (pop <str> [<int-index> [<int-length>]])

    能够指定pop字符的数量, [<int-length>] .

(set 'version-string (string (sys-info -2)))
; eg: version-string is "10402"
(set 'dev-version (pop version-string -2 2)) ; 老是两个数字
; version-string is now "02"
(set 'point-version (pop version-string -1)) ; 老是一个数字
; version-string is now "4"
(set 'version version-string) ; 一位或者两位 99?
(println version "." point-version "." dev-version " on " ostype)
10.4.02 on Win32
"Win32"

    ostype 返回操做系统类型.

五. 修改字符串
    Modifying strings

    有两种方法修改字符串, 一种, 指定具体的位置. 第二种指定特定的内容.

1: 经过索引修改字符串
    Using index numbers in strings

    很久之前是有nth-set 和 set-nth 的, 不过鉴于各类 set 和被 set , 其操做方法
和返回值的复杂性. 在现今的版本中, 他们都已经消失不见了. 不过咱们可使用隐式索
引, 操做访问指定位置的元素.

> (set 'str "thinking newLISP !")
"thinking newLISP !"
> (setf (str 0) "I t")
"I T"
> str
"I Thinking newLISP !"

2: 改变字符串的子串
    Changing substrings

    不少时候, 你没法确切的知道, 须要操做的字符的索引, 或者找出来的代价太大.\
    这时候就能够用replace 替换全部符合本身要求的字符串部分...

> (help replace)
syntax: (replace <str-key> <str-data> <exp-replacement>)
syntax: (replace <str-pattern> <str-data> <exp-replacement> <int-regex-option>)

(replace old-string source-string replacement)
So:
(set't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" t "theor") ;将字符串中全部的hypoth替换成theor
;-> "a theoretical one-dimensional subatomic particle"

replace 是破坏性函数, 若是你不想改变原来的字符串, 可使用copy 或者 string :

(set't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" (string t) "theor")
;-> "a theoretical one-dimensional subatomic particle"
t
;-> "a hypothetical one-dimensional subatomic particle"

3: 使用正则表达式替换字符串内容
    Regular expressions

    若是你翻阅过手册, 会发现不少语法里都会加上一个可选参数, <int-regex-option>
. 这个参数就是正则表达式数字选项. 具体的数字意义, 能够在手册中搜索 PCRE name .
最经常使用的是0 (大小写不敏感) 好 1 (大小写敏感).

    nL使用的是Perl-compatible Regular Expressions (PCRE), Perl兼容的正则表达
式. 除了replace 外, directory, find, find-all, parse, search starts-with,
ends-with, 都接受正则表达式.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {h.*?l(?# h followed by l but not too greedy)} t {} 0)
;-> "a one-dimensional subatomic particle"

    在构建正则表达式的时候, 你能够选用双引号, 或者花括号, 二者的区别以前已经讲
过了. 我的仍是推荐花括号...

(set'str "\s")
(replace str "this is a phrase" "|" 0) ; 并无搜索替换 \s (空白符)
;-> thi| i| a phra|e ; 只替换了字符 s

(set'str "\\s")
(replace str "this is a phrase" "|" 0)
;-> this|is|a|phrase ; 成功替换!

(set'str {\s})
(replace str "this is a phrase" "|" 0)
;-> this|is|a|phrase ; better!

六: 系统变量: $0, $1 ...
    System variables: $0, $1 ...

    凡是使用 regex 的函数, 都会将匹配的结果绑定到系统变量: $0 $1 ... $15 , 可
以直接使用他们, 也可使用$ 函数来引用他们.
    若是你是正则表达式初学者, 建议搜索pcre 教程. 下面的代码看的迷糊的不用建议.
还有手册, 还有code-pattern, 再不济还有"狗狗" , 通往nL的路不止一条.
    个人观点一贯是够用就好, 因此若是看的不太懂, 能够跳下去. 等你用多了, 天然就
会了. 业精于勤荒于嬉.

- (set 'quotation {"I cannot explain." She spoke in a low, eager voice,
with a curious lisp in her utterance. "But for God's sake do what I ask you. Go
back
and never set foot upon the moor again."})

- (replace {(.*?),.*?curious\s*(l.*p\W)(.*?)(moor)(.*)}
quotation
(println { $1 } $1 { $2 } $2 { $3 } $3 { $4 } $4 { $5 } $5)
4) ;出于格式的问题上面的字符串多了\n换行, 因此我用4 设置了 PCRE_DOTALL
   ;这样 . 也表明了换行符

$1 "I cannot explain." She spoke in a low $2 lisp $3 in her utterance. "But f
r God's sake do what I ask you. Go
back
and never set foot upon the $4 moor $5 again."

    上面每个小括号内的匹配值, 都被绑定到了系统变量, 从$1 到$5 , 而$0 表明符
合整个正则表达式的字符串部分. 拗口吧, 蛋疼的看代码去.

(set 'str "http://newlisp.org:80")
(find "http://(.*):(.*)" str 0) → 0

$0 → "http://newlisp.org:80"
$1 → "newlisp.org"
$2 → "80"

1. 替换部分的表达式
    The replacement expression

> (help replace)
syntax: (replace <str-key> <str-data> <exp-replacement>)
syntax: (replace <str-pattern> <str-data> <exp-replacement> <int-regex-option>)

    <exp-replacement>就是替换部分, 你找到的任何符合要求的数据, 均可以用这里的
表达式值, 替换. 整个表达式没有限制, 设置是能够没意义的操做.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {t[h]|t[aeiou]} t (println $0) 0)
th
ti
to
ti
t
;-> "a hypothetical one-dimensional subatomic particle"

    整个replace 表达式的目的是, 将字符串里, 以t开头, h或者任何元音字母结尾的字
符打印出来. <exp-replacement> 就是 (println $0) , 他完成了两个工做, 1. 打印出
匹配的单词, 也有人叫这"反作用". 第二个利用表达式的返回值$0 , 替换远字符串中匹
配的值, 而这两个值是同样的, 因此原字符串内容看起来没有任何改变.

(replace "a|e|c" "This is a sentence" (upper-case $0) 0)
;-> "This is A sEntEnCE"

    下面的代码使用了更复杂的<exp-replacement>.

(set 't "a hypothetical one-dimensional subatomic particle")
(set 'counter 0)
- (replace "o" t
- (begin
(inc 'counter)
(println {replacing "} $0 {" number } counter)
(string counter)) ; 替换的部分必须是字符串. 这个值是<exp-replacement>的返回值
0)
replacing "o" number 1
replacing "o" number 2
replacing "o" number 3
replacing "o" number 4
"a hyp1thetical 2ne-dimensi3nal subat4mic particle"

    begin 将多个表达式组装成一个表达式, 依次执行, 最后一个表达式, 做为这个表达
式组的返回值.
    下面让咱们看一个replace 的实际应用.
    假设有一个文本文件, "zhuzhu.txt"里面的内容以下:

1 a = 15
2 another_variable = "strings"
4 x2 = "another string"
5 c = 25
3x=9

    如今咱们想将他改为以下形式, 让他看起来漂亮点.

10 a                   = 15
20 another_variable    = "strings"
30 x2                  = "another string"
40 c                   = 25
50 x                   = 9

    将下面的代码保持成ft.lsp . 而后执行 newlisp ft.lsp zhuzhu.txt

(set 'file (open ((main-args) 2) "read"))
;(set 'file (open "ni.txt" "read"))
(set 'counter 0)
- (while (read-line file)
-     (set 'temp
-         (replace {^(\d*)(\s*)(.*)} ; 改变开始的数字
            (current-line)
            (string (inc 'counter 10) " " $3 )
            0))
- (println
-     (replace {(\S*)(\s*)(=)(\s*)(.*)} ; 找出有用的数据
        temp
        (string $1 (dup " " (- 20 (length $1))) $3 " " $5)
    0)))

    while 循环不断的将文件的每一行读入, 而后(current-line) 获取当前读入的行.
第一个replace 组装开始的数字, {^(\d*)(\s*)(.*)} 将源字符串分离成, 开始的数字,
接着的空白符, 和最后的内容. 接着用 (string (inc 'counter 10) " " $3 ) 将前两部
分剔除, 剩下第三部分和 counter 值组成新的字符串. counter 每处理一行, 就加 10 .
替换后的字符串赋值给临时变量temp.
    第二个replace , 将临时变量分离成4个部分 {(\S*)(\s*)(=)(\s*)(.*)}.
    \S 表明了除 \s 之外的任何字符.
    从中提取出$1 $3 $5 , 组成新的字符串,
    (string $1 (dup " " (- 20 (length $1))) $3 " " $5)
    为了对齐, 咱们将$1 和 $3 (也就是等号) , 之间的距离规定成20 , 若是$1 短于
20个字节则dup 出多余空格来补充.

    Regular expressions aren't very easy for the newcomer,
    but they're very powerful, particularly
    with newLISP's replace function, so they're worth learning.

    正则表达式也许对于初学者来讲比较困难, 可是很是强大, 特别是配合上各类
newLISP函数后, 能够大大的提升效率. 平时仍是该多练习下.

七. 测试和比较字符串
    Testing and comparing strings

    有各类各样的测试函数能够用到字符串上. 这些比较操做符会依序相互比较字符串的
每个部分.

(> {Higgs Boson} {Higgs boson}) ; nil ;B 比 b 小
(> {Higgs Boson} {Higgs}) ; true
(< {dollar} {euro}) ; true
(> {newLISP} {LISP}) ; true
(= {fred} {Fred}) ; nil ; f 和 F 不同
(= {fred} {fred}) ; true

    从第一个字符开始比较, 直到得出结果.
    比较多个字符串也不是问题. 介于newLISP 优秀的参数处理能力, 你不用再直接写迭
代了.

(< "a" "c" "d" "f" "h")
;-> true

    若是只提供一个参数呢?
    nL会为你提供默认值. 若是提供的是数字, 则假设和0 比较, 若是是字符串, 则假设
和"" 空字符串比较...

(> 1) ; true - assumes > 0
(> "fred") ; true - assumes > ""

    下面的函数能够很是方便的分析和提取字符串中的指定内容:
    member , regex , find-all , starts-with , ends-with .

(starts-with "newLISP" "new")
;-> true
(ends-with "newLISP" "LISP")
;-> true

    他们也可使用正则表达式参数. (一般使用 0 和 1)

(starts-with {newLISP} {[a-z][aeiou](?\#lc followed by lc vowel)} 0)
;-> true
(ends-with {newLISP} {[aeiou][A-Z](?\# lc vowel followed by UCase)} 0)
;-> false

    0 表明了PCRE 里的, 大小写敏感, 1 则是不敏感.
    find , find-all , member , 和 regex 查找整个字符串.
    find 返回, 第一个符合要求的元素的位置.

(set 't "a hypothetical one-dimensional subatomic particle")
(find "atom" t)
;-> 34
(find "l" t)
;-> 13
(find "L" t)
;-> nil ; 大小写敏感

    member 判断一个字符串是不是另外一个字符串的一部分, 若是是, 则返回子串, 以及
以后的全部字符.

(member "rest" "a good restaurant")
;-> "restaurant"

    find 和 member 均可以使用正则表达式选项.

- (set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})

(find "lisp" quotation) ; 没有正则
;-> 69 ; 位于第 69 位 , 即 l 的位置

(find {i} quotation 0) ; with regex
;-> 15 ; 位于第 15 位

(find {s} quotation 1) ; 大小写不敏感
;-> 20 ; 位于第 20 位

- (println "character "
(find {(l.*?p)} quotation 0) ": " $0) ; 查找一个字符l 后跟着字符p 的子串
;-> character 13: lain." She sp

    再次提醒, 在console 命令行下, 输入多行语句的时候, 先输入一个回城, 而后才能
把语句全粘贴上去, 或者在多行语句的首尾两行, 分别单独的写上[cmd]和[/cmd].

    find-all 的工做方式相似 find , 不过他不只仅是返回第一个匹配子串, 而是以列
表的形式, 返回全部的匹配子串. 他操做字符串的时候默认使用正则表达式. 因此能够不
用显示的标注, 正则选项.

> (help find-all)
syntax: (find-all <str-regex-pattern> <str-text> [<exp> [<int-regex-option>]])

- (set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})

(find-all "[aeiou]{2,}" quotation $0) ; 两个或者更多的原音字母组成的子串
;-> ("ai" "ea" "oi" "iou" "ou" "oo" "oo" "ai")

    find-all 返回的是, 符合要求的内容. 若是还想获得他们的位置和长度, 就要使用
regex .
    regex 返回符合要求的每一个子串的内容, 开始位置, 以及长度. 第一次看, 会以为
稍显复杂.

- (set 'quotation
{She spoke in a low, eager voice, with a curious lisp in her utterance.})

(println (regex {(.*)(l.*)(l.*p)(.*)} quotation 0))
;-->
- ("She spoke in a low, eager voice, with a curious lisp in
her utterance." 0 70 "She spoke in a " 0 15 "low, eager
voice, with a curious " 15 33 "lisp" 48 4 " in her
utterance." 52 18)

    首先返回的就是符合整个正则表达式要求的字符串. 也是最长的, 从 0 开始长达
70 字节. 而后就是第一个第一个括号内匹配的内容, 从位置 0 开始 , 长 15 个字节.
第二个括号(分组)内的数据, 从第 15 位开始, 长 33 字节....

    这些匹配的分组全被放到系统变量里.

- (for (x 1 4)
(println {$} x ": " ($ x)))
$1: She spoke in a
$2: low, eager voice, with a curious
$3: lisp
$4: in her utterance.

八. 字符串转换成列表
    Strings to lists

    先让咱们看看 "闻名遐迩" 的explode , 他能够将字符串按指定的大小炸成一段段
的子串, 而后以列表的形式返回全部子串.

(set 't "a hypothetical one-dimensional subatomic particle")
(explode t)

- :-> ("a" " " "h" "y" "p" "o" "t" "h" "e" "t" "i" "c" "a" "l"
" " "o" "n" "e" "-" "d" "i" "m" "e" "n" "s" "i" "o" "n" "a"
"l" " " "s" "u" "b" "a" "t" "o" "m" "i" "c" " " "p" "a" "r"
"t" "i" "c" "l" "e")

> (help explode)
syntax: (explode <str> [<int-chunk> [<bool>]])
syntax: (explode <list> [<int-chunk> [<bool>]])

(explode (replace " " t "") 5)
;-> ("ahypo" "theti" "calon" "e-dim" "ensio" "nalsu" "batom" "icpar"
"ticle")

    int-chunk 就是分块的大小, bool 决定是否要抛弃最后不满int-chunk 长度的子串.
    你有开天斧, 我有补天石.
    join 和 explode 作的恰好相反, 将一个全是字符串元素的列表组装成一个新的字符
串.

>(help join)
syntax: (join list-of-strings [str-joint [bool-trail-joint]])

set 'lst '("this" "is" "a" "sentence"))

(join lst " ") → "this is a sentence"

(join (map string (slice (now) 0 3)) "-") → "2012-5-16" ;将数字中

(join (explode "keep it together")) → "keep it together"

(join '("A" "B" "C") "-")         → "A-B-C"
(join '("A" "B" "C") "-" true)    → "A-B-C-"

    find-all 也能够分割字符串.

(find-all ".{3}" t) ; 默认使用正则表达式
characters
;-> ("a h" "ypo" "the" "tic" "al " "one" "-di" "men" "sio"
"nal" " su" "bat" "omi" "c p" "art" "icl")

九. 分析字符串
    Parsing strings

    接下来这个函数绝对会让你"声泪俱下".
    若是你须要常常频繁的处理大范围的文本数据的时候. parse 绝对是你的至宝.
    他让你的数据统计分析, 再也不痛苦. (nL内部还有不少专业的统计学函数)

> (help parse)
syntax: (parse <str-data> [<str-break> [<int-option>]])

    parse 根据<str-break> 来分割字符串. 字符串中的 <str-break> 会被吃掉. 剩下
判断, 做为一个个子串组成列表返回.

(parse t) ; 默认的分隔符为空格...
;-> ("a" "hypothetical" "one-dimensional" "subatomic" "particle")

    <str-break> 能够是单个的分割符 , 也能够是字符串.

(set 'pathname {/System/Library/Fonts/Courier.dfont})
(parse pathname {/})
;-> ("" "System" "Library" "Fonts" "Courier.dfont")

(set 't {spamspamspamspamspamspamspamspam})
;-> "spamspamspamspamspamspamspamspam"
(parse t {am}) ; break on "am"
;-> ("sp" "sp" "sp" "sp" "sp" "sp" "sp" "sp" "")

    咱们能够用filter 将结果列表中的, 空格字符串, 过滤掉.

(filter (fn (s) (not (empty? s))) (parse t {/}))
;-> ("System" "Library" "Fonts" "Courier.dfont")

    过滤HTML-tag:

(set 'html (read-file "/Users/Sites/index.html"))
(println (parse html {<.*?>} 4)) ; option 4: dot matches newline

    nL同时提供了专门的XML分析工具: xml-parse . 后面会有专门一整章介绍.

    在咱们没有明确指定的 <str-break> 的时候, nL 使用内部的分析规则. 这时候的算
法和指定后的算法也不同.

    When no str-break is given, parse tokenizes according to newLISP's
    internal parsing rules.

- (set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t)
;-> ("Eats" "," "shoots" "," "and" "leaves") ; she's gone!

    由于没有指定界定符, 因此 ";" 以后的内容都被断定成了注释.
    若是要让parse 按你的规则分离数据, 就必须提供明确的界定符或者正则表达式.

- (set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t " ")
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

    或者

(parse t "\\s" 0) ; {\s} 是空白字符
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

    另外一种分割字符串的方法就是使用 find-all .

(set 'a "1212374192387562311")
(println (find-all {\d{3}|\d{2}$|\d$} a))
;-> ("121" "237" "419" "238" "756" "231" "1")

; 二选一

(explode a 3)
;-> ("121" "237" "419" "238" "756" "231" "1")

    parse 会界定符吃掉, 而 find-all 则是留下来.

(find-all {\w+} t ) ; 匹配一个英文字母、数字或下划线；等价于[0-9a-zA-Z_]
;-> ("Eats" "shoots" "and" "leaves" "a" "book" "by" "Lynn" "Truss")

(parse t {\w+} 0 ) ; 吃掉界定符
;-> ("" ", " ", " " " " ; " " " " " " " " " "")

(parse t {[^\w]+} 0 )
;->("Eats" "shoots" "and" "leaves" "a" "book" "by" "Lynn" "Truss")

(append '("") (find-all {[^\w]+} t ) '(""))
;-> ("" ", " ", " " " " ; " " " " " " " " " "")

十. 其余的字符串函数
    Other string functions

    search 在文件中搜索符合要求的字符串. 并返回第一个符合要求的字符串的位置,
而后将文件指针移到字符串头的位置(默认状况下), 当 <bool-flag> 为 true 值时 , 则
移字符串末尾. 下次search 的时候, 从当前文件指针的位置继续开始.

> (help search )
syntax: (search <int-file> <str-search> [<bool-flag> [<int-options>]])

(set 'f (open {/private/var/log/system.log} {read}))
(search f {kernel})
(seek f (- (seek f) 64)) ; rewind file pointer
- (dotimes (n 3)
(println (read-line f)))
(close f)

    上面的代码从系统日志中搜索包含 kernel 的字符串, 而后从找到的位置回溯 64
个字节, 读取一行日志, 并打印出来.
    更多的字符串相关函数, 能够在手册中搜索 String and conversion functions .

十一. 格式化字符串
      String and conversion functions

    和其余的语言同样, nL也提供了优雅的字符串输出更能 (format 函数).
    假设咱们须要打印以下的内容:

folder: Library
file: mach

    咱们须要使用以下的字符串模板:

"folder: %s" ; or
" file: %s"

    提供给 format 一个文字模板, 以后依序接上全部模板中须要的参数.

(format "folder: %s" f) ; or
(format " file: %s" f)

> (help format)
syntax: (format <str-format> [<exp-data-1> <exp-data-2> ... ])
syntax: (format <str-format> <list-data>)

    <str-format> 就是字符串模板, 只有一个. 其后的参数都是编码中相对应的数据.
以 (format " file: %s" f) 为例, 这里提供的 f 是字符串, 前面模板里就必须放一个
%s , 若是提供的 f 是数字, 前面的模板就必须放一个 %d . 目前支持 11 种数据类型.

format description
s    text string
c    character (value 1 - 255)
d    decimal (32-bit)
u    unsigned decimal (32-bit)
x    hexadecimal lowercase
X    hexadecimal uppercase
o    octal (32-bits) (not supported on all compilers)
f    floating point
e    scientific floating point
E    scientific floating point
g    general floating point

    相似必须匹配, 不然会报错. %至关于转义字符, 他的位置表明了后面的数据在字符
串中的位置.

(set 'f "OneLisp")
(format "folder: %s" f)
;-->"folder: OneLisp"

(format "%s folder: " f)
"OneLisp folder: "

(format "%d" "abc")
;-->ERR: data type and format don't match in function format : "abc"

    下面的代码使用 directory 函数打印出当前目录下全部的文件和目录.

- (dolist (f (directory))
-     (if (directory? f)
        (println (format "folder: %s" f))
        (println (format " file: %s" f))))

;输出

folder: .
folder: ..
folder: api
file: cd.dll
file: cmd-lisp.bat
folder: code
file: CodePatterns-cn.html
file: CodePatterns-CN.html.bak
file: CodePatterns.html
file: COPYING
file: demo-stdin.lsp
file: drag.bat
folder: examples
file: freetype6.dll
file: gs.bat
folder: guiserver
file: guiserver-keyword.txt
...

    format 里的字符串模板还能够就行更精细的输出控制.

"%w.pf"

    f 就是以前介绍的数据类型标志, 必选.
    w 是这个数据输出时, 占用的宽度.
    p 是这个数据输出时, 的精度.
    w以前能够跟, 负号(右对齐), 正号(左对齐), 0 (空位用0填满) , 默认是右对齐.
    填 0 只在右对齐的时候有用.

>(format "Result = %05d" 2)
"Result = 00002"

> (format "Result = %+05d" 2)
"Result = +0002"
> (format "Result = %+05d" -2)
"Result = -0002"
> (format "Result = %-05d" -2)
"Result = -2   "
> (format "Result = %05d" -2)
"Result = -0002"

    下面来个复杂点的例子. 打印位于 32 - 400 内的全部字符, 并输出他们的十进制,
十六进制, 和二进制内容.
    由于format 没法输出二进制数据, 因此专门写了个二进制转换函数. 如今有个现成
的bits 能够转换 2 进制了.

- (define (binary x , results)
-   (until (<= x 0)
    (push (string (% x 2)) results) ;使用 % 求余, 表明每一位的二进制数
    (set 'x (/ x 2))) ; 从新设置 x
  results)

- (for (x 32 0x01a0)
-   (println (char x) ; 先用char将数字转换成字符
-     (format "%4d\t%4x\t%10s" ; 十进制 \t 十六进制 \t 二进制字符串
            (list x x (join (binary x))))))

x 120     78       1111000
y 121     79       1111001
z 122     7a       1111010
{ 123     7b       1111011
| 124     7c       1111100
} 125     7d       1111101
~ 126     7e       1111110

十二. 让newLISP思考
      Strings that make newLISP think

    为何用这个标题, 嘿嘿, 最后有个很好玩的例子. 你甚至能够写个, 代码混乱生成
器, 看看你会获得些什么.

    本章最后介绍的两个函数: eval , eval-string .
    这两个函数专门负责执行nL代码.
    只要你提供的代码能经过检测, 他们就会返回给你结果.

    eval 接受表达式:

(set 'expr (+ 1 2))
(eval expr)
;-> 3

    eval-string 只接受字符串:

(set 'expr "(+ 1 2)")
(eval-string expr)
;-> 3

    使用这两个函数你能够执行任何的nL代码. 在咱们默认执行的各类表达式中, 都隐含
了他们的身影. 他们被默认的执行着, 而你必定不能忘记他们曾经来过, 不然你极可能成
为一团浆糊. 当你对 symbol , 对宏对各类表达式的本质和他们的计算迷惑的时候, 回
来从新看看这句话, 你会豁然开朗.
    eval 为何重要, 由于他表明了自主选择, 你能够在任何须要的时间 , 须要的地点
执行须要的代码. 特别是在操做宏的时候, 你的感觉会更深.

    下面是段很是有趣的代码, 他能够不断的重组列表, 而后调用 eval-string 执行他
们, 直到某个表达式获得执行后, 才结束.

(set 'code '(")" "set" "'valid" "true" "("))
(set 'valid nil)
- (until valid
    (set 'code (randomize code)) ; 使用radomize 打乱 code 序列
    (println (join code " "))
    (eval-string (join code " ") MAIN nil))

;输出

) true 'valid ( set
'valid ) ( set true
true set ( 'valid )
'valid true ( set )
'valid ( true set )
) true ( set 'valid
) ( set 'valid true
'valid ) set true (
...
true set ) ( 'valid
true ( 'valid ) set
true 'valid ( set )
true ) 'valid ( set
( set 'valid true )
true

到目前为止newLISP的基础, 基本上算是介绍的差很少了, 接下来介绍的会比较深刻点.
context 和宏 .
不过在nL里这些不管是看起来仍是用起来, 仍是原理上都很是简洁明了.
Good Luck !!!

彩色版本到http://code.google.com/p/newlisp-you-can-do下载使用scite4newlisp观看

2012-05-14 - 2012-05-17 15:10:29