shell脚本正则表达式三剑客之一（grep,egrep）

时间 2019-11-06

标签 shell 脚本正则表达式剑客之一 grep egrep 栏目 Unix 繁體版

原文原文链接

Shell脚本之正则表达式

一.正则表达式三剑客之一：grep

1.学习正则表达式前咱们拿一个无用的配置文件做为测试练习正则表达式

[root@localhost ~]# vim chen.txt

#version=DEVEL
 System authorization information
auth --enableshadow --passalgo=sha512# Use CDROM installation media
cdrom
thethethe
THE
THEASDHAS
 Use graphical install
graphical
 Run the Setup Agent on first boot
firstboot --enable
ignoredisk --only-use=sda
wood
wd
wod
woooooooood
124153
3234
342222222
faasd11
2
ZASASDNA
short
shirt

2.查找特定字符redis

“-vn” 反向选择。查找不包含“the”字符的行，则须要经过 grep 命令的“-vn”选项实现。
-n“ 表示显示行号
“-i” 表示不区分大小写
命令执行后，符合匹配标准的字符，字体颜色会变为红色shell

[root@localhost ~]# grep -n 'the' chen.txt
6:thethethe
11:# Run the Setup Agent on first boot
[root@localhost ~]# grep -in 'the' chen.txt
6:thethethe
7:THE
8:THEASDHAS
11:# Run the Setup Agent on first boot
[root@localhost ~]# grep -vn 'the' chen.txt
1:#version=DEVEL
2:# System authorization information
3:auth --enableshadow --passalgo=sha512
4:# Use CDROM installation media
5:cdrom
7:THE
8:THEASDHAS
9:# Use graphical install
10:graphical
12:firstboot --enable
13:ignoredisk --only-use=sda
14:wood
15:wd
16:wod
17:woooooooood
18:124153
19:3234
20:342222222
21:faasd11
22:2
23:ZASASDNA
24:
short
shirt

3.括号"[ ]"来查找集合字符
想要查找“shirt”与“short”这两个字符串时，能够发现这两个字符串均包含“sh” 与“rt”。此时执行如下命令便可同时查找到“shirt”与“short”这两个字符串。“[]”中不管有几个字符，都仅表明一个字符，也就是说“[io]”表示匹配“i”或者“o”。vim

[root@localhost ~]# grep -n 'sh[io]rt' chen.txt  //过滤short或shirt中都有io集合字符
24:short
25:shirt

若要查找包含重复单个字符“oo”时，只须要执行如下命令便可。ide

[root@localhost ~]# grep -n 'oo' chen.txt 
11:# Run the Setup Agent on first boot
12:firstboot --enable
14:wood
17:woooooooood

若查找“oo”前面不是“w”的字符串，只须要经过集合字符的反向选择“[^]”来实现该目的，如执行“grep –n‘[^w]oo’test.txt”命令表示在 test.txt 文本中查找“oo” 前面不是“w”的字符串学习

[root@localhost ~]# grep -n '[^w]oo' chen.txt //过滤w开头oo的字符串
11:# Run the Setup Agent on first boot
12:firstboot --enable
17:woooooooood

在上述命令的执行结果中发现“woood”与“wooooood”也符合匹配规则，两者均包含“w”。其实经过执行结果就能够看出，符合匹配标准的字符加粗显示，而上述结果中能够得知，“#woood #”中加粗显示的是“ooo”，而“oo”前面的“o”是符合匹配规则的。同理 “#woooooood #”也符合匹配规则。
若不但愿“oo”前面存在小写字母，可使用“grep –n‘[^a-z]oo’test.txt”命令实现，其中“a-z”表示小写字母，大写字母则经过“A-Z”表示。测试

[root@localhost ~]# grep -n '[^a-z]oo' chen.txt 
19:Foofddd

查找包含数字的行能够经过“grep –n‘[0-9]’test.txt”命令来实现字体

[root@localhost ~]# grep -n '[0-9]' chen.txt
3:auth --enableshadow --passalgo=sha512
20:124153
21:3234
22:342222222
23:faasd11
24:2code

查找行首“^”与行尾字符“$”orm

[root@localhost ~]# grep -n '^the' chen.txt
6:thethethe

查询以小写字母开头的行能够经过“1”规则来过滤，

[root@localhost ~]# grep -n '^[a-z]' chen.txt
3:auth --enableshadow --passalgo=sha512
5:cdrom
6:thethethe
10:graphical
12:firstboot --enable
13:ignoredisk --only-use=sda
14:wood
15:wd
16:wod
17:woooooooood
18:dfsjdjoooooof
23:faasd11
26:short
27:shirt

查询大写字母开头

[root@localhost ~]# grep -n '^[A-Z]' chen.txt
7:THE
8:THEASDHAS
19:Foofddd
25:ZASASDNA

若查询不以字母开头的行则使用“[a-zA-Z]”规则。

[root@localhost ~]# grep -n '^[^a-zA-Z]' chen.txt
1:#version=DEVEL
2:# System authorization information
4:# Use CDROM installation media
9:# Use graphical install
11:# Run the Setup Agent on first boot
20:124153
21:3234
22:342222222
24:2

“^”符号在元字符集合“[]”符号内外的做用是不同的，在“[]”符号内表示反向选择，在“[]”符号外则表明定位行首。反之，若想查找以某一特定字符结尾的行则可使用“$”定位符。例如，执行如下命令便可实现查询以小数点（.）结尾的行。由于小数点（.）在正则表达式中也是一个元字符（后面会讲到），因此在这里须要用转义字符“\”将具备特殊意义的字符转化成普通字符。

[root@localhost ~]# grep -n '\.$' chen.txt
5:cdrom.
6:thethethe.
9:# Use graphical install.
10:graphical.
11:# Run the Setup Agent on first boot.

当查询空白行时，执行“grep –n ‘^$’ chen.txt

查找任意一个字符“.”与重复字符“*”
在正则表达式中小数点（.）也是一个元字符，表明任意一个字符。例如，执行如下命令就能够查找“w??d”的字符串，即共有四个字符，以 w 开头 d 结尾。

[root@localhost ~]# grep -n 'w..d' chen.txt
14:wood

在上述结果中，“wood”字符串“w…d”匹配规则。若想要查询 oo、ooo、ooooo 等资料，则须要使用星号（）元字符。但须要注意的是，“”表明的是重复零个或多个前面的单字符。“o”表示拥有零个（即为空字符）或大于等于一个“o”的字符，由于容许空字符，因此执行“grep –n‘o’test.txt”命令会将文本中全部的内容都输出打印。若是是“oo”，则第一个 o 必须存在，第二个 o 则是零个或多个 o，因此凡是包含 o、oo、ooo、ooo，等的资料都符合标准。同理，若查询包含至少两个 o 以上的字符串，则执行“grep –n‘ooo’ test.txt”命令便可。

[root@localhost ~]# grep -n 'ooo*' chen.txt
11:# Run the Setup Agent on first boot.
12:firstboot --enable
14:wood
17:woooooooood
18:dfsjdjoooooof
19:Foofddd

查询以 w 开头 d 结尾，中间包含至少一个 o 的字符串，执行如下命令便可实现。

[root@localhost ~]# grep -n 'woo*d' chen.txt
14:wood
16:wod
17:woooooooood

查询以 w 开头 d 结尾，中间的字符无关紧要的字符串。

[root@localhost ~]# grep -n 'w.*d' chen.txt
14:wood
15:wd
16:wod
17:woooooooood

查询任意数字所在行。

[root@localhost ~]# grep -n '[0-9][0-9]*' chen.txt
3:auth --enableshadow --passalgo=sha512
20:124153
21:3234
22:342222222
23:faasd11
24:2

查找连续字符范围“{}”
使用“.”与“*”来设定零个到无限多个重复的字符，若是想要限制一个范围内的重复的字符串该如何实现呢？例如，查找三到五个 o 的连续字符，这个时候就须要使用基础正则表达式中的限定范围的字符“{}”。由于“{}”在 Shell 中具备特殊意义，因此在使用“{}”字符时，须要利用转义字符“\”，将“{}”字符转换成普通字符。

查询两个 o 以上的字符

[root@localhost ~]# grep -n 'o\{2\}' chen.txt
11:# Run the Setup Agent on first boot.
12:firstboot --enable
14:wood
17:woooooooood
18:dfsjdjoooooof
19:Foofddd

查询以 w 开头以 d 结尾，中间包含 2～5 个 o 的字符串。

[root@localhost ~]# grep -n 'wo\{2,5\}d' chen.txt
14:wood

查询以 w 开头以 d 结尾，中间包含 2 以上 o 的字符串。

[root@localhost ~]# grep -n 'wo\{2,\}d' chen.txt
14:wood
17:woooooooood

二.扩展正则表达式

为了简化整个指令，须要使用范围更广的扩展正则表达式。例如，使用基础正则表达式查询除文件中空白行与行首为“#” 以外的行（一般用于查看生效的配置文件），执行“grep –v‘^KaTeX parse error: Expected group after '^' at position 22: …txt | grep –v ‘^̲#’”便可实现。这里须要使用管…|^#’test.txt”，其中，单引号内的管道符号表示或者（or）。
此外，grep 命令仅支持基础正则表达式，若是使用扩展正则表达式，须要使用 egrep 或 awk 命令。awk 命令在后面的小节进行讲解，这里咱们直接使用 egrep 命令。egrep 命令与 grep 命令的用法基本类似。egrep 命令是一个搜索文件得到模式，使用该命令能够搜索文件中的任意字符串和符号，也能够搜索一个或多个文件的字符串，一个提示符能够是单个字符、一个字符串、一个字或一个句子。
常见的扩展正则表达式的元字符主要包括如下几个：

"+“示例：执行“egrep -n ‘wo+d’ test.txt”命令，便可查询"wood” “woood” "woooooood"等字符串

[root@localhost ~]# egrep -n 'wo+d' chen.txt
14:wood
16:wod
17:woooooooood

"?"示例：执行“egrep -n ‘bes?t’ test.txt”命令，便可查询“bet”“best”这两个字符串

[root@localhost ~]# egrep -n 'bes?t' chen.txt
11:best
12:bet

"|"示例：执行“egrep -n ‘of|is|on’ test.txt”命令便可查询"of"或者"if"或者"on"字符串

[root@localhost ~]# egrep -n 'of|is|on' chen.txt
1:#version=DEVEL
2:# System authorization information
4:# Use CDROM installation media
13:# Run the Setup Agent on first boot.
15:ignoredisk --only-use=sda
20:dfsjdjoooooof
21:Foofddd

"()"示例：“egrep -n ‘t(a|e)st’ test.txt”。“tast”与“test”由于这两个单词的“t”与“st”是重复的，因此将“a”与“e”列于“()”符号当中，并以“|”分隔，便可查询"tast"或者"test"字符串

[root@localhost ~]# egrep -n 't(a|e)st' chen.txt
12:test
13:tast

"()+“示例：“egrep -n ‘A(xyz)+C’ test.txt”。该命令是查询开头的"A"结尾是"C”，中间有一个以上的 "xyz"字符串的意思

[root@localhost ~]# egrep -n 'A(xyz)+C' chen.txt
14:AxyzxyzxyzC