4.3Linux文本处理工具

时间 2019-11-17

标签 4.3linux linux 文本处理工具栏目 Linux 繁體版

原文原文链接

解释型语言：源程序经过翻译一条执行一条解释器
编译型语言：事先把全部源程序都翻译好编译器linux

        bash编程：
            指令：OS上可运行的命令
                翻译：在当前OS上查找相应命令并提交给内核的执行的过程。git

程序控制语句：
顺序执行、选择执行、循环执行正则表达式

Linux文本处理工具：
文本搜索查找文件中符合特定条件的行shell

globbing: 通配符元字符
*：p*dexpress

/etc/passwd: root编程

文本搜索工具：grep, egrep, fgrepbash

Global search REgular expression and Print out the line. 全局搜索正则表达式而且输出工具

文本搜索工具，根据用户指定的文本模式（搜索条件）对目标文件进行逐行搜索，显示能匹配到的行。this

语法格式：
grep [option]... 'PATTERN' FILE...spa

--color=auto 指定匹配的模式用颜色来显示

        正则表达式：
            是一类字符所书写的模式，其中许多字符不表示其字面意义，而是表达控制或通配等功能；
                元字符:不表示其字面意义，而用于额外功能性描述
正则表达式权威指南书籍

正则表达式：正则表达式引擎（能理解正则表达式，是由程序自身实现的）

            基本正则表达式：grep
            扩展正则表达式: egrep, grep -E
            fgrep: fast, 不支持使用正则表达式

        基本正则表达式中的元字符：
            字符匹配：
                .: 匹配任意单个字符
[root@linux_basic ~]#alias grep='grep --color=auto'
[root@linux_basic ~]#grep "r..t" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
                []: 匹配指定范围内的任意单个字符
                    [0-9], [[:digit:]]
                    [a-z], [[:lower:]]
                    [A-Z], [[:upper:]]
                    [[:space:]] 空格
                    [[:punct:]] 标点符号
                    [[:alpha:]] 全部字母
                    [[:alnum:]] 字母和数字
                [^]:
            次数匹配元字符：用于实现指定其前面的字符所可以出现的次数
                *: 任意长度，它前面的字符能够出现任意次
                    例如：x*y
                        xxy, xyy, y,
                \?: 0次或1次，它前面的字符是无关紧要的
        [root@linux_basic ~]#grep "r\?oot" /etc/passwd
                    例如：x\?y
                        xy, y, ay 不是整串，字串匹配也会显示
                \{m\}: m次，它前的字符要出现m次
        [root@linux_basic ~]#grep "o\{2\}" /etc/passwd
                    例如：x\{2\}y
                        xy, xxy, y, xxxxy, xyy
                \{m,n\}: 至少m次，至多n次
        [root@linux_basic ~]#grep "o\{0,3\}" /etc/passwd
                    例如：x\{2,5\}y
                        xy, y, xxy
                \{m,\}：至少m次
        [root@linux_basic ~]#grep "o\{2\}" /etc/passwd
                \{0,n\}: 至多n次

.*：任意长度的任意字符
[root@linux_basic ~]#grep "r.*t" /etc/passwd

                    工做于贪婪模式：尽量多的去匹配
            位置锚定：
                ^: 行首锚定；脱字符号
                    写在模式最左侧
                $: 行尾锚定：
                    写在模式最右侧
                ^$: 空白行

                不包含特殊字符的连续字符组成的串叫单词：
                \<: 词首，出现于单词左侧，\b
                    \<char
          [root@linux_basic ~]#grep "\<r..t" /etc/passwd
                \>: 词尾，出现于单词右侧, \b
                    char\>
         [root@linux_basic ~]#grep "l..e\>" /etc/passwd

           \< xxxxx\>   用来作精确锚定的？？ ifconfig |grep -E "\<([0-9]|[1-9][0-9])\>" 左半边和右半边
                                                ifconfig |grep -E "\<[0-9]|[1-9][0-9]\>"
          [root@linux_basic ~]#grep "\<root\>" /etc/passwd

            分组：
                
                    例如：$ab$*    使用\来转义的
                    分组中的模式匹配到的内容，可由正则表达式引擎记忆在内存中，以后可被引用

             [root@linux_basic ~]#grep "$root$\{1\}" /etc/passwd
                引用：
                    例如$ab\(x$y\).*$mn$   在嵌套的括号中，中间两个最近的括号为一组的
                        有编号：自左然后的左括号，以及与其匹配右括号
                        $a\(b\(c$\)mn$x$\).*\1

                \#: 引用第n个括号所匹配到的内容，而非模式自己
                    例如：
                        $ab\?c$.*\1   b出现0次或1次

                            abcmnaaa
                            abcmnabc
                            abcmnac
                            acxyac

        grep命令选项：
            -v: 反向选取
-v, --invert-match
      Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
            -o: 仅显示匹配的字串，而非字串所在的行
-o, --only-matching
      Print only the matched (non-empty) parts of a matching line, with each such part on a separate
      output line.
            -i: ignore-case，忽略字符大小写
-i, --ignore-case
      Ignore case distinctions in both the PATTERN and the input files. (-i is specified by POSIX.)
            -E: 支持使用扩展正则表达式
-E, --extended-regexp
      Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by
      POSIX.)
            -A #   匹配到对应的行及其下面#行
-A NUM, --after-context=NUM
    Print NUM lines of trailing context after matching lines. Places a line containing a group
    separator (--) between contiguous groups of matches. With the -o or --only-matching option,
    this has no effect and a warning is given.
[root@linux_basic ~]#grep -A 1 "root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
--
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
            -B #   匹配到对应的行及其上面#行
-B NUM, --before-context=NUM
      Print NUM lines of leading context before matching lines. Places a line containing a group
      separator (--) between contiguous groups of matches. With the -o or --only-matching option,
      this has no effect and a warning is given.
[root@linux_basic ~]#grep -B 2 "root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
--
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
            -C #   匹配到对应的行及其上下面#行
-C NUM, -NUM, --context=NUM
      Print NUM lines of output context. Places a line containing a group separator (--) between
      contiguous groups of matches. With the -o or --only-matching option, this has no effect and a
      warning is given.
[root@linux_basic ~]#grep -C 2 "root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
--
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin

        练习：
            一、显示/proc/meminfo文件中以大写或小写S开头的行；
            # grep -i '^s' /proc/meminfo
            # grep '^[Ss]' /proc/meminfo

# grep -E '^(S|s)' /proc/meminfo

二、显示/etc/passwd文件中其默认shell为非/sbin/nologin的用户；
# grep -v "/sbin/nologin$" /etc/passwd | cut -d: -f1

            三、显示/etc/passwd文件中其默认shell为/bin/bash的用户；
                进一步：仅显示上述结果中其ID号最大的用户；
            # grep "/bin/bash$" /etc/passwd | sort -t: -k3 -n | tail -1 | cut -d: -f1

            四、找出/etc/passwd文件中的一位数或两位数；
            # grep "\<[0-9][0-9]\?\>" /etc/passwd
            # grep "\<[0-9]\{1,2\}\>" /etc/passwd

五、显示/boot/grub/grub.conf中以致少一个空白字符开头的行；
# grep "^[[:space:]]\{1,\}" /boot/grub/grub.conf

六、显示/etc/rc.d/rc.sysinit文件中，以#开头，后面跟至少一个空白字符，然后又有至少一个非空白字符的行；
# grep "^#[[:space:]]\{1,\}[^[:space:]]\{1,\}" /etc/rc.d/rc.sysinit

七、找出netstat -tan命令执行结果中以'LISTEN'结尾的行；
# netstat -tan | grep "LISTEN[[:space:]]*$"

八、添加用户bash, testbash, basher, nologin（SHELL为/sbin/nologin），而找出当前系统上其用户名和默认shell相同的用户； # grep "^$[[:alnum:]]\{1,\}$:.*\1$" /etc/passwd