awk应用案例 - JavaShuo

awk并不是一个简单命令,它实际上是一门编程语言,适合处理文本类数据,awk处理文本是以记录(文本中的行)为单位的,它会遍历文件的每条记录并进行处理linux
awk语法格式: awk 'Pattern {Action}' filenamenginx
awk工做原理:awk读取一条记录,并将记录赋值给内部变量$0,记录被分隔符分割成多个字段,每一个字段存储到指定编号的变量中,从$1开始,(awk内部变量FS用来指定字段分隔符,默认为空格,包含制表符和空格符,也可用-F来自定义分隔符),对于每一条记录,按照给定的Pattern进行匹配,匹配成功则执行Action,匹配失败,则不执行Action,其中Pattern和Action都是可选的,但必须提供其中一个,若是未指定Pattern,则对全部输入行都执行Action,若是未指定Action,则输出匹配行的内容git
$0 表示文本中的一条记录即一行内容redis

[root@linux01 ~]#  head -n5 /etc/passwd| awk '{print $0}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

截取其中若干列内容,awk默认空格为分隔符,若是要改变分隔符可用"-F"选项自定义

[root@linux01 ~]#  head -n5 /etc/passwd| awk -F: '{print $1,$3}'
root 0
bin 1
daemon 2
adm 3
lp 4

添加修饰内容,BEGIN和END是awk关键字,前者表示在awk对文件处理前首先被执行,后者表示在awk对文件处理后被执行

[root@linux01 ~]#  head -n5 /etc/passwd| awk -F: 'BEGIN {print "=====begin====="} {print $1,$3} END {print "=====end====="}'
=====begin=====
root 0
bin 1
daemon 2
adm 3
lp 4
=====end=====

awk可实现数学计算,$3+$4表示以:(冒号)分隔的第三和第四列之和

[root@linux01 ~]#  head -n5 /etc/passwd| awk -F: 'BEGIN {print "=====begin====="} {print $1,$2,$3+$4} END {print "=====end====="}'
=====begin=====
root x 0
bin x 2
daemon x 4
adm x 7
lp x 11
=====end=====

针对输出结果进行过滤,~表示匹配意思

[root@linux01 ~]#  head -n5 /etc/passwd| awk -F: 'BEGIN {print "=====begin====="} $1~/lp/ {print $1,$2,$3+$4} END {print "=====end====="}'
=====begin=====
lp x 11
=====end=====

巧妙显示奇/偶行,'NR%2==0 {next}'做为awk的Pattern匹配部分,其中NR表示当前记录的行号,整句表示NR被整除为TRUE的话,则执行next动做,而遇到next,就会忽略后面的Action部分,直接进入下一行处理,因此显示为奇数行输出

[root@linux01 ~]# cat -n file1.txt    
     1	test 1
     2	one two three proce
     3	test 2 
     4	test test file
[root@linux01 ~]# awk 'NR%2==0 {next} {print NR,$0}' file1.txt 
1 test 1
3 test 2 
[root@linux01 ~]# awk 'NR%2!=0 {next} {print NR,$0}' file1.txt 
2 one two three proce
4 test test file

固定行合并,经过第一部分和第二部分语句,能够实现3行一组的前两行内容存放到变量T中,当处理第三行时,就会把全部以前存储的内容都输出来,而后清空变量T,再进行下一组数据处理

[root@linux01 ~]# cat -n file1.txt 
     1	test 1
     2	one two three proce
     3	test 2 
     4	test test file
     5	wsr239wfgrte
     6	fw9efs0art4
     7	4tesd8w40-sdgd
     8	43q8wsfjwrskdpoht
     9	rear8ut9spgoog
    10	45678
    11	s34otjvsa;
    12	4su02q9jdgopro[
[root@linux01 ~]# awk 'NR%3!=0 {T=(T" "$0);next} {print T,$0;T=""}' file1.txt 
 test 1 one two three proce test 2 
 test test file wsr239wfgrte fw9efs0art4
 4tesd8w40-sdgd 43q8wsfjwrskdpoht rear8ut9spgoog
 45678 s34otjvsa; 4su02q9jdgopro[

以上案例存在一个小问题,文件行数恰好是3的倍数,若是不是3的倍数呢...

[root@linux01 ~]# cat -n file1.txt 
     1	test 1
     2	one two three proce
     3	test 2 
     4	test test file
     5	wsr239wfgrte
     6	fw9efs0art4
     7	4tesd8w40-sdgd
     8	43q8wsfjwrskdpoht
     9	rear8ut9spgoog
    10	45678
    11	s34otjvsa;
[root@linux01 ~]# awk 'NR%3!=0 {T=(T" "$0);next} {print T,$0;T=""}' file1.txt     -->发现丢失了10,11行内容
 test 1 one two three proce test 2 
 test test file wsr239wfgrte fw9efs0art4
 4tesd8w40-sdgd 43q8wsfjwrskdpoht rear8ut9spgoog

[root@linux01 ~]# awk 'NR%3!=0 {T=(T" "$0);next} {print T,$0;T=""} END {print T}' file1.txt     -->改进后命令,由于文本行数不是3的倍数时,不是3的倍数的最后一组记录存放在变量T中,只要咱们把它输出便可完美解决此问题
 test 1 one two three proce test 2 
 test test file wsr239wfgrte fw9efs0art4
 4tesd8w40-sdgd 43q8wsfjwrskdpoht rear8ut9spgoog
 45678 s34otjvsa;

不定行数合并,在于用正则来找规律分组

[root@linux01 ~]# cat -n file1.txt 
     1	test 1
     2	one two three proce
     3	test 2
     4	abc bca file
     5	wsr239wfgrte
     6	fw9efs0art4
     7	4tesd8w40-sdgd
     8	test 3
     9	43q8wsfjwrskdpoht
    10	rear8ut9spgoog
    11	45678
    12	s34otjvsa;
    13	test 4
[root@linux01 ~]# awk 'BEGIN {T=""} /test/ {print T;T=$0;next} {T=T" "$0} END {print T}' file1.txt 

test 1 one two three proce
test 2 abc bca file wsr239wfgrte fw9efs0art4 4tesd8w40-sdgd
test 3 43q8wsfjwrskdpoht rear8ut9spgoog 45678 s34otjvsa;
test 4

合并全部行

[root@linux01 ~]# cat -n file1.txt 
     1	test 1
     2	one two three proce
     3	test 2
     4	abc bca file
     5	wsr239wfgrte
     6	fw9efs0art4
     7	4tesd8w40-sdgd
     8	test 3
     9	43q8wsfjwrskdpoht
    10	rear8ut9spgoog
    11	45678
    12	s34otjvsa;
    13	test 4
[root@linux01 ~]# awk '{T=T" "$0} END {print T}' file1.txt 
 test 1 one two three proce test 2 abc bca file wsr239wfgrte fw9efs0art4 4tesd8w40-sdgd test 3 43q8wsfjwrskdpoht rear8ut9spgoog 45678 s34otjvsa; test 4

** 输出多个文件内容sql

[root@linux01 ~]# cat new.txt 
root:x:0:0:root:/root:/bin/bash
this is a file
good ha 
feng file
[root@linux01 ~]# cat test.txt 
/0/,/3/p
sfwefov
w342re0k
fewfklfs;ar
sfd9wo4sd
[root@linux01 ~]# awk '{print $0}' new.txt test.txt 
root:x:0:0:root:/root:/bin/bash
this is a file
good ha 
feng file
/0/,/3/p
sfwefov
w342re0k
fewfklfs;ar
sfd9wo4sd

** 输出第一个文件的第一行和第二个文件的第二行内容,NR表示已读记录数,不论有几个文件,每读一条记录,值就会加1,FNR表示当前文件已读记录数,每读取一条记录,值就会加1,但更换文件后,该变量会从新从零开始,NR==FNR时,表示awk正在处理第一个文件,NR>FNR时,表示正在处理第二个文件,FILENAME表示当前处理文件的文件名shell

[root@linux01 ~]# cat new.txt 
root:x:0:0:root:/root:/bin/bash
this is a file
good ha 
feng file
[root@linux01 ~]# cat test.txt 
/0/,/3/p
sfwefov
w342re0k
fewfklfs;ar
sfd9wo4sd
[root@linux01 ~]# awk 'NR==FNR&&FNR==1 {print FILENAME,$0} NR>FNR&&FNR==2 {print FILENAME,$0}' n
ew.txt test.txt 
new.txt root:x:0:0:root:/root:/bin/bash
test.txt sfwefov

** 上面的案例在处理两个以上文件时有所局限,咱们能够转变一下思路,引入一个环境变量,叫作ARGIND,用来指示当前处理的文件编号编程

[root@linux01 ~]# cat new.txt 
root:x:0:0:root:/root:/bin/bash
this is a file
good ha 
feng file
[root@linux01 ~]# cat test.txt 
/0/,/3/p
sfwefov
w342re0k
fewfklfs;ar
sfd9wo4sd
[root@linux01 ~]# cat file1.txt 
test 1
one two three proce
test 2
abc bca file
wsr239wfgrte
fw9efs0art4
4tesd8w40-sdgd
test 3
43q8wsfjwrskdpoht
rear8ut9spgoog
45678
s34otjvsa;
test 4
[root@linux01 ~]# awk 'ARGIND==1&&FNR==1 {print FILENAME,$0} ARGIND==2&&FNR==2 {print FILENAME,$0} ARGIND==3&&FNR==3 {print FILENAME,$0}' new.txt test.txt file1.txt 
new.txt root:x:0:0:root:/root:/bin/bash
test.txt sfwefov
file1.txt test 2

** 上面案例还能够用相似数组方式处理,ARGV表示命令行参数的数组vim

[root@linux01 ~]# cat new.txt 
root:x:0:0:root:/root:/bin/bash
this is a file
good ha 
feng file
[root@linux01 ~]# cat test.txt 
/0/,/3/p
sfwefov
w342re0k
fewfklfs;ar
sfd9wo4sd
[root@linux01 ~]# cat file1.txt 
test 1
one two three proce
test 2
abc bca file
wsr239wfgrte
fw9efs0art4
4tesd8w40-sdgd
test 3
43q8wsfjwrskdpoht
rear8ut9spgoog
45678
s34otjvsa;
test 4
[root@linux01 ~]# awk 'FILENAME==ARGV[1]&&FNR==1 {print FILENAME,$0} FILENAME==ARGV[2]&&FNR==2 {print FILENAME,$0} FILENAME==ARGV[3]&&FNR==3 {print FILENAME,$0}' new.txt test.txt file1.txt 
new.txt root:x:0:0:root:/root:/bin/bash
test.txt sfwefov
file1.txt test 2

** 咱们想用/etc/shadow文件中的加密字段替换/etc/passwd文件中的x密码字段,OFS表示输出时指定的分隔符,FS表示字段分隔符,读取shadow文件时,把加密字段内容放入下标为用户列的数组中,再读取passwd文件时,再赋值给x密码字段,输出修改后内容便可数组

[root@linux01 ~]# awk 'BEGIN {OFS=FS=":"} NR==FNR {a[$1]=$2} NR>FNR {$2=a[$1];print}' /etc/shadow /etc/passwd
root:$6$CYXcNKe6hd7jzBgx$cA8WUgq/mYEymotuD0YAJRBkYtgc5nc8MFfPokHjC0LrfmHGtSIx3zE0YS5ML2Dc2YaG8Kl7khssL0faik7AS.:0:0:root:/root:/bin/bash
bin:*:1:1:bin:/bin:/sbin/nologin
daemon:*:2:2:daemon:/sbin:/sbin/nologin
adm:*:3:4:adm:/var/adm:/sbin/nologin
lp:*:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:*:5:0:sync:/sbin:/bin/sync
shutdown:*:6:0:shutdown:/sbin:/sbin/shutdown
halt:*:7:0:halt:/sbin:/sbin/halt
mail:*:8:12:mail:/var/spool/mail:/sbin/nologin
operator:*:11:0:operator:/root:/sbin/nologin
games:*:12:100:games:/usr/games:/sbin/nologin
ftp:*:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:*:99:99:Nobody:/:/sbin/nologin
systemd-bus-proxy:!!:999:997:systemd Bus Proxy:/:/sbin/nologin
systemd-network:!!:192:192:systemd Network Management:/:/sbin/nologin
dbus:!!:81:81:System message bus:/:/sbin/nologin
polkitd:!!:998:996:User for polkitd:/:/sbin/nologin
tss:!!:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
postfix:!!:89:89::/var/spool/postfix:/sbin/nologin
sshd:!!:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
chrony:!!:997:995::/var/lib/chrony:/sbin/nologin
jenkins:!!:996:994:Jenkins Automation Server:/var/lib/jenkins:/bin/false
gitlab-www:!!:995:992::/var/opt/gitlab/nginx:/bin/false
git:!!:994:991::/var/opt/gitlab:/bin/sh
gitlab-redis:!!:993:990::/var/opt/gitlab/redis:/bin/false
gitlab-psql:!!:992:989::/var/opt/gitlab/postgresql:/bin/sh
gitlab-prometheus:!!:991:988::/var/opt/gitlab/prometheus:/bin/sh
virftp:!!:1000:1000::/home/virftp:/sbin/nologin

** getline操做文件案例,你会以为getline命令也能够实现奇偶行输出功能,但不建议这样用,由于有缺憾,只有在处理文本总数是偶数行时适用,在awk中,当getline左右没有"<" "|"时,getline是对当前打开文件操做,表示读取当前行的下一行内容,并把数据赋值给$0,同时更新NF,NR,FNRbash

[root@linux01 ~]# seq 10|awk '{getline;print}'
2
4
6
8
10
[root@linux01 ~]# seq 10|awk '{print;getline}'
1
3
5
7
9
[root@linux01 ~]# seq 10|awk '{print;getline;print}'
1
2
3
4
5
6
7
8
9
10
[root@linux01 ~]# seq 11|awk '{getline;print}'    -->若是换成奇数行内容getline命令就会出错了
2
4
6
8
10
11

** 在awk中,当getline左右有重定向符"< 、<<"或"|"时,getline做用于其后的输入文件而不是当前打开的文件,因为输入文件是新打开的,并无被awk读取,所以getline返回的是该文件的第一行,而不是当前读取文件的隔行内容,getline的返回值: 1:表示正常读取一行数据;0:表示到了文件末尾;-1:表示读取遇到错误;所以案例中当处理test文件第一行内容时,getline会读取文件num的内容,直至文件指针指向文件末尾,之后每处理一行文件test的数据,getline返回值都是0,因此不会再有文件num的内容输出,getline重定向文件时,后面必须跟字符串类型,所以文件名必定要加双引号

[root@linux01 ~]# cat test.txt 
1 /sdfa/,/ewra/p
2 sfwefov
3 wsfresdfwek
4 fewfklfs;ar
5 sfdsfawowesd
[root@linux01 ~]# cat num.txt 
10 20 30
40 50 60
[root@linux01 ~]# awk '' test.txt 
[root@linux01 ~]# awk '{print $0;while((getline < "num.txt") > 0) print $0}' test.txt 
1 /sdfa/,/ewra/p
10 20 30
40 50 60
2 sfwefov
3 wsfresdfwek
4 fewfklfs;ar
5 sfdsfawowesd

** 用system调用shell,awk经过system()函数来调用shell的程序

[root@linux01 ~]# awk 'BEGIN {system("ls -al")}'
总用量 136
dr-xr-x---.  8 root root  4096 12月 11 15:13 .
dr-xr-xr-x. 18 root root   256 11月 23 21:53 ..
-rw-r-----   1 root root  1502 11月 24 21:29 1.log
-rw-r--r--   1 root root   161 11月 20 10:38 adduser.sh
-rw-------.  1 root root  1422 11月 14 18:11 anaconda-ks.cfg
-rw-------.  1 root root 18728 12月 11 11:56 .bash_history
-rw-r--r--.  1 root root    18 12月 29 2013 .bash_logout
-rw-r--r--.  1 root root   176 12月 29 2013 .bash_profile
-rw-r--r--.  1 root root   176 12月 29 2013 .bashrc
drwxr-xr-x   3 root root    18 11月 28 23:08 .config
-rw-r--r--.  1 root root   100 12月 29 2013 .cshrc
-rw-r--r--   1 root root    91 11月 20 10:33 deluser.sh
-rw-r--r--   1 root root   151 12月 10 12:12 file1.txt
-rw-r--r--   1 root root    56 11月 24 01:08 .gitconfig
-rw-r--r--   1 root root  4360 11月 22 15:11 :ii:wq
-rw-r--r--   1 root root    20 12月 11 10:59 k1.txt
-rw-------   1 root root    54 11月 23 01:29 .lesshst
drwxr-xr-x   3 root root    19 11月 28 23:08 .local
drwxr-x---   2 root root    19 11月 24 21:28 logs
-rw-r--r--   1 root root    66 12月 10 21:33 new.txt
-rw-r--r--   1 root root    18 12月 11 15:13 num.txt
-rw-r--r--   1 root root  1494 11月 23 17:38 passwd
-rw-r--r--   1 root root 12288 11月 27 14:54 .passwd.swp
drwxr-----   3 root root    19 11月 20 20:47 .pki
-rw-------   1 root root  1024 11月 29 00:45 .rnd
drwxr-xr-x   3 root root    32 11月 23 21:38 sample
drwx------   2 root root    82 11月 23 16:34 .ssh
-rw-r--r--.  1 root root   129 12月 29 2013 .tcshrc
-rw-r--r--   1 root root    43 11月 22 11:27 tee
-rw-r--r--   1 root root    70 12月 11 11:40 test.txt
-rw-------   1 root root  5627 12月 10 12:12 .viminfo
-rw-------   1 root root  4286 11月 27 18:01 .viminfo.tmp