注:该文原文是 Chapter 5. Useful SystemTap Scriptshtml
注:还未完成,先丢上来纯粹是为了测试新功能目录结构滴。这个备注在文章完成后,会删除滴。node
本章列举了几种能够用来监测和调查不一样的子系统的 SystemTap 脚本。一旦你安装了 systemtap-testsuite
RPM 包,全部的这些脚本均可以在 /usr/share/systemtap/testsuite/systemtap.examples/
目录下找到。redis
后面的章节展现了跟踪网络相关的函数和构建一个网络活动的概要文件的脚本。网络
本节描述了如何描述网络活动,nettop.stp 提供了一个了解在每台机器上每一个进程生成了多少网络流量的机会。app
nettop.stpssh
#! /usr/bin/env stap global ifxmit, ifrecv global ifmerged probe netdev.transmit { ifxmit[pid(), dev_name, execname(), uid()] <<< length } probe netdev.receive { ifrecv[pid(), dev_name, execname(), uid()] <<< length } function print_activity() { printf("%5s %5s %-7s %7s %7s %7s %7s %-15s\n", "PID", "UID", "DEV", "XMIT_PK", "RECV_PK", "XMIT_KB", "RECV_KB", "COMMAND") foreach ([pid, dev, exec, uid] in ifrecv) { ifmerged[pid, dev, exec, uid] += @count(ifrecv[pid,dev,exec,uid]); } foreach ([pid, dev, exec, uid] in ifxmit) { ifmerged[pid, dev, exec, uid] += @count(ifxmit[pid,dev,exec,uid]); } foreach ([pid, dev, exec, uid] in ifmerged-) { n_xmit = @count(ifxmit[pid, dev, exec, uid]) n_recv = @count(ifrecv[pid, dev, exec, uid]) printf("%5d %5d %-7s %7d %7d %7d %7d %-15s\n", pid, uid, dev, n_xmit, n_recv, n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0, n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0, exec) } print("\n") delete ifxmit delete ifrecv delete ifmerged } probe timer.ms(5000), end, error { print_activity() }
注意 function print_activity()
使用如下表达式:socket
n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0 n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0
这些表达式是 if/else 条件判断语句,上面第二个语句是如下伪代码的一个更简洁的写做方式:tcp
if n_recv != 0 then @sum(ifrecv[pid, dev, exec, uid])/1024 else 0
nettop.stp
跟踪在系统上哪一个进程在生成网络流量,并提供关于进程的如下信息:ide
nettop.stp
每 5 秒提供网络性能分析取样。你能够根据 probe timer.ms(5000)
改变这个设置, Example 5.1, “nettop.stp Sample Output” 包含了一份从 nettop.stp
输出的 20s 内的摘录。函数
Example 5.1. nettop.stp Sample Output [...] PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 0 5 0 0 swapper 11178 0 eth0 2 0 0 0 synergyc PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 2886 4 eth0 79 0 5 0 cups-polld 11362 0 eth0 0 61 0 5 firefox 0 0 eth0 3 32 0 3 swapper 2886 4 lo 4 4 0 0 cups-polld 11178 0 eth0 3 0 0 0 synergyc PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 0 6 0 0 swapper 2886 4 lo 2 2 0 0 cups-polld 11178 0 eth0 3 0 0 0 synergyc 3611 0 eth0 0 1 0 0 Xorg PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 3 42 0 2 swapper 11178 0 eth0 43 1 3 0 synergyc 11362 0 eth0 0 7 0 0 firefox 3897 0 eth0 0 1 0 0 multiload-apple [...]
本节描述了怎样从 net/socket.c
文件中跟踪函数调用。这个任务能够帮助你在更多的细节识别,在内核中,每一个进程是怎么与网络交互的。
socket-trace.stp
#! /usr/bin/env stap probe kernel.function("*@net/socket.c").call { printf ("%s -> %s\n", thread_indent(1), ppfunc()) } probe kernel.function("*@net/socket.c").return { printf ("%s <- %s\n", thread_indent(-1), ppfunc()) }
socket-trace.stp
是彻底和 Example 3.6, “thread_indent.stp” 同样的。最先在 SystemTap Functions 中使用用于证实 thread_indent()
是怎么工做的。
Example 5.2. socket-trace.stp Sample Output [...] 0 Xorg(3611): -> sock_poll 3 Xorg(3611): <- sock_poll 0 Xorg(3611): -> sock_poll 3 Xorg(3611): <- sock_poll 0 gnome-terminal(11106): -> sock_poll 5 gnome-terminal(11106): <- sock_poll 0 scim-bridge(3883): -> sock_poll 3 scim-bridge(3883): <- sock_poll 0 scim-bridge(3883): -> sys_socketcall 4 scim-bridge(3883): -> sys_recv 8 scim-bridge(3883): -> sys_recvfrom 12 scim-bridge(3883):-> sock_from_file 16 scim-bridge(3883):<- sock_from_file 20 scim-bridge(3883):-> sock_recvmsg 24 scim-bridge(3883):<- sock_recvmsg 28 scim-bridge(3883): <- sys_recvfrom 31 scim-bridge(3883): <- sys_recv 35 scim-bridge(3883): <- sys_socketcall [...]
Example 5.2, “socket-trace.stp Sample Output” 包含了 socket-trace.stp 输出中的 3s 引用。想要脚本 thread_indent()
提供的更多信息,请移步至 SystemTap Functions Example 3.6, “thread_indent.stp”。
本节说明如何监控传入的TCP链接。这个任务在识别任何未受权的,可疑的,或是没必要要的实时网络访问请求方面十分有用。
tcp_connections.stp
#! /usr/bin/env stap probe begin { printf("%6s %16s %6s %6s %16s\n", "UID", "CMD", "PID", "PORT", "IP_SOURCE") } probe kernel.function("tcp_accept").return?, kernel.function("inet_csk_accept").return? { sock = $return if (sock != 0) printf("%6d %16s %6d %6d %16s\n", uid(), execname(), pid(), inet_get_local_port(sock), inet_get_ip_source(sock)) }
当 tcp_connections.stp
正在运行,它将打印任何关于被系统实时接收的 TCP 链接的如下信息:
Example 5.3. tcp_connections.stp Sample Output UID CMD PID PORT IP_SOURCE 0 sshd 3165 22 10.64.0.227 0 sshd 3165 22 10.64.0.227
本节说明了如何监控被系统接收的 TCP 包。这个对分析在系统上运行的应用生成的网络流量很是有用。
tcpdumplike.stp
#! /usr/bin/env stap // A TCP dump like example probe begin, timer.s(1) { printf("-----------------------------------------------------------------\n") printf(" Source IP Dest IP SPort DPort U A P R S F \n") printf("-----------------------------------------------------------------\n") } probe udp.recvmsg /* ,udp.sendmsg */ { printf(" %15s %15s %5d %5d UDP\n", saddr, daddr, sport, dport) } probe tcp.receive { printf(" %15s %15s %5d %5d %d %d %d %d %d %d\n", saddr, daddr, sport, dport, urg, ack, psh, rst, syn, fin) }
当 tcpdumplike.stp
在运行,它将打印如下关于任何被实时接收的 TCP 包的信息:
为了肯定被包使用的标志,tcpdumplike.stp
使用了如下函数:
上述函数返回 1 或 0 来指定包是否使用了匹配的标志。
Example 5.4. tcpdumplike.stp Sample Output ----------------------------------------------------------------- Source IP Dest IP SPort DPort U A P R S F ----------------------------------------------------------------- 209.85.229.147 10.0.2.15 80 20373 0 1 1 0 0 0 92.122.126.240 10.0.2.15 80 53214 0 1 0 0 1 0 92.122.126.240 10.0.2.15 80 53214 0 1 0 0 0 0 209.85.229.118 10.0.2.15 80 63433 0 1 0 0 1 0 209.85.229.118 10.0.2.15 80 63433 0 1 0 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.118 10.0.2.15 80 63433 0 1 1 0 0 0 [...]
在 Linux 网络栈能够由于各类缘由丢弃数据包。一些 Linux 内核包含了跟踪点,kernel.trace("kfree_skb")
,能够很容易的跟踪包在哪里丢弃了。 dropwatch.stp 使用 kernel.trace("kfree_skb")
来追踪包丢弃;这个脚本概述了每 5 秒的间隔包丢弃的位置。
dropwatch.stp
#! /usr/bin/env stap ############################################################ # Dropwatch.stp # Author: Neil Horman <nhorman@redhat.com> # An example script to mimic the behavior of the dropwatch utility # http://fedorahosted.org/dropwatch ############################################################ # Array to hold the list of drop points we find global locations # Note when we turn the monitor on and off probe begin { printf("Monitoring for dropped packets\n") } probe end { printf("Stopping dropped packet monitor\n") } # increment a drop counter for every location we drop at probe kernel.trace("kfree_skb") { locations[$location] <<< 1 } # Every 5 seconds report our drop locations probe timer.sec(5) { printf("\n") foreach (l in locations-) { printf("%d packets dropped at %s\n", @count(locations[l]), symname(l)) } delete locations }
kernel.trace("kfree_skb")
跟踪到内核丢弃网络包的位置。kernel.trace("kfree_skb")
有两个参数:一个指向缓冲区的指针被释放($skb)的 buffer,内核代码缓冲区的位置被释放($location)。dropwatch.stp
脚本提供了包含 $location
的函数。把 $location
映射回函数的信息不是测量的默认值。在 SystemTap 1.4 ,--all-modules
选项将包含要求的映射信息,如下命令能够被用于运行这个脚本。
stap --all-modules dropwatch.stp
在 SystemTap 的老版本,你可使用如下命令来模仿 --all-modules
选项:
stap -dkernel \ `cat /proc/modules | awk 'BEGIN { ORS = " " } {print "-d"$1}'` \ dropwatch.stp
运行 dropwatch.stp 脚本 15s 将有相似 Example 5.5, “dropwatch.stp Sample Output” 的输出结果。
Example 5.5. dropwatch.stp Sample Output Monitoring for dropped packets 1762 packets dropped at unix_stream_recvmsg 4 packets dropped at tun_do_read 2 packets dropped at nf_hook_slow 467 packets dropped at unix_stream_recvmsg 20 packets dropped at nf_hook_slow 6 packets dropped at tun_do_read 446 packets dropped at unix_stream_recvmsg 4 packets dropped at tun_do_read 4 packets dropped at nf_hook_slow Stopping dropped packet monitor
当脚本在一台机器上编译,在另一台机器上运行, --all-modules
和 /proc/modules
目录是不可用的。symname
函数将打印出原始地址。为了使得原始地址丢弃的更有意义,涉及 /boot/System.map-
uname -r`` 文件。文件列表列出了每一个函数的开始地址。容许你映射地址到 Example 5.5, “dropwatch.stp Sample Output”
输出的一个指定的函数名字。获得 /boot/System.map-
uname -r 文件的如下片断。
0xffffffff8149a8ed
地址映射到函数 unix_stream_recvmsg
:
[...] ffffffff8149a420 t unix_dgram_poll ffffffff8149a5e0 t unix_stream_recvmsg ffffffff8149ad00 t unix_find_other [...]
后面的章节展现了监控磁盘和 I/O 活动的脚本。
这节描述了怎样识别哪一个进程在执行频繁的磁盘 reads/writes。
disktop.stp
#!/usr/bin/env stap # # Copyright (C) 2007 Oracle Corp. # # Get the status of reading/writing disk every 5 seconds, # output top ten entries # # This is free software,GNU General Public License (GPL); # either version 2, or (at your option) any later version. # # Usage: # ./disktop.stp # global io_stat,device global read_bytes,write_bytes probe vfs.read.return { if ($return>0) { if (devname!="N/A") {/*skip read from cache*/ io_stat[pid(),execname(),uid(),ppid(),"R"] += $return device[pid(),execname(),uid(),ppid(),"R"] = devname read_bytes += $return } } } probe vfs.write.return { if ($return>0) { if (devname!="N/A") { /*skip update cache*/ io_stat[pid(),execname(),uid(),ppid(),"W"] += $return device[pid(),execname(),uid(),ppid(),"W"] = devname write_bytes += $return } } } probe timer.ms(5000) { /* skip non-read/write disk */ if (read_bytes+write_bytes) { printf("\n%-25s, %-8s%4dKb/sec, %-7s%6dKb, %-7s%6dKb\n\n", ctime(gettimeofday_s()), "Average:", ((read_bytes+write_bytes)/1024)/5, "Read:",read_bytes/1024, "Write:",write_bytes/1024) /* print header */ printf("%8s %8s %8s %25s %8s %4s %12s\n", "UID","PID","PPID","CMD","DEVICE","T","BYTES") } /* print top ten I/O */ foreach ([process,cmd,userid,parent,action] in io_stat- limit 10) printf("%8d %8d %8d %25s %8s %4s %12d\n", userid,process,parent,cmd, device[process,cmd,userid,parent,action], action,io_stat[process,cmd,userid,parent,action]) /* clear data */ delete io_stat delete device read_bytes = 0 write_bytes = 0 } probe end{ delete io_stat delete device delete read_bytes delete write_bytes }
disktop.stp
输出了最频繁读写磁盘的前 10 进程。Example 5.6, “disktop.stp Sample Output”显示了这个脚本的取样输出,每一个列出的进程包含如下数据:
disktop.stp
输出的时间和日期是由函数 ctime()
和 gettimeofday_s(). ctime()
返回的。硬件时钟从 UNIX 时间(January 1, 1970)以秒为单位传递。 gettimeofday_s()
计算了从 UNIX 时间的实际秒数。给出了一个至关准确的人类可读的时间戳做为输出。
在这个脚本中,$return
是一个本地变量,存储了每一个进程从虚拟文件系统读或写的实际字节数。$return
仅能被用于返回探针(例如, vfs.read.return
)。
Example 5.6. disktop.stp Sample Output [...] Mon Sep 29 03:38:28 2008 , Average: 19Kb/sec, Read: 7Kb, Write: 89Kb UID PID PPID CMD DEVICE T BYTES 0 26319 26294 firefox sda5 W 90229 0 2758 2757 pam_timestamp_c sda5 R 8064 0 2885 1 cupsd sda5 W 1678 Mon Sep 29 03:38:38 2008 , Average: 1Kb/sec, Read: 7Kb, Write: 1Kb UID PID PPID CMD DEVICE T BYTES 0 2758 2757 pam_timestamp_c sda5 R 8064 0 2885 1 cupsd sda5 W 1678
这节描述了每一个进程读或写任何文件所花费的时间。这对肯定哪一个文件在系统中加载慢是很是有用的。
iotime.stp
#! /usr/bin/env stap /* * Copyright (C) 2006-2007 Red Hat Inc. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions * of the GNU General Public License v.2. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. * * Print out the amount of time spent in the read and write systemcall * when each file opened by the process is closed. Note that the systemtap * script needs to be running before the open operations occur for * the script to record data. * * This script could be used to to find out which files are slow to load * on a machine. e.g. * * stap iotime.stp -c 'firefox' * * Output format is: * timestamp pid (executabable) info_type path ... * * 200283135 2573 (cupsd) access /etc/printcap read: 0 write: 7063 * 200283143 2573 (cupsd) iotime /etc/printcap time: 69 * */ global start global time_io function timestamp:long() { return gettimeofday_us() - start } function proc:string() { return sprintf("%d (%s)", pid(), execname()) } probe begin { start = gettimeofday_us() } global filehandles, fileread, filewrite probe syscall.open.return { filename = user_string($filename) if ($return != -1) { filehandles[pid(), $return] = filename } else { printf("%d %s access %s fail\n", timestamp(), proc(), filename) } } probe syscall.read.return { p = pid() fd = $fd bytes = $return time = gettimeofday_us() - @entry(gettimeofday_us()) if (bytes > 0) fileread[p, fd] += bytes time_io[p, fd] <<< time } probe syscall.write.return { p = pid() fd = $fd bytes = $return time = gettimeofday_us() - @entry(gettimeofday_us()) if (bytes > 0) filewrite[p, fd] += bytes time_io[p, fd] <<< time } probe syscall.close { if ([pid(), $fd] in filehandles) { printf("%d %s access %s read: %d write: %d\n", timestamp(), proc(), filehandles[pid(), $fd], fileread[pid(), $fd], filewrite[pid(), $fd]) if (@count(time_io[pid(), $fd])) printf("%d %s iotime %s time: %d\n", timestamp(), proc(), filehandles[pid(), $fd], @sum(time_io[pid(), $fd])) } delete fileread[pid(), $fd] delete filewrite[pid(), $fd] delete filehandles[pid(), $fd] delete time_io[pid(),$fd] }
iotime.stp
追踪系统调用打开, 关闭, 读, 和 写一个文件的时间。对于每一个系统调用访问,iotime.stp
会计算任何读写花费的微秒数和追踪读写进文件中的数据量。
iotime.stp
也使用本地变量 $count
来追踪任何系统调用试图读和写的数据量。注意 $return
(被用于 Section 5.2.1, “Summarizing Disk Read/Write Traffic” 的 disktop.stp ) 存储读写的实际数据量。 $count
仅能被用于追踪数据读写的探针上(是 syscall.read
和 syscall.write
)。
Example 5.7. iotime.stp Sample Output [...] 825946 3364 (NetworkManager) access /sys/class/net/eth0/carrier read: 8190 write: 0 825955 3364 (NetworkManager) iotime /sys/class/net/eth0/carrier time: 9 [...] 117061 2460 (pcscd) access /dev/bus/usb/003/001 read: 43 write: 0 117065 2460 (pcscd) iotime /dev/bus/usb/003/001 time: 7 [...] 3973737 2886 (sendmail) access /proc/loadavg read: 4096 write: 0 3973744 2886 (sendmail) iotime /proc/loadavg time: 11 [...]
Example 5.7, “iotime.stp Sample Output”
打印如下数据:
若是一个进程能够读写任何数据,一对 access 和 iotime 应该出如今一块儿, access 行的时间戳涉及到一个给定的进程访问文件的时间;在这行的最后,它将显示读写字节数。iotime 行显示了一个进程为了执行读写所花费的时间。
若是 access 行后跟随的不是任何 iotime 行,意味着该进程没有读写任何数据。
这节描述了怎样跟踪累积的系统 I/O。
traceio.stp
#! /usr/bin/env stap # traceio.stp # Copyright (C) 2007 Red Hat, Inc., Eugene Teo <eteo@redhat.com> # Copyright (C) 2009 Kai Meyer <kai@unixlords.com> # Fixed a bug that allows this to run longer # And added the humanreadable function # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # global reads, writes, total_io probe vfs.read.return { if ($return > 0) { reads[pid(),execname()] += $return total_io[pid(),execname()] += $return } } probe vfs.write.return { if ($return > 0) { writes[pid(),execname()] += $return total_io[pid(),execname()] += $return } } function humanreadable(bytes) { if (bytes > 1024*1024*1024) { return sprintf("%d GiB", bytes/1024/1024/1024) } else if (bytes > 1024*1024) { return sprintf("%d MiB", bytes/1024/1024) } else if (bytes > 1024) { return sprintf("%d KiB", bytes/1024) } else { return sprintf("%d B", bytes) } } probe timer.s(1) { foreach([p,e] in total_io- limit 10) printf("%8d %15s r: %12s w: %12s\n", p, e, humanreadable(reads[p,e]), humanreadable(writes[p,e])) printf("\n") # Note we don't zero out reads, writes and total_io, # so the values are cumulative since the script started. }
traceio.stp
打印了前十的可执行文件生成 I/O 通讯。此外,它也跟踪 I/O 读写的累积数量,经过这些前十的可执行文件。这些信息会被追踪并每隔 1s 打印出来,以降序的方式。
注意 traceio.stp
也使用本地变量 $return
,被 Section 5.2.1, “Summarizing Disk Read/Write Traffic” 章节的 disktop.stp 使用的。
Example 5.8. traceio.stp Sample Output [...] Xorg r: 583401 KiB w: 0 KiB floaters r: 96 KiB w: 7130 KiB multiload-apple r: 538 KiB w: 537 KiB sshd r: 71 KiB w: 72 KiB pam_timestamp_c r: 138 KiB w: 0 KiB staprun r: 51 KiB w: 51 KiB snmpd r: 46 KiB w: 0 KiB pcscd r: 28 KiB w: 0 KiB irqbalance r: 27 KiB w: 4 KiB cupsd r: 4 KiB w: 18 KiB Xorg r: 588140 KiB w: 0 KiB floaters r: 97 KiB w: 7143 KiB multiload-apple r: 543 KiB w: 542 KiB sshd r: 72 KiB w: 72 KiB pam_timestamp_c r: 138 KiB w: 0 KiB staprun r: 51 KiB w: 51 KiB snmpd r: 46 KiB w: 0 KiB pcscd r: 28 KiB w: 0 KiB irqbalance r: 27 KiB w: 4 KiB cupsd r: 4 KiB w: 18 KiB
这节描述了怎样在指定设备上监控 I/O 活动。
traceio2.stp
#! /usr/bin/env stap global device_of_interest probe begin { /* The following is not the most efficient way to do this. One could directly put the result of usrdev2kerndev() into device_of_interest. However, want to test out the other device functions */ dev = usrdev2kerndev($1) device_of_interest = MKDEV(MAJOR(dev), MINOR(dev)) } probe vfs.write, vfs.read { if (dev == device_of_interest) printf ("%s(%d) %s 0x%x\n", execname(), pid(), ppfunc(), dev) }
traceio2.stp
须要一个参数:整个设备号。为了获取这个数字,使用 stat -c "0x%D" directory
,directory
位于被监控的设备。
usrdev2kerndev()
函数把整个设备号转换成内核可理解的格式。usrdev2kerndev()
产生的输出被用于链接 MKDEV()
, MINOR()
, 和 MAJOR()
函数来肯定指定设备的最大和最小的数字。
traceio2.stp
输出包含任何执行读写进程的 ID 和名字,执行的函数(vfs_read 或 vfs_write),和内核设备号。
如下示例是从 stap traceio2.stp 0x805
的完整输出摘录的,0x805
是 /home
的整个设备号,/home
在 /dev/sda5
中,就是咱们但愿监控的设备。
Example 5.9. traceio2.stp Sample Output [...] synergyc(3722) vfs_read 0x800005 synergyc(3722) vfs_read 0x800005 cupsd(2889) vfs_write 0x800005 cupsd(2889) vfs_write 0x800005 cupsd(2889) vfs_write 0x800005 [...]
这节描述了怎样监控文件的实时读写。
inodewatch.stp
#! /usr/bin/env stap probe vfs.write, vfs.read { # dev and ino are defined by vfs.write and vfs.read if (dev == MKDEV($1,$2) # major/minor device && ino == $3) printf ("%s(%d) %s 0x%x/%u\n", execname(), pid(), ppfunc(), dev, ino) }