原文地址:http://www.yellow-bricks.com/esxtop/ 来自 Duncan Epping
node
esxtop 命令的指标和对应阈值(原文做者根据官方文档,测试和使用经验给出的参考值)bash
Display |
Metric | Threshold | Explanation |
---|---|---|---|
CPU |
%RDY | 10 |
Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs |
CPU |
%CSTP |
3 |
Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities. |
CPU | %SYS |
20 |
The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause |
CPU | %MLMTD |
0 |
The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU. |
CPU |
%SWPWT |
5 |
VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment. |
MEM |
MCTLSZ |
1 |
If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited. |
MEM |
SWCUR |
1 |
If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment. |
MEM |
SWR/s |
1 |
If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment. |
MEM |
SWW/s |
1 |
If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment. |
MEM |
CACHEUSD |
0 |
If larger than 0 host has compressed memory. Possible cause: Memory overcommitment. |
MEM |
ZIP/s |
0 |
If larger than 0 host is actively compressing memory. Possible cause: Memory overcommitment. |
MEM |
UNZIP/s |
0 |
If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory. |
MEM |
N%L |
80 |
If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used. |
NETWORK |
%DRPTX |
1 |
Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization |
NETWORK |
%DRPRX |
1 |
Dropped packets received, hardware overworked. Possible cause: very high network utilization |
DISK |
GAVG |
25 |
Look at “DAVG” and “KAVG” as the sum of both is GAVG. |
DISK |
DAVG |
25 |
Disk latency most likely to be caused by array. |
DISK |
KAVG |
2 |
Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”. |
DISK |
QUED |
1 |
Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value. |
DISK |
ABRTS/s |
1 |
Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason. |
DISK |
RESETS/s |
1 |
The number of commands reset per second. |
DISK | CONS/s | 20 | SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS. |
基本使用方式
app
经过本地控制台或者ssh登陆,执行esxtop启动它less
esxtop
默认采集间隔是5秒,按s,输入正整数修改采集间隔。ssh
s 2
经过如下快捷键切换视图ide
c = cpu m = memory n = network i = interrupts d = disk adapter u = disk device (includes NFS as of 4.0 Update 2) v = disk VM p = power states V = only show virtual machine worlds e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID) k = kill world, for tech support purposes only! l = limit display to a single group (GID), enables you to focus on one VM # = limiting the number of entitites, for instance the top 5 2 = highlight a row, moving down8 = highlight a row, moving up 4 = remove selected row from view e = statistics broken down per world 6 = statistics broken down per world
添加删除字段工具
f<根据屏幕提示输入字段对应的字母>
更改排序性能
o<输入对应字符移动字段,大写向左,小写向右>
保存设置测试
W
不修改文件名的状况下,以默认文件名保存时将做为默认设置
获取帮助ui
?
在大型环境中可能由于大量数据须要搜集和计算,从而致使使用esxtop占用大量CPU资源。可使用命令行选项锁定特定的实例和特定的信息来减小esxtop所消耗的CPU资源。
esxtop -l
了解更多信息,请查看 here.
经过批处理模式采集数据
首先,确认须要获取的信息,添加/删除你须要/不须要的字段(f),保存到配置文件(W)
运行如下命令搜集数据,将结果保存到csv文件。
esxtop -b -d 2 -n 100 > esxtopcapture.csv
其中,"-b"表示批处理模式,"-d 2"表示采集间隔2秒,"-n 100"表示采集100次。间隔2秒采集100次,也就是采集200秒的数据。若是须要采集全部指标,使用"-a"参数
若是实例过多,或者采集周期较长,从而致使数据量很大,能够经过 gzip压缩
esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gz
注意,这种方式采集的数据,将不包括命令执行之后新建立的虚机或者是从其余主机上vMotion过来的虚机。这一点和 -l 参数类似。
数据分析
有多种方式,官方的方案是经过Windows的性能监视器或者Excel。在http://labs.vmware.com/flings/上还有一些工具能够完成数据呈现。 好比 visualEsxtop和esxplot。
其余
实际使用过程当中,可能由于实例数量,字段长度,显示器分辨率等问题致使显示不完整,能够经过导出列表,修改编辑后从新导入的方式来限制显示视图
esxtop -export-entity filename
导出后,你能够编辑这个文件,注释掉不须要的部分
esxtop -import-entity filename
如下是命令行的方式筛选出须要的虚机信息,其中virtualmachinename须要根据须要修改(未测试)
VMWID=`vm-support -x | grep <virtualmachinename> |awk '{gsub("wid=", "");print $1}'`VMXCARTEL=`vsish -e cat /vm/$VMWID/vmxCartelID`vsish -e cat /sched/memClients/$VMXCARTEL/SchedGroupID