关于 esxtop 命令

时间 2021-05-12

标签 node bash app less ssh ide 工具性能测试栏目 Unix 繁體版

原文原文链接

原文地址：http://www.yellow-bricks.com/esxtop/ 来自 Duncan Epping
node

esxtop 命令的指标和对应阈值（原文做者根据官方文档，测试和使用经验给出的参考值）bash

**Metrics and Thresholds**
Display	Metric	Threshold	Explanation
CPU	%RDY	10	Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs
CPU	%CSTP	3	Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.
CPU	%SYS	20	The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause
CPU	%MLMTD	0	The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU.
CPU	%SWPWT	5	VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.
MEM	MCTLSZ	1	If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.
MEM	SWCUR	1	If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.
MEM	SWR/s	1	If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.
MEM	SWW/s	1	If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.
MEM	CACHEUSD	0	If larger than 0 host has compressed memory. Possible cause: Memory overcommitment.
MEM	ZIP/s	0	If larger than 0 host is actively compressing memory. Possible cause: Memory overcommitment.
MEM	UNZIP/s	0	If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory.
MEM	N%L	80	If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used.
NETWORK	%DRPTX	1	Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization
NETWORK	%DRPRX	1	Dropped packets received, hardware overworked. Possible cause: very high network utilization
DISK	GAVG	25	Look at “DAVG” and “KAVG” as the sum of both is GAVG.
DISK	DAVG	25	Disk latency most likely to be caused by array.
DISK	KAVG	2	Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”.
DISK	QUED	1	Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.
DISK	ABRTS/s	1	Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.
DISK	RESETS/s	1	The number of commands reset per second.
DISK	CONS/s	20	SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS.

基本使用方式
app

经过本地控制台或者ssh登陆，执行esxtop启动它less

esxtop

默认采集间隔是5秒，按s，输入正整数修改采集间隔。ssh

s 2

经过如下快捷键切换视图ide

c = cpu
m = memory
n = network
i = interrupts
d = disk adapter
u = disk device (includes NFS as of 4.0 Update 2)
v = disk VM
p = power states
V = only show virtual machine worlds
e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)
k = kill world, for tech support purposes only!
l = limit display to a single group (GID), enables you to focus on one VM
# = limiting the number of entitites, for instance the top 5
2 = highlight a row, moving down8 = highlight a row, moving up
4 = remove selected row from view
e = statistics broken down per world
6 = statistics broken down per world

添加删除字段工具

f<根据屏幕提示输入字段对应的字母>

更改排序性能

o<输入对应字符移动字段，大写向左，小写向右>

保存设置测试

不修改文件名的状况下，以默认文件名保存时将做为默认设置
获取帮助ui

在大型环境中可能由于大量数据须要搜集和计算，从而致使使用esxtop占用大量CPU资源。可使用命令行选项锁定特定的实例和特定的信息来减小esxtop所消耗的CPU资源。

esxtop -l

了解更多信息，请查看 here.

经过批处理模式采集数据

首先，确认须要获取的信息，添加/删除你须要/不须要的字段（f），保存到配置文件（W）

运行如下命令搜集数据，将结果保存到csv文件。

esxtop -b -d 2 -n 100 > esxtopcapture.csv

其中，"-b"表示批处理模式，"-d 2"表示采集间隔2秒，"-n 100"表示采集100次。间隔2秒采集100次，也就是采集200秒的数据。若是须要采集全部指标，使用"-a"参数

若是实例过多，或者采集周期较长，从而致使数据量很大，能够经过 gzip压缩

esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gz

注意，这种方式采集的数据，将不包括命令执行之后新建立的虚机或者是从其余主机上vMotion过来的虚机。这一点和 -l 参数类似。

数据分析

有多种方式，官方的方案是经过Windows的性能监视器或者Excel。在http://labs.vmware.com/flings/上还有一些工具能够完成数据呈现。好比 visualEsxtop和esxplot。

其余

实际使用过程当中，可能由于实例数量，字段长度，显示器分辨率等问题致使显示不完整，能够经过导出列表，修改编辑后从新导入的方式来限制显示视图

esxtop -export-entity filename

导出后，你能够编辑这个文件，注释掉不须要的部分

esxtop -import-entity filename

如下是命令行的方式筛选出须要的虚机信息，其中virtualmachinename须要根据须要修改（未测试）

VMWID=`vm-support -x | grep <virtualmachinename> |awk '{gsub("wid=", "");print $1}'`VMXCARTEL=`vsish -e cat /vm/$VMWID/vmxCartelID`vsish -e cat /sched/memClients/$VMXCARTEL/SchedGroupID