2018年10月20日,宿主上的一台虚机触发oom,致使虚机被内核干掉,问题出现时宿主上内存还剩不少,message中日志以下:html
说明node
日志中的order=0说明申请了多少内存,order=0说明申请2的0次方页内存,也就是4k内存linux
Oct 20 00:43:07 kernel: qemu-kvm invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Oct 20 00:43:07 kernel: qemu-kvm cpuset=emulator mems_allowed=1
Oct 20 00:43:07 kernel: CPU: 7 PID: 1194284 Comm: qemu-kvm Tainted: G OE ------------ 3.10.0-327.el7.x86_64 #1
Oct 20 00:43:07 kernel: Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.5.5 08/16/2017
Oct 20 00:43:07 kernel: ffff882e328f0b80 000000008b0f4108 ffff882f6f367b00 ffffffff816351f1
Oct 20 00:43:07 kernel: ffff882f6f367b90 ffffffff81630191 ffff882e32a91980 0000000000000001
Oct 20 00:43:07 kernel: 000000000000420f 0000000000000010 ffffffff8197d740 00000000b922b922
Oct 20 00:43:07 kernel: Call Trace:
Oct 20 00:43:07 kernel: [<ffffffff816351f1>] dump_stack+0x19/0x1b
Oct 20 00:43:07 kernel: [<ffffffff81630191>] dump_header+0x8e/0x214
Oct 20 00:43:07 kernel: [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0
Oct 20 00:43:07 kernel: [<ffffffff8116c956>] ? find_lock_task_mm+0x56/0xc0
Oct 20 00:43:07 kernel: [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
Oct 20 00:43:07 kernel: [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90
Oct 20 00:43:07 kernel: [<ffffffff811b78ca>] alloc_pages_vma+0x9a/0x140
Oct 20 00:43:07 kernel: [<ffffffff81197655>] handle_mm_fault+0xb85/0xf50
Oct 20 00:43:07 kernel: [<ffffffff8122bb37>] ? eventfd_ctx_read+0x67/0x210
Oct 20 00:43:07 kernel: [<ffffffff81640e22>] __do_page_fault+0x152/0x420
Oct 20 00:43:07 kernel: [<ffffffff81641113>] do_page_fault+0x23/0x80
Oct 20 00:43:07 kernel: [<ffffffff8163d408>] page_fault+0x28/0x30
Oct 20 00:43:07 kernel: Mem-Info:
Oct 20 00:43:07 kernel: active_anon:87309259 inactive_anon:444334 isolated_anon:0#012 active_file:101827 inactive_file:1066463 isolated_file:0#012 unevictable:0 dirty:16777 writeback:0 unstable:0#012 free:8521193 slab_reclaimable:179558 slab_unreclaimable:138991#012 mapped:14804 shmem:1180357 pagetables:195678 bounce:0#012 free_cma:0
Oct 20 00:43:07 kernel: Node 1 Normal free:44244kB min:45096kB low:56368kB high:67644kB active_anon:194740280kB inactive_anon:795780kB active_file:80kB inactive_file:100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:201326592kB managed:198168156kB mlocked:0kB dirty:4kB writeback:0kB mapped:2500kB shmem:2177236kB slab_reclaimable:158548kB slab_unreclaimable:199088kB kernel_stack:109552kB pagetables:478460kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:301 all_unreclaimable? yes
Oct 20 00:43:07 kernel: lowmem_reserve[]: 0 0 0 0
Oct 20 00:43:07 kernel: Node 1 Normal: 10147*4kB (UEM) 22*8kB (UE) 3*16kB (U) 11*32kB (UR) 8*64kB (R) 6*128kB (R) 2*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 44492kB
Oct 20 00:43:07 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 20 00:43:07 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 20 00:43:07 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 20 00:43:07 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 20 00:43:07 kernel: 2349178 total pagecache pages
Oct 20 00:43:07 kernel: 0 pages in swap cache
Oct 20 00:43:07 kernel: Swap cache stats: add 0, delete 0, find 0/0
Oct 20 00:43:07 kernel: Free swap = 0kB
Oct 20 00:43:07 kernel: Total swap = 0kB
Oct 20 00:43:07 kernel: 100639322 pages RAM
Oct 20 00:43:07 kernel: 0 pages HighMem/MovableOnly
Oct 20 00:43:07 kernel: 1646159 pages reserved
Oct 20 00:43:07 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Oct 20 00:43:07 kernel: Out of memory: Kill process 1409878 (qemu-kvm) score 666 or sacrifice child
Oct 20 00:43:07 kernel: Killed process 1409878 (qemu-kvm) total-vm:136850144kB, anon-rss:133909332kB, file-rss:4724kB
Oct 20 00:43:30 libvirtd: 2018-10-19 16:43:30.303+0000: 81546: error : qemuMonitorIO:705 : internal error: End of file from qemu monitor
Oct 20 00:43:30 systemd-machined: Machine qemu-7-c2683281-6cbd-4100-ba91-e221ed06ee60 terminated.
Oct 20 00:43:30 kvm: 6 guests now active
复制代码
上述日志省略掉了meminfo的详细信息和每一个进程占用内存的信息。bash
从日志中能够看到Node 1 Normal free内存只剩下44M左右,因此触发了oom,但当时其实node0上还有不少内存未被使用,触发oom的进程kvm,pid为1194284,经过查日志定位到引起问题的虚机为25913bd0-d869-4310-ab53-8df6855dd258,查看出本台虚机机xml文件配置发现,虚机内存的numa配置为:app
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
复制代码
经过virsh client获取到的信息以下:ide
virsh # numatune 25913bd0-d869-4310-ab53-8df6855dd258
numa_mode : strict
numa_nodeset : 1
复制代码
发现当mode是strict,placement为auto的时候,进程会算出一个合适的numa节点配置到这台虚机上。因此这台虚机内存就被限定到了node1上,当node1的内存被用尽就触发了oom工具
参见官网连接性能
严格策略意思是,若是目标节点上的内存不能被分配,那么内存分配就会失败 指定了numa节点列表,可是没有定义内存模式默认为strict策略测试
跨越指定节点集分配内存页,分配遵循round-robin(循环/轮替)方法ui
内存从单个首选内存节点分配,若是没有足够的内存能知足,那么内存从其余节点分配。
重要提示
若是在strict模式内存被过量使用,而且guest没有足够的swap空间,那么内核将kill某些guest进程来得到足够的内存,因此红帽官方推荐用perferred,配置一个单节点(好比说,nodeset=‘0’)来避免这种状况
咱们拿了一台新的宿主建立一台虚拟机,修改虚拟机的numatune配置,测试了虚机的numatune配置在strict和prefreed两种mode在如下三种配置下的表现:
interleave这种跨节点内存分配方式性能表现确定会比以上两种弱,且咱们主要想测在单node节点内存占用满的状况下strict和prefreed两种模式会不会触发oom,因此interleave模式不在测试范围内。
mode 为strict placement为auto
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
复制代码
mode 为preferred placement为auto
<numatune>
<memory mode='preferred' placement='auto'/>
</numatune>
复制代码
mode为strict nodeset配置为0-1
<numatune>
<memory mode='strict' nodeset='0-1'/>
</numatune>
复制代码
将宿主上单个node的节点内存用memholder(这个工具属于ssplatform2-tools这个rpm包)占用满(具体命令numactl -i 0 memholder 64000 &),而后在虚机上也跑memholder进程,看虚机占用内存也不断升高时,内存在numa节点上的分配状况。
virsh client段获取到的信息以下,placement是auto,可是qemu-kvm进程仍是选了个node肯定了下来
virsh # numatune 638abba7-bba8-498b-88d6-ddc70f2cef18
numa_mode : strict
numa_nodeset : 1
复制代码
开始虚机内存占用以下
# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 0 693 694
1764062 (qemu-kv 0 366 366
--------------- ------ ------ -----
Total 1 1060 1060
复制代码
用memholder把node1内存占用满以后宿主的内存占用
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58476 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 64 MB
node distances:
node 0 1
0: 10 21
1: 21 10
复制代码
虚机里运行完memholder开始占用内存以后,虚机的内存占用以下:
numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 6 685 692
1764062 (qemu-kv 7 4670 4677
--------------- ------ ------ -----
Total 13 5355 5368
复制代码
宿主的内存占用:
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58650 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 52181 MB
node distances:
node 0 1
0: 10 21
1: 21 10
复制代码
这个时候发现kvm进程已经出发了oom,宿主上占用内存的memholder进程已经被kernel kill调了,宿主内存空闲了出来
message里日志以下:
Nov 13 21:07:07 kernel: qemu-kvm invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
Nov 13 21:07:07 kernel: qemu-kvm cpuset=emulator mems_allowed=1
Nov 13 21:07:07 kernel: CPU: 28 PID: 1332894 Comm: qemu-kvm Not tainted 4.4.36-1.el7.elrepo.x86_64 #1
Nov 13 21:07:07 kernel: Mem-Info:
Nov 13 21:07:07 kernel: active_anon:1986423 inactive_anon:403229 isolated_anon:0#012 active_file:116773 inactive_file:577075 isolated_file:0#012 unevictable:14364416 dirty:142 writeback:0 unstable:0#012 slab_reclaimable:61182 slab_unreclaimable:296489#012 mapped:14400991 shmem:15542531 pagetables:35749 bounce:0#012 free:14983912 free_pcp:0 free_cma:0
Nov 13 21:07:07 kernel: Node 1 Normal free:44952kB min:45120kB low:56400kB high:67680kB active_anon:5485032kB inactive_anon:1571408kB active_file:308kB inactive_file:0kB unevictable:57286820kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:66044484kB mlocked:57286820kB dirty:48kB writeback:0kB mapped:57330444kB shmem:61948048kB slab_reclaimable:143752kB slab_unreclaimable:1107004kB kernel_stack:16592kB pagetables:129312kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2248 all_unreclaimable? yes
Nov 13 21:07:07 kernel: lowmem_reserve[]: 0 0 0 0
Nov 13 21:07:07 kernel: Node 1 Normal: 1018*4kB (UME) 312*8kB (UE) 155*16kB (UE) 34*32kB (UE) 293*64kB (UM) 53*128kB (U) 5*256kB (U) 1*512kB (U) 1*1024kB (E) 2*2048kB (UM) 2*4096kB (M) = 50776kB
Nov 13 21:07:07 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 13 21:07:07 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 13 21:07:07 kernel: 16236582 total pagecache pages
Nov 13 21:07:07 kernel: 0 pages in swap cache
Nov 13 21:07:07 kernel: Swap cache stats: add 0, delete 0, find 0/0
Nov 13 21:07:07 kernel: Free swap = 0kB
Nov 13 21:07:07 kernel: Total swap = 0kB
Nov 13 21:07:07 kernel: 33530456 pages RAM
Nov 13 21:07:07 kernel: 0 pages HighMem/MovableOnly
Nov 13 21:07:07 kernel: 551723 pages reserved
Nov 13 21:07:07 kernel: 0 pages hwpoisoned
复制代码
咱们测试的进程是1764062,可是出发oom的进程是1332894,该进程对应的虚机的numatune配置也为配置一,且运行virsh client获取到的nodeset也是1
virsh # numatune c11a155a-95b0-4593-9ce5-f2a42dc0ccca
numa_mode : strict
numa_nodeset : 1
复制代码
virsh client获取到的虚机numatune以下:
virsh # numatune 638abba7-bba8-498b-88d6-ddc70f2cef18
numa_mode : preferred
numa_nodeset : 1
复制代码
开始虚机的内存占用以下
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 6 691 698
1897916 (qemu-kv 17 677 694
--------------- ------ ------ -----
Total 24 1368 1392
复制代码
用memholder把node1内存占用满以后宿主的内存占用
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58403 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 56 MB
node distances:
node 0 1
0: 10 21
1: 21 10
复制代码
虚机里运行完memholder开始占用内存以后,虚机的内存占用以下
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1332894 (qemu-kv 7 690 697
1897916 (qemu-kv 4012 682 4695
--------------- ------ ------ -----
Total 4019 1372 5391
复制代码
宿主的内存占用:
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 54395 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 55 MB
node distances:
node 0 1
0: 10 21
1: 21 10
复制代码
从以上表现来看虽然preferred是node1可是当node1内存不足的时候,进程申请了node0的内存,并未引起oom
说明1308480这个进程是咱们测试的虚机进程
开始虚机内存占用以下
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1308480 (qemu-kv 141 584 725
1332894 (qemu-kv 0 707 708
--------------- ------ ------ -----
Total 141 1291 1432
复制代码
宿主上的内存占用以下
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 58241 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 131 MB
node distances:
node 0 1
0: 10 21
1: 21 10
复制代码
虚机里运行完memholder开始占用内存以后,虚机的内存占用以下:
[@ ~]# numastat -c qemu-kvm
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
--------------- ------ ------ -----
1308480 (qemu-kv 4017 682 4699
1332894 (qemu-kv 7 681 688
--------------- ------ ------ -----
Total 4024 1363 5387
复制代码
宿主上的内存占用以下:
[@ ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64326 MB
node 0 free: 54410 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64496 MB
node 1 free: 55 MB
node distances:
node 0 1
0: 10 21
1: 21 10
复制代码
从测试来看,第二种和第三种配置方式都不会致使因为两个node节点内存使用不均衡致使oom,可是哪一种配置性能更好还须要后续的测试。
参考连接