Node调试指南-内存篇

时间 2019-11-06

标签 node 调试指南内存繁體版

原文原文链接

Node.js 发展到今天，已经被愈来愈普遍地应用到 BFF 先后端分离 、 全栈开发 、 客户端工具 等领域。然而，相对于应用层的蓬勃发展，其 Runtime 对于绝大部分前端出身的开发者来讲，处于黑盒的状态，这一点并无获得很好的改善，从而也阻碍了 Node.js 在业务中的应用和推广。node

内存泄漏问题

对于缓慢上涨最终 OOM 这种类型的内存泄漏，咱们有充足的时间去抓 Heapsnapshot，进而分析堆快照来定位泄漏点。（可参见以前的文章『Node 案发现场揭秘 —— 快速定位线上内存泄漏』）git
对于诸如 while 循环跳出条件失败、长正则执行致使进程假死、以及因为异常请求致使应用短期内 OOM 的状况，每每来不及抓取 Heapsnapshot，一直没有特别好的办法进行处理。github

生成 Coredump 文件有两种方式express

当咱们的应用意外崩溃终止时，操做系统将自动记录。这种方式通常用于 「死后验尸」，用于分析由雪崩触发 OOM,来对出现未捕获的异常时也进行自动 Core dump。

这里须要注意的是，这是一个并无那么安全的操做：线上通常会 pm2 等具有自动重启功能的守护工具进行进程守护，这意味着若是咱们的程序在某些状况下频繁 crash 和重启，那么会生成大量的 Coredump 文件，甚至可能会将服务器磁盘写满。因此开启这个选项后，请务必记得对服务器磁盘进行监控和告警。npm

手动调用 gcore <pid> 的方式来手动生成。这种方式通常用于 「活体检验」，用于 Node.js 进程假死状态下的问题定位。

本文将介绍几种Node调试内存指南编程

1 gcore + llnode

1.1 Core & Core Dump

在开始以前，咱们先了解下什么是 Core 和 Core Dump。json

什么是 Core?后端

在使用半导体做为内存材料前，人类是利用线圈看成内存的材料，线圈就叫做 core ，用线圈作的内存就叫做 core memory。现在，半导体工业澎勃发展，已经没有人用 core memory 了，不过在许多状况下，人们仍是把记忆体叫做 core 。数组

什么是 Core Dump?

当程序运行的过程当中异常终止或崩溃，操做系统会将程序当时的内存状态记录下来，保存在一个文件中，这种行为就叫作 Core Dump（中文有的翻译成 “核心转储”)。咱们能够认为 Core Dump 是 “内存快照”，但实际上，除了内存信息以外，还有些关键的程序运行状态也会同时 dump 下来，例如寄存器信息（包括程序指针、栈指针等）、内存管理信息、其余处理器和操做系统状态和信息。Core Dump 对于编程人员诊断和调试程序是很是有帮助的，由于对于有些程序错误是很难重现的，例如指针异常，而 Core Dump 文件能够再现程序出错时的情景。

1.2 测试环境

$ uname -a
Darwin xiaopinguodeMBP 16.7.0 Darwin Kernel Version 16.7.0: Wed Oct 10 20:06:00 PDT 2018; root:xnu-3789.73.24~1/RELEASE_X86_64 x86_64
复制代码

1.3 开启 Core Dump

在终端中输入：

$ ulimit -c
复制代码

查看容许 Core Dump 生成的文件的大小，若是是 0 则表示关闭了 Core Dump。使用如下命令开启 Core Dump 功能，而且不限制 Core Dump 生成的文件大小：

$ ulimit -c unlimited
复制代码

以上命令只针对当前终端环境有效，若是想永久生效，须要修改 /etc/security/limits.conf 文件，以下：

1.4 gcore

使用 gcore 能够不重启程序而 dump 出特定进程的 core 文件。gcore 使用方法以下：

$ gcore [-o filename] pid
# 用法以下
$gcore
gcore: no pid specified
usage:
        gcore [-s] [-v] [[-o file] | [-c pathfmt ]] [-b size] pid
复制代码

在 Core Dump 时，默认会在执行 gcore 命令的目录生成 core.pid 的文件。

1.5 llnode

什么是 llnode？

Node.js v4.x+ C++ plugin for LLDB - a next generation, high-performance debugger.

什么是 LLDB？

LLDB is a next generation, high-performance debugger. It is built as a set of reusable components which highly leverage existing libraries in the larger LLVM Project, such as the Clang expression parser and LLVM disassembler.

安装 llnode + lldb：

github.com/nodejs/llno…

# Prerequisites: Install LLDB and its Library
brew update && brew install --with-lldb --with-toolchain llvm
# instal
npm install -g llnode
复制代码

1.6 测试内存实例

下面用一个典型的全局变量缓存致使的内存泄漏的例子来测试 llnode 的用法。代码以下：

const leaks = []
function LeakingClass() {
  this.name = Math.random().toString(36)
  this.age = Math.floor(Math.random() * 100)
}
setInterval(() => {
  for (let i = 0; i < 100; i++) {
    leaks.push(new LeakingClass)
  }
  console.warn('Leaks: %d', leaks.length)
}, 1000)
复制代码

运行该程序：

$ node app.js
复制代码

等待几秒，打开另外一个终端运行 gcore：

$ ulimit -c unlimited
$ pgrep -n node
$ 33833
$ sudo gcore -c core.33833  33833
复制代码

生成 core.33833 文件。

1.7 分析 Core 文件

使用 lldb 加载刚才生成的 Core 文件：

llnode -c ./core.33833 
(lldb) target create --core "./core.33833"
Core file '/Users/xiaopingguo/repos/my_repos/node_repos/node-in-debugging/./core.33833' (x86_64) was loaded.
(lldb) plugin load '/usr/local/lib/node_modules/llnode/llnode.dylib'
复制代码

输入 v8 查看使用文档，有如下几条命令：

v8
The following subcommands are supported:
      bt                -- Show a backtrace with node.js JavaScript functions and their args. An optional argument is accepted; if that argument is a number, it
                           specifies the number of frames to display. Otherwise all frames will be dumped.
                           Syntax: v8 bt [number]
      findjsinstances   -- List every object with the specified type name.
                           Flags:
                           * -v, --verbose                  - display detailed `v8 inspect` output for each object.
                           * -n <num>  --output-limit <num> - limit the number of entries displayed to `num` (use 0 to show all). To get next page repeat
                           command or press [ENTER].
                           Accepts the same options as `v8 inspect`
      findjsobjects     -- List all object types and instance counts grouped by type name and sorted by instance count. Use -d or --detailed to get an output
                           grouped by type name, properties, and array length, as well as more information regarding each type.
      findrefs          -- Finds all the object properties which meet the search criteria.
                           The default is to list all the object properties that reference the specified value.
                           Flags:
                           * -v, --value expr     - all properties that refer to the specified JavaScript object (default)
                           * -n, --name  name     - all properties with the specified name
                           * -s, --string string  - all properties that refer to the specified JavaScript string value
      getactivehandles  -- Print all pending handles in the queue. Equivalent to running process._getActiveHandles() on the living process.
      getactiverequests -- Print all pending requests in the queue. Equivalent to running process._getActiveRequests() on the living process.
      inspect           -- Print detailed description and contents of the JavaScript value.
                           Possible flags (all optional):
                           * -F, --full-string    - print whole string without adding ellipsis
                           * -m, --print-map      - print object's map address                           * -s, --print-source   - print source code for function objects                           * -l num, --length num - print maximum of `num` elements from string/array                           Syntax: v8 inspect [flags] expr      nodeinfo          -- Print information about Node.js      print             -- Print short description of the JavaScript value.                           Syntax: v8 print expr      settings          -- Interpreter settings      source            -- Source code information For more help on any particular subcommand, type 'help <command> <subcommand>'. 复制代码

bt
findjsinstances
findjsobjects
findrefs
inspect
nodeinfo
print
source

运行 v8 findjsobjects 查看全部对象实例及总共占内存大小

(llnode) v8 findjsobjects
 Instances  Total Size Name
 ---------- ---------- ----
        ...
        356      11392 (Array)
        632      35776 Object
       8300     332000 LeakingClass
      14953      53360 (String)
 ---------- ---------- 
      24399     442680
      
复制代码

能够看出：LeakingClass 有8300 个实例，占内存332000 byte。使用v8 findjsinstances 查看全部 LeakingClass 实例：

(lldb) v8 findjsinstances LeakingClass
...
0x221fb297fbb9:<Object: LeakingClass>
0x221fb297fc29:<Object: LeakingClass>
0x221fb297fc99:<Object: LeakingClass>
0x221fb297fd09:<Object: LeakingClass>
0x221fb297fd79:<Object: LeakingClass>
0x221fb297fde9:<Object: LeakingClass>
0x221fb297fe59:<Object: LeakingClass>
0x221fb297fec9:<Object: LeakingClass>
0x221fb297ff39:<Object: LeakingClass>
0x221fb297ffa9:<Object: LeakingClass>
(Showing 1 to 8300 of 8300 instances)
复制代码

使用 v8 i检索实例的具体内容

(llnode) v8 i 0x221fb297ffa9
0x221fb297ffa9:<Object: LeakingClass properties {
    .name=0x221f9bc82201:<String: "0.s3psjp4ctzj">,
    .age=<Smi: 95>}>
(llnode) v8 i 0x221fb297ff39
0x221fb297ff39:<Object: LeakingClass properties {
    .name=0x221fb297ff71:<String: "0.q1t4gikp9a">,
    .age=<Smi: 6>}>
(llnode) v8 i 0x221fb297fec9
0x221fb297fec9:<Object: LeakingClass properties {
    .name=0x221fb297ff01:<String: "0.zzomfpcmgn">,
    .age=<Smi: 52>}>
复制代码

能够看到每一个 LeakingClass 实例的 name 和 age 字段的值。

使用 v8 findrefs 查看引用

(llnode) v8 findrefs 0x221fb297ffa9
0x221fd136cb51: (Array)[7041]=0x221fb297ffa9
(llnode) v8 i 0x221fd136cb51
0x221fd136cb51:<Array: length=10018 {
    [0]=0x221f9b627171:<Object: LeakingClass>,
    [1]=0x221f9b627199:<Object: LeakingClass>,
    [2]=0x221f9b6271c1:<Object: LeakingClass>,
    [3]=0x221f9b6271e9:<Object: LeakingClass>,
    [4]=0x221f9b627211:<Object: LeakingClass>,
    [5]=0x221f9b627239:<Object: LeakingClass>,
    [6]=0x221f9b627261:<Object: LeakingClass>,
    [7]=0x221f9b627289:<Object: LeakingClass>,
    [8]=0x221f9b6272b1:<Object: LeakingClass>,
    [9]=0x221f9b6272d9:<Object: LeakingClass>,
    [10]=0x221f9b627301:<Object: LeakingClass>,
    [11]=0x221f9b627329:<Object: LeakingClass>,
    [12]=0x221f9b627351:<Object: LeakingClass>,
    [13]=0x221f9b627379:<Object: LeakingClass>,
    [14]=0x221f9b6273a1:<Object: LeakingClass>,
    [15]=0x221f9b6273c9:<Object: LeakingClass>}>
复制代码

能够看出：经过一个 LeakingClass 实例的内存地址，咱们使用 v8 findrefs找到了引用它的数组的内存地址，而后经过这个地址去检索数组，获得这个数组长度为10018，每一项都是一个 LeakingClass 实例，这不就是咱们代码中的 leaks 数组吗？

小提示: v8 i 是 v8 inspect的缩写，v8 p是 v8 print的缩写。

1.8 `--abort-on-uncaught-exception`

在 Node.js 程序启动时添加 —-abort-on-uncaught-exception 参数，当程序 crash 的时候，会自动 Core Dump，方便 “死后验尸”。

添加 --abort-on-uncaught-exception 参数，启动测试程序：

$ ulimit -c unlimited
$ node --abort-on-uncaught-exception app.js
复制代码

启动另一个终端运行：

$ kill -BUS `pgrep -n node`
复制代码

第 1 个终端会显示：

Leaks: 100
Leaks: 200
Leaks: 300
Leaks: 400
Leaks: 500
Leaks: 600
Leaks: 700
Leaks: 800
Bus error (core dumped)
复制代码

调试步骤与上面一致：

(llnode) v8 findjsobjects
 Instances  Total Size Name
 ---------- ---------- ----
        ...
        356      11392 (Array)
        632      35776 Object
       8300     332000 LeakingClass
      14953      53360 (String)
 ---------- ---------- 
      24399     442680
      
复制代码

1.9 总结

咱们的测试代码很简单，没有引用任何第三方模块，若是项目较大且引用的模块较多，则 v8 findjsobjects 的结果将难以甄别，这个时候能够屡次使用 gcore 进行 Core Dump，对比发现增加的对象，再进行诊断。

2 使用 heapdump

heapdump 是一个 dump V8 堆信息的工具。v8-profiler 也包含了这个功能，这两个工具的原理都是一致的，都是 v8::Isolate::GetCurrent()->GetHeapProfiler()->TakeHeapSnapshot(title, control)，可是 heapdump 的使用简单些。下面咱们以 heapdump 为例讲解如何分析 Node.js 的内存泄漏。

这里以一段经典的内存泄漏代码做为测试代码：

const heapdump = require('heapdump')
let leakObject = null
let count = 0
setInterval(function testMemoryLeak() {
  const originLeakObject = leakObject
  const unused = function () {
    if (originLeakObject) {
      console.log('originLeakObject')
    }
  }
  leakObject = {
    count: String(count++),
    leakStr: new Array(1e7).join(''),
    leakMethod: function () {
      console.log('leakMessage')
    }
  }
}, 1000)
复制代码

为何这段程序会发生内存泄漏呢？首先咱们要明白闭包的原理：同一个函数内部的闭包做用域只有一个，全部闭包共享。在执行函数的时候，若是遇到闭包，则会建立闭包做用域的内存空间，将该闭包所用到的局部变量添加进去，而后再遇到闭包，会在以前建立好的做用域空间添加此闭包会用到而前闭包没用到的变量。函数结束时，清除没有被闭包做用域引用的变量。

这段代码内存泄露缘由是：在 testMemoryLeak 函数内有两个闭包：unused 和 leakMethod。unused 这个闭包引用了父做用域中的 originLeakObject 变量，若是没有后面的 leakMethod，则会在函数结束后被清除，闭包做用域也跟着被清除了。由于后面的 leakObject 是全局变量，即 leakMethod 是全局变量，它引用的闭包做用域（包含了 unused 所引用的 originLeakObject）不会释放。而随着 testMemoryLeak 不断的调用，originLeakObject 指向前一次的 leakObject，下次的 leakObject.leakMethod 又会引用以前的 originLeakObject，从而造成一个闭包引用链，而 leakStr 是一个大字符串，得不到释放，从而形成了内存泄漏。

解决方法：在 testMemoryLeak 函数内部的最后添加originLeakObject = null便可。

运行测试代码：

$ node app
复制代码

而后前后执行两次：

$ kill -USR2 `pgrep -n node`
复制代码

在当前目录下生成了两个 heapsnapshot 文件：

heapdump-100427359.61348.heapsnapshot
heapdump-100438986.797085.heapsnapshot
复制代码

2.1 Chrome DevTools

咱们使用 Chrome DevTools 来分析前面生成的 heapsnapshot 文件。调出 Chrome DevTools -> Memory -> Load，按顺序依次加载前面生成的 heapsnapshot 文件。单击第 2 个堆快照，在左上角有个下拉菜单，有以下 4 个选项：

Summary：以构造函数名分类显示。
Comparison：比较多个快照之间的差别。
Containment：查看整个 GC 路径。
Statistics：以饼状图显示内存占用信息。一般咱们只会用前两个选项；第 3 个选项通常用不到，由于在展开 Summary 和 Comparison 中的每一项时，均可以看到从 GC roots 到这个对象的路径；第 4 个选项只能看到内存占用比，以下图所示：

切换到 Summary 页，能够看到有以下 5 个属性：

Contructor：构造函数名，例如 Object、Module、Socket，(array)、(string)、(regexp) 等加了括号的分别表明内置的 Array、String 和 Regexp。
Distance：到 GC roots （GC 根对象）的距离。GC 根对象在浏览器中通常是 window 对象，在 Node.js 中是 global 对象。距离越大，则说明引用越深，有必要重点关注一下，极有多是内存泄漏的对象。
Objects Count：对象个数。
Shallow Size：对象自身的大小，不包括它引用的对象。
Retained Size：对象自身的大小和它引用的对象的大小，即该对象被 GC 以后所能回收的内存大小。

小提示:

一个对象的 Retained Size = 该对象的 Shallow Size + 该对象可直接或间接引用到的对象的 Shallow Size 之和。
Shallow Size == Retained Size 的有 (boolean)、(number)、(string)，它们没法引用其余值，而且始终是叶子节点。

咱们单击 Retained Size 选择降序展现，能够看到 (closure) 这一项引用的内容达到 99%，继续展开以下：

能够看出：一个 leakStr 占了 5% 的内存，而 leakMethod 引用了 88% 的内存。对象保留树（Retainers，老版本 Chrome 叫 Object’s retaining tree）展现了对象的 GC path，单击如上图中的 leakStr（Distance 是 13），Retainers 会自动展开，Distance 从 13 递减到 1。

咱们继续展开 leakMethod，以下所示：

能够看出：有一个 count=”18” 的 originLeakObject 的 leakMethod 函数的 context（即上下文）引用了一个 count=”17” 的 originLeakObject 对象，而这个 originLeakObject 对象的 leakMethod 函数的 context 又引用了 count=”16” 的 originLeakObject 对象，以此类推。而每一个 originLeakObject 对象上都有一个大字符串 leakStr（占用 8% 的内存），从而形成内存泄漏，符合咱们以前的推断。

小提示：若是背景色是黄色的，则表示这个对象在 JavaScript 中还存在引用，因此可能没有被清除。若是背景色是红色的，则表示这个对象在 JavaScript 中不存在引用，可是依然存活在内存中，通常常见于 DOM 对象，它们存放的位置和 JavaScript 中的对象仍是有不一样的，在 Node.js 中不多碰见。

2.2 对比快照

切换到 Comparison 视图下，能够看到一些 #New、#Deleted、#Delta 等属性，+ 和 - 表明相对于比较的堆快照而言。咱们对比第 2 个快照和第 1 个快照，以下所示：

能够看出：(string) 增长了 5 个，每一个 string 大小为 10000024 字节。

3 使用 memwatch-next

memwatch-next（如下简称 memwatch）是一个用来监测 Node.js 的内存泄漏和堆信息比较的模块。下面咱们以一段事件监听器致使内存泄漏的代码为例，讲解如何使用 memwatch。

测试代码以下：

let count = 1
const memwatch = require('memwatch-next')
memwatch.on('stats', (stats) => { 
  console.log(count++, stats)
})
memwatch.on('leak', (info) => {
  console.log('---')
  console.log(info)
  console.log('---')
})
const http = require('http')
const server = http.createServer((req, res) => {
  for (let i = 0; i < 10000; i++) {
    server.on('request', function leakEventCallback() {})
  }
  res.end('Hello World')
  global.gc()
}).listen(3000)
复制代码

在每一个请求到来时，给 server 注册 10000 个 request 事件的监听函数（大量的事件监听函数存储到内存中，形成了内存泄漏），而后手动触发一次 GC。

运行该程序：

$ node --expose-gc app.js
复制代码

注意：这里添加 —expose-gc 参数启动程序，这样咱们才能够在程序中手动触发 GC。

memwatch 能够监听两个事件：

stats： GC 事件，每执行一次 GC，都会触发该函数，打印 heap 相关的信息。以下：

{
  num_full_gc: 1,// 完整的垃圾回收次数
  num_inc_gc: 1,// 增加的垃圾回收次数
  heap_compactions: 1,// 内存压缩次数
  usage_trend: 0,// 使用趋势
  estimated_base: 5350136,// 预期基数
  current_base: 5350136,// 当前基数
  min: 0,// 最小值
  max: 0// 最大值
}
复制代码

leak： 内存泄露事件，触发该事件的条件是：连续 5 次 GC 后内存都是增加的。打印以下：

{ 
  growth: 3616040,
  reason: 'heap growth over 5 consecutive GCs (0s) - -2147483648 bytes/hr' 
}
复制代码

运行：

$ ab -c 1 -n 5 http://localhost:3000/
复制代码

输出：

(node:35513) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 request listeners added. Use emitter.setMaxListeners() to increase limit
1 { num_full_gc: 1,
  num_inc_gc: 2,
  heap_compactions: 1,
  usage_trend: 0,
  estimated_base: 5674608,
  current_base: 5674608,
  min: 0,
  max: 0 }
2 { num_full_gc: 2,
  num_inc_gc: 4,
  heap_compactions: 2,
  usage_trend: 0,
  estimated_base: 6668760,
  current_base: 6668760,
  min: 0,
  max: 0 }
3 { num_full_gc: 3,
  num_inc_gc: 5,
  heap_compactions: 3,
  usage_trend: 0,
  estimated_base: 7570424,
  current_base: 7570424,
  min: 7570424,
  max: 7570424 }
4 { num_full_gc: 4,
  num_inc_gc: 7,
  heap_compactions: 4,
  usage_trend: 0,
  estimated_base: 8488368,
  current_base: 8488368,
  min: 7570424,
  max: 8488368 }
--------------
{ growth: 3616040,
  reason: 'heap growth over 5 consecutive GCs (0s) - -2147483648 bytes/hr' }
--------------
5 { num_full_gc: 5,
  num_inc_gc: 9,
  heap_compactions: 5,
  usage_trend: 0,
  estimated_base: 9290648,
  current_base: 9290648,
  min: 7570424,
  max: 9290648 }
  
复制代码

能够看出：Node.js 已经警告咱们事件监听器超过了 11 个，可能形成内存泄露。连续 5 次内存增加触发 leak 事件打印出增加了多少内存（bytes）和预估每小时增加多少 bytes。

3.1 Heap Diffing

memwatch 有一个 HeapDiff 函数，用来对比并计算出两次堆快照的差别。修改测试代码以下：

const memwatch = require('memwatch-next')
const http = require('http')
const server = http.createServer((req, res) => {
  for (let i = 0; i < 10000; i++) {
    server.on('request', function leakEventCallback() {})
  }
  res.end('Hello World')
  global.gc()
}).listen(3000)
const hd = new memwatch.HeapDiff()
memwatch.on('leak', (info) => {
  const diff = hd.end()
  console.dir(diff, { depth: 10 })
})
运行这段代码并执行一样的 ab 命令，打印以下：

(node:35690) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 request listeners added. Use emitter.setMaxListeners() to increase limit
{ before: { nodes: 35864, size_bytes: 4737664, size: '4.52 mb' },
  after: { nodes: 87476, size_bytes: 8946784, size: '8.53 mb' },
  change: 
   { size_bytes: 4209120,
     size: '4.01 mb',
     freed_nodes: 894,
     allocated_nodes: 52506,
     details: 
      [ ...
        { what: 'Array',
          size_bytes: 533008,
          size: '520.52 kb',
          '+': 1038,
          '-': 517 },
        { what: 'Closure',
          size_bytes: 3599856,
          size: '3.43 mb',
          '+': 50001,
          '-': 3 }
        ...
      ]
    }
}
复制代码

能够看出：内存由 4.52mb 涨到了 8.53mb，其中 Closure 和 Array 涨了绝大部分，而咱们知道注册事件监听函数的本质就是将事件函数（Closure）push 到相应的数组（Array）里。

3.2 结合 heapdump

memwatch 在结合 heapdump 使用时才能发挥更好的做用。一般用 memwatch 监测到发生内存泄漏，用 heapdump 导出多份堆快照，而后用 Chrome DevTools 分析和比较，定位内存泄漏的元凶。

修改代码以下：

const memwatch = require('memwatch-next')
const heapdump = require('heapdump')
const http = require('http')
const server = http.createServer((req, res) => {
  for (let i = 0; i < 10000; i++) {
    server.on('request', function leakEventCallback() {})
  }
  res.end('Hello World')
  global.gc()
}).listen(3000)
dump()
memwatch.on('leak', () => {
  dump()
})
function dump() {
  const filename = `${__dirname}/heapdump-${process.pid}-${Date.now()}.heapsnapshot`
  heapdump.writeSnapshot(filename, () => {
    console.log(`${filename} dump completed.`)
  })
}
复制代码

以上程序在启动后先执行一次 heap dump，当触发 leak 事件时再执行一次 heap dump。运行这段代码并执行一样的 ab 命令，生成两个 heapsnapshot 文件：

heapdump-21126-1519545957879.heapsnapshot
heapdump-21126-1519545975702.heapsnapshot
复制代码

用 Chrome DevTools 加载这两个 heapsnapshot 文件，选择 comparison 比较视图，以下所示：

能够看出：增长了 5 万个 leakEventCallback 函数，单击其中任意一个，能够从 Retainers 中看到更详细的信息，例如 GC path 和所在的文件等信息。

前面介绍了 heapdump 和 memwatch-next 的用法，但在实际使用时并不那么方便，咱们总不能一直盯着服务器的情况，在发现内存持续增加并超过内心的阈值时，再手动去触发 Core Dump 吧？在大多数状况下发现问题时，就已经错过了现场。因此，咱们可能须要 cpu-memory-monitor。顾名思义，这个模块能够用来监控 CPU 和 Memory 的使用状况，并能够根据配置策略自动 dump CPU 的使用状况（cpuprofile）和内存快照（heapsnapshot）。

4 使用 cpu-memory-monitor

咱们先来看看如何使用 cpu-memory-monitor，其实很简单，只需在进程启动的入口文件中引入如下代码：

require('cpu-memory-monitor')({
  cpu: {
    interval: 1000,
    duration: 30000,
    threshold: 60,
    profileDir: '/tmp',
    counter: 3,
    limiter: [5, 'hour']
  }
})
复制代码

上述代码的做用是：每 1000ms(interval)检查一次 CPU 的使用状况，若是发现连续 3(counter)次 CPU 使用率大于 60%(threshold)，则 dump 30000ms(duration) CPU 的使用状况，生成 cpu-${process.pid}-${Date.now()}.cpuprofile 到/tmp(profileDir) 目录下，1(limiter[1]) 小时最多 dump 5(limiter[0]) 次。

以上是自动 dump CPU 使用状况的策略。dump Memory 使用状况的策略同理：

require('cpu-memory-monitor')({
  memory: {
    interval: 1000,
    threshold: '1.2gb',
    profileDir: '/tmp',
    counter: 3,
    limiter: [3, 'hour']
  }
})
复制代码

上述代码的做用是：每 1000ms(interval) 检查一次 Memory 的使用状况，若是发现连续 3(counter) 次 Memory 大于 1.2gb(threshold)，则 dump 一次 Memory，生成memory-${process.pid}-${Date.now()}.heapsnapshot 到 /tmp(profileDir) 目录下，1(limiter[1]) 小时最多 dump 3(limiter[0]) 次。

注意：memory 的配置没有 duration 参数，由于 Memroy 的 dump 只是某一时刻的，而不是一段时间的。

那聪明的你确定会问了：能不能将 cpu 和 memory 配置一块使用？好比：

require('cpu-memory-monitor')({
  cpu: {
    interval: 1000,
    duration: 30000,
    threshold: 60,
    ...
  },
  memory: {
    interval: 10000,
    threshold: '1.2gb',
    ...
  }
})
复制代码

答案是：能够，但不要这么作。由于这样作可能会出现这种状况：

内存高了且达到设定的阈值 -> 触发 Memory Dump/GC -> 致使 CPU 使用率高且达到设定的阈值 -> 触发 CPU Dump -> 致使堆积的请求愈来愈多（好比内存中堆积了不少 SQL 查询）-> 触发 Memory Dump -> 致使雪崩。

一般状况下，只使用其中一种就能够了。

4.1 源码解读

cpu-memory-monitor 的源代码不过百余行，大致逻辑以下：

const processing = {
  cpu: false,
  memory: false
}
const counter = {
  cpu: 0,
  memory: 0
}
function dumpCpu(cpuProfileDir, cpuDuration) { ... }
function dumpMemory(memProfileDir) { ... }
module.exports = function cpuMemoryMonitor(options = {}) {
  ...
  if (options.cpu) {
    const cpuTimer = setInterval(() => {
      if (processing.cpu) {
        return
      }
      pusage.stat(process.pid, (err, stat) => {
        if (err) {
          clearInterval(cpuTimer)
          return
        }
        if (stat.cpu > cpuThreshold) {
          counter.cpu += 1
          if (counter.cpu >= cpuCounter) {
            memLimiter.removeTokens(1, (limiterErr, remaining) => {
              if (limiterErr) {
                return
              }
              if (remaining > -1) {
                dumpCpu(cpuProfileDir, cpuDuration)
                counter.cpu = 0
              }
            })
          } else {
            counter.cpu = 0
          }
        }
      })
    }, cpuInterval)
  }
  if (options.memory) {
    ...
    memwatch.on('leak', () => {
      dumpMemory(...)
    })
  }
}
复制代码

能够看出：cpu-memory-monitor 没有用到什么新鲜的东西，仍是以前讲解过的 v8-profiler、heapdump、memwatch-next 的组合使用而已。

有如下几点须要注意：

只有传入了 cpu 或者 memory 的配置，才会去监听相应的 CPU 或者 Memory。在传入 memory 配置时，用了 memwatch-next 额外监听了 leak 事件，也会 dump Memory，格式是 leak-memory-${process.pid}-${Date.now()}.heapsnapshot。顶部引入了 heapdump，因此即便没有 memory 配置，也能够经过kill -USR2 <PID>手动触发 Memory Dump。