Redis BGSAVE由于内存不足 fork 失败致使目标 Redis 没法访问的问题

中秋的时候正在外面愉快的在外卖喝着咖啡玩电脑。。。。。。突发 redis 报警从 sentry 应用端曝出的错误html

MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, 
because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option).
Please check the Redis logs for details about the RDB error.

因而又开始愉快的处理问题了,看上去像是执行 rdb 快照持久化的时候出现的问题,上到 redis 机器查看日志定位详细问题linux

420:M 14 Sep 15:56:27.067 # Can't save in background: fork: Cannot allocate memory
420:M 14 Sep 15:56:33.071 * 10000 changes in 60 seconds. Saving...
420:M 14 Sep 15:56:33.072 # Can't save in background: fork: Cannot allocate memory
420:M 14 Sep 15:56:39.079 * 10000 changes in 60 seconds. Saving...
420:M 14 Sep 15:56:39.080 # Can't save in background: fork: Cannot allocate memory
420:M 14 Sep 15:56:45.083 * 10000 changes in 60 seconds. Saving...
420:M 14 Sep 15:56:45.083 # Can't save in background: fork: Cannot allocate memory
420:M 14 Sep 15:56:51.094 * 10000 changes in 60 seconds. Saving...
420:M 14 Sep 15:56:51.095 # Can't save in background: fork: Cannot allocate memory
420:M 14 Sep 15:56:57.002 * 10000 changes in 60 seconds. Saving...

能够很明显的发现应该是尝试 fork 的时候内存不够,并无被 linux 内核放行。redis

这里有两个点我认为须要注意一下,一个是 redis 在默认配置的状况是下是开启参数less

stop-writes-on-bgsave-error yes

也就是 若是 bgsave 存储快照失败,那么 redis 将阻止数据继续写入,若是将这个设置成 False 那么即便是 bgsave 快照写入磁盘失败,也不会让 redis 当即对外中止服务。ui

可是没法 bgsave 让数据落盘始终是隐患,要是机器一重启,就完蛋了。因此我尝试查询一些热修复的手段来修复这个问题。this

最终 linux 端有一个参数 vm.overcommit_memory 能够解决这个问题默认参数是 0 ,它有三个值能够配置。spa

这时候就是内存不足,到了这里,操做系统要怎么办,就要祭出咱们的主角“overcommit_memory”参数了(/proc/sys/vm/overcommit_memory);

vm.overcommit_memory = 0   启发策略
比较 这次请求分配的虚拟内存大小和系统当前空闲的物理内存加上swap,决定是否放行。系统在为应用进程分配虚拟地址空间时,会判断当前申请的虚拟地址空间大小是否超过剩余内存大小,若是超过,则虚拟地址空间分配失败。所以,也就是若是进程自己占用的虚拟地址空间比较大或者剩余内存比较小时,fork、malloc等调用可能会失败。

vm.overcommit_memory = 1 容许overcommit
直接放行,系统在为应用进程分配虚拟地址空间时,彻底不进行限制,这种状况下,避免了fork可能产生的失败,但因为malloc是先分配虚拟地址空间,然后经过异常陷入内核分配真正的物理内存,在内存不足的状况下,这至关于彻底屏蔽了应用进程对系统内存状态的感知,即malloc老是能成功,一旦内存不足,会引发系统OOM杀进程,应用程序对于这种后果是没法预测的。

vm.overcommit_memory = 2 禁止overcommit
根据系统内存状态肯定了虚拟地址空间的上限,因为不少状况下,进程的虚拟地址空间占用远大于其实际占用的物理内存,这样一旦内存使用量上去之后,对于一些动态产生的进程(须要复制父进程地址空间)则很容易建立失败,若是业务过程没有过多的这种动态申请内存或者建立子进程,则影响不大,不然会产生比较大的影响 。这种状况下系统所能分配的内存不会超过上面提到的CommitLimit大小,若是这么多资源已经用光,那么后面任未尝试申请内存的行为都会返回错误,这一般意味着此时无法运行任何新程序。
————————————————
版权声明:本文为CSDN博主「朱清震」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处连接及本声明。
原文连接:https://blog.csdn.net/zqz_zqz/article/details/53384854

因此这里 bgsave 咱们 redis 应用会尝试对主进程进行 fork ,而后内存不够申请未被内核放行。因此 hotfix 我尝试将参数 vm.overcommit_memory 设置成 1 直接进行放行。操作系统

/etc/sysctl.conf
vm.overcommit_memory=1
sysctl -p

生效,再看日志发现就能够成功了。.net

 

这里我找到官方 FAQ 也对相似问题有描述日志

Background saving fails with a fork() error under Linux even if I have a lot of free RAM!

Short answer: echo 1 > /proc/sys/vm/overcommit_memory :)

And now the long one:

Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.

Setting overcommit_memory to 1 tells Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.

A good source to understand how Linux Virtual Memory works and other alternatives for overcommit_memory and overcommit_ratio is this classic from Red Hat Magazine, "Understanding Virtual Memory". Beware, this article had 1 and 2 configuration values for overcommit_memory reversed: refer to the proc(5) man page for the right meaning of the available values.

后来 hotfix 以后,咱们清理了一些好久未能释放的大 key,将内存恢复到比较小的水平。就很稳了,此次问题发生以后没有无脑进行重启,而是迅速经过必定的思路来查询问题,感受本身解决问题的方法稍微成熟了一点点。 

 

 

Reference:

https://zhuanlan.zhihu.com/p/36872365    fork 的原理及实现

https://stackoverflow.com/questions/11752544/redis-bgsave-failed-because-fork-cannot-allocate-memory    redis bgsave failed because fork Cannot allocate memory

https://www.freebsd.org/doc/zh_CN/books/handbook/configtuning-sysctl.html    12.11. 用 sysctl 进行调整

https://blog.csdn.net/zqz_zqz/article/details/53384854    redis Can’t save in background: fork: Cannot allocate memory 解决及原理

https://redis.io/topics/faq    官方 FAQ

相关文章
相关标签/搜索