Nginx 104 Connection reset by peer故障处理

故障现象

1.看日志发现正常日志和错误日志比例几乎1:1
2.错误日志所有是104: Connection reset by peer) while reading upstream
3.看访问日志也没有其余http错误状态码html

[root@VM_0_22_centos logs]# ls -lh
total 389M
-rw-r--r-- 1 work work 191M Oct 30 17:30 ttt.minminmsn.com_access.log
-rw-r--r-- 1 work work 199M Oct 30 17:30 ttt.minminmsn.com_error.log
[root@VM_0_22_centos logs]# tail -n 1  ttt.minminmsn.com_error.log
2020/10/30 17:30:27 [error] 14063#0: *807476828 readv() failed (104: Connection reset by peer) while reading upstream, client: 117.61.242.104, server: ttt.minminmsn.com, request: "POST /yycp-launcherSnapshot/launcherSnapshot/querySnapshotSync HTTP/1.1", upstream: "http://192.168.8831:8081/ttt", host: "ttt.minminmsn.com"
[root@VM_0_22_centos logs]# cat ttt.minminmsn.com_access.log |awk '{print $9}'|sort |uniq -dc
1081274 200
      6 304
    125 400
  27482 404
    145 429
    106 499
      8 500

分析问题

1.连续责任人咨询业务场景发现客户端请求基本上都是POST请求,开始觉得是上传大文件链接超时了,后来开发确认为了安全使用POST请求,因此并无大文件上传
2.因为upstream重置链接了,就是说后端主动断开了链接,而后发现链接里有不少TIME-WAIT,应该是qps比较大的状况下,链接处理比较快还在断开链接中就显得比较多了
3.nginx做为反向代理既然是客户端又是服务端,当和后端服务创建链接时并无默认开启长链接,开启长链接后性能应该会提高不少
4.默认开启长链接不须要keeplive参数,以下是nginx官网查寻的keepalive参数,看的不是很明白,不过有个连接讲的很清楚,他能够激活链接缓存,应该属于长链接性能优化类
5.keepalive参数值应该与qps有关,默认不须要设置太大,若是访问日志里面有5XX错误还得根据实际状况调整,以达到最优效果nginx

下面是官网keeplaive参数解释
Syntax: keepalive connections;
Default: —
Context: upstream
This directive appeared in version 1.1.4.apache

Activates the cache for connections to upstream servers.后端

The connections parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed.centos

It should be particularly noted that the keepalive directive does not limit the total number of connections to upstream servers that an nginx worker process can open. The connections parameter should be set to a number small enough to let upstream servers process new incoming connections as well.
When using load balancing methods other than the default round-robin method, it is necessary to activate them before the keepalive directive.缓存

处理方案

1.修改nginx配置开启长链接及结合链接缓存
2.重启nginx服务
主要配置以下安全

upstream gateway{
            server 192.168.88.31:8081;
            server 192.168.88.44:8081;
            server 192.168.88.115:8081;
            server 192.168.88.80:8081;
            #如下是新增配置
            keepalive 100;
        }

        location / {
           proxy_pass http://gateway;
           proxy_set_header   Host             $host;
           proxy_set_header   X-Real-IP        $remote_addr;
           proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
           #如下是新增配置
           proxy_connect_timeout      120;   
           proxy_send_timeout         300;    
           proxy_read_timeout         300; 
           proxy_http_version 1.1;    
           proxy_set_header Connection ""; 
        }

检查效果

1.查看错误日志
错误日志清空后没有增加过性能优化

[root@VM_0_22_centos logs]# ls -lh
total 389M
-rw-r--r-- 1 work work 389M Oct 30 18:50 ttt.minminmsn.com_access.log
-rw-r--r-- 1 work work  446 Oct 30 18:10 ttt.minminmsn.com_error.log

2.查看链接数状态
长链接前TIME-WAIT比较多app

[root@VM_0_22_centos logs]# ss -an |awk '{print $2}'|sort |uniq -dc |sort -rn
   5045 TIME-WAIT
    156 ESTAB
     62 UNCONN
     21 LISTE

长链接后TSTAB比较多ide

[root@VM_0_22_centos ~]# ss -an |awk '{print $2}'|sort |uniq -dc |sort -rn
    511 ESTAB
     62 UNCONN
     52 TIME-WAIT
     21 LISTEN

参考文档

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive
https://www.cnblogs.com/sunsky303/p/10648861.html
http://blog.51yip.com/apachenginx/2203.html

相关文章
相关标签/搜索