openresty 致使大量 timewait 的一次跳坑经历

    背景是cdn 的刷新系统,agent 会向 openresty 发送大量 GET / PURGE 请求,而后由openresty 将请求转发给后端 ats ,当压测的时候,发现openresty 性能出现问题,检查发现出现了大量TIMEWAIT。html

该文章后续仍在不断的更新修改中, 请移步到原文地址http://dmwan.ccnginx

    首先,咱们openresty 的配置是这样的:后端

lua_package_path "/usr/local/openresty/nginx/sp_lua/?.lua;/usr/local/openresty/nginx/sp_lua/?.sp;?.lua;/usr/local/openresty/lualib/?.lua";
lua_code_cache on;
lua_shared_dict refresh_db 16m;
  

upstream ats{
    server 127.0.0.1:8080;
}

    server {
        listen       80;
        server_name  locolhost default.com;

        location / {
                set $module_conf "/usr/local/openresty/nginx/conf/lua_modules_conf/module_conf";
                include "lua_include_conf/include_location.conf";
                proxy_set_header Host $host;
                proxy_pass http://ats;
               
        }

        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
}

    由 openresty 做为代理,向8080端口的ats 转发请求。按照猜测,我认为这应该是长链接是比较合理的。而后,并非。bash

    ss -s 发现短时timewait 数量达到5万,这必定是有问题的。tcp

    查看端口状态memcached

netstat -an |grep 8080

  现象以下:性能

tcp        0      0 127.0.0.1:58009             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:57931             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:60167             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:58079             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:58149             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:60375             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:57657             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:59569             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:56999             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:63087             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:61483             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:61461             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:62133             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:63053             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:62125             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:61197             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:63139             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:57719             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:60967             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:58575             127.0.0.1:8080              TIME_WAIT   -

    从timewait 能够看出应该是openresty 主动关掉了链接。fetch

    抓包定位下,this

ngrep -W byline port 8080 -d lo -X

    显示结果以下:lua

##
T 127.0.0.1:31278 -> 127.0.0.1:8080 [AP]
GET /test.html HTTP/1.0.
Host: www.default.com.
Connection: close.
User-Agent: refresh_fetcher.
Accept-Encoding: gzip.
.

##
T 127.0.0.1:8080 -> 127.0.0.1:31278 [AP]
HTTP/1.0 200 OK.
Server: ATS/6.2.3.
Date: Tue, 27 Mar 2018 13:34:37 GMT.
Content-Type: text/html.
Content-Length: 4.
Last-Modified: Tue, 05 Sep 2017 05:11:20 GMT.
ETag: "59ae31f8-4".
Expires: Tue, 27 Mar 2018 14:34:37 GMT.
Cache-Control: max-age=3600.
Accept-Ranges: bytes.
Age: 489.
Ws-Cache-Key: http://www.xxx.com/test.html.
Ws-Milestone: UA-BEGIN=1522158166703713, UA-FIRST-READ=1522158166703713, UA-READ-HEADER-DONE=1522158166703713, UA-BEGIN-WRITE=1522158166703758, CACHE-OPEN-READ-BEGIN=1522158166703733, CACHE-OPEN-READ-END=1522158166703733, PLUGIN-ACTIVE=1522158166703715, PLUGIN-TOTAL=152215816670371.
Ws-Hit-Miss-Code: TCP_MEM_HIT.
Ws-Is-Hit: 1.

    结果有两个很重要的点,一个是http1.0 发起方就是http1.0,说明发送端确实是转发的时候就设主动以1.0发的短链接,第二个是connection 是close。

    查询openresty 官方文档。发现有对应描述:

The connections parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed.

It should be particularly noted that the keepalive directive does not limit the total number of connections to upstream servers that an nginx worker process can open. The connections parameter should be set to a number small enough to let upstream servers process new incoming connections as well.
Example configuration of memcached upstream with keepalive connections:

upstream memcached_backend {
    server 127.0.0.1:11211;
    server 10.0.0.2:11211;

    keepalive 32;
}

server {
    ...

    location /memcached/ {
        set $memcached_key $uri;
        memcached_pass memcached_backend;
    }

}
For HTTP, the proxy_http_version directive should be set to “1.1” and the “Connection” header field should be cleared:

upstream http_backend {
    server 127.0.0.1:8080;

    keepalive 16;
}

server {
    ...

    location /http/ {
        proxy_pass http://http_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        ...
    }
}
Alternatively, HTTP/1.0 persistent connections can be used by passing the “Connection: Keep-Alive” header field to an upstream server, though this method is not recommended.

    简单来讲,就是须要设置keepalive 数量,设置http_version, 设置 Connection 三个参数。当修改后,问题获得解决。

    总结:最开始觉得像openresty 性能如此优秀的项目,长链接是应该天然而然支持的,不过,真的是想多了。

相关文章
相关标签/搜索