这个问题以前没有怎么留意过,是最近在面试过程当中遇到的一个问题,面了两家公司,两家公司居然都面到到了这个问题,不得不使我开始关注这个问题。提及CLOSE_WAIT状态,若是不知道的话,仍是先瞧一下TCP的状态转移图吧。php
关闭socket分为主动关闭(Active closure)和被动关闭(Passive closure)两种状况。前者是指有本地主机主动发起的关闭;然后者则是指本地主机检测到远程主机发起关闭以后,做出回应,从而关闭整个链接。将关闭部分的状态转移摘出来,就获得了下图:html
产生缘由
经过图上,咱们来分析,什么状况下,链接处于CLOSE_WAIT状态呢?
在被动关闭链接状况下,在已经接收到FIN,可是尚未发送本身的FIN的时刻,链接处于CLOSE_WAIT状态。
一般来说,CLOSE_WAIT状态的持续时间应该很短,正如SYN_RCVD状态。可是在一些特殊状况下,就会出现链接长时间处于CLOSE_WAIT状态的状况。
出现大量close_wait的现象,主要缘由是某种状况下对方关闭了socket连接,可是我方忙与读或者写,没有关闭链接。代码须要判断socket,一旦读到0,断开链接,read返回负,检查一下errno,若是不是AGAIN,就断开链接。
参考资料4中描述,经过发送SYN-FIN报文来达到产生CLOSE_WAIT状态链接,没有进行具体实验。不过我的认为协议栈会丢弃这种非法报文,感兴趣的同窗能够测试一下,而后把结果告诉我;-)
为了更加清楚的说明这个问题,咱们写一个测试程序,注意这个测试程序是有缺陷的。
只要咱们构造一种状况,使得对方关闭了socket,咱们还在read,或者是直接不关闭socket就会构造这样的状况。
server.c:linux
#include <stdio.h> #include <string.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(void) { struct sockaddr_in servaddr, cliaddr; socklen_t cliaddr_len; int listenfd, connfd; char buf[MAXLINE]; char str[INET_ADDRSTRLEN]; int i, n; listenfd = socket(AF_INET, SOCK_STREAM, 0); int opt = 1; setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl(INADDR_ANY); servaddr.sin_port = htons(SERV_PORT); bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); listen(listenfd, 20); printf("Accepting connections ...\n"); while (1) { cliaddr_len = sizeof(cliaddr); connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len); //while (1) { n = read(connfd, buf, MAXLINE); if (n == 0) { printf("the other side has been closed.\n"); break; } printf("received from %s at PORT %d\n", inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)), ntohs(cliaddr.sin_port)); for (i = 0; i < n; i++) buf[i] = toupper(buf[i]); write(connfd, buf, n); } //这里故意不关闭socket,或者是在close以前加上一个sleep均可以 //sleep(5); //close(connfd); } } |
client.c:面试
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(int argc, char *argv[]) { struct sockaddr_in servaddr; char buf[MAXLINE]; int sockfd, n; char *str; if (argc != 2) { fputs("usage: ./client message\n", stderr); exit(1); } str = argv[1]; sockfd = socket(AF_INET, SOCK_STREAM, 0); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr); servaddr.sin_port = htons(SERV_PORT); connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); write(sockfd, str, strlen(str)); n = read(sockfd, buf, MAXLINE); printf("Response from server:\n"); write(STDOUT_FILENO, buf, n); write(STDOUT_FILENO, "\n", 1); close(sockfd); return 0; } |
结果以下:app
debian-wangyao:~$ ./client a Response from server: A debian-wangyao:~$ ./client b Response from server: B debian-wangyao:~$ ./client c Response from server: C debian-wangyao:~$ netstat -antp | grep CLOSE_WAIT (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 1 0 127.0.0.1:8000 127.0.0.1:58309 CLOSE_WAIT 6979/server tcp 1 0 127.0.0.1:8000 127.0.0.1:58308 CLOSE_WAIT 6979/server tcp 1 0 127.0.0.1:8000 127.0.0.1:58307 CLOSE_WAIT 6979/server |
解决方法
基本的思想就是要检测出对方已经关闭的socket,而后关闭它。
1.代码须要判断socket,一旦read返回0,断开链接,read返回负,检查一下errno,若是不是AGAIN,也断开链接。(注:在UNP 7.5节的图7.6中,能够看到使用select可以检测出对方发送了FIN,再根据这条规则就能够处理CLOSE_WAIT的链接)
2.给每个socket设置一个时间戳last_update,每接收或者是发送成功数据,就用当前时间更新这个时间戳。按期检查全部的时间戳,若是时间戳与当前时间差值超过必定的阈值,就关闭这个socket。
3.使用一个Heart-Beat线程,按期向socket发送指定格式的心跳数据包,若是接收到对方的RST报文,说明对方已经关闭了socket,那么咱们也关闭这个socket。
4.设置SO_KEEPALIVE选项,并修改内核参数
前提是启用socket的KEEPALIVE机制:
//启用socket链接的KEEPALIVE
int iKeepAlive = 1;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (void *)&iKeepAlive, sizeof(iKeepAlive));
tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
The number of seconds between TCP keep-alive probes.
tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained from the other end.
tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connec‐tion is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled.
echo 120 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 2 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 1 > /proc/sys/net/ipv4/tcp_keepalive_probes
除了修改内核参数外,可使用setsockopt修改socket参数,参考man 7 socket。socket
int KeepAliveProbes=1; int KeepAliveIntvl=2; int KeepAliveTime=120; setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, (void *)&KeepAliveProbes, sizeof(KeepAliveProbes)); setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, (void *)&KeepAliveTime, sizeof(KeepAliveTime)); setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, (void *)&KeepAliveIntvl, sizeof(KeepAliveIntvl)); |
参考:
http://blog.chinaunix.net/u/20146/showart_1217433.html
http://blog.csdn.net/eroswang/archive/2008/03/10/2162986.aspx
http://haka.sharera.com/blog/BlogTopic/32309.htm
http://learn.akae.cn/media/ch37s02.html
http://faq.csdn.net/read/208036.html
http://www.cndw.com/tech/server/2006040430203.asp
http://davidripple.bokee.com/1741575.html
http://doserver.net/post/keepalive-linux-1.php
man 7 tcptcp