解决php sockect Address already的问题

时间 2019-11-06

原文原文链接

In order for a network connection to close, both ends have to send FIN (final) packets, which indicate they will not send any additional data, and both ends must ACK (acknowledge) each other's FIN packets. The FIN packets are initiated by the application performing a close(), a shutdown(), or an exit(). The ACKs are handled by the kernel after the close()has completed. Because of this, it is possible for the process to complete before the kernel has released the associated network resource, and this port cannot be bound to another process until the kernel has decided that it is done.html

Figure 1linux

Figure 1 shows all of the possible states that can occur during a normal closure, depending on the order in which things happen. Note that if you initiate closure, there is a TIME_WAIT state that is absent from the other side. This TIME_WAIT is necessary in case the ACK you sent wasn't received, or in case spurious packets show up for other reasons. I'm really not sure why this state isn't necessary on the other side, when the remote end initiates closure, but this is definitely the case. TIME_WAIT is the state that typically ties up the port for several minutes after the process has completed. The length of the associated timeout varies on different operating systems, and may be dynamic on some operating systems, however typical values are in the range of one to four minutes.web

If both ends send a FIN before either end receives it, both ends will have to go through TIME_WAIT.编程

Normal Closure of Listen Sockets

A socket which is listening for connections can be closed immediately if there are no connections pending, and the state proceeds directly to CLOSED. If connections are pending however, FIN_WAIT_1 is entered, and a TIME_WAITis inevitable.sass

Note that it is impossible to completely guarantee a clean closure here. While you can check the connections using a select() call before closure, a tiny but real possibility exists that a connection could arrive after the select() but before the close().服务器

Abnormal Closure

If the remote application dies unexpectedly while the connection is established, the local end will have to initiate closure. In this case TIME_WAIT is unavoidable. If the remote end disappears due to a network failure, or the remote machine reboots (both are rare), the local port will be tied up until each state times out. Worse, some older operating systems do not implement a timeout for FIN_WAIT_2, and it is possible to get stuck there forever, in which case restarting your server could require a reboot.cookie

If the local application dies while a connection is active, the port will be tied up in TIME_WAIT. This is also true if the application dies while a connection is pending.网络

Strategies for Avoidance

SO_REUSEADDR

You can use setsockopt() to set the SO_REUSEADDR socket option, which explicitly allows a process to bind to a port which remains in TIME_WAIT(it still only allows a single process to be bound to that port). This is the both the simplest and the most effective option for reducing the "address already in use" error.并发

Oddly, using SO_REUSEADDR can actually lead to more difficult "address already in use" errors. SO_REUSADDR permits you to use a port that is stuck in TIME_WAIT, but you still can not use that port to establish a connection to the last place it connected to. What? Suppose I pick local port 1010, and connect to foobar.com port 300, and then close locally, leaving that port in TIME_WAIT. I can reuse local port 1010 right away to connect to anywhere except for foobar.com port 300.app

A situation where this might be a problem is if my program is trying to find a reserved local port (< 1024) to connect to some service which likes reserved ports. If I used SO_REUSADDR, then each time I run the program on my machine, I'll keep getting the same local reserved port, even if it is stuck in TIME_WAIT, and I risk getting a "connect: Address already in use" error if I go back to any place I've been to in the last few minutes. The solution here is to avoid SO_REUSEADDR.

Some folks don't like SO_REUSEADDR because it has a security stigma attached to it. On some operating systems it allows the same port to be used with a different address on the same machine by different processes at the same time. This is a problem because most servers bind to the port, but they don't bind to a specific address, instead they use INADDR_ANY (this is why things show up in netstat output as *.8080). So if the server is bound to *.8080, another malicious user on the local machine can bind to local-machine.8080, which will intercept all of your connections since it is more specific. This is only a problem on multi-user machines that don't have restricted logins, it is NOT a vulnerability from outside the machine. And it is easily avoided by binding your server to the machine's address.

Additionally, others don't like that a busy server may have hundreds or thousands of these TIME_WAIT sockets stacking up and using kernel resources. For these reasons, there's another option for avoiding this problem.

Client Closes First

Looking at the diagram above, it is clear that TIME_WAIT can be avoided if the remote end initiates the closure. So the server can avoid problems by letting the client close first. The application protocol must be designed so that the client knows when to close. The server can safely close in response to an EOFfrom the client, however it will also need to set a timeout when it is expecting an EOF in case the client has left the network ungracefully. In many cases simply waiting a few seconds before the server closes will be adequate.

It probably makes more sense to call this method "Remote Closes First", because otherwise it depends on what you are calling the client and the server. If you are developing some system where a cluster of client programs sit on one machine and contact a variety of different servers, then you would want to foist the responsibility for closure onto the servers, to protect the resources on the client.

For example, I wrote a script that uses rsh to contact all of the machines on our network, and it does it in parallel, keeping some number of connections open at all times. rsh source ports are arbitrary available ports less than 1024. I initially used "rsh -n", which it turns out causes the local end to close first. After a few tests, every single free port less than 1024 was stuck in TIME_WAIT and I couldn't proceed. Removing the "-n" option causes the remote (server) end to close first (understanding why is left as an exercise for the reader), and should've eliminated the TIME_WAIT problem. However, without the -n, rsh can hang waiting for input. And, if you close input at the local end, this can again result in the port going into TIME_WAIT. I ended up avoiding the system-installed rsh program, and developing my own implementation in perl. My current implementation, multi-rsh, is available for download

Reduce Timeout

If (for whatever reason) neither of these options works for you, it may also be possible to shorten the timeout associated with TIME_WAIT. Whether this is possible and how it should be accomplished depends on the operating system you are using. Also, making this timeout too short could have negative side-effects, particularly in lossy or congested networks.

从"address already in use"谈起

1 问题

问题起源：不少时候，server端若是重启或者崩溃，会遇到“ Address already in use”。过几分钟，就能够从新启动了。

下面是问题：

A）为何会出现这种状况?

B) 如何解决，使得服务器可以立刻启动？

2 分析

原来，Server端若是重启或者遇到崩溃，会进入TIME_WAIT状态，而且会等待2MSL的时间，在这个时间内，是不容许服务器重启的。

那为何Server端会是TIME_WAIT状态，而不是Close状态。这就涉及到TCP链接关闭的问题。

2.1 TCP链接关闭流程

TCP中，执行主动关闭的一方会进入TIME_WAIT的状态，图中的例子是Client进入TIME_WAIT状态。

进入 TIME_WAIT状态以后，会等待2MSL（Max Segment Lifetime，最大段生存时间，MSL为2min，1min，30s,根据不一样的实现决定，RFC 793 建议为2min）。

做为参考，下面是TCP链接状态转换图。

2.2 TIME_WAIT的做用

TIME_WAIT有2个做用：

1）当主动关闭方发送最后的ACK消息丢失时，会致使另外一方从新发送FIN消息。 TIME-WAIT 状态用于维护链接状态。

–若是主动关闭方直接关闭链接，当重传的FIN消息到达时，由于TCP已经再也不有链接的信息了，因此它就用RST（从新启动）消息应答，这样会致使对等方进入错误状态而不是有序的终止状态。

–从新启动2MSL计时器，防止该ACK再次丢失。

2）为链接中“离群的段”提供从网络中消失的时间。

网络中的数据包由于延时等因素，可能在链接关闭以后才到达，若是没有进入TIME_WAIT状态，且知足

A) 又创建了新的链接，且新的链接的4元组和上次的链接同样，即Src_IP, Src_Port, Dst_IP,Dst_Port同样。

B）这个延时的数据包的序列号刚好又处于对方新链接的可接受窗口以内。

知足这个2个条件，就会被接收，而且会破坏新的链接。

而进入TIME_WAIT状态，而且等待2 MSL，就给网络中“离群的段”提供了消失的时间。

2.3 如何结束TIME_WAIT状态呢

有种说法，叫作TIME_WAIT Assassination，就是TIME_WAIT暗杀。有2种状况会致使TIME_WAIT Assassination.

A) 意外终止。

以下图所示，当有个延时的MSG发送过来的时候，执行主动关闭的HOST1处于TIME_WAIT,由于这个延时的MSG的序列号不在当前能处理的窗口范围以内，HOST1会发送一个ACK包，告诉对方说，我HOST1能收的序列号是多少。而对方已经关闭，处于Close状态，收到一个ACK包，就会回复一个RST包给HOST1。致使HOST1当即结束。

TIME_WAIT给Assassinate掉了。这种状况有没有办法避免呢？

有的，有的实现这么处理：当处于TIME_WAIT状态时不处理RST包便可。

B) 人为形成。

能够调用setsockopt，设置SO_LINGER，就能够不进行结束链接的4次握手，不进入TIME_WAIT，而直接关闭链接。

关于SO_LINGER

–应用程序关闭链接时，close或者closesocket调用会操当即返回，若是有数据残留在套接口缓冲区中则系统将试着将这些数据发送给对方，可是应用程序并不知道递交是否成功。

–close的成功返回仅告诉咱们发送的数据（和FIN）已由对方TCP确认，它并不能告诉咱们对方应用进程是否已读了数据。若是套接口设为非阻塞的，它将不等待close完成

–SO_LINGER选项用来改变此缺省设置

设置SO_LINGER结构

struct linger {

int l_onoff; /* 0 = off, nozero = on */

int l_linger; /* linger time */

};

–l_onoff为0，则该选项关闭，l_linger的值被忽略，等于缺省状况，close当即返回；

–l_onoff为非0，l_linger为0，则套接口关闭时TCP中断链接，TCP将丢弃保留在套接口发送缓冲区中的任何数据并发送一个RST给对方，而不是一般的四次挥手终止序列，这避免了TIME_WAIT状态；

–l_onoff 为非0，l_linger为非0，当套接口关闭时内核将拖延一段时间（由l_linger决定）。若是套接口缓冲区中仍残留数据，进程将处于睡眠状态，直到全部数据发送完且被对方确认，以后进行正常的终止序列或延迟时间到。此种状况下，应用程序检查close的返回值是很是重要的，若是在数据发送完并被确认前时间到，close将返回EWOULDBLOCK错误且套接口发送缓冲区中的任何数据都丢失。

2.4 关于TIME_WAIT状态的结论

健壮的应用程序永远不该该干涉TIME-WAIT状态----它是TCP可靠性机制的一个重要部分。

3 Server问题分析

上面讲了TIME_WAIT相关的知识，如今咱们知道，当Server端重启或者崩溃的时候，它就是主动关闭的一方，会进入TIME_WAIT状态，致使服务器不能重启。

那咱们能够立刻重启么，能够的。

4 如何立刻重启Server

在调用bind函数以前，设置SO_REUSEADDR就能够了。

说到这里，好像应该结束了，可是，咱们刚刚介绍过，TIME_WAIT的做用有2个，那这里Server重用这个地址，有没有可能致使问题呢。

答案是确定的，有这个可能。只要知足4元组相同，而且delay的数据包的序列号在新的链接可接受的窗口以内，就可能致使问题。

在Stackoverflow上，有人问过这个问题：

Using SO_REUSEADDR - What happens to previously open socket?

答案就是：The SO_REUSEADDR option overrides that behavior, allowing you to reuse the port immediately.

Effectively, you're saying: "I understand the risks and would like to use the port anyway."

Linux下有关TCP协议TIME_WAIT状态分析

今天遇到一个端口问题。socket编程中，值得注意的是，调用close(sock_id)函数sock_id套接口不会当即释放。这是TCP协议的特性，主要是为了让双方有足够的时候进行“四次信号”关闭。

咱们能够回顾下计算机网络TCP的握手操做：

所以，当调用close()函数以后，套接口状态由原来的ESTABLISHED状态变成TIME_WAIT状态，这段时间端口未被释放，这段时间内调用bind()函数，绑定这个端口，将会出错“can’t bind server socket :address already in use”。能够修改协议保持TIME_WAIT状态的时间，具体修改办法，能够参考

下面一段代码应用自：linux下解决大量的TIME_WAIT

[root@web02 ~]# vi /etc/sysctl.conf
新增以下内容：
net.ipv4.tcp_tw_reuse  = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies=1
使 内核参数生效：
[root@web02 ~]# sysctl -p
readme:
net.ipv4.tcp_syncookies=1  打开TIME-WAIT套接字重用功能，对于存在大量链接的Web服务器很是有效。
net.ipv4.tcp_tw_recyle=1
net.ipv4.tcp_tw_reuse=1  减小处于FIN-WAIT-2链接状态的时间，使系统能够处理更多的链接。
net.ipv4.tcp_fin_timeout=30  减小TCP KeepAlive链接侦测的时间，使系统能够处理更多的链接。
net.ipv4.tcp_keepalive_time=1800  增长TCP SYN队列长度，使系统能够处理更多的并发链接。
net.ipv4.tcp_max_syn_backlog=8192

1. 若是仍是想执行bind()函数，能够绕过TIME_WAIT状态，使用setsockopt函数，重用端口，这样bind()的时候就不会出错。例如：

int sock,opt=1;//opt=0则为禁止重用
sock=sock(....);
setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,&opt,sizeof(opt));
bind(...);

具体setsockopt函数的操做能够参考int setsockopt(int socket, int level, int option_name, const void *option_value, socklen_t option_len);

2. 禁止TIME_WAIT状态。

setsockopt函数的SO_LINGER参数能够设置是否延迟关闭套接口。
struct linger {
int l_onoff; /* 0 = off, nozero = on */
int l_linger; /* linger time */
};

int server_fd;
server_fd=socket(AF_INET,SOCK_STREAM,0);
int opt=1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct linger li;
li.l_onoff = 1;
li.l_linger = 0;
setsockopt (server_fd,SOL_SOCKET, SO_LINGER,(const char *)&li,sizeof (li));

通过个人测试以上代码能够实现当即关闭暴力套接口，在终端执行 sudo netstat -anp | grep 8080 以后8080端口不会是TIME_WAIT状态。

若是你的程序是C/S模式的。此时，经过以上代码，当服务器关闭套接口后，对方不会出现 peer reset而自动退出。我判断，当禁止掉延迟关闭套接口以后，并无执行“四次信号”结束，服务器本身断开了，没有通知客户端。