在工做中,咱们常常须要重启PHP-FPM,那么这个重启过程都发生了那些事情呢?让咱们从PHP源码中一探究竟吧。php
运行环境: Mac 10.14.2 + PHP 7.3.7segmentfault
信号在fpm的重启中扮演着重要的角色。那什么是信号呢?bash
信号是由用户、系统或者进程发送给目标进程的信息,以通知目标进程某个状态的改变或系统异常。Linux信号可由以下条件产生:异步
- 对于前台进程,用户能够经过输入特殊的终端字符来给它发送信号。
- 系统异常。好比浮点异常和非法内存段访问。
- 系统状态变化。好比 alarm 定时器到期将引发 SIGALARM 信号。
- 运行 kill 命令或调用 kill 函数
在PHP-FPM中,用户经过kill
命令来重启fpm,master进程也是经过kill()
函数向worker进程发送信号来结束进程。fpm的重启分为优雅重启(kill -SIGUSR2
)和强制重启(kill -SIGTERM
)两种,下面是以优雅重启为例,master进程将收到SIGUSR2
信号。socket
master进程信号初始化函数fpm_signals_init_main()
主要作了两件事情:ide
经过socketpair()
来建立这一对双全工的unix_socket,其中sp[0]
的可读事件在fpm_event_loop()
中被注册到事件队列中,其回调函数为fpm_got_signal()
,这样往sp[1]
写入数据时将触发sp[0]
的可读事件回调。对这俩unix_socket还有两个操做:函数
fcntl(fd, F_SETFL, old_flags|O_NONBLOCK)
,这样当fd不可读或不可写的时候,read()
、write()
不会阻塞,而是直接返回-1,errno设为EAGAIN。fcntl(fd, F_SETFD, FD_CLOEXEC)
,这样当进程调用exec()
族函数前会关闭该fd。这么作是为了防止文件描述符的泄露,由于调用exec()
族函数会用新程序替换掉当前进程执行的程序,进程的正文、数据、堆和栈段都会被替换,这就致使原先保存文件描述符的变量不存在了,也就没法关闭“老进程“的fd,致使文件描述符泄露。注册的信号有SIGTERM
、SIGINT
、SIGUSR1
、SIGUSR2
、SIGCHLD
、SIGQUIT
六种。php-fpm
int fpm_signals_init_main() /* {{{ */ {
struct sigaction act;
if (0 > socketpair(AF_UNIX, SOCK_STREAM, 0, sp)) {
zlog(ZLOG_SYSERROR, "failed to init signals: socketpair()");
return -1;
}
if (0 > fd_set_blocked(sp[0], 0) || 0 > fd_set_blocked(sp[1], 0)) {
zlog(ZLOG_SYSERROR, "failed to init signals: fd_set_blocked()");
return -1;
}
if (0 > fcntl(sp[0], F_SETFD, FD_CLOEXEC) || 0 > fcntl(sp[1], F_SETFD, FD_CLOEXEC)) {
zlog(ZLOG_SYSERROR, "falied to init signals: fcntl(F_SETFD, FD_CLOEXEC)");
return -1;
}
memset(&act, 0, sizeof(act));
act.sa_handler = sig_handler;
sigfillset(&act.sa_mask);
if (0 > sigaction(SIGTERM, &act, 0) ||
0 > sigaction(SIGINT, &act, 0) ||
0 > sigaction(SIGUSR1, &act, 0) ||
0 > sigaction(SIGUSR2, &act, 0) ||
0 > sigaction(SIGCHLD, &act, 0) ||
0 > sigaction(SIGQUIT, &act, 0)) {
zlog(ZLOG_SYSERROR, "failed to init signals: sigaction()");
return -1;
}
return 0;
}
复制代码
worker进程信号初始化函数fpm_signals_init_child()
主要作了三件事情:oop
这对unix_socket继承自master进程,worker进程用不到它们。ui
sig_soft_quit()
,sa_flags
变量设为SA_RESTART
表示信号处理函数返回后从新调用被中断的系统调用,这样worker进程正在处理中的事情不会受到影响。SIG_DFL
,即采用默认行为。调用zend_signal_init()
,这个不展开讲了。
int fpm_signals_init_child() /* {{{ */ {
struct sigaction act, act_dfl;
memset(&act, 0, sizeof(act));
memset(&act_dfl, 0, sizeof(act_dfl));
act.sa_handler = &sig_soft_quit;
act.sa_flags |= SA_RESTART;
act_dfl.sa_handler = SIG_DFL;
close(sp[0]);
close(sp[1]);
if (0 > sigaction(SIGTERM, &act_dfl, 0) ||
0 > sigaction(SIGINT, &act_dfl, 0) ||
0 > sigaction(SIGUSR1, &act_dfl, 0) ||
0 > sigaction(SIGUSR2, &act_dfl, 0) ||
0 > sigaction(SIGCHLD, &act_dfl, 0) ||
0 > sigaction(SIGQUIT, &act, 0)) {
zlog(ZLOG_SYSERROR, "failed to init child signals: sigaction()");
return -1;
}
zend_signal_init();
return 0;
}
复制代码
master进程收到SIGUSR2
信号后将回调sig_handler()
进行信号处理。咱们能够看到SIGUSR2
被映射为2
,并写入到 sp[1]
。
static void sig_handler(int signo) /* {{{ */ {
static const char sig_chars[NSIG + 1] = {
[SIGTERM] = 'T',
[SIGINT] = 'I',
[SIGUSR1] = '1',
[SIGUSR2] = '2',
[SIGQUIT] = 'Q',
[SIGCHLD] = 'C'
};
char s;
int saved_errno;
if (fpm_globals.parent_pid != getpid()) {
/* prevent a signal race condition when child process have not set up it's own signal handler yet */
return;
}
saved_errno = errno;
s = sig_chars[signo];
zend_quiet_write(sp[1], &s, sizeof(s)); //实际调用write()
errno = saved_errno;
}
复制代码
当往sp[1]
写入数据后,sp[0]
变为可读,触发事件回调fpm_got_signal()
。从sp[0]
读取到写入的数据 2
,以后调用fpm_pctl()
来进行重启操做。
static void fpm_got_signal(struct fpm_event_s *ev, short which, void *arg) /* {{{ */ {
char c;
int res, ret;
int fd = ev->fd;
do {
do {
res = read(fd, &c, 1);
} while (res == -1 && errno == EINTR);
if (res <= 0) {
if (res < 0 && errno != EAGAIN && errno != EWOULDBLOCK) {
zlog(ZLOG_SYSERROR, "unable to read from the signal pipe");
}
return;
}
switch (c) {
case 'C' : /* SIGCHLD */
zlog(ZLOG_DEBUG, "received SIGCHLD");
fpm_children_bury();
break;
......
case '2' : /* SIGUSR2 */
zlog(ZLOG_DEBUG, "received SIGUSR2");
zlog(ZLOG_NOTICE, "Reloading in progress ...");
fpm_pctl(FPM_PCTL_STATE_RELOADING, FPM_PCTL_ACTION_SET);
break;
}
if (fpm_globals.is_child) {
break;
}
} while (1);
return;
}
复制代码
由下面的fpm_pctl()
代码可知,对于FPM_PCTL_ACTION_SET
操做只有当fpm状态fpm_state
为正常时(FPM_PCTL_STATE_NORMAL
),重启操做才能进行下去。
以后将重置已发送信号(fpm_signal_sent=0
),并设置fpm当前状态为FPM_PCTL_STATE_RELOADING
,而后调用fpm_pctl_action_next()
进行下一步操做。
void fpm_pctl(int new_state, int action) /* {{{ */ {
switch (action) {
case FPM_PCTL_ACTION_SET :
if (fpm_state == new_state) { /* already in progress - just ignore duplicate signal */
return;
}
switch (fpm_state) { /* check which states can be overridden */
case FPM_PCTL_STATE_NORMAL :
/* 'normal' can be overridden by any other state */
break;
case FPM_PCTL_STATE_RELOADING :
/* 'reloading' can be overridden by 'finishing' */
if (new_state == FPM_PCTL_STATE_FINISHING) break;
case FPM_PCTL_STATE_FINISHING :
/* 'reloading' and 'finishing' can be overridden by 'terminating' */
if (new_state == FPM_PCTL_STATE_TERMINATING) break;
case FPM_PCTL_STATE_TERMINATING :
/* nothing can override 'terminating' state */
zlog(ZLOG_DEBUG, "not switching to '%s' state, because already in '%s' state",
fpm_state_names[new_state], fpm_state_names[fpm_state]);
return;
}
fpm_signal_sent = 0;
fpm_state = new_state;
zlog(ZLOG_DEBUG, "switching to '%s' state", fpm_state_names[fpm_state]);
/* fall down */
case FPM_PCTL_ACTION_TIMEOUT :
fpm_pctl_action_next();
break;
case FPM_PCTL_ACTION_LAST_CHILD_EXITED :
fpm_pctl_action_last();
break;
}
}
复制代码
此阶段能够当作是三个升级信号的发送过程:
SIGQUIT
信号,worker进程收到后会进行优雅关闭,并设置一个超时时为process_control_timeout
的定时器事件,关于process_control_timeout
能够看我另一篇文章【PHP】配置文件中的超时时间解析,定时器超时后最终将调用fpm_pctl(FPM_PCTL_STATE_UNSPECIFIED, FPM_PCTL_ACTION_TIMEOUT);
,从action名称能够看出是要进行超时的操做。fpm_pctl()
源码可知,action FPM_PCTL_ACTION_TIMEOUT
仍然调用fpm_pctl_action_next()
,只不过此次SIGQUIT
信号会升级为SIGTERM
发送给worker进程,定时器超时时间变为1s。SIGTERM
会升级为终极信号SIGKILL
。SIGKILL
信号相比SIGTERM
是不可被捕获或者忽略的,它将强行终止worker进程。static void fpm_pctl_action_next() /* {{{ */ {
int sig, timeout;
if (!fpm_globals.running_children) {
fpm_pctl_action_last();
}
if (fpm_signal_sent == 0) {
if (fpm_state == FPM_PCTL_STATE_TERMINATING) {
sig = SIGTERM;
} else {
sig = SIGQUIT;
}
timeout = fpm_global_config.process_control_timeout;
} else {
if (fpm_signal_sent == SIGQUIT) {
sig = SIGTERM;
} else {
sig = SIGKILL;
}
timeout = 1;
}
// 实际调用kill()
fpm_pctl_kill_all(sig);
fpm_signal_sent = sig;
fpm_pctl_timeout_set(timeout);
}
复制代码
worker进程主要处理master发送过来的三个信号,即SIGQUIT
、SIGTERM
、SIGKILL
。
SIGQUIT
信号的回调事件是sig_soft_quit()
。它首先会关闭listening_socket
,而且将in_shutdown
置为1,这样accept()
系统调用将当即返回-1,worker进程再也不接收请求,开始结束进程的操做。static void sig_soft_quit(int signo) /* {{{ */ {
int saved_errno = errno;
/* closing fastcgi listening socket will force fcgi_accept() exit immediately */
close(fpm_globals.listening_socket);
if (0 > socket(AF_UNIX, SOCK_STREAM, 0)) {
zlog(ZLOG_WARNING, "failed to create a new socket");
}
// 设置in_shutdown=1
fpm_php_soft_quit();
errno = saved_errno;
}
int fcgi_accept_request(fcgi_request *req) {
while (1) {
if (req->fd < 0) {
while (1) {
if (in_shutdown) {
return -1;
}
......
req->fd = accept(listen_socket, (struct sockaddr *)&sa, &len);
......
}
} else {
fcgi_close(req, 1, 1);
}
}
}
复制代码
SIGTERM
信号采用SIG_DFL
默认处理方式,即终止进程,能够被阻塞、捕获、忽略。SIGKILL
信号不能被捕获或者忽略,将强行终止worker进程。worker进程的状态发生变化时,被终止或者暂停,内核会向master进程发送一个异步通知,即SIGCHLD
信号,由信号处理函数fpm_got_signal()
可知将执行fpm_children_bury()
。
下面将fpm_children_bury()
的代码拆解到对应部分下。
在这里先介绍下waitpid()
是干吗的:
当子进程结束的时候,内核会为终止子进程保存必定量的信息,这些信息至少包括进程ID、该进程的的终止状态、以及该进程使用的CPU时间总量。
一个已经终止、可是其父进程还没有对其进行善后处理(获取终止子进程的有关信息,释放它仍占用的资源)的进程会成为僵尸进程。僵尸进程的进程号会被一直占用着,可是系统所能使用的进程号是有限的,因此若是有大量的僵尸进程产生,将由于没有可用的进程号而致使系统不能产生新的进程。
wait()
或waitpid()
就可让父进程获取到这些信息,并被内核释放掉。
// 最外层循环
while ( (pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
......
}
复制代码
master进程经过waitpid()
获取到终止的worker进程的pid
和终止状态status
后,将对status
进行一些判断
WTERMSIG(status)
来获取时子进程终止的信号编号。request_slowlog_timeout
后,master进程的心跳检测模块会给worker进程发送SIGSTOP
信号,worker进程被暂停,状态发生变化,内核向master进程发送SIGCHLD
信号,以后就会执行到这里。最后将调用fpm_php_trace()
函数来打印致使请求slow的堆栈信息。if (WIFEXITED(status)) {
snprintf(buf, sizeof(buf), "with code %d", WEXITSTATUS(status));
/* if it's been killed because of dynamic process management * don't restart it automaticaly */
if (child && child->idle_kill) {
restart_child = 0;
}
// 调用fpm_php_trace()
if (WEXITSTATUS(status) != FPM_EXIT_OK) {
severity = ZLOG_WARNING;
}
} else if (WIFSIGNALED(status)) {
const char *signame = fpm_signal_names[WTERMSIG(status)];
const char *have_core = WCOREDUMP(status) ? " - core dumped" : "";
if (signame == NULL) {
signame = "";
}
snprintf(buf, sizeof(buf), "on signal %d (%s%s)", WTERMSIG(status), signame, have_core);
/* if it's been killed because of dynamic process management * don't restart it automaticaly */
if (child && child->idle_kill && WTERMSIG(status) == SIGQUIT) {
restart_child = 0;
}
if (WTERMSIG(status) != SIGQUIT) { /* possible request loss */
severity = ZLOG_WARNING;
}
} else if (WIFSTOPPED(status)) {
zlog(ZLOG_NOTICE, "child %d stopped for tracing", (int) pid);
if (child && child->tracer) {
child->tracer(child);
}
continue;
}
复制代码
child = fpm_child_find(pid);
if (child) {
struct fpm_worker_pool_s *wp = child->wp;
struct timeval tv1, tv2;
// 资源释放
fpm_child_unlink(child);
fpm_scoreboard_proc_free(wp->scoreboard, child->scoreboard_i);
fpm_clock_get(&tv1);
timersub(&tv1, &child->started, &tv2);
......
// 关闭标准输出、标准错误
fpm_child_close(child, 1 /* in event_loop */);
// 在后文中详解
fpm_pctl_child_exited();
......
} else {
zlog(ZLOG_ALERT, "oops, unknown child (%d) exited %s. Please open a bug report (https://bugs.php.net).", pid, buf);
}
复制代码
从fpm_pctl_child_exited()
源码可知,若是这是最后一个worker进程的终止,将调用fpm_pctl(FPM_PCTL_STATE_UNSPECIFIED, FPM_PCTL_ACTION_LAST_CHILD_EXITED);
。
int fpm_pctl_child_exited() /* {{{ */ {
if (fpm_state == FPM_PCTL_STATE_NORMAL) {
return 0;
}
if (!fpm_globals.running_children) {
fpm_pctl(FPM_PCTL_STATE_UNSPECIFIED, FPM_PCTL_ACTION_LAST_CHILD_EXITED);
}
return 0;
}
复制代码
继续追踪源码会发现,在重启操做中最后会调用fpm_pctl_exec()
。
execvp()
函数将从新执行php-fpm
程序,当前进程的正文、数据、堆和栈段都将被替换掉。
static void fpm_pctl_exec() /* {{{ */ {
fpm_cleanups_run(FPM_CLEANUP_PARENT_EXEC);
execvp(saved_argv[0], saved_argv);
// 正常状况不会走到这里
zlog(ZLOG_SYSERROR, "failed to reload: execvp() failed");
exit(FPM_EXIT_SOFTWARE);
}
复制代码
至此,PHP-FPM就完成了重启。
PHP打印了不少Debug日志,你们能够在php-fpm.conf中将log_level
选项设置为debug
来开启。下面是debug日志的例子,能够对照着理解下上文内容。
[16-Jul-2019 16:51:40.248439] DEBUG: pid 36507, fpm_got_signal(), line 110: received SIGUSR2
[16-Jul-2019 16:51:40.248711] NOTICE: pid 36507, fpm_got_signal(), line 111: Reloading in progress ...
[16-Jul-2019 16:51:40.248909] DEBUG: pid 36507, fpm_pctl(), line 229: switching to 'reloading' state
[16-Jul-2019 16:51:40.249112] DEBUG: pid 36507, fpm_pctl_kill_all(), line 157: [pool www] sending signal 3 SIGQUIT to child 36508
[16-Jul-2019 16:51:40.249360] DEBUG: pid 36507, fpm_pctl_kill_all(), line 166: 1 child(ren) still alive
[16-Jul-2019 16:51:40.249624] DEBUG: pid 36507, fpm_event_loop(), line 417: event module triggered 1 events
[16-Jul-2019 16:51:40.256626] DEBUG: pid 36507, fpm_got_signal(), line 74: received SIGCHLD
[16-Jul-2019 16:51:40.256968] DEBUG: pid 36507, fpm_children_bury(), line 259: [pool www] child 36508 exited with code 0 after 16.412179 seconds from start
[16-Jul-2019 16:51:40.257411] NOTICE: pid 36507, fpm_pctl_exec(), line 96: reloading: execvp("/usr/local/Cellar/php/7.3.7/sbin/php-fpm", {"/usr/local/Cellar/php/7.3.7/sbin/php-fpm", "--fpm-config=/usr/local/etc/php/7.3.7/php-fpm.conf", "--pid=/usr/local/Cellar/php/7.3.7/var/run/php-fpm.pid"})
[16-Jul-2019 16:51:40.319184] DEBUG: pid 36507, fpm_unix_init_main(), line 518: The calling process is waiting for the master process to ping via fd=4
[16-Jul-2019 16:51:40.321064] DEBUG: pid 36699, fpm_scoreboard_init_main(), line 38: got clock tick '100'
[16-Jul-2019 16:51:40.321588] NOTICE: pid 36699, fpm_sockets_init_main(), line 417: using inherited socket fd=7, "127.0.0.1:9001"
[16-Jul-2019 16:51:40.321588] NOTICE: pid 36699, fpm_sockets_init_main(), line 417: using inherited socket fd=7, "127.0.0.1:9001"
[16-Jul-2019 16:51:40.321782] DEBUG: pid 36699, fpm_socket_af_inet_socket_by_addr(), line 290: Found address for 127.0.0.1, socket opened on 127.0.0.1
[16-Jul-2019 16:51:40.321969] DEBUG: pid 36699, fpm_event_init_main(), line 335: event module is kqueue and 1 fds have been reserved
[16-Jul-2019 16:51:40.322374] NOTICE: pid 36699, fpm_init(), line 83: fpm is running, pid 36699
[16-Jul-2019 16:51:40.322505] DEBUG: pid 36699, main(), line 1858: Sending "1" (OK) to parent via fd=5
[16-Jul-2019 16:51:40.322648] DEBUG: pid 36507, fpm_unix_init_main(), line 537: I received a valid acknowledge from the master process, I can exit without error
[16-Jul-2019 16:51:40.322977] DEBUG: pid 36699, fpm_children_make(), line 428: [pool www] child 36702 started
[16-Jul-2019 16:51:40.323302] DEBUG: pid 36699, fpm_event_loop(), line 364: 1296 bytes have been reserved in SHM
[16-Jul-2019 16:51:40.323498] NOTICE: pid 36699, fpm_event_loop(), line 365: ready to handle connections
复制代码