经过前面几篇文章的理论和实践,你们都知道,Docker Swarm会自动判断服务中容器的健康状态,从而决定是否删除重建,以保证设定的副本数replicas
。但它是怎么判断的呢?python
容器都有一个STATUS
表明它的运行状态created, restarting, running, removing, paused, exited, dead
,最主要的,是容器运行的状态码STOPSIGNAL
,只要是Exited(STOPSIGNAL!=0)
,那就表明异常退出。web
经过docker kill http.2.a6i8uov6efb4e0wjioha02o9y
模拟服务中的一个副本异常退出运行,docker ps -a
查看:docker
... CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aad2191ae7d5 nandy/show-host-info:v2 "/app" 18 seconds ago Exited (2) 4 seconds ago http.2.a6i8uov6efb4e0wjioha02o9y 5e0b2da6b4e5 nandy/show-host-info:v2 "/app" 18 seconds ago Up 17 seconds 80/tcp http.1.p35na0wooa509aqd53qzr1n50 ...
Exited (2)
表明容器非正常中止。服务会在第一时间捕获到这个STOPSIGNAL
并当即重建一个新的容器:服务器
... CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c08cdedcf461 nandy/show-host-info:v2 "/app" 9 seconds ago Up 4 seconds 80/tcp http.2.ak0lef9jebrqz2racsf2vq8fl aad2191ae7d5 nandy/show-host-info:v2 "/app" 24 seconds ago Exited (2) 10 seconds ago http.2.a6i8uov6efb4e0wjioha02o9y 5e0b2da6b4e5 nandy/show-host-info:v2 "/app" 24 seconds ago Up 23 seconds 80/tcp http.1.p35na0wooa509aqd53qzr1n50 ...
可是,咱们结合实际的服务器运维经验思考一下,仅靠容器自己的异常退出与否来判断,是否是能够肯定服务健康(正常响应请求)?若是是服务假死(CPU异常、内存异常、触发代码BUG…)而容器并未异常退出呢?若是服务有一个接口,经过按期请求这个接口并返回指望值来判断呢?app
这篇文章的主角即是healthcheck
。它有5个子选项:运维
--health-cmd=命令,用于检查接口的命令。 --health-interval=时间间隔 (默认: 30s),它是每次执行healthcheck的时间间隔。 --health-timeout=时间间隔 (默认: 30s),若是在超时时间以内没有响应,则表明异常。 --health-retries=N (默认: 3),连续达到多少次异常以后退出。 --health-start-period=时间间隔 (默认:0),容器启动以后多久进行健康检查(服务启动预热),即运行health-cmd。
建立服务,在以前的基础上加入容器的健康检查,注意curl -f http://localhost:80
检查命令,镜像中必须先安装curl
:curl
docker service create --network httpnet --name http --replicas 2 -p 81:80 \ --health-cmd "curl -f http://localhost:80 || exit 1" --health-interval 5s --health-timeout 3s --health-retries 3 --health-start-period 30s \ nandy/show-host-info:v2
运行docker ps
查看:tcp
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES add8115df6f1 nandy/show-host-info:v2 "python run.py" About a minute ago Up About a minute (healthy) 80/tcp http.1.q1jjcnjcrfcxsafzdvtvhi7ho 66e77ee79c0a nandy/show-host-info:v2 "python run.py" About a minute ago Up About a minute (healthy) 80/tcp http.2.m0oaoqjia0d6m3rzmsh5sr7n6
此时,STATUS
的显示比以往多了(healthy)
状态,同时,为了验证curl -f http://localhost:80
是否每隔5s运行一次,且运行是否正常,咱们经过docker logs --tail 5 http.1.q1jjcnjcrfcxsafzdvtvhi7ho
查看一下容器的日志:svg
[2018-10-16 17:54:42 +0800] - (sanic.access)[INFO][127.0.0.1:50220]: GET http://localhost/ 200 41 [2018-10-16 17:54:47 +0800] - (sanic.access)[INFO][127.0.0.1:50224]: GET http://localhost/ 200 41 [2018-10-16 17:54:52 +0800] - (sanic.access)[INFO][127.0.0.1:50228]: GET http://localhost/ 200 41 [2018-10-16 17:54:58 +0800] - (sanic.access)[INFO][127.0.0.1:50232]: GET http://localhost/ 200 41 [2018-10-16 17:55:03 +0800] - (sanic.access)[INFO][127.0.0.1:50236]: GET http://localhost/ 200 41