准备3台机器,包含一台管理节点,两台工做节点的最小的swarm集群html
172.16.0.200 Ubuntu14.04node
172.16.0.201 Ubuntu14.04nginx
172.16.0.202 Ubuntu14.04web
生成环境最好2n+1(n>=1)个manager节点,但也不是越多越好,官方建议是7个manager节点docker
在测试或开发环境中 Docker 官方为了简化安装流程,提供了一套便捷的安装脚本,Ubuntu 系统上可使用这套脚本安装:npm
$ curl -fsSL get.docker.com -o get-docker.sh $ sudo sh get-docker.sh --mirror Aliyun
执行这个命令后,会下载脚本到get-docker.sh,并使用阿里云的镜像下载,而后脚本会把 Docker CE 的 Edge 版本安装在系统中。且是默认启动。ubuntu
或api
直接只有Daocloud的安装脚本安装浏览器
curl -sSL https://get.daocloud.io/docker | sh
service docker start
默认状况下,docker 命令会使用 Unix socket 与 Docker 引擎通信。而只有 root 用户和 docker 组的用户才能够访问 Docker 引擎的 Unix socket。出于安全考虑,通常 Linux 系统上不会直接使用 root 用户。所以,更好地作法是将须要使用 docker 的用户加入 docker 用户组。安全
创建docker组
groupadd docker
将当前用户加入docker组
usermod -aG docker $USER
退出当前终端并从新登陆,进行以下测试。
$ docker run hello-world Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world ca4f61b1923c: Pull complete Digest: sha256:be0cd392e45be79ffeffa6b05338b98ebb16c87b255f48e297ec7f98e123905c Status: Downloaded newer image for hello-world:latest Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker ID: https://cloud.docker.com/ For more examples and ideas, visit: https://docs.docker.com/engine/userguide/
若能输出以上信息,则是正确安装
3台机器见最上面,都安装docker环境。如今以172.16.0.200为manager,其他2台为worker
咱们使用 docker swarm init
在本机初始化一个 Swarm
集群。
root@ubuntu:~# docker swarm init --advertise-addr 172.16.0.200 Swarm initialized: current node (u66elsqnr7cx3ufefopuvchbm) is now a manager. To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
若是你的 Docker 主机有多个网卡,拥有多个 IP,必须使用 --advertise-addr
指定 IP。
执行 docker swarm init
命令的节点自动成为管理节点。
登陆201
root@api:~# ssh 172.16.0.201 root@172.16.0.201's password: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64) Last login: Fri Dec 29 12:05:09 2017 from 172.16.0.200
加入到集群
依照上面初始化集群成功后的提示执行加入命令便可
root@ubuntu:~# docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377 This node joined a swarm as a worker.
登陆202
root@api:~# ssh 172.16.0.202 root@172.16.0.202's password: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64) Last login: Fri Dec 29 12:05:09 2017 from 172.16.0.200
加入到集群
root@ubuntu:~# docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377 This node joined a swarm as a worker.
通过上边的两步,咱们已经拥有了一个最小的 Swarm
集群,包含一个管理节点和两个工做节点。
在管理节点使用 docker node ls
查看集群。
root@ubuntu:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c88xqfzcbhhlg6c07oochr2g7 ubuntu Ready Active rq5oh6hfdo32t9xbl7z7sgikm ubuntu Ready Active u66elsqnr7cx3ufefopuvchbm * ubuntu Ready Active Leader
退出集群
root@ubuntu:~# docker swarm leave Node left the swarm.
两台都退出后,manager上查看
root@ubuntu:~# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c88xqfzcbhhlg6c07oochr2g7 ubuntu Down Active rq5oh6hfdo32t9xbl7z7sgikm ubuntu Down Active u66elsqnr7cx3ufefopuvchbm * ubuntu Ready Active Leader
可见STATUS=Down了
上面可见有个token,那么我忘记后面怎么再加入新的worker节点或manager节点呢? 经过 docker swarm join-token worker查看,见下
root@ubuntu:~# docker swarm join-token worker To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377
加入manager也同样,换下参数
root@ubuntu:~# docker swarm join-token manager To add a manager to this swarm, run the following command: docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-469wlqjh1jjuupv8ha29l2rzm 172.16.0.200:2377
也可更新 token
$ docker swarm join-token --rotate worker Succesfully rotated worker join token. To add a worker to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-3pu6hszjas19xyp7ghgosyx9k8atbfcr8p2is99znpy26u2lkl-b30ljddcqhef9b9v4rs7mel7t \ 172.16.0.200:2377
使用–rotate更新token以后,只能用新的token来加入集群。
-q或–quiet参数只打印token:
root@ubuntu:~# docker swarm join-token -q worker SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii
咱们使用 docker service
命令来管理 Swarm
集群中的服务,该命令只能在管理节点运行。
管理节点执行
root@ubuntu:~# docker service create --name nginx --replicas 3 -p 80:80 nginx tqd95pxsro7o0rs33ylz9zimj overall progress: 3 out of 3 tasks 1/3: running [==================================================>] 2/3: running [==================================================>] 3/3: running [==================================================>] verify: Service converged
如今咱们使用浏览器,输入任意节点 IP ,便可看到 nginx 默认页面,如curl http://172.16.0.200
root@ubuntu:~# curl http://172.16.0.172 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
到各台机器查看,上面都已经起了docker 服务
root@worker2:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 08a92a22061b nginx:latest "nginx -g 'daemon of…" 33 seconds ago Up 33 seconds 80/tcp nginx.1.tfhtoyshmzun54x17l3l3sop5
使用 docker service ls
来查看当前 Swarm
集群运行的服务。
root@ubuntu:~# docker service ls ID NAME MODE REPLICAS IMAGE PORTS tqd95pxsro7o nginx replicated 3/3 nginx:latest *:80->80/tcp
使用 docker service ps xxx
来查看某个服务的详情,分布在哪一个node等
root@manager1:~# docker service ps nginx ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS tfhtoyshmzun nginx.1 nginx:latest worker2 Running Running 18 minutes ago p6b1gfragl8y nginx.2 nginx:latest manager1 Running Running 16 minutes ago 250owqr51y50 nginx.3 nginx:latest worker1 Running Running 20 minutes ago
使用 docker service logs xxx
来查看某个服务的log,前面我访问curl http://172.16.0.200 3次,log以下
root@manager1:~# docker service logs nginx nginx.2.p6b1gfragl8y@manager1 | 10.255.0.2 - - [02/Jan/2018:09:58:31 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-" nginx.1.tfhtoyshmzun@worker2 | 10.255.0.2 - - [02/Jan/2018:10:01:48 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-" nginx.3.250owqr51y50@worker1 | 10.255.0.2 - - [02/Jan/2018:10:05:12 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
上面nginx是service名称,也可访问具体服务中的某一个服务的log,logs后面按tab键,会弹出名称. 我又访问了3次,可见3台上分别有2次访问,可见其实现了负载均衡
root@manager1:~# docker service logs 250owqr51y50 nginx.3.250owqr51y50@worker1 | 10.255.0.2 - - [02/Jan/2018:10:05:12 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-" nginx.3.250owqr51y50@worker1 | 10.255.0.2 - - [02/Jan/2018:10:06:43 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-" root@manager1:~# docker service logs p6b1gfragl8y nginx.2.p6b1gfragl8y@manager1 | 10.255.0.2 - - [02/Jan/2018:09:58:31 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-" nginx.2.p6b1gfragl8y@manager1 | 10.255.0.2 - - [02/Jan/2018:10:06:36 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-" root@manager1:~# docker service logs tfhtoyshmzun nginx.1.tfhtoyshmzun@worker2 | 10.255.0.2 - - [02/Jan/2018:10:01:48 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-" nginx.1.tfhtoyshmzun@worker2 | 10.255.0.2 - - [02/Jan/2018:10:06:42 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
使用 docker service rm xxx
来从swarm集群移除某个服务。
root@manager1:~# docker service rm nginx nginx
查看,已经无nginx的服务
root@manager1:~# docker service ls ID NAME MODE REPLICAS IMAGE PORTS
到worker节点上查看,也已经删除了docker 服务
root@worker1:~# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
在swarm集群中也可使用compose文件(docker-compose.yml)来配置,启动多个服务,咱们以部署WordPress为例进行说明
manager1节点上的 docker-compose.yml
version: "3" services: web: image: nginx deploy: replicas: 3 restart_policy: condition: on-failure resources: limits: cpus: "0.1" memory: 50M ports: - "80:80" networks: - webnet visualizer: image: dockersamples/visualizer:stable ports: - "8080:8080" volumes: - "/var/run/docker.sock:/var/run/docker.sock" deploy: placement: constraints: [node.role == manager] networks: - webnet networks: webnet:
这里:
一、起了2个services:(web 和 visualizer)
二、web是3个nginx组成的。 visualizer是一个开源项目,可用一个图来看到整个swarm上运行的容器,这里指定了只能运行在manager节点上
三、起了一个网络webnet,类型为overlay,见最下面一行。启动的容器都使用此网络互联
root@manager1:~# docker network ls NETWORK ID NAME DRIVER SCOPE 965de71420b3 bridge bridge local f66cd75a8741 docker_gwbridge bridge local 136fb30aa99c host host local 73ha87ntxcd5 ingress overlay swarm eebca5dc00a0 none null local pjxvmr7b1wcq proj_webnet overlay swarm
deploy 部署
-c 指定配置文件
proj 名称随便起
root@manager1:~# docker stack deploy -c docker-compose.yml proj Creating network proj_webnet Creating service proj_web Creating service proj_visualizer
部署完成之后,访问http://任意节点:8080,即会看到监控界面
列出全部stack
root@manager1:~# docker stack ls NAME SERVICES proj 2
一个stack,2个services(web和visualizer)
列出全部服务services
root@manager1:~# docker stack services proj ID NAME MODE REPLICAS IMAGE PORTS 44u5poqrieoy proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp s2hlawvwwal4 proj_web replicated 3/3 nginx:latest *:80->80/tcp
列出stack中任务状况,分布状况
root@manager1:~# docker stack ps proj ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 20 minutes ago 4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Running Running 21 minutes ago skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 21 minutes ago uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 21 minutes ago
好比由3份变为5份 docker service scale proj_web=5
root@manager1:~# docker service scale proj_web=5 proj_web scaled to 5 overall progress: 5 out of 5 tasks 1/5: running [==================================================>] 2/5: running [==================================================>] 3/5: running [==================================================>] 4/5: running [==================================================>] 5/5: running [==================================================>] verify: Service converged
查看服务状况
root@manager1:~# docker service ls ID NAME MODE REPLICAS IMAGE PORTS lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp uxo0piooz4it proj_web replicated 5/5 nginx:latest *:80->80/tcp root@manager1:~# docker service ps proj_web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 6jqcnikrh39j proj_web.1 nginx:latest worker1 Running Running 4 minutes ago o7gq6w3wisbk proj_web.2 nginx:latest worker2 Running Running 4 minutes ago 15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 4 minutes ago zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 2 minutes ago 7mvta3g7m8fm proj_web.5 nginx:latest worker2 Running Running 2 minutes ago
减配置直接设置较少的数量便可,好比再设置回3个副本
root@manager1:~# docker service scale proj_web=3 proj_web scaled to 3 overall progress: 3 out of 3 tasks 1/3: running [==================================================>] 2/3: running [==================================================>] 3/3: running [==================================================>] verify: Service converged root@manager1:~# docker service ls ID NAME MODE REPLICAS IMAGE PORTS lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp uxo0piooz4it proj_web replicated 3/3 nginx:latest *:80->80/tcp root@manager1:~# docker service ps proj_web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 6 minutes ago zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 4 minutes ago 7mvta3g7m8fm proj_web.5 nginx:latest worker2 Running Running 4 minutes ago
docker stack rm xxx 移除服务
root@manager1:~# docker stack rm proj Removing service proj_visualizer Removing service proj_web Removing network proj_webnet
root@manager1:~# docker stack ls NAME SERVICES root@manager1:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
webnet的网络也删除了
root@worker2:~# docker network ls NETWORK ID NAME DRIVER SCOPE feff2a6d1503 bridge bridge local 2f0e401d10a6 docker_gwbridge bridge local 7a60d8ee6f8b host host local 73ha87ntxcd5 ingress overlay swarm 98442c4c5766 none null local
-----------------------
好比我在202上 stop 容器
root@worker2:~# docker stop 505 505
root@manager1:~# docker stack ps proj ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 31 minutes ago 4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 3 seconds ago skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 32 minutes ago uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 32 minutes ago
可看到worker2 Shutdown
本觉得过会会新起个容器,可没有
root@manager1:~# docker stack ps proj ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 37 minutes ago 4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 6 minutes ago skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 38 minutes ago uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 38 minutes ago
本觉得再从新启动202上的容器会恢复,可仍是没有
root@worker2:~# docker start 505 505
root@manager1:~# docker stack ps proj ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 40 minutes ago 4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 8 minutes ago skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 41 minutes ago uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 41 minutes ago
那么删除容器试试呢,仍是没有--!
root@worker2:~# docker rm -f 505 505 root@worker2:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@manager1:~# docker stack ps proj ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 1 hours ago 4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 1 hours ago skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 1 hours ago uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 1 hours ago
虽然服务挂了,可是访问没问题 curl http://172.16.0.202,见下
root@api:~# curl http://172.16.0.174 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
几经测试,发现kill掉的容器会立刻发现并重启,见下:
kill 掉202上的容器
root@worker2:~# docker kill 2ff47 2ff47 root@worker2:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
发现副本又3个变为2个
root@manager1:~# docker service ls ID NAME MODE REPLICAS IMAGE PORTS lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp uxo0piooz4it proj_web replicated 2/3 nginx:latest *:80->80/tcp
但几秒钟后就新起了一个容器,3个副本就恢复了。 ps看的话仍是能看到Shutdown的那个容器
root@manager1:~# docker service ls ID NAME MODE REPLICAS IMAGE PORTS lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp uxo0piooz4it proj_web replicated 3/3 nginx:latest *:80->80/tcp root@manager1:~# docker service ps proj_web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 9 minutes ago zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 6 minutes ago gu528xwks2h8 proj_web.5 nginx:latest worker2 Running Running 22 seconds ago 7mvta3g7m8fm \_ proj_web.5 nginx:latest worker2 Shutdown Failed 28 seconds ago "task: non-zero exit (137)"
因而可知,docker stop停掉的容器可能认为是人工的方式,人为之,docker swarm集群就再也不新起,这里多是官方的bug
或
docker service update proj_web 也会使上面docker stop方式停掉的容器重启
root@manager1:~# docker service update proj_web proj_web overall progress: 3 out of 3 tasks 1/3: running [==================================================>] 2/3: 3/3: running [==================================================>] verify: Service converged
而且删掉了原各类缘由退出的容器,如下可见原 “7mvta3g7m8fm \_ proj_web.5 nginx:latest worker2 Shutdown Failed 5 minutes ago "task: non-zero exit (137)"”的容器没了
root@manager1:~# docker service ps proj_web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 11 seconds ago 15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 16 minutes ago zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 13 minutes ago
去202上看退出的容器也是没了
root@worker2:~# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 69346c5e694e nginx:latest "nginx -g 'daemon of…" 2 minutes ago Up 2 minutes 80/tcp proj_web.1.r5ievoac5x3obvgelod00o9vw
又发现个问题,当尝试用attach进入容器时,会一直hang住,手动断开后,容器挂了
root@manager1:~# docker attach proj_web.3.15hzrtciynx72plzhjula2cd5 ^C root@manager1:~# root@manager1:~# docker service ls ID NAME MODE REPLICAS IMAGE PORTS lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp uxo0piooz4it proj_web replicated 2/3 nginx:latest *:80->80/tcp root@manager1:~# docker stack ps proj ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 18 hours ago ige19sbe6zma proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 19 hours ago 15hzrtciynx7 proj_web.3 nginx:latest manager1 Shutdown Complete 18 seconds ago zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 19 hours ago
发现proj_web.3已经Shutdown了,并且集群感知不到,不会重启,就像前面的手动stop同样
当docker service update时也会hang住,且没有新起挂掉的容器
root@manager1:~# docker service update proj_web proj_web overall progress: 2 out of 3 tasks 1/3: 2/3: running [==================================================>] 3/3: running [==================================================>] ^C Operation continuing in background. Use `docker service ps proj_web` to check progress. root@manager1:~# docker service ps proj_web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 19 hours ago 15hzrtciynx7 proj_web.3 nginx:latest manager1 Shutdown Complete 12 minutes ago zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 19 hours ago
当docker service scale proj_web=3时也会hang住,只是新加的容器启动了,那个死去的容器就是连不通,多是服务内部网络的问题
root@manager1:~# docker service scale proj_web=4 proj_web scaled to 4 overall progress: 3 out of 4 tasks 1/4: running [==================================================>] 2/4: 3/4: running [==================================================>] 4/4: running [==================================================>]
考虑是容器网络的问题,那么把有问题的容器删掉呢
root@manager1:~# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ecd4a8bebc9c nginx:latest "nginx -g 'daemon of…" About a minute ago Up About a minute 80/tcp proj_web.2.xwmvxfs23ijab0wessptod8d7 d58eb620e9aa nginx:latest "nginx -g 'daemon of…" 19 hours ago Exited (0) 21 minutes ago proj_web.3.15hzrtciynx72plzhjula2cd5 1879b088ca37 dockersamples/visualizer:stable "npm start" 19 hours ago Up 19 hours 8080/tcp proj_visualizer.1.ige19sbe6zma322oplynf7csp 5bc982461a41 dockersamples/visualizer:stable "npm start" 27 hours ago Exited (0) 19 hours ago proj_visualizer.1.quexa31y9kkt1m4a4pqcwskzt root@manager1:~# docker rm d58 d58 root@manager1:~# docker service update proj_web proj_web overall progress: 3 out of 4 tasks 1/4: 2/4: running [==================================================>] 3/4: running [==================================================>] 4/4: running [==================================================>] ^C Operation continuing in background. Use `docker service ps proj_web` to check progress. ##### 可见删掉也仍是不行的,集群服务里仍是有4个副本,只是一个一直不通 root@manager1:~# docker rm 5bc 5bc root@manager1:~# docker service update proj_web proj_web overall progress: 3 out of 4 tasks 1/4: 2/4: running [==================================================>] 3/4: running [==================================================>] 4/4: running [==================================================>] ^C Operation continuing in background. Use `docker service ps proj_web` to check progress. root@manager1:~# docker service scale proj_web=3 proj_web scaled to 3 overall progress: 3 out of 3 tasks 1/3: running [==================================================>] 2/3: running [==================================================>] 3/3: running [==================================================>] verify: Service converged ##### 当重置为3个副本时,就ok了。也就说明了是坏掉容器的网络问题 root@manager1:~# docker service ps proj_web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 19 hours ago xwmvxfs23ija proj_web.2 nginx:latest manager1 Running Running 4 minutes ago zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 19 hours ago root@manager1:~#