详谈kubernetes更新-2

时间 2019-11-11

标签详谈 kubernetes 更新繁體版

原文原文链接

系列目录html

本文详细探索deployment在滚动更新时候的行为node

要详细探讨的参数描述:nginx

livenessProbe：存活性探测。判断pod是否已经中止
readinessProbe：就绪性探测。判断pod是否可以提供正常服务
maxSurge：在滚动更新过程当中最多能够存在的pod数
maxUnavailable：在滚动更新过程当中最多不可用的pod数

准备镜像和yaml文件

首先准备2个不一样版本的镜像，用于测试（已经在阿里云上建立好2个不一样版本的nginx镜像）docker

docker pull registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1
docker pull registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1

2个镜像都提供相同的服务，只不过nginx:delay_v1会延迟启动20才启动nginx后端

root@k8s-master:~# docker run -d --rm -p 10080:80 nginx:v1
e88097841c5feef92e4285a2448b943934ade5d86412946bc8d86e262f80a050
root@k8s-master:~# curl http://127.0.0.1:10080
----------
version: v1
hostname: f5189a5d3ad3

yaml文件：api

root@k8s-master:~# more roll_update.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: update-deployment
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: roll-update
    spec:
      containers:
      - name: nginx
        image: registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1
        imagePullPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
    selector:
      app: roll-update
    ports:
    - protocol: TCP
      port: 10080
      targetPort: 80

livenessProbe与readinessProbe

livenessProbe：存活性探测，最主要是用来探测pod是否须要重启bash
readinessProbe：就绪性探测，用来探测pod是否已经可以提供服务app
- 在滚动更新的过程当中，pod会动态的被delete，而后又被create出来。存活性探测保证了始终有足够的pod存活提供服务，一旦出现pod数量不足，k8s会当即拉起新的podcurl
- 可是在pod启动的过程当中，服务正在打开，并不可用，这时候若是有流量打过来，就会形成报错tcp

下面来模拟一下这个场景：

首先apply上述的配置文件

root@k8s-master:~# kubectl apply -f roll_update.yaml
deployment.extensions "update-deployment" created
service "nginx-service" created
root@k8s-master:~# kubectl get pod -owide
NAME                                 READY     STATUS    RESTARTS   AGE       IP              NODE
update-deployment-7db77f7cc6-c4s2v   1/1       Running   0          28s       10.10.235.232   k8s-master
update-deployment-7db77f7cc6-nfgtd   1/1       Running   0          28s       10.10.36.82     k8s-node1
update-deployment-7db77f7cc6-tflfl   1/1       Running   0          28s       10.10.169.158   k8s-node2
root@k8s-master:~# kubectl get svc
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
nginx-service   ClusterIP   10.254.254.199   <none>        10080/TCP   1m

从新打开终端，测试当前服务的可用性（每秒作一次循环去获取nginx的服务内容）：

root@k8s-master:~# while :; do curl http://10.254.254.199:10080; sleep 1; done
----------
version: v1
hostname: update-deployment-7db77f7cc6-nfgtd
----------
version: v1
hostname: update-deployment-7db77f7cc6-c4s2v
----------
version: v1
hostname: update-deployment-7db77f7cc6-tflfl
----------
version: v1
hostname: update-deployment-7db77f7cc6-nfgtd
...

这时候把镜像版本更新到nginx:delay_v1，这个镜像会延迟启动nginx，也就是说，会先sleep 20s，而后才去启动nginx服务。这就模拟了在服务启动过程当中，虽然pod已是存在的状态，可是并无真正提供服务

root@k8s-master:~# kubectl patch deployment update-deployment --patch '{"metadata":{"annotations":{"kubernetes.io/change-cause":"update version to v2"}} ,"spec": {"template": {"spec": {"containers": [{"name": "nginx","image":"registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1"}]}}}}'
deployment.extensions "update-deployment" patched

...
----------
version: v1
hostname: update-deployment-7db77f7cc6-h6hvt
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-6th87
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-n22vz
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-njmpz
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-6th87

能够看到，因为延迟启动，nginx并无真正作好准备提供服务，此时流量已经发到后端，致使服务不可用的状态

因此，加入readinessProbe是很是必要的举措：

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: update-deployment
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: roll-update
    spec:
      containers:
      - name: nginx
        image: registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1
        imagePullPolicy: Always
        readinessProbe:
          tcpSocket:
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
    selector:
      app: roll-update
    ports:
    - protocol: TCP
      port: 10080
      targetPort: 80

重复上述步骤，先建立nginx:v1，而后patch到nginx:delay_v1

root@k8s-master:~# kubectl apply -f roll_update.yaml
deployment.extensions "update-deployment" created
service "nginx-service" created
root@k8s-master:~# kubectl patch deployment update-deployment --patch '{"metadata":{"annotations":{"kubernetes.io/change-cause":"update version to v2"}} ,"spec": {"template": {"spec": {"containers": [{"name": "nginx","image":"registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1"}]}}}}'
deployment.extensions "update-deployment" patched

root@k8s-master:~# kubectl get pod -owide
NAME                                 READY     STATUS        RESTARTS   AGE       IP              NODE
busybox                              1/1       Running       0          45d       10.10.235.255   k8s-master
lifecycle-demo                       1/1       Running       0          32d       10.10.169.186   k8s-node2
private-reg                          1/1       Running       0          92d       10.10.235.209   k8s-master
update-deployment-54d497b7dc-4mlqc   0/1       Running       0          13s       10.10.169.178   k8s-node2
update-deployment-54d497b7dc-pk4tb   0/1       Running       0          13s       10.10.36.98     k8s-node1
update-deployment-6d5d7c9947-l7dkb   1/1       Terminating   0          1m        10.10.169.177   k8s-node2
update-deployment-6d5d7c9947-pbzmf   1/1       Running       0          1m        10.10.36.97     k8s-node1
update-deployment-6d5d7c9947-zwt4z   1/1       Running       0          1m        10.10.235.246   k8s-master

因为设置了readinessProbe，虽然pod已经启动起来了，可是并不会当即投入使用，因此出现了 READY: 0/1 的状况
而且有pod出现了一直持续Terminating状态，由于滚动更新的限制，至少要保证有pod可用

再查看curl的状态，image的版本平滑更新到了nginx:delay_v1，没有出现报错的情况

root@k8s-master:~# while :; do curl http://10.254.66.136:10080; sleep 1; done
...
version: v1
hostname: update-deployment-6d5d7c9947-pbzmf
----------
version: v1
hostname: update-deployment-6d5d7c9947-zwt4z
----------
version: v1
hostname: update-deployment-6d5d7c9947-pbzmf
----------
version: v1
hostname: update-deployment-6d5d7c9947-zwt4z
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-pk4tb
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-4mlqc
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-pk4tb
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-4mlqc
...

maxSurge与maxUnavailable

在滚动更新中，有几种更新方案：先删除老的pod，而后添加新的pod；先添加新的pod，而后删除老的pod。在这个过程当中，服务必须是可用的（也就是livenessProbe与readiness必须检测经过）
在具体的实施中，由maxSurge与maxUnavailable来控制到底是先删老的仍是先加新的以及粒度
若指定的副本数为3：
maxSurge=1 maxUnavailable=0：最多容许存在4个（3+1）pod，必须有3个pod（3-0）同时提供服务。先建立一个新的pod，可用以后删除老的pod，直至所有更新完毕
maxSurge=0 maxUnavailable=1：最多容许存在3个（3+0）pod，必须有2个pod（3-1）同时提供服务。先删除一个老的pod，而后建立新的pod，直至所有更新完毕
归根结底，必须知足maxSurge与maxUnavailable的条件，若是maxSurge与maxUnavailable同时为0，那就无法更新了，由于又不让删除，也不让添加，这种条件是没法知足的