在《Kubernetes探秘-多master节点容错部署》中介绍了经过部署多个主节点来提升Kubernetes的容错能力。其中,最为关键是存储集群控制数据的etcd服务必须在多个副节点间实时同步,而kube-apiserver经过keepalived进行IP主地址的切换。 在《Kubernetes探秘-etcd节点和实例扩容》中已经详细介绍了etcd多节点扩容的步骤,但在实际操做中发现,必须要遵循必定的操做顺序、注意细节,不然会致使扩容失败,并且很容易形成整个etcd集群没法访问。这里将etcd集群扩容的实际操做经验整理出来分享。git
所有使用https进行etcd集群的链接,须要生成和配置证书,能够参考《Kubernetes探秘-etcd节点和实例扩容》里的方法,这些证书文件须要复制到每个节点的/etc/kubernetes/pki目录下。须要所有使用固定IP地址,在Ubuntu 18.06中使用 Netplan进行配置(参考《Ubuntu 18.04设置静态IP》),使用 sudo netplan apply让其当即生效(须要稍等会儿配置完毕)。github
上传目录,示例:docker
sudo scp -r root@10.1.1.201:/etc/kubernetes/pki /etc/kubernetes
咱们首先安装一个Kubernetes主节点,其它节点配置参数将从其衍生出来。参考:数据库
准备工做完成后,使用下面的命令安装Kubernetes的单实例Master节点。api
sudo kubeadm init --kubernetes-version=v1.13.1 --apiserver-advertise-address=10.1.1.199
由于个人机器上有多块网卡,使用 --apiserver-advertise-address=10.1.1.199 指定apiserver的服务地址,这个地址也是keepalived的虚拟IP地址(须要提早安装,参考《Keepalived快速使用》),将会在遇故障时在多个节点间自动漂移该主地址,使其它节点能够访问到。app
输入 kubectl get pod --all-namespaces检查该单实例集群是否运行正常。工具
主节点已经安装了一个etcd的实例,而且存放了集群运行的最基础参数。为了防止etcd集群扩容过程数据丢失,咱们将其备份。具体操做参见《Kubernetes的etcd数据查看和迁移》。须要注意的是,etcd api2和api3的备份和恢复方法不一样,由于从Kubernetes 1.13.0开始已经使用API3,下面介绍的都是API3的方法。url
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.202]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ snapshot save /home/supermap/openthings/etcd$(date +%Y%m%d_%H%M%S)_snapshot.db
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.199]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --data-dir=/var/lib/etcd \ --initial-advertise-peer-urls=https://10.1.1.199:2380 \ --initial-cluster=podc01=https://10.1.1.199:2380 \ --initial-cluster-token=etcd-cluster \ --name=podc01 \ snapshot restore /home/supermap/etcd_snapshot.db
上面的备份文件名能够本身起,恢复时能对上就行。spa
包括:.net
首先,检查etcd集群的运行状态:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.199]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ endpoint status -w table
而后,更新etcd实例的peer-urls:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --endpoints=https://[10.1.1.199]:2379 \ member update podc01 --peer-urls=https://10.1.1.201:2380
第三,修改etcd实例的client-urls。
sudo systemctl stop kubelet
apiVersion: v1 kind: Pod metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" creationTimestamp: null labels: component: etcd tier: control-plane name: etcd namespace: kube-system spec: containers: - command: - etcd - --advertise-client-urls=https://10.1.1.201:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.pem - --client-cert-auth=true - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://10.1.1.201:2380 - --initial-cluster=podc01=https://10.1.1.201:2380 - --key-file=/etc/kubernetes/pki/etcd/server-key.pem - --listen-client-urls=https://127.0.0.1:2379,https://10.1.1.201:2379 - --listen-peer-urls=https://10.1.1.201:2380 - --name=podc01 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer1.pem - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer1-key.pem - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem image: k8s.gcr.io/etcd:3.2.24 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -ec - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem get foo failureThreshold: 8 initialDelaySeconds: 15 timeoutSeconds: 15 name: etcd resources: {} volumeMounts: - mountPath: /var/lib/etcd name: etcd-data - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs hostNetwork: true priorityClassName: system-cluster-critical volumes: - hostPath: path: /etc/kubernetes/pki/etcd-certs type: DirectoryOrCreate name: etcd-certs - hostPath: path: /var/lib/etcd type: DirectoryOrCreate name: etcd-data status: {}
sudo systemctl start kubelet
检查一下etcd的服务状态:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem --cert=/etc/kubernetes/pki/etcd-certs/client.pem --key=/etc/kubernetes/pki/etcd-certs/client-key.pem endpoint status -w table
修改/etc/kubernetes/manifests/api-server.yaml文件,以下:
# - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt # - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt # - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key # - --etcd-servers=https://127.0.0.1:2379 - --etcd-cafile=/etc/kubernetes/pki/etcd-certs/ca.pem - --etcd-certfile=/etc/kubernetes/pki/etcd-certs/client.pem - --etcd-keyfile=/etc/kubernetes/pki/etcd-certs/client-key.pem - --etcd-servers=https://10.1.1.201:2379
将上面的新的etcd服务地址配置给kube-apiserver。
重启 kubelet,以下:
#从新启动kubelet服务。 sudo systemctl restart kubelet #查看运行的容器实例。 docker ps #查看全部运行的容器实例。 #包含已中止的,若是etcd启动失败退出,可经过该命令查看。 docker ps -a #查看特定容器实例的日志。 docker logs idxxxx
再次检查etcd状态:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ endpoint status -w table
检查kubernetes集群状态(kubectl get pod --all-namespaces)。
下面将节点逐个加入(etcd节点的IP地址必须在前面的证书生成时加入)。
我使用Kubernetes的kubelet来托管etcd的运行(也能够部署为独立的系统服务,如使用systemd)。
使用kubeadm join加入新的节点(将会建立kubelet基础服务,并且etcd节点和kubernetes节点同时可用)。在主节点获取添加命令,以下:
#在主节点上执行 kubeadm token create --print-join-command
直接将master节点的/etc/kubernetes/pki目录复制到子节点。以下:
#在子节点上执行 sudo scp -r root@10.1.1.201:/etc/kubernetes/pki /etc/kubernetes/
命令为:
sudo systemctl stop kubelet
使用etcdctl的member add命令添加节点:
#在子节点上执行,将子节点peer-urls添加到etcd集群中。 ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --endpoints=https://[10.1.1.201]:2379 \ member add podc02 --peer-urls=https://10.1.1.202:2380
此时,etcdctl member list查当作员为unstarted状态。命令以下:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ member list -w table
将etcd.yaml文件放入各个子节点的/etc/kubernetes/manifests目录下,跟master节点同样,而后sudo systemctl restart kubelet重启kubelet服务,kubelet启动时将会自动启动/etc/kubernetes/manifests下的全部*.yaml实例为静态pod(静态pod在Dashboard删除时会删除当前的运行实例,而后被kubelet自动重启,不会永久删除)。
#在子节点上执行 sudo scp -r root@10.1.1.201:/etc/kubernetes/manifests/etcd.yaml /etc/kubernetes/manifests/
sudo nano /etc/kubernetes/manifest/etcd.yaml
#子节点podc02上的/etc/kubernetes/manifests/etcd.yaml apiVersion: v1 kind: Pod metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" creationTimestamp: null labels: component: etcd tier: control-plane name: etcd namespace: kube-system spec: containers: - command: - etcd - --advertise-client-urls=https://10.1.1.202:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.pem - --client-cert-auth=true - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://10.1.1.202:2380 - --initial-cluster=podc01=https://10.1.1.201:2380,podc02=https://10.1.1.202:2380 - --initial-cluster-token=etcd-cluster - --initial-cluster-state=existing - --key-file=/etc/kubernetes/pki/etcd/server-key.pem - --listen-client-urls=https://127.0.0.1:2379,https://10.1.1.202:2379 - --listen-peer-urls=https://10.1.1.202:2380 - --name=podc02 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer2.pem - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer2-key.pem - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem image: k8s.gcr.io/etcd:3.2.24 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -ec - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem get foo failureThreshold: 8 initialDelaySeconds: 15 timeoutSeconds: 15 name: etcd resources: {} volumeMounts: - mountPath: /var/lib/etcd name: etcd-data - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs hostNetwork: true priorityClassName: system-cluster-critical volumes: - hostPath: path: /etc/kubernetes/pki/etcd-certs type: DirectoryOrCreate name: etcd-certs - hostPath: path: /var/lib/etcd type: DirectoryOrCreate name: etcd-data status: {}
确认etcd参数正确,如今能够启动kubelet服务了。命令为:
sudo systemctl start kubelet
! 参照上面的3.1-3.6方法将全部etcd集群子节点加入到集群中(注意严格按照顺序操做)。
能够在主机安装etcd-client,而后etcdctl能够直接链接到容器中的etcd服务。
查看etcd集群成员列表:
# etcd cluster member list echo "" echo "=============================" echo "+ etcd cluster member list..." ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ member list -w table --endpoints=https://[10.1.1.201]:2379
输出以下:
============================= + etcd cluster member list... +------------------+---------+--------+-------------------------+-------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+--------+-------------------------+-------------------------+ | 741ead392743e35 | started | podc02 | https://10.1.1.202:2380 | https://10.1.1.202:2379 | | 72077d56570df47f | started | podc01 | https://10.1.1.201:2380 | https://10.1.1.201:2379 | | dfc70cacefa4fbbb | started | podc04 | https://10.1.1.204:2380 | https://10.1.1.204:2379 | | e3ecb8f6d5866785 | started | podc03 | https://10.1.1.203:2380 | https://10.1.1.203:2379 | +------------------+---------+--------+-------------------------+-------------------------+
查看etcd集群成员状态:
# member list, local echo "" echo "=========================" echo "+ etcd cluster status... " ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --endpoints=https://[10.1.1.201]:2379,https://[10.1.1.202]:2379,https://[10.1.1.203]:2379,https://[10.1.1.204]:2379 \ endpoint status -w table
输出以下:
========================= + etcd cluster status... +---------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +---------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://[10.1.1.201]:2379 | 72077d56570df47f | 3.2.24 | 4.2 MB | true | 1875 | 253980 | | https://[10.1.1.202]:2379 | 741ead392743e35 | 3.2.24 | 4.2 MB | false | 1875 | 253980 | | https://[10.1.1.203]:2379 | e3ecb8f6d5866785 | 3.2.24 | 4.2 MB | false | 1875 | 253980 | | https://[10.1.1.204]:2379 | dfc70cacefa4fbbb | 3.2.24 | 4.2 MB | false | 1875 | 253980 | +---------------------------+------------------+---------+---------+-----------+-----------+------------+
修改/etc/kubernetes/manifests/api-server.yaml文件,以下:
# - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt # - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt # - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key # - --etcd-servers=https://127.0.0.1:2379 - --etcd-cafile=/etc/kubernetes/pki/etcd-certs/ca.pem - --etcd-certfile=/etc/kubernetes/pki/etcd-certs/client.pem - --etcd-keyfile=/etc/kubernetes/pki/etcd-certs/client-key.pem - --etcd-servers=https://10.1.1.201:2379
将上面的新的etcd服务地址配置给kube-apiserver。
⚠️提示:
下一步:
参考: