Kubernetes使用kubeadm安装默认只有一个etcd实例,存在单点故障的风险。提高Kubernetes集群可用性的方法包括:一、备份(Kubernetes探秘—etcd状态数据及其备份 );二、etcd节点和实例扩容;三、apiserver的多节点服务和负载均衡。这里主要实验etcd节点和实例的扩容。node
etcd是一个独立的服务,在kubernetes中使用时将配置参数和数据目录分别映射到了宿主机目录,并且使用hostnetwork网络(本主机网络)。其中,/etc/kubernetes/manifest/etcd.yaml 为启动参数文件,/etc/kubernetes/pki/etcd 为 https使用的证书,/var/lib/etcd 为该节点的etcd数据文件。linux
对于已用kubeadm安装的单Master节点Kubernetes集群,其etcd运行实例只有一个。咱们但愿将其etcd实例扩展到多个,以下降单点失效风险。Kubernetes中etcd的扩容的思路以下:git
准备好安装etcd的节点。我使用ubuntu 18.04LTS,而后安装Docker CE 18.06和kubernetes 1.12.3。github
我这里的三个节点分别为:docker
须要提早把k8s用到的容器镜像拉取下来到每个节点。参考:json
本想尝试复制主节点的/etc/kubernetes/kpi和/etc/kubernetes/manifest目录到全部副(mate)节点,启动后出现各类问题没法正常访问,提示是ca证书问题。最后,准备从头开始建立本身的证书和部署yaml文件。ubuntu
建立证书使用cfssl来建立,须要下载模版文件和修改定义文件,包括ca机构、ca-config配置、ca-key私钥、csr请求、server/peer/client等证书的配置模版文件等。须要将里面的信息按照本身的环境进行修改。segmentfault
下面说明具体过程(更多信息参考 http://www.javashuo.com/article/p-zzahcksu-hh.html)。api
mkdir ~/cfssl && cd ~/cfssl mkdir bin && cd bin wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -O cfssl wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -O cfssljson chmod +x {cfssl,cfssljson} export PATH=$PATH:~/cfssl/bin
建立证书配置文件目录:服务器
mkdir -p ~/cfssl/etcd-certs && cd ~/cfssl/etcd-certs
生成证书配置文件放到~/cfssl/etcd-certs目录中,文件模版以下:
# ============================================== # ca-config.json { "signing": { "default": { "expiry": "43800h" }, "profiles": { "server": { "expiry": "43800h", "usages": [ "signing", "key encipherment", "server auth" ] }, "client": { "expiry": "43800h", "usages": [ "signing", "key encipherment", "client auth" ] }, "peer": { "expiry": "43800h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] } } } } # ============================================== # ca-csr.json { "CN": "My own CA", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "US", "L": "CA", "O": "My Company Name", "ST": "San Francisco", "OU": "Org Unit 1", "OU": "Org Unit 2" } ] } # ============================================== # server.json { "CN": "etcd0", "hosts": [ "127.0.0.1", "0.0.0.0", "10.1.1.201", "10.1.1.202", "10.1.1.203" ], "key": { "algo": "ecdsa", "size": 256 }, "names": [ { "C": "US", "L": "CA", "ST": "San Francisco" } ] } # ============================================== # peer1.json # 填本机IP { "CN": "etcd0", "hosts": [ "10.1.1.201" ], "key": { "algo": "ecdsa", "size": 256 }, "names": [ { "C": "US", "L": "CA", "ST": "San Francisco" } ] } # ============================================== # client.json { "CN": "client", "hosts": [ "" ], "key": { "algo": "ecdsa", "size": 256 }, "names": [ { "C": "US", "L": "CA", "ST": "San Francisco" } ] }
操做以下:
cd ~/cfssl/etcd-certs cfssl gencert -initca ca-csr.json | cfssljson -bare ca - cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer peer1.json | cfssljson -bare peer1 cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client
查看所产生的证书文件:
ls -l ~/cfssl/etcd-certs
文件包括:
...
启动etcd实例以前,务必将/var/lib/etcd目录清空,不然一些设置的参数将不会起做用,仍然保留原来的状态。
注意,etcd的下面几个参数只在第一次启动(初始化)时起做用,包括:
将cfssl/etcd-certs目录拷贝到/etc/kubernetes/pki/etcd-certs 目录,可使用scp或sftp上传。
编辑/etc/kubernetes/manifests/etcd.yaml文件,这是kubelet启动etcd实例的配置文件。
# /etc/kubernetes/manifests/etcd.yaml apiVersion: v1 kind: Pod metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" creationTimestamp: null labels: component: etcd tier: control-plane name: etcd namespace: kube-system spec: containers: - command: - etcd - --advertise-client-urls=https://10.1.1.201:2379 - --cert-file=/etc/kubernetes/pki/etcd-certs/server.pem - --client-cert-auth=true - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://10.1.1.201:2380 - --initial-cluster=etcd0=https://10.1.1.201:2380 - --key-file=/etc/kubernetes/pki/etcd-certs/server-key.pem - --listen-client-urls=https://10.1.1.201:2379 - --listen-peer-urls=https://10.1.1.201:2380 - --name=etcd1 - --peer-cert-file=/etc/kubernetes/pki/etcd-certs/peer1.pem - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd-certs/peer1-key.pem - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem image: k8s.gcr.io/etcd-amd64:3.2.18 imagePullPolicy: IfNotPresent #livenessProbe: # exec: # command: # - /bin/sh # - -ec # - ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem # --cert=/etc/kubernetes/pki/etcd-certs/client.pem --key=/etc/kubernetes/pki/etcd-certs/client-key.pem # get foo # failureThreshold: 8 # initialDelaySeconds: 15 # timeoutSeconds: 15 name: etcd resources: {} volumeMounts: - mountPath: /var/lib/etcd name: etcd-data - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs hostNetwork: true priorityClassName: system-cluster-critical volumes: - hostPath: path: /var/lib/etcd type: DirectoryOrCreate name: etcd-data - hostPath: path: /etc/kubernetes/pki/etcd-certs type: DirectoryOrCreate name: etcd-certs status: {}
参照上面的模式,在各个副节点修改etcd启动参数/etc/kubernetes/manifest/etcd.yaml文件内容。
进入etcd容器执行:
alias etcdv3="ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem" etcdv3 member add etcd1 --peer-urls="https://10.1.1.202:2380"
拷贝etcd1(10.1.1.201)节点上的证书到etcd1(10.1.1.202)节点上,复制peer1.json到etcd2的peer2.json,修改peer2.json。
# peer2.json { "CN": "etcd1", "hosts": [ "10.1.86.202" ], "key": { "algo": "ecdsa", "size": 256 }, "names": [ { "C": "US", "L": "CA", "ST": "San Francisco" } ] }
从新生成在etcd1上生成peer1证书:
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer peer1.json | cfssljson -bare peer1
启动etcd1,配置文件以下:
# etcd02 etcd.yaml apiVersion: v1 kind: Pod metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" creationTimestamp: null labels: component: etcd tier: control-plane name: etcd namespace: kube-system spec: containers: - command: - etcd - --advertise-client-urls=https://10.1.1.202:2379 - --cert-file=/etc/kubernetes/pki/etcd-certs/server.pem - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://10.1.1.202:2380 - --initial-cluster=etcd01=https://10.1.1.201:2380,etcd02=https://10.1.1.202:2380 - --key-file=/etc/kubernetes/pki/etcd-certs/server-key.pem - --listen-client-urls=https://10.1.1.202:2379 - --listen-peer-urls=https://10.1.1.202:2380 - --name=etcd02 - --peer-cert-file=/etc/kubernetes/pki/etcd-certs/peer2.pem - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd-certs/peer2-key.pem - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem - --initial-cluster-state=existing # 千万别加双引号,被坑死 image: k8s.gcr.io/etcd-amd64:3.2.18 imagePullPolicy: IfNotPresent # livenessProbe: # exec: # command: # - /bin/sh # - -ec # - ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.202]:2379 --cacert=/etc/kubernetes/pki/etcd-certs/ca.crt # --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd-certs/healthcheck-client.key # get foo # failureThreshold: 8 # initialDelaySeconds: 15 # timeoutSeconds: 15 name: etcd resources: {} volumeMounts: - mountPath: /var/lib/etcd name: etcd-data - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs hostNetwork: true priorityClassName: system-cluster-critical volumes: - hostPath: path: /var/lib/etcd type: DirectoryOrCreate name: etcd-data - hostPath: path: /etc/kubernetes/pki/etcd-certs type: DirectoryOrCreate name: etcd-certs status: {}
进入etcd容器执行:
alias etcdv3="ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.86.201]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem" etcdv3 member add etcd1 --peer-urls="https://10.1.1.203:2380"
按照以上步骤,增长etcd03。
# etcdctl --endpoints=https://[10.1.1.201]:2379 --ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem --cert-file=/etc/kubernetes/pki/etcd-certs/client.pem --key-file=/etc/kubernetes/pki/etcd-certs/client-key.pem cluster-health member 5856099674401300 is healthy: got healthy result from https://10.1.86.201:2379 member df99f445ac908d15 is healthy: got healthy result from https://10.1.86.202:2379 cluster is healthy
- --etcd-cafile=/etc/kubernetes/pki/etcd-certs/ca.pem - --etcd-certfile=/etc/kubernetes/pki/etcd-certs/client.pem - --etcd-keyfile=/etc/kubernetes/pki/etcd-certs/client-key.pem
至此,etcd已经扩展成多节点的分布式集群,并且各个节点的kubernetes都是能够访问的。
注意:
上面所部署的工做节点还只能链接到一个apiserver,其它副节点的apiserver虽然可用可是没法被工做节点链接到。
下一步须要实现多master节点的容错,遇主节点故障时能够转移访问其它的副节点。