主机名 | Centos版本 | ip | docker version | flannel version | Keepalived version | 主机配置 | 备注 |
---|---|---|---|---|---|---|---|
lvs-keepalived01 | 7.6.1810 | 172.27.34.28 | / | / | v1.3.5 | 4C4G | lvs-keepalived |
lvs-keepalived01 | 7.6.1810 | 172.27.34.29 | / | / | v1.3.5 | 4C4G | lvs-keepalived |
master01 | 7.6.1810 | 172.27.34.35 | 18.09.9 | v0.11.0 | / | 4C4G | control plane |
master02 | 7.6.1810 | 172.27.34.36 | 18.09.9 | v0.11.0 | / | 4C4G | control plane |
master03 | 7.6.1810 | 172.27.34.37 | 18.09.9 | v0.11.0 | / | 4C4G | control plane |
work01 | 7.6.1810 | 172.27.34.161 | 18.09.9 | / | / | 4C4G | worker nodes |
work02 | 7.6.1810 | 172.27.34.162 | 18.09.9 | / | / | 4C4G | worker nodes |
work03 | 7.6.1810 | 172.27.34.163 | 18.09.9 | / | / | 4C4G | worker nodes |
VIP | 7.6.1810 | 172.27.34.222 | / | / | v1.3.5 | 4C4G | 在lvs-keepalived两台主机上浮动 |
client | 7.6.1810 | 172.27.34.85 | / | / | / | 4C4G | client |
共有9台服务器,2台为lvs-keepalived集群,3台control plane集群,3台work集群,1台client。node
主机名 | kubelet version | kubeadm version | kubectl version | 备注 |
---|---|---|---|---|
master01 | v1.16.4 | v1.16.4 | v1.16.4 | kubectl选装 |
master02 | v1.16.4 | v1.16.4 | v1.16.4 | kubectl选装 |
master03 | v1.16.4 | v1.16.4 | v1.16.4 | kubectl选装 |
work01 | v1.16.4 | v1.16.4 | v1.16.4 | kubectl选装 |
work02 | v1.16.4 | v1.16.4 | v1.16.4 | kubectl选装 |
work03 | v1.16.4 | v1.16.4 | v1.16.4 | kubectl选装 |
client | / | / | v1.16.4 | client |
本文采用kubeadm方式搭建高可用k8s集群,k8s集群的高可用实际是k8s各核心组件的高可用,这里使用集群模式(针对apiserver来说),架构以下:linux
核心组件 | 高可用模式 | 高可用实现方式 |
---|---|---|
apiserver | 集群 | lvs+keepalived |
controller-manager | 主备 | leader election |
scheduler | 主备 | leader election |
etcd | 集群 | kubeadm |
- apiserver 经过lvs-keepalived实现高可用,vip将请求分发至各个control plane节点的apiserver组件;
- controller-manager k8s内部经过选举方式产生领导者(由--leader-elect 选型控制,默认为true),同一时刻集群内只有一个controller-manager组件运行;
- scheduler k8s内部经过选举方式产生领导者(由--leader-elect 选型控制,默认为true),同一时刻集群内只有一个scheduler组件运行;
- etcd 经过运行kubeadm方式自动建立集群来实现高可用,部署的节点数为奇数,3节点方式最多容忍一台机器宕机。
本文全部的服务器都为Centos7.6,Centos7.6安装详见:Centos7.6操做系统安装及优化全纪录 nginx
安装Centos时已经禁用了防火墙和selinux并设置了阿里源。git
control plane和work节点都执行本部分操做,以master01为例记录搭建过程。github
[root@centos7 ~]# hostnamectl set-hostname master01 [root@centos7 ~]# more /etc/hostname master01
退出从新登录便可显示新设置的主机名master01,各服务器修改成对应的主机名。算法
[root@master01 ~]# cat >> /etc/hosts << EOF 172.27.34.35 master01 172.27.34.36 master02 172.27.34.37 master03 172.27.34.161 work01 172.27.34.162 work02 172.27.34.163 work03 EOF
[root@master01 ~]# cat /sys/class/net/ens160/address [root@master01 ~]# cat /sys/class/dmi/id/product_uuid
保证各节点mac和uuid惟一docker
[root@master01 ~]# swapoff -a
若须要重启后也生效,在禁用swap后还需修改配置文件/etc/fstab,注释swapjson
[root@master01 ~]# sed -i.bak '/swap/s/^/#/' /etc/fstab
本文的k8s网络使用flannel,该网络须要设置内核参数bridge-nf-call-iptables=1,修改这个参数须要系统有br_netfilter模块。centos
查看br_netfilter模块:api
[root@master01 ~]# lsmod |grep br_netfilter
若是系统没有br_netfilter模块则执行下面的新增命令,若有则忽略。
临时新增br_netfilter模块:
[root@master01 ~]# modprobe br_netfilter
该方式重启后会失效
永久新增br_netfilter模块:
[root@master01 ~]# cat > /etc/rc.sysinit << EOF #!/bin/bash for file in /etc/sysconfig/modules/*.modules ; do [ -x $file ] && $file done EOF [root@master01 ~]# cat > /etc/sysconfig/modules/br_netfilter.modules << EOF modprobe br_netfilter EOF [root@master01 ~]# chmod 755 /etc/sysconfig/modules/br_netfilter.modules
[root@master01 ~]# sysctl net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-iptables = 1 [root@master01 ~]# sysctl net.bridge.bridge-nf-call-ip6tables=1 net.bridge.bridge-nf-call-ip6tables = 1
[root@master01 ~]# cat <<EOF > /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF [root@master01 ~]# sysctl -p /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1
[root@master01 ~]# cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF
- [] 中括号中的是repository id,惟一,用来标识不一样仓库
- name 仓库名称,自定义
- baseurl 仓库地址
- enable 是否启用该仓库,默认为1表示启用
- gpgcheck 是否验证从该仓库得到程序包的合法性,1为验证
- repo_gpgcheck 是否验证元数据的合法性 元数据就是程序包列表,1为验证
- gpgkey=URL 数字签名的公钥文件所在位置,若是gpgcheck值为1,此处就须要指定gpgkey文件的位置,若是gpgcheck值为0就不须要此项了
[root@master01 ~]# yum clean all [root@master01 ~]# yum -y makecache
配置master01到master0二、master03免密登陆,本步骤只在master01上执行。
[root@master01 ~]# ssh-keygen -t rsa
[root@master01 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.27.34.35 [root@master01 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.27.34.36
[root@master01 ~]# ssh 172.27.34.36 [root@master01 ~]# ssh master03
master01能够直接登陆master02和master03,不须要输入密码。
重启各control plane和work节点。
control plane和work节点都执行本部分操做。
[root@master01 ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
[root@master01 ~]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
[root@master01 ~]# yum list docker-ce --showduplicates | sort -r
[root@master01 ~]# yum install docker-ce-18.09.9 docker-ce-cli-18.09.9 containerd.io -y
指定安装的docker版本为18.09.9
[root@master01 ~]# systemctl start docker [root@master01 ~]# systemctl enable docker
[root@master01 ~]# yum -y install bash-completion
[root@master01 ~]# source /etc/profile.d/bash_completion.sh
因为Docker Hub的服务器在国外,下载镜像会比较慢,能够配置镜像加速器。主要的加速器有:Docker官方提供的中国registry mirror、阿里云加速器、DaoCloud 加速器,本文以阿里加速器配置为例。
登录地址为:https://cr.console.aliyun.com ,未注册的能够先注册阿里云帐户
配置daemon.json文件
[root@master01 ~]# mkdir -p /etc/docker [root@master01 ~]# tee /etc/docker/daemon.json <<-'EOF' { "registry-mirrors": ["https://v16stybc.mirror.aliyuncs.com"] } EOF
重启服务
[root@master01 ~]# systemctl daemon-reload [root@master01 ~]# systemctl restart docker
加速器配置完成
[root@master01 ~]# docker --version [root@master01 ~]# docker run hello-world
经过查询docker版本和运行容器hello-world来验证docker是否安装成功。
修改daemon.json,新增‘"exec-opts": ["native.cgroupdriver=systemd"’
[root@master01 ~]# more /etc/docker/daemon.json { "registry-mirrors": ["https://v16stybc.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=systemd"] }
[root@master01 ~]# systemctl daemon-reload [root@master01 ~]# systemctl restart docker
修改cgroupdriver是为了消除告警:
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
control plane和work节点都执行本部分操做。
[root@master01 ~]# yum list kubelet --showduplicates | sort -r
本文安装的kubelet版本是1.16.4,该版本支持的docker版本为1.13.1, 17.03, 17.06, 17.09, 18.06, 18.09。
[root@master01 ~]# yum install -y kubelet-1.16.4 kubeadm-1.16.4 kubectl-1.16.4
- kubelet 运行在集群全部节点上,用于启动Pod和容器等对象的工具
- kubeadm 用于初始化集群,启动集群的命令工具
- kubectl 用于和集群通讯的命令行,经过kubectl能够部署和管理应用,查看各类资源,建立、删除和更新各类组件
启动kubelet并设置开机启动
[root@master01 ~]# systemctl enable kubelet && systemctl start kubelet
[root@master01 ~]# echo "source <(kubectl completion bash)" >> ~/.bash_profile [root@master01 ~]# source .bash_profile
Kubernetes几乎全部的安装组件和Docker镜像都放在goolge本身的网站上,直接访问可能会有网络问题,这里的解决办法是从阿里云镜像仓库下载镜像,拉取到本地之后改回默认的镜像tag。本文经过运行image.sh脚本方式拉取镜像。
[root@master01 ~]# more image.sh #!/bin/bash url=registry.cn-hangzhou.aliyuncs.com/loong576 version=v1.16.4 images=(`kubeadm config images list --kubernetes-version=$version|awk -F '/' '{print $2}'`) for imagename in ${images[@]} ; do docker pull $url/$imagename docker tag $url/$imagename k8s.gcr.io/$imagename docker rmi -f $url/$imagename done
url为阿里云镜像仓库地址,version为安装的kubernetes版本。
运行脚本image.sh,下载指定版本的镜像
[root@master01 ~]# ./image.sh [root@master01 ~]# docker images
master01节点执行本部分操做。
[root@master01 ~]# more kubeadm-config.yaml apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: v1.16.4 apiServer: certSANs: #填写全部kube-apiserver节点的hostname、IP、VIP - master01 - master02 - master03 - work01 - work02 - work03 - 172.27.34.35 - 172.27.34.36 - 172.27.34.37 - 172.27.34.161 - 172.27.34.162 - 172.27.34.163 - 172.27.34.222 controlPlaneEndpoint: "172.27.34.222:6443" networking: podSubnet: "10.244.0.0/16"
kubeadm.conf为初始化的配置文件
在master01上起虚ip:172.27.34.222
[root@master01 ~]# ifconfig ens160:2 172.27.34.222 netmask 255.255.255.0 up
起虚ip目的是为了执行master01的初始化,待初始化完成后去掉该虚ip
[root@master01 ~]# kubeadm init --config=kubeadm-config.yaml
记录kubeadm join的输出,后面须要这个命令将work节点和其余control plane节点加入集群中。
You can now join any number of control-plane nodes by copying certificate authorities and service account keys on each node and then running the following as root: kubeadm join 172.27.34.222:6443 --token lw90fv.j1lease5jhzj9ih2 \ --discovery-token-ca-cert-hash sha256:79575e7a39eac086e121364f79e58a33f9c9de2a4e9162ad81d0abd1958b24f4 \ --control-plane Then you can join any number of worker nodes by running the following on each as root: kubeadm join 172.27.34.222:6443 --token lw90fv.j1lease5jhzj9ih2 \ --discovery-token-ca-cert-hash sha256:79575e7a39eac086e121364f79e58a33f9c9de2a4e9162ad81d0abd1958b24f4
初始化失败:
若是初始化失败,可执行kubeadm reset后从新初始化
[root@master01 ~]# kubeadm reset [root@master01 ~]# rm -rf $HOME/.kube/config
[root@master01 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile [root@master01 ~]# source .bash_profile
本文全部操做都在root用户下执行,若为非root用户,则执行以下操做:
mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config
在master01上新建flannel网络
[root@master01 ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
因为网络缘由,可能会安装失败,能够在文末直接下载kube-flannel.yml文件,而后再执行apply
在master01上运行脚本cert-main-master.sh,将证书分发至master02和master03
[root@master01 ~]# ll|grep cert-main-master.sh -rwxr--r-- 1 root root 638 1月 16 10:25 cert-main-master.sh [root@master01 ~]# more cert-main-master.sh USER=root # customizable CONTROL_PLANE_IPS="172.27.34.36 172.27.34.37" for host in ${CONTROL_PLANE_IPS}; do scp /etc/kubernetes/pki/ca.crt "${USER}"@$host: scp /etc/kubernetes/pki/ca.key "${USER}"@$host: scp /etc/kubernetes/pki/sa.key "${USER}"@$host: scp /etc/kubernetes/pki/sa.pub "${USER}"@$host: scp /etc/kubernetes/pki/front-proxy-ca.crt "${USER}"@$host: scp /etc/kubernetes/pki/front-proxy-ca.key "${USER}"@$host: scp /etc/kubernetes/pki/etcd/ca.crt "${USER}"@$host:etcd-ca.crt # Quote this line if you are using external etcd scp /etc/kubernetes/pki/etcd/ca.key "${USER}"@$host:etcd-ca.key done
在master02上运行脚本cert-other-master.sh,将证书移至指定目录
[root@master02 ~]# more cert-other-master.sh USER=root # customizable mkdir -p /etc/kubernetes/pki/etcd mv /${USER}/ca.crt /etc/kubernetes/pki/ mv /${USER}/ca.key /etc/kubernetes/pki/ mv /${USER}/sa.pub /etc/kubernetes/pki/ mv /${USER}/sa.key /etc/kubernetes/pki/ mv /${USER}/front-proxy-ca.crt /etc/kubernetes/pki/ mv /${USER}/front-proxy-ca.key /etc/kubernetes/pki/ mv /${USER}/etcd-ca.crt /etc/kubernetes/pki/etcd/ca.crt # Quote this line if you are using external etcd mv /${USER}/etcd-ca.key /etc/kubernetes/pki/etcd/ca.key [root@master02 ~]# ./cert-other-master.sh
在master03上也运行脚本cert-other-master.sh
[root@master03 ~]# pwd /root [root@master03 ~]# ll|grep cert-other-master.sh -rwxr--r-- 1 root root 484 1月 16 10:30 cert-other-master.sh [root@master03 ~]# ./cert-other-master.sh
[root@master03 ~]# kubeadm join 172.27.34.222:6443 --token lw90fv.j1lease5jhzj9ih2 --discovery-token-ca-cert-hash sha256:79575e7a39eac086e121364f79e58a33f9c9de2a4e9162ad81d0abd1958b24f4 --control-plane
运行初始化master生成的control plane节点加入集群的命令
[root@master03 ~]# kubeadm join 172.27.34.222:6443 --token 0p7rzn.fdanprq4y8na36jh --discovery-token-ca-cert-hash sha256:fc7a828208d554329645044633159e9dc46b0597daf66769988fee8f3fc0636b --control-plane
master02和master03加载环境变量
[root@master02 ~]# scp master01:/etc/kubernetes/admin.conf /etc/kubernetes/ [root@master02 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile [root@master02 ~]# source .bash_profile [root@master03 ~]# scp master01:/etc/kubernetes/admin.conf /etc/kubernetes/ [root@master03 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile [root@master03 ~]# source .bash_profile
该步操做是为了在master02和master03上也能执行kubectl命令。
[root@master01 ~]# kubectl get nodes [root@master01 ~]# kubectl get po -o wide -n kube-system
发现master01和master03下载flannel异常,分别在master01和master03上手动下载该镜像后正常。
[root@master01 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/loong576/flannel:v0.11.0-amd64 [root@master03 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/loong576/flannel:v0.11.0-amd64
[root@work01 ~]# kubeadm join 172.27.34.222:6443 --token lw90fv.j1lease5jhzj9ih2 --discovery-token-ca-cert-hash sha256:79575e7a39eac086e121364f79e58a33f9c9de2a4e9162ad81d0abd1958b24f4
运行初始化master生成的work节点加入集群的命令
[root@work02 ~]# kubeadm join 172.27.34.222:6443 --token lw90fv.j1lease5jhzj9ih2 --discovery-token-ca-cert-hash sha256:79575e7a39eac086e121364f79e58a33f9c9de2a4e9162ad81d0abd1958b24f4
[root@work03 ~]# kubeadm join 172.27.34.222:6443 --token lw90fv.j1lease5jhzj9ih2 --discovery-token-ca-cert-hash sha256:79575e7a39eac086e121364f79e58a33f9c9de2a4e9162ad81d0abd1958b24f4
[root@master01 ~]# kubectl get nodes [root@master01 ~]# kubectl get po -o wide -n kube-system
lvs-keepalived01和lvs-keepalived02都执行本操做。
LVS无需安装,安装的是管理工具,第一种叫ipvsadm,第二种叫keepalive。ipvsadm是经过命令行管理,而keepalive读取配置文件管理。
[root@lvs-keepalived01 ~]# yum -y install ipvsadm
把ipvsadm模块加载进系统
[root@lvs-keepalived01 ~]# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn [root@lvs-keepalived01 ~]# lsmod | grep ip_vs ip_vs 145497 0 nf_conntrack 133095 1 ip_vs libcrc32c 12644 3 xfs,ip_vs,nf_conntrack
lvs相关实践详见:LVS+Keepalived+Nginx负载均衡搭建测试
lvs-keepalived01和lvs-keepalived02都执行本操做。
[root@lvs-keepalived01 ~]# yum -y install keepalived
lvs-keepalived01配置以下:
[root@lvs-keepalived01 ~]# more /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { router_id lvs-keepalived01 #router_id 机器标识,一般为hostname,但不必定非得是hostname。故障发生时,邮件通知会用到。 } vrrp_instance VI_1 { #vrrp实例定义部分 state MASTER #设置lvs的状态,MASTER和BACKUP两种,必须大写 interface ens160 #设置对外服务的接口 virtual_router_id 100 #设置虚拟路由标示,这个标示是一个数字,同一个vrrp实例使用惟一标示 priority 100 #定义优先级,数字越大优先级越高,在一个vrrp——instance下,master的优先级必须大于backup advert_int 1 #设定master与backup负载均衡器之间同步检查的时间间隔,单位是秒 authentication { #设置验证类型和密码 auth_type PASS #主要有PASS和AH两种 auth_pass 1111 #验证密码,同一个vrrp_instance下MASTER和BACKUP密码必须相同 } virtual_ipaddress { #设置虚拟ip地址,能够设置多个,每行一个 172.27.34.222 } } virtual_server 172.27.34.222 6443 { #设置虚拟服务器,须要指定虚拟ip和服务端口 delay_loop 6 #健康检查时间间隔 lb_algo wrr #负载均衡调度算法 lb_kind DR #负载均衡转发规则 #persistence_timeout 50 #设置会话保持时间,对动态网页很是有用 protocol TCP #指定转发协议类型,有TCP和UDP两种 real_server 172.27.34.35 6443 { #配置服务器节点1,须要指定real server的真实IP地址和端口 weight 10 #设置权重,数字越大权重越高 TCP_CHECK { #realserver的状态监测设置部分单位秒 connect_timeout 10 #链接超时为10秒 retry 3 #重连次数 delay_before_retry 3 #重试间隔 connect_port 6443 #链接端口为6443,要和上面的保持一致 } } real_server 172.27.34.36 6443 { #配置服务器节点1,须要指定real server的真实IP地址和端口 weight 10 #设置权重,数字越大权重越高 TCP_CHECK { #realserver的状态监测设置部分单位秒 connect_timeout 10 #链接超时为10秒 retry 3 #重连次数 delay_before_retry 3 #重试间隔 connect_port 6443 #链接端口为6443,要和上面的保持一致 } } real_server 172.27.34.37 6443 { #配置服务器节点1,须要指定real server的真实IP地址和端口 weight 10 #设置权重,数字越大权重越高 TCP_CHECK { #realserver的状态监测设置部分单位秒 connect_timeout 10 #链接超时为10秒 retry 3 #重连次数 delay_before_retry 3 #重试间隔 connect_port 6443 #链接端口为6443,要和上面的保持一致 } } }
lvs-keepalived02配置以下:
[root@lvs-keepalived02 ~]# more /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { router_id lvs-keepalived02 #router_id 机器标识,一般为hostname,但不必定非得是hostname。故障发生时,邮件通知会用到。 } vrrp_instance VI_1 { #vrrp实例定义部分 state BACKUP #设置lvs的状态,MASTER和BACKUP两种,必须大写 interface ens160 #设置对外服务的接口 virtual_router_id 100 #设置虚拟路由标示,这个标示是一个数字,同一个vrrp实例使用惟一标示 priority 90 #定义优先级,数字越大优先级越高,在一个vrrp——instance下,master的优先级必须大于backup advert_int 1 #设定master与backup负载均衡器之间同步检查的时间间隔,单位是秒 authentication { #设置验证类型和密码 auth_type PASS #主要有PASS和AH两种 auth_pass 1111 #验证密码,同一个vrrp_instance下MASTER和BACKUP密码必须相同 } virtual_ipaddress { #设置虚拟ip地址,能够设置多个,每行一个 172.27.34.222 } } virtual_server 172.27.34.222 6443 { #设置虚拟服务器,须要指定虚拟ip和服务端口 delay_loop 6 #健康检查时间间隔 lb_algo wrr #负载均衡调度算法 lb_kind DR #负载均衡转发规则 #persistence_timeout 50 #设置会话保持时间,对动态网页很是有用 protocol TCP #指定转发协议类型,有TCP和UDP两种 real_server 172.27.34.35 6443 { #配置服务器节点1,须要指定real server的真实IP地址和端口 weight 10 #设置权重,数字越大权重越高 TCP_CHECK { #realserver的状态监测设置部分单位秒 connect_timeout 10 #链接超时为10秒 retry 3 #重连次数 delay_before_retry 3 #重试间隔 connect_port 6443 #链接端口为6443,要和上面的保持一致 } } real_server 172.27.34.36 6443 { #配置服务器节点1,须要指定real server的真实IP地址和端口 weight 10 #设置权重,数字越大权重越高 TCP_CHECK { #realserver的状态监测设置部分单位秒 connect_timeout 10 #链接超时为10秒 retry 3 #重连次数 delay_before_retry 3 #重试间隔 connect_port 6443 #链接端口为6443,要和上面的保持一致 } } real_server 172.27.34.37 6443 { #配置服务器节点1,须要指定real server的真实IP地址和端口 weight 10 #设置权重,数字越大权重越高 TCP_CHECK { #realserver的状态监测设置部分单位秒 connect_timeout 10 #链接超时为10秒 retry 3 #重连次数 delay_before_retry 3 #重试间隔 connect_port 6443 #链接端口为6443,要和上面的保持一致 } } }
[root@master01 ~]# ifconfig ens160:2 172.27.34.222 netmask 255.255.255.0 down
master01上去掉初始化使用的ip 172.27.34.222
lvs-keepalived01和lvs-keepalived02都启动keepalived并设置为开机启动
[root@lvs-keepalived01 ~]# service keepalived start Redirecting to /bin/systemctl start keepalived.service [root@lvs-keepalived01 ~]# systemctl enable keepalived Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[root@lvs-keepalived01 ~]# ip a
此时vip在lvs-keepalived01上
control plane都执行本操做。
打开control plane所在服务器的“路由”功能、关闭“ARP查询”功能并设置回环ip,三台control plane配置相同,以下:
[root@master01 ~]# cd /etc/rc.d/init.d/ [root@master01 init.d]# more realserver.sh #!/bin/bash SNS_VIP=172.27.34.222 case "$1" in start) ifconfig lo:0 $SNS_VIP netmask 255.255.255.255 broadcast $SNS_VIP /sbin/route add -host $SNS_VIP dev lo:0 echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce sysctl -p >/dev/null 2>&1 echo "RealServer Start OK" ;; stop) ifconfig lo:0 down route del $SNS_VIP >/dev/null 2>&1 echo "0" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "0" >/proc/sys/net/ipv4/conf/lo/arp_announce echo "0" >/proc/sys/net/ipv4/conf/all/arp_ignore echo "0" >/proc/sys/net/ipv4/conf/all/arp_announce echo "RealServer Stoped" ;; *) echo "Usage: $0 {start|stop}" exit 1 esac exit 0
此脚本用于control plane节点绑定 VIP ,并抑制响应 VIP 的 ARP 请求。这样作的目的是为了避免让关于 VIP 的 ARP 广播时,节点服务器应答( 由于control plane节点都绑定了 VIP ,若是不作设置它们会应答,就会乱套 )。
在全部control plane节点执行realserver.sh脚本:
[root@master01 init.d]# chmod u+x realserver.sh [root@master01 init.d]# /etc/rc.d/init.d/realserver.sh start RealServer Start OK
给realserver.sh脚本授予执行权限并运行realserver.sh脚本
[root@master01 init.d]# sed -i '$a /etc/rc.d/init.d/realserver.sh start' /etc/rc.d/rc.local [root@master01 init.d]# chmod u+x /etc/rc.d/rc.local
[root@client ~]# cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF
[root@client ~]# yum clean all [root@client ~]# yum -y makecache
[root@client ~]# yum install -y kubectl-1.16.4
安装版本与集群版本保持一致
[root@client ~]# yum -y install bash-completion
[root@client ~]# source /etc/profile.d/bash_completion.sh
[root@client ~]# mkdir -p /etc/kubernetes [root@client ~]# scp 172.27.34.35:/etc/kubernetes/admin.conf /etc/kubernetes/ [root@client ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile [root@client ~]# source .bash_profile
[root@master01 ~]# echo "source <(kubectl completion bash)" >> ~/.bash_profile [root@master01 ~]# source .bash_profile
[root@client ~]# kubectl get nodes [root@client ~]# kubectl get cs [root@client ~]# kubectl cluster-info [root@client ~]# kubectl get po -o wide -n kube-system
本节内容都在client节点完成。
[root@client ~]# wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml
若是链接超时,能够多试几回。recommended.yaml已上传,也能够在文末下载。
[root@client ~]# sed -i 's/kubernetesui/registry.cn-hangzhou.aliyuncs.com\/loong576/g' recommended.yaml
因为默认的镜像仓库网络访问不通,故改为阿里镜像
[root@client ~]# sed -i '/targetPort: 8443/a\ \ \ \ \ \ nodePort: 30001\n\ \ type: NodePort' recommended.yaml
配置NodePort,外部经过https://NodeIp:NodePort 访问Dashboard,此时端口为30001
[root@client ~]# cat >> recommended.yaml << EOF --- # ------------------- dashboard-admin ------------------- # apiVersion: v1 kind: ServiceAccount metadata: name: dashboard-admin namespace: kubernetes-dashboard --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: dashboard-admin subjects: - kind: ServiceAccount name: dashboard-admin namespace: kubernetes-dashboard roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin
建立超级管理员的帐号用于登陆Dashboard
[root@client ~]# kubectl apply -f recommended.yaml
[root@client ~]# kubectl get all -n kubernetes-dashboard
[root@client ~]# kubectl describe secrets -n kubernetes-dashboard dashboard-admin
令牌为:
eyJhbGciOiJSUzI1NiIsImtpZCI6Ii1SOU1pNGswQnJCVUtCaks2TlBnMGxUdGRSdTlPS0s0MjNjUkdlNzFRVXMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tbXRuZ3giLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNWVjOTdkNzItZTgwZi00MDE2LTk2NTEtZDhkMTYwOGJkODViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmVybmV0ZXMtZGFzaGJvYXJkOmRhc2hib2FyZC1hZG1pbiJ9.WJPzxkAGYjtq556d3HuXNh6g0sDYm2h6U_FsPDvvfhquYSccPGJ1UzX-lKxhPYyCegc603D7yFCc9zQOzpONttkue3rGdOz8KePOAHCUX7Xp_yTcJg15BPxQDDny6Lebu0fFXh_fpbU2_35nG28lRjiwKG3mV3O5uHdX5nk500RBmLkw3F054ww66hgFBfTH2HVDi1jOlAKWC0xatdxuqp2JkMqiBCZ_8Zwhi66EQYAMT1xu8Sn5-ur_6QsgaNNYhCeNxqHUiEFIZdLNu8QAnsKJJuhxxXd2KhIF6dwMvvOPG1djKCKSyNRn-SGILDucu1_6FoBG1DiNcIr90cPAtA
请使用火狐浏览器访问:https://control plane ip:30001,即https://172.27.34.35/36/37:30001/
接受风险
经过令牌方式登陆
登陆的首页显示
切换到命名空间kubernetes-dashboard,查看资源。
Dashboard提供了能够实现集群管理、工做负载、服务发现和负载均衡、存储、字典配置、日志视图等功能。
为了丰富dashboard的统计数据和图表,能够安装heapster组件。heapster组件实践详见:k8s实践(十一):heapster+influxdb+grafana实现kubernetes集群监
经过ipvsadm查看apiserver所在节点,经过leader-elect查看scheduler和controller-manager所在节点:
在lvs-keepalived01上执行ipvsadm查看apiserver转发到的服务器
[root@lvs-keepalived01 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.27.34.222:6443 wrr -> 172.27.34.35:6443 Route 10 2 0 -> 172.27.34.36:6443 Route 10 2 0 -> 172.27.34.37:6443 Route 10 2 0
在client节点上查看controller-manager和scheduler组件所在节点
[root@client ~]# kubectl get endpoints kube-controller-manager -n kube-system -o yaml |grep holderIdentity control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master01_0a2bcea9-d17e-405b-8b28-5059ca434144","leaseDurationSeconds":15,"acquireTime":"2020-01-19T03:07:51Z","renewTime":"2020-01-19T04:40:20Z","leaderTransitions":2}' [root@client ~]# kubectl get endpoints kube-scheduler -n kube-system -o yaml |grep holderIdentity control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master01_c284cee8-57cf-46e7-a578-6c0a10aedb37","leaseDurationSeconds":15,"acquireTime":"2020-01-19T03:07:51Z","renewTime":"2020-01-19T04:40:30Z","leaderTransitions":2}'
组件名 | 所在节点 |
---|---|
apiserver | master0一、master0二、master03 |
controller-manager | master01 |
scheduler | master01 |
关闭master01,模拟宕机
[root@master01 ~]# init 0
lvs-keepalived01上查看apiserver节点连接状况
[root@lvs-keepalived01 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.27.34.222:6443 wrr -> 172.27.34.36:6443 Route 10 4 0 -> 172.27.34.37:6443 Route 10 2 0
发现master01的apiserver被移除集群,即访问172.27.34.222:64443时不会被调度到master01
client节点上再次运行查看controller-manager和scheduler命令
[root@client ~]# kubectl get endpoints kube-controller-manager -n kube-system -o yaml |grep holderIdentity control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master03_9481b109-f236-432a-a2cb-8d0c27417396","leaseDurationSeconds":15,"acquireTime":"2020-01-19T04:42:22Z","renewTime":"2020-01-19T04:45:45Z","leaderTransitions":3}' [root@client ~]# kubectl get endpoints kube-scheduler -n kube-system -o yaml |grep holderIdentity control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master03_6d84981b-3ab9-4a00-a86a-47bd2f5c7729","leaseDurationSeconds":15,"acquireTime":"2020-01-19T04:42:23Z","renewTime":"2020-01-19T04:45:48Z","leaderTransitions":3}' [root@client ~]#
controller-manager和scheduler都被切换到master03节点
组件名 | 所在节点 |
---|---|
apiserver | master0二、master03 |
controller-manager | master03 |
scheduler | master03 |
全部功能性测试都在client节点完成。
[root@client ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master01 NotReady master 22h v1.16.4 master02 Ready master 22h v1.16.4 master03 Ready master 22h v1.16.4 work01 Ready <none> 22h v1.16.4 work02 Ready <none> 22h v1.16.4 work03 Ready <none> 22h v1.16.4
master01状态为NotReady
[root@client ~]# more nginx-master.yaml apiVersion: apps/v1 #描述文件遵循extensions/v1beta1版本的Kubernetes API kind: Deployment #建立资源类型为Deployment metadata: #该资源元数据 name: nginx-master #Deployment名称 spec: #Deployment的规格说明 selector: matchLabels: app: nginx replicas: 3 #指定副本数为3 template: #定义Pod的模板 metadata: #定义Pod的元数据 labels: #定义label(标签) app: nginx #label的key和value分别为app和nginx spec: #Pod的规格说明 containers: - name: nginx #容器的名称 image: nginx:latest #建立容器所使用的镜像 [root@client ~]# kubectl apply -f nginx-master.yaml deployment.apps/nginx-master created [root@client ~]# kubectl get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-master-75b7bfdb6b-9d66p 1/1 Running 0 20s 10.244.3.6 work01 <none> <none> nginx-master-75b7bfdb6b-h4bql 1/1 Running 0 20s 10.244.5.5 work03 <none> <none> nginx-master-75b7bfdb6b-zmc68 1/1 Running 0 20s 10.244.4.5 work02 <none> <none>
以新建pod nginx为例测试集群是否能正常对外提供服务。
在3节点的k8s集群中,当有一个control plane节点宕机时,集群各项功能不受影响。
在master01处于关闭状态下,继续关闭master02,测试集群还可否正常对外服务。
[root@master02 ~]# init 0
[root@lvs-keepalived01 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.27.34.222:6443 wrr -> 172.27.34.37:6443 Route 10 6 20
此时对集群的访问都转到master03
[root@client ~]# kubectl get nodes The connection to the server 172.27.34.222:6443 was refused - did you specify the right host or port?
在3节点的k8s集群中,当有两个control plane节点同时宕机时,etcd集群崩溃,整个k8s集群也不能正常对外服务。
[root@client ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master01 Ready master 161m v1.16.4 master02 Ready master 144m v1.16.4 master03 Ready master 142m v1.16.4 work01 Ready <none> 137m v1.16.4 work02 Ready <none> 135m v1.16.4 work03 Ready <none> 134m v1.16.4
集群内个节点运行正常
[root@lvs-keepalived01 ~]# ip a|grep 222 inet 172.27.34.222/32 scope global ens160
发现vip运行在lvs-keepalived01上
lvs-keepalived01:
[root@lvs-keepalived01 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.27.34.222:6443 wrr -> 172.27.34.35:6443 Route 10 6 0 -> 172.27.34.36:6443 Route 10 0 0 -> 172.27.34.37:6443 Route 10 38 0
lvs-keepalived02:
[root@lvs-keepalived02 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.27.34.222:6443 wrr -> 172.27.34.35:6443 Route 10 0 0 -> 172.27.34.36:6443 Route 10 0 0 -> 172.27.34.37:6443 Route 10 0 0
关闭lvs-keepalived01,模拟宕机
[root@lvs-keepalived01 ~]# init 0
[root@client ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master01 Ready master 166m v1.16.4 master02 Ready master 148m v1.16.4 master03 Ready master 146m v1.16.4 work01 Ready <none> 141m v1.16.4 work02 Ready <none> 139m v1.16.4 work03 Ready <none> 138m v1.16.4
集群内个节点运行正常
[root@lvs-keepalived02 ~]# ip a|grep 222 inet 172.27.34.222/32 scope global ens160
发现vip已漂移至lvs-keepalived02
lvs-keepalived02:
[root@lvs-keepalived02 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.27.34.222:6443 wrr -> 172.27.34.35:6443 Route 10 1 0 -> 172.27.34.36:6443 Route 10 4 0 -> 172.27.34.37:6443 Route 10 1 0
[root@client ~]# kubectl delete -f nginx-master.yaml deployment.apps "nginx-master" deleted [root@client ~]# kubectl get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-master-75b7bfdb6b-9d66p 0/1 Terminating 0 20m 10.244.3.6 work01 <none> <none> nginx-master-75b7bfdb6b-h4bql 0/1 Terminating 0 20m 10.244.5.5 work03 <none> <none> nginx-master-75b7bfdb6b-zmc68 0/1 Terminating 0 20m 10.244.4.5 work02 <none> <none> [root@client ~]# kubectl get po -o wide No resources found in default namespace.
删除以前新建的pod nginx,成功删除。
当lvs-keepalived集群有一台宕机时,对k8s集群无影响,仍能正常对外提供服务。
本文全部脚本和配置文件已上传github:lvs-keepalived-install-k8s-HA-cluster
单机版k8s集群部署详见:k8s实践(一):Centos7.6部署k8s(v1.14.2)集群
主备高可用版k8s集群部署详见:k8s实践(十五):Centos7.6部署k8s v1.16.4高可用集群(主备模式)