kube-apiserver:html
kube-controller-manager:node
kube-scheduler:python
kubelet:mysql
kube-proxy:linux
集群插件:nginx
注意:git
设置永久主机名称,而后从新登陆:github
$ sudo hostnamectl set-hostname kube-node1 # 将 kube-node1 替换为当前主机名
/etc/hostname
文件中;若是 DNS 不支持解析主机名称,则须要修改每台机器的 /etc/hosts
文件,添加主机名和 IP 的对应关系:golang
cat >> /etc/hosts <<EOF 192.168.75.110 kube-node1 192.168.75.111 kube-node2 192.168.75.112 kube-node3 EOF
在每台机器上添加 docker 帐户:web
useradd -m docker
若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
设置 kube-node1 的 root 帐户能够无密码登陆全部节点:
ssh-keygen -t rsa ssh-copy-id root@kube-node1 ssh-copy-id root@kube-node2 ssh-copy-id root@kube-node3
将可执行文件目录添加到 PATH 环境变量中:
mkdir -p /opt/k8s/bin echo 'PATH=/opt/k8s/bin:$PATH' >>/root/.bashrc source /root/.bashrc
在每台机器上安装依赖包:
CentOS:
yum install -y epel-release yum install -y conntrack ntpdate ntp ipvsadm ipset jq iptables curl sysstat libseccomp wget
Ubuntu:
apt-get install -y conntrack ipvsadm ntp ipset jq iptables curl sysstat libseccomp
在每台机器上关闭防火墙,清理防火墙规则,设置默认转发策略:
systemctl stop firewalld systemctl disable firewalld iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat iptables -P FORWARD ACCEPT
若是开启了 swap 分区,kubelet 会启动失败(能够经过将参数 --fail-swap-on 设置为 false 来忽略 swap on),故须要在每台机器上关闭 swap 分区。同时注释 /etc/fstab
中相应的条目,防止开机自动挂载 swap 分区:
swapoff -a sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
关闭 SELinux,不然后续 K8S 挂载目录时可能报错 Permission denied
:
setenforce 0 sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
linux 系统开启了 dnsmasq 后(如 GUI 环境),将系统 DNS Server 设置为 127.0.0.1,这会致使 docker 容器没法解析域名,须要关闭它:
systemctl stop dnsmasq systemctl disable dnsmasq
modprobe ip_vs_rr modprobe br_netfilter
cat > kubernetes.conf <<EOF net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-ip6tables=1 net.ipv4.ip_forward=1 net.ipv4.tcp_tw_recycle=0 # 禁止使用 swap 空间,只有当系统 OOM 时才容许使用它 vm.swappiness=0 # 不检查物理内存是否够用 vm.overcommit_memory=1 # 开启 OOM vm.panic_on_oom=0 fs.inotify.max_user_instances=8192 fs.inotify.max_user_watches=1048576 fs.file-max=52706963 fs.nr_open=52706963 net.ipv6.conf.all.disable_ipv6=1 net.netfilter.nf_conntrack_max=2310720 vm.max_map_count=655360 EOF cp kubernetes.conf /etc/sysctl.d/kubernetes.conf sysctl -p /etc/sysctl.d/kubernetes.conf
# 调整系统 TimeZone timedatectl set-timezone Asia/Shanghai # 将当前的 UTC 时间写入硬件时钟 timedatectl set-local-rtc 0 # 重启依赖于系统时间的服务 systemctl restart rsyslog systemctl restart crond
systemctl stop postfix && systemctl disable postfix
systemd 的 journald 是 Centos 7 缺省的日志记录工具,它记录了全部系统、内核、Service Unit 的日志。
相比 systemd,journald 记录的日志有以下优点:
journald 默认将日志转发给 rsyslog,这会致使日志写了多份,/var/log/messages 中包含了太多无关日志,不方便后续查看,同时也影响系统性能。
mkdir /var/log/journal # 持久化保存日志的目录 mkdir /etc/systemd/journald.conf.d cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF [Journal] # 持久化保存到磁盘 Storage=persistent # 压缩历史日志 Compress=yes SyncIntervalSec=5m RateLimitInterval=30s RateLimitBurst=1000 # 最大占用空间 10G SystemMaxUse=10G # 单日志文件最大 200M SystemMaxFileSize=200M # 日志保存时间 2 周 MaxRetentionSec=2week # 不将日志转发到 syslog ForwardToSyslog=no EOF systemctl restart systemd-journald
建立目录:
mkdir -p /opt/k8s/{bin,work} /etc/{kubernetes,etcd}/cert
CentOS 7.x 系统自带的 3.10.x 内核存在一些 Bugs,致使运行的 Docker、Kubernetes 不稳定,例如:
解决方案以下:
git clone --branch v1.14.1 --single-branch --depth 1 https://github.com/kubernetes/kubernetes cd kubernetes KUBE_GIT_VERSION=v1.14.1 ./build/run.sh make kubelet GOFLAGS="-tags=nokmem"
这里采用升级内核的解决办法:
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm # 安装完成后检查 /boot/grub2/grub.cfg 中对应内核 menuentry 中是否包含 initrd16 配置,若是没有,再安装一次! yum --enablerepo=elrepo-kernel install -y kernel-lt # 设置开机重新内核启动 grub2-set-default 0
安装内核源文件(可选,在升级完内核并重启机器后执行):
# yum erase kernel-headers yum --enablerepo=elrepo-kernel install kernel-lt-devel-$(uname -r) kernel-lt-headers-$(uname -r)
cp /etc/default/grub{,.bak} vim /etc/default/grub # 在 GRUB_CMDLINE_LINUX 一行添加 `numa=off` 参数,以下所示: diff /etc/default/grub.bak /etc/default/grub 6c6 < GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet" --- > GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet numa=off"
从新生成 grub2 配置文件:
cp /boot/grub2/grub.cfg{,.bak} grub2-mkconfig -o /boot/grub2/grub.cfg
后续使用的环境变量都定义在文件 environment.sh 中,请根据本身的机器、网络状况修改。而后,把它拷贝到全部节点的 /opt/k8s/bin
目录:
#!/usr/bin/bash # 生成 EncryptionConfig 所需的加密 key export ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64) # 集群各机器 IP 数组 export NODE_IPS=(192.168.75.110 192.168.75.111 192.168.75.112) # 集群各 IP 对应的主机名数组 export NODE_NAMES=(kube-node1 kube-node2 kube-node3) # etcd 集群服务地址列表 export ETCD_ENDPOINTS="https://192.168.75.110:2379,https://192.168.75.111:2379,https://192.168.75.112:2379" # etcd 集群间通讯的 IP 和端口 export ETCD_NODES="kube-node1=https://192.168.75.110:2380,kube-node2=https://192.168.75.111:2380,kube-node3=https://192.168.75.112:2380" # kube-apiserver 的反向代理(kube-nginx)地址端口 export KUBE_APISERVER="https://127.0.0.1:8443" # 节点间互联网络接口名称 export VIP_IF="ens33" # etcd 数据目录 export ETCD_DATA_DIR="/data/k8s/etcd/data" # etcd WAL 目录,建议是 SSD 磁盘分区,或者和 ETCD_DATA_DIR 不一样的磁盘分区 export ETCD_WAL_DIR="/data/k8s/etcd/wal" # k8s 各组件数据目录 export K8S_DIR="/data/k8s/k8s" # docker 数据目录 export DOCKER_DIR="/data/k8s/docker" ## 如下参数通常不须要修改 # TLS Bootstrapping 使用的 Token,可使用命令 head -c 16 /dev/urandom | od -An -t x | tr -d ' ' 生成 BOOTSTRAP_TOKEN="41f7e4ba8b7be874fcff18bf5cf41a7c" # 最好使用 当前未用的网段 来定义服务网段和 Pod 网段 # 服务网段,部署前路由不可达,部署后集群内路由可达(kube-proxy 保证) SERVICE_CIDR="10.254.0.0/16" # Pod 网段,建议 /16 段地址,部署前路由不可达,部署后集群内路由可达(flanneld 保证) CLUSTER_CIDR="172.30.0.0/16" # 服务端口范围 (NodePort Range) export NODE_PORT_RANGE="30000-32767" # flanneld 网络配置前缀 export FLANNEL_ETCD_PREFIX="/kubernetes/network" # kubernetes 服务 IP (通常是 SERVICE_CIDR 中第一个IP) export CLUSTER_KUBERNETES_SVC_IP="10.254.0.1" # 集群 DNS 服务 IP (从 SERVICE_CIDR 中预分配) export CLUSTER_DNS_SVC_IP="10.254.0.2" # 集群 DNS 域名(末尾不带点号) export CLUSTER_DNS_DOMAIN="cluster.local" # 将二进制目录 /opt/k8s/bin 加到 PATH 中 export PATH=/opt/k8s/bin:$PATH
source environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp environment.sh root@${node_ip}:/opt/k8s/bin/ ssh root@${node_ip} "chmod +x /opt/k8s/bin/*" done # 这个脚本使用的是用户名@ip的形式,结果仍是须要输入响应的用户密码才行,考虑上前面步骤配置的无密码ssh登录其余节点,这里能够考虑改换成hostname的形式进行 source environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" scp environment.sh root@${node_name}:/opt/k8s/bin/ ssh root@${node_name} "chmod +x /opt/k8s/bin/*" done
为确保安全,kubernetes
系统各组件须要使用 x509
证书对通讯进行加密和认证。
CA (Certificate Authority) 是自签名的根证书,用来签名后续建立的其它证书。
本文档使用 CloudFlare
的 PKI 工具集 cfssl 建立全部证书。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1节点上执行,而后远程分发文件和执行命令。
mkdir -p /opt/k8s/cert && cd /opt/k8s wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 mv cfssl_linux-amd64 /opt/k8s/bin/cfssl wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 mv cfssljson_linux-amd64 /opt/k8s/bin/cfssljson wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 mv cfssl-certinfo_linux-amd64 /opt/k8s/bin/cfssl-certinfo chmod +x /opt/k8s/bin/* export PATH=/opt/k8s/bin:$PATH
CA 证书是集群全部节点共享的,只须要建立一个 CA 证书,后续建立的全部证书都由它签名。
CA 配置文件用于配置根证书的使用场景 (profile) 和具体参数 (usage,过时时间、服务端认证、客户端认证、加密等),后续在签名其它证书时须要指定特定场景。
cd /opt/k8s/work cat > ca-config.json <<EOF { "signing": { "default": { "expiry": "87600h" }, "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "87600h" } } } } EOF
signing
:表示该证书可用于签名其它证书,生成的 ca.pem
证书中 CA=TRUE
;server auth
:表示 client 能够用该该证书对 server 提供的证书进行验证;client auth
:表示 server 能够用该该证书对 client 提供的证书进行验证;cd /opt/k8s/work cat > ca-csr.json <<EOF { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ], "ca": { "expiry": "876000h" } } EOF
Common Name
,kube-apiserver 从证书中提取该字段做为请求的用户名 (User Name),浏览器使用该字段验证网站是否合法;Organization
,kube-apiserver 从证书中提取该字段做为请求用户所属的组 (Group);RBAC
受权的用户标识;cd /opt/k8s/work cfssl gencert -initca ca-csr.json | cfssljson -bare ca ls ca*
将生成的 CA 证书、秘钥文件、配置文件拷贝到全部节点的 /etc/kubernetes/cert
目录下:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert" scp ca*.pem ca-config.json root@${node_ip}:/etc/kubernetes/cert done # 这个脚本使用的是用户名@ip的形式,结果仍是须要输入响应的用户密码才行,考虑上前面步骤配置的无密码ssh登录其余节点,这里能够考虑改换成hostname的形式进行 cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" ssh root@${node_name} "mkdir -p /etc/kubernetes/cert" scp ca*.pem ca-config.json root@${node_name}:/etc/kubernetes/cert done
1. 各类 CA 证书类型:https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/auth.md
本文档介绍安装和配置 kubernetes 集群的命令行管理工具 kubectl 的步骤。
kubectl 默认从 ~/.kube/config
文件读取 kube-apiserver 地址和认证信息,若是没有配置,执行 kubectl 命令时可能会出错:
$ kubectl get pods The connection to the server localhost:8080 was refused - did you specify the right host or port?
注意:
~/.kube/config
;下载和解压:
cd /opt/k8s/work # 使用迅雷下载后上传 wget https://dl.k8s.io/v1.14.2/kubernetes-client-linux-amd64.tar.gz tar -xzvf kubernetes-client-linux-amd64.tar.gz
分发到全部使用 kubectl 的节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kubernetes/client/bin/kubectl root@${node_ip}:/opt/k8s/bin/ ssh root@${node_ip} "chmod +x /opt/k8s/bin/*" done # 使用主机名的脚本 source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" scp kubernetes/client/bin/kubectl root@${node_name}:/opt/k8s/bin/ ssh root@${node_name} "chmod +x /opt/k8s/bin/*" done
kubectl 与 apiserver https 安全端口通讯,apiserver 对提供的证书进行认证和受权。
kubectl 做为集群的管理工具,须要被授予最高权限,这里建立具备最高权限的 admin 证书。
建立证书签名请求:
cd /opt/k8s/work cat > admin-csr.json <<EOF { "CN": "admin", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "system:masters", "OU": "4Paradigm" } ] } EOF
system:masters
,kube-apiserver 收到该证书后将请求的 Group 设置为 system:masters;cluster-admin
将 Group system:masters
与 Role cluster-admin
绑定,该 Role 授予全部 API的权限;生成证书和私钥:
cd /opt/k8s/work cfssl gencert -ca=/opt/k8s/work/ca.pem \ -ca-key=/opt/k8s/work/ca-key.pem \ -config=/opt/k8s/work/ca-config.json \ -profile=kubernetes admin-csr.json | cfssljson -bare admin ls admin*
kubeconfig 为 kubectl 的配置文件,包含访问 apiserver 的全部信息,如 apiserver 地址、CA 证书和自身使用的证书;
cd /opt/k8s/work source /opt/k8s/bin/environment.sh # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/opt/k8s/work/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kubectl.kubeconfig # 设置客户端认证参数 kubectl config set-credentials admin \ --client-certificate=/opt/k8s/work/admin.pem \ --client-key=/opt/k8s/work/admin-key.pem \ --embed-certs=true \ --kubeconfig=kubectl.kubeconfig # 设置上下文参数 kubectl config set-context kubernetes \ --cluster=kubernetes \ --user=admin \ --kubeconfig=kubectl.kubeconfig # 设置默认上下文 kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfig
--certificate-authority
:验证 kube-apiserver 证书的根证书;--client-certificate
、--client-key
:刚生成的 admin
证书和私钥,链接 kube-apiserver 时使用;--embed-certs=true
:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl.kubeconfig 文件中(不加时,写入的是证书文件路径,后续拷贝 kubeconfig 到其它机器时,还须要单独拷贝证书文件,不方便。);分发到全部使用 kubectl
命令的节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ~/.kube" scp kubectl.kubeconfig root@${node_ip}:~/.kube/config done # 使用主机名的脚本 source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" ssh root@${node_name} "mkdir -p ~/.kube" scp kubectl.kubeconfig root@${node_name}:~/.kube/config done
~/.kube/config
;etcd 是基于 Raft 的分布式 key-value 存储系统,由 CoreOS 开发,经常使用于服务发现、共享配置以及并发控制(如 leader 选举、分布式锁等)。kubernetes 使用 etcd 存储全部运行数据。
本文档介绍部署一个三节点高可用 etcd 集群的步骤:
etcd 集群各节点的IP和 名称 以下:
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
到 etcd 的 release 页面 下载最新版本的发布包:
cd /opt/k8s/work wget https://github.com/coreos/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz tar -xvf etcd-v3.3.13-linux-amd64.tar.gz
分发二进制文件到集群全部节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp etcd-v3.3.13-linux-amd64/etcd* root@${node_ip}:/opt/k8s/bin ssh root@${node_ip} "chmod +x /opt/k8s/bin/*" done # 使用主机名的脚本 source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" scp etcd-v3.3.13-linux-amd64/etcd* root@${node_name}:/opt/k8s/bin ssh root@${node_name} "chmod +x /opt/k8s/bin/*" done
建立证书签名请求:
cd /opt/k8s/work cat > etcd-csr.json <<EOF { "CN": "etcd", "hosts": [ "127.0.0.1", "192.168.75.110", "192.168.75.111", "192.168.75.112" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ] } EOF
生成证书和私钥:
cd /opt/k8s/work cfssl gencert -ca=/opt/k8s/work/ca.pem \ -ca-key=/opt/k8s/work/ca-key.pem \ -config=/opt/k8s/work/ca-config.json \ -profile=kubernetes etcd-csr.json | cfssljson -bare etcd ls etcd*pem
分发生成的证书和私钥到各 etcd 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /etc/etcd/cert" scp etcd*.pem root@${node_ip}:/etc/etcd/cert/ done # 使用主机名脚本 source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" ssh root@${node_name} "mkdir -p /etc/etcd/cert" scp etcd*.pem root@${node_name}:/etc/etcd/cert/ done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > etcd.service.template <<EOF [Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target Documentation=https://github.com/coreos [Service] Type=notify WorkingDirectory=${ETCD_DATA_DIR} ExecStart=/opt/k8s/bin/etcd \\ --data-dir=${ETCD_DATA_DIR} \\ --wal-dir=${ETCD_WAL_DIR} \\ --name=##NODE_NAME## \\ --cert-file=/etc/etcd/cert/etcd.pem \\ --key-file=/etc/etcd/cert/etcd-key.pem \\ --trusted-ca-file=/etc/kubernetes/cert/ca.pem \\ --peer-cert-file=/etc/etcd/cert/etcd.pem \\ --peer-key-file=/etc/etcd/cert/etcd-key.pem \\ --peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem \\ --peer-client-cert-auth \\ --client-cert-auth \\ --listen-peer-urls=https://##NODE_IP##:2380 \\ --initial-advertise-peer-urls=https://##NODE_IP##:2380 \\ --listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379 \\ --advertise-client-urls=https://##NODE_IP##:2379 \\ --initial-cluster-token=etcd-cluster-0 \\ --initial-cluster=${ETCD_NODES} \\ --initial-cluster-state=new \\ --auto-compaction-mode=periodic \\ --auto-compaction-retention=1 \\ --max-request-bytes=33554432 \\ --quota-backend-bytes=6442450944 \\ --heartbeat-interval=250 \\ --election-timeout=2000 Restart=on-failure RestartSec=5 LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF
WorkingDirectory
、--data-dir
:指定工做目录和数据目录为 ${ETCD_DATA_DIR}
,需在启动服务前建立这个目录;--wal-dir
:指定 wal 目录,为了提升性能,通常使用 SSD 或者和 --data-dir
不一样的磁盘;--name
:指定节点名称,当 --initial-cluster-state
值为 new
时,--name
的参数值必须位于 --initial-cluster
列表中;--cert-file
、--key-file
:etcd server 与 client 通讯时使用的证书和私钥;--trusted-ca-file
:签名 client 证书的 CA 证书,用于验证 client 证书;--peer-cert-file
、--peer-key-file
:etcd 与 peer 通讯使用的证书和私钥;--peer-trusted-ca-file
:签名 peer 证书的 CA 证书,用于验证 peer 证书;替换模板文件中的变量,为各节点建立 systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for (( i=0; i < 3; i++ )) do sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" etcd.service.template > etcd-${NODE_IPS[i]}.service done ls *.service
分发生成的 systemd unit 文件:
cd /opt/k8s/work # 由于生成的etcd.service文件中是以ip进行区分的,这里不能使用主机名的形式 source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp etcd-${node_ip}.service root@${node_ip}:/etc/systemd/system/etcd.service done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ${ETCD_DATA_DIR} ${ETCD_WAL_DIR}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd " & done
systemctl start etcd
会卡住一段时间,为正常现象;cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status etcd|grep Active" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u etcd
部署完 etcd 集群后,在任一 etcd 节点上执行以下命令:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ETCDCTL_API=3 /opt/k8s/bin/etcdctl \ --endpoints=https://${node_ip}:2379 \ --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/etcd/cert/etcd.pem \ --key=/etc/etcd/cert/etcd-key.pem endpoint health done
预期输出:
>>> 192.168.75.110 https://192.168.75.110:2379 is healthy: successfully committed proposal: took = 69.349466ms >>> 192.168.75.111 https://192.168.75.111:2379 is healthy: successfully committed proposal: took = 2.989018ms >>> 192.168.75.112 https://192.168.75.112:2379 is healthy: successfully committed proposal: took = 1.926582ms
输出均为 healthy
时表示集群服务正常。
查看当前的 leader
source /opt/k8s/bin/environment.sh ETCDCTL_API=3 /opt/k8s/bin/etcdctl \ -w table --cacert=/etc/kubernetes/cert/ca.pem \ --cert=/etc/etcd/cert/etcd.pem \ --key=/etc/etcd/cert/etcd-key.pem \ --endpoints=${ETCD_ENDPOINTS} endpoint status
输出:
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +-----------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://192.168.75.110:2379 | f3373394e2909c16 | 3.3.13 | 20 kB | true | 2 | 8 | | https://192.168.75.111:2379 | bd1095e88a91da45 | 3.3.13 | 20 kB | false | 2 | 8 | | https://192.168.75.112:2379 | 110570bfaa8447c2 | 3.3.13 | 20 kB | false | 2 | 8 | +-----------------------------+------------------+---------+---------+-----------+-----------+------------+
kubernetes 要求集群内各节点(包括 master 节点)能经过 Pod 网段互联互通。flannel 使用 vxlan 技术为各节点建立一个能够互通的 Pod 网络,使用的端口为 UDP 8472(须要开放该端口,如公有云 AWS 等)。
flanneld 第一次启动时,从 etcd 获取配置的 Pod 网段信息,为本节点分配一个未使用的地址段,而后建立 flannedl.1
网络接口(也多是其它名称,如 flannel1 等)。
flannel 将分配给本身的 Pod 网段信息写入 /run/flannel/docker
文件,docker 后续使用这个文件中的环境变量设置 docker0
网桥,从而从这个地址段为本节点的全部 Pod 容器分配 IP。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
从 flannel 的 release 页面 下载最新版本的安装包:
cd /opt/k8s/work mkdir flannel wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz tar -xzvf flannel-v0.11.0-linux-amd64.tar.gz -C flannel
分发二进制文件到集群全部节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp flannel/{flanneld,mk-docker-opts.sh} root@${node_ip}:/opt/k8s/bin/ ssh root@${node_ip} "chmod +x /opt/k8s/bin/*" done
flanneld 从 etcd 集群存取网段分配信息,而 etcd 集群启用了双向 x509 证书认证,因此须要为 flanneld 生成证书和私钥。
建立证书签名请求:
cd /opt/k8s/work cat > flanneld-csr.json <<EOF { "CN": "flanneld", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ] } EOF
生成证书和私钥:
cfssl gencert -ca=/opt/k8s/work/ca.pem \ -ca-key=/opt/k8s/work/ca-key.pem \ -config=/opt/k8s/work/ca-config.json \ -profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld ls flanneld*pem
将生成的证书和私钥分发到全部节点(master 和 worker):
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /etc/flanneld/cert" scp flanneld*.pem root@${node_ip}:/etc/flanneld/cert done
注意:本步骤只需执行一次。
cd /opt/k8s/work source /opt/k8s/bin/environment.sh etcdctl \ --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/opt/k8s/work/ca.pem \ --cert-file=/opt/k8s/work/flanneld.pem \ --key-file=/opt/k8s/work/flanneld-key.pem \ mk ${FLANNEL_ETCD_PREFIX}/config '{"Network":"'${CLUSTER_CIDR}'", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}'
${CLUSTER_CIDR}
地址段(如 /16)必须小于 SubnetLen
,必须与 kube-controller-manager
的 --cluster-cidr
参数值一致;cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > flanneld.service << EOF [Unit] Description=Flanneld overlay address etcd agent After=network.target After=network-online.target Wants=network-online.target After=etcd.service Before=docker.service [Service] Type=notify ExecStart=/opt/k8s/bin/flanneld \\ -etcd-cafile=/etc/kubernetes/cert/ca.pem \\ -etcd-certfile=/etc/flanneld/cert/flanneld.pem \\ -etcd-keyfile=/etc/flanneld/cert/flanneld-key.pem \\ -etcd-endpoints=${ETCD_ENDPOINTS} \\ -etcd-prefix=${FLANNEL_ETCD_PREFIX} \\ -iface=${IFACE} \\ -ip-masq ExecStartPost=/opt/k8s/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker Restart=always RestartSec=5 StartLimitInterval=0 [Install] WantedBy=multi-user.target RequiredBy=docker.service EOF
mk-docker-opts.sh
脚本将分配给 flanneld 的 Pod 子网段信息写入 /run/flannel/docker
文件,后续 docker 启动时使用这个文件中的环境变量配置 docker0 网桥;-iface
参数指定通讯接口;-ip-masq
: flanneld 为访问 Pod 网络外的流量设置 SNAT 规则,同时将传递给 Docker 的变量 --ip-masq
(/run/flannel/docker
文件中)设置为 false,这样 Docker 将再也不建立 SNAT 规则; Docker 的 --ip-masq
为 true 时,建立的 SNAT 规则比较“暴力”:将全部本节点 Pod 发起的、访问非 docker0 接口的请求作 SNAT,这样访问其余节点 Pod 的请求来源 IP 会被设置为 flannel.1 接口的 IP,致使目的 Pod 看不到真实的来源 Pod IP。 flanneld 建立的 SNAT 规则比较温和,只对访问非 Pod 网段的请求作 SNAT。cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp flanneld.service root@${node_ip}:/etc/systemd/system/ done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld" done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status flanneld|grep Active" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u flanneld
查看集群 Pod 网段(/16):
source /opt/k8s/bin/environment.sh etcdctl \ --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/etc/kubernetes/cert/ca.pem \ --cert-file=/etc/flanneld/cert/flanneld.pem \ --key-file=/etc/flanneld/cert/flanneld-key.pem \ get ${FLANNEL_ETCD_PREFIX}/config
输出:
{"Network":"172.30.0.0/16", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}
查看已分配的 Pod 子网段列表(/24):
source /opt/k8s/bin/environment.sh etcdctl \ --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/etc/kubernetes/cert/ca.pem \ --cert-file=/etc/flanneld/cert/flanneld.pem \ --key-file=/etc/flanneld/cert/flanneld-key.pem \ ls ${FLANNEL_ETCD_PREFIX}/subnets
输出(结果视部署状况而定):
/kubernetes/network/subnets/172.30.24.0-21 /kubernetes/network/subnets/172.30.40.0-21 /kubernetes/network/subnets/172.30.200.0-21
查看某一 Pod 网段对应的节点 IP 和 flannel 接口地址:
source /opt/k8s/bin/environment.sh etcdctl \ --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/etc/kubernetes/cert/ca.pem \ --cert-file=/etc/flanneld/cert/flanneld.pem \ --key-file=/etc/flanneld/cert/flanneld-key.pem \ get ${FLANNEL_ETCD_PREFIX}/subnets/172.30.24.0-21
输出(结果视部署状况而定):
{"PublicIP":"192.168.75.110","BackendType":"vxlan","BackendData":{"VtepMAC":"62:08:2f:f4:b8:a9"}}
[root@kube-node1 work]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:4f:53:fa brd ff:ff:ff:ff:ff:ff inet 192.168.75.110/24 brd 192.168.75.255 scope global ens33 valid_lft forever preferred_lft forever 3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether 62:08:2f:f4:b8:a9 brd ff:ff:ff:ff:ff:ff inet 172.30.24.0/32 scope global flannel.1 valid_lft forever preferred_lft forever
[root@kube-node1 work]# ip route show |grep flannel.1 172.30.40.0/21 via 172.30.40.0 dev flannel.1 onlink 172.30.200.0/21 via 172.30.200.0 dev flannel.1 onlink
${FLANNEL_ETCD_PREFIX}/subnets/172.30.24.0-21
,来决定进请求发送给哪一个节点的互联 IP;在各节点上部署 flannel 后,检查是否建立了 flannel 接口(名称可能为 flannel0、flannel.0、flannel.1 等):
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh ${node_ip} "/usr/sbin/ip addr show flannel.1|grep -w inet" done
输出:
>>> 192.168.75.110 inet 172.30.24.0/32 scope global flannel.1 >>> 192.168.75.111 inet 172.30.40.0/32 scope global flannel.1 >>> 192.168.75.112 inet 172.30.200.0/32 scope global flannel.1
在各节点上 ping 全部 flannel 接口 IP,确保能通:
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh ${node_ip} "ping -c 1 172.30.80.0" ssh ${node_ip} "ping -c 1 172.30.32.0" ssh ${node_ip} "ping -c 1 172.30.184.0" done
本文档讲解使用 nginx 4 层透明代理功能实现 K8S 节点( master 节点和 worker 节点)高可用访问 kube-apiserver 的步骤。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
下载源码:
cd /opt/k8s/work wget http://nginx.org/download/nginx-1.15.3.tar.gz tar -xzvf nginx-1.15.3.tar.gz
配置编译参数:
cd /opt/k8s/work/nginx-1.15.3 mkdir nginx-prefix ./configure --with-stream --without-http --prefix=$(pwd)/nginx-prefix --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module
--with-stream
:开启 4 层透明转发(TCP Proxy)功能;--without-xxx
:关闭全部其余功能,这样生成的动态连接二进制程序依赖最小;输出:
Configuration summary + PCRE library is not used + OpenSSL library is not used + zlib library is not used nginx path prefix: "/root/tmp/nginx-1.15.3/nginx-prefix" nginx binary file: "/root/tmp/nginx-1.15.3/nginx-prefix/sbin/nginx" nginx modules path: "/root/tmp/nginx-1.15.3/nginx-prefix/modules" nginx configuration prefix: "/root/tmp/nginx-1.15.3/nginx-prefix/conf" nginx configuration file: "/root/tmp/nginx-1.15.3/nginx-prefix/conf/nginx.conf" nginx pid file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/nginx.pid" nginx error log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/error.log" nginx http access log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/access.log" nginx http client request body temporary files: "client_body_temp" nginx http proxy temporary files: "proxy_temp"
编译和安装:
cd /opt/k8s/work/nginx-1.15.3 make && make install
cd /opt/k8s/work/nginx-1.15.3 ./nginx-prefix/sbin/nginx -v
输出:
nginx version: nginx/1.15.3
查看 nginx 动态连接的库:
$ ldd ./nginx-prefix/sbin/nginx
输出:
linux-vdso.so.1 => (0x00007ffc945e7000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f4385072000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4384e56000) libc.so.6 => /lib64/libc.so.6 (0x00007f4384a89000) /lib64/ld-linux-x86-64.so.2 (0x00007f4385276000)
建立目录结构:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}" done
拷贝二进制程序:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}" scp /opt/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx root@${node_ip}:/opt/k8s/kube-nginx/sbin/kube-nginx ssh root@${node_ip} "chmod a+x /opt/k8s/kube-nginx/sbin/*" done
配置 nginx,开启 4 层透明转发功能:
cd /opt/k8s/work cat > kube-nginx.conf << \EOF worker_processes 1; events { worker_connections 1024; } stream { upstream backend { hash $remote_addr consistent; server 192.168.75.110:6443 max_fails=3 fail_timeout=30s; server 192.168.75.111:6443 max_fails=3 fail_timeout=30s; server 192.168.75.112:6443 max_fails=3 fail_timeout=30s; } server { listen 127.0.0.1:8443; proxy_connect_timeout 1s; proxy_pass backend; } } EOF
分发配置文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-nginx.conf root@${node_ip}:/opt/k8s/kube-nginx/conf/kube-nginx.conf done
配置 kube-nginx systemd unit 文件:
cd /opt/k8s/work cat > kube-nginx.service <<EOF [Unit] Description=kube-apiserver nginx proxy After=network.target After=network-online.target Wants=network-online.target [Service] Type=forking ExecStartPre=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -t ExecStart=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx ExecReload=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -s reload PrivateTmp=true Restart=always RestartSec=5 StartLimitInterval=0 LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF
分发 systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-nginx.service root@${node_ip}:/etc/systemd/system/ done
启动 kube-nginx 服务:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx" done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-nginx |grep 'Active:'" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u kube-nginx
kubernetes master 节点运行以下组件:
kube-apiserver、kube-scheduler 和 kube-controller-manager 均以多实例模式运行:
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
参考 06-0.apiserver高可用之nginx代理.md
从 CHANGELOG 页面 下载二进制 tar 文件并解压:
cd /opt/k8s/work # 使用迅雷下载后上传,不过有个问题,迅雷下载后的文件名是kubernetes-server-linux-amd64.tar.tar。注意后缀不是gz,使用的时候须要修改一下 wget https://dl.k8s.io/v1.14.2/kubernetes-server-linux-amd64.tar.gz tar -xzvf kubernetes-server-linux-amd64.tar.gz cd kubernetes tar -xzvf kubernetes-src.tar.gz
将二进制文件拷贝到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kubernetes/server/bin/{apiextensions-apiserver,cloud-controller-manager,kube-apiserver,kube-controller-manager,kube-proxy,kube-scheduler,kubeadm,kubectl,kubelet,mounter} root@${node_ip}:/opt/k8s/bin/ ssh root@${node_ip} "chmod +x /opt/k8s/bin/*" done
本文档讲解部署一个三实例 kube-apiserver 集群的步骤,它们经过 kube-nginx 进行代理访问,从而保证服务可用性。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md
建立证书签名请求:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kubernetes-csr.json <<EOF { "CN": "kubernetes", "hosts": [ "127.0.0.1", "172.27.137.240", "172.27.137.239", "172.27.137.238", "${CLUSTER_KUBERNETES_SVC_IP}", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local." ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ] } EOF
hosts 字段指定受权使用该证书的 IP 和域名列表,这里列出了 master 节点 IP、kubernetes 服务的 IP 和域名;
kubernetes 服务 IP 是 apiserver 自动建立的,通常是 --service-cluster-ip-range
参数指定的网段的第一个IP,后续能够经过下面命令获取:
$ kubectl get svc kubernetes NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes 10.254.0.1 <none> 443/TCP 1d
生成证书和私钥:
cfssl gencert -ca=/opt/k8s/work/ca.pem \ -ca-key=/opt/k8s/work/ca-key.pem \ -config=/opt/k8s/work/ca-config.json \ -profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes ls kubernetes*pem
将生成的证书和私钥文件拷贝到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert" scp kubernetes*.pem root@${node_ip}:/etc/kubernetes/cert/ done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > encryption-config.yaml <<EOF kind: EncryptionConfig apiVersion: v1 resources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: ${ENCRYPTION_KEY} - identity: {} EOF
将加密配置文件拷贝到 master 节点的 /etc/kubernetes
目录下:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp encryption-config.yaml root@${node_ip}:/etc/kubernetes/ done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > audit-policy.yaml <<EOF apiVersion: audit.k8s.io/v1beta1 kind: Policy rules: # The following requests were manually identified as high-volume and low-risk, so drop them. - level: None resources: - group: "" resources: - endpoints - services - services/status users: - 'system:kube-proxy' verbs: - watch - level: None resources: - group: "" resources: - nodes - nodes/status userGroups: - 'system:nodes' verbs: - get - level: None namespaces: - kube-system resources: - group: "" resources: - endpoints users: - 'system:kube-controller-manager' - 'system:kube-scheduler' - 'system:serviceaccount:kube-system:endpoint-controller' verbs: - get - update - level: None resources: - group: "" resources: - namespaces - namespaces/status - namespaces/finalize users: - 'system:apiserver' verbs: - get # Don't log HPA fetching metrics. - level: None resources: - group: metrics.k8s.io users: - 'system:kube-controller-manager' verbs: - get - list # Don't log these read-only URLs. - level: None nonResourceURLs: - '/healthz*' - /version - '/swagger*' # Don't log events requests. - level: None resources: - group: "" resources: - events # node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes - level: Request omitStages: - RequestReceived resources: - group: "" resources: - nodes/status - pods/status users: - kubelet - 'system:node-problem-detector' - 'system:serviceaccount:kube-system:node-problem-detector' verbs: - update - patch - level: Request omitStages: - RequestReceived resources: - group: "" resources: - nodes/status - pods/status userGroups: - 'system:nodes' verbs: - update - patch # deletecollection calls can be large, don't log responses for expected namespace deletions - level: Request omitStages: - RequestReceived users: - 'system:serviceaccount:kube-system:namespace-controller' verbs: - deletecollection # Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data, # so only log at the Metadata level. - level: Metadata omitStages: - RequestReceived resources: - group: "" resources: - secrets - configmaps - group: authentication.k8s.io resources: - tokenreviews # Get repsonses can be large; skip them. - level: Request omitStages: - RequestReceived resources: - group: "" - group: admissionregistration.k8s.io - group: apiextensions.k8s.io - group: apiregistration.k8s.io - group: apps - group: authentication.k8s.io - group: authorization.k8s.io - group: autoscaling - group: batch - group: certificates.k8s.io - group: extensions - group: metrics.k8s.io - group: networking.k8s.io - group: policy - group: rbac.authorization.k8s.io - group: scheduling.k8s.io - group: settings.k8s.io - group: storage.k8s.io verbs: - get - list - watch # Default level for known APIs - level: RequestResponse omitStages: - RequestReceived resources: - group: "" - group: admissionregistration.k8s.io - group: apiextensions.k8s.io - group: apiregistration.k8s.io - group: apps - group: authentication.k8s.io - group: authorization.k8s.io - group: autoscaling - group: batch - group: certificates.k8s.io - group: extensions - group: metrics.k8s.io - group: networking.k8s.io - group: policy - group: rbac.authorization.k8s.io - group: scheduling.k8s.io - group: settings.k8s.io - group: storage.k8s.io # Default level for all other requests. - level: Metadata omitStages: - RequestReceived EOF
分发审计策略文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp audit-policy.yaml root@${node_ip}:/etc/kubernetes/audit-policy.yaml done
建立证书签名请求:
cat > proxy-client-csr.json <<EOF { "CN": "aggregator", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ] } EOF
--requestheader-allowed-names
参数中,不然后续访问 metrics 时会提示权限不足。生成证书和私钥:
cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \ -ca-key=/etc/kubernetes/cert/ca-key.pem \ -config=/etc/kubernetes/cert/ca-config.json \ -profile=kubernetes proxy-client-csr.json | cfssljson -bare proxy-client ls proxy-client*.pem
将生成的证书和私钥文件拷贝到全部 master 节点:
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp proxy-client*.pem root@${node_ip}:/etc/kubernetes/cert/ done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kube-apiserver.service.template <<EOF [Unit] Description=Kubernetes API Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target [Service] WorkingDirectory=${K8S_DIR}/kube-apiserver ExecStart=/opt/k8s/bin/kube-apiserver \\ --advertise-address=##NODE_IP## \\ --default-not-ready-toleration-seconds=360 \\ --default-unreachable-toleration-seconds=360 \\ --feature-gates=DynamicAuditing=true \\ --max-mutating-requests-inflight=2000 \\ --max-requests-inflight=4000 \\ --default-watch-cache-size=200 \\ --delete-collection-workers=2 \\ --encryption-provider-config=/etc/kubernetes/encryption-config.yaml \\ --etcd-cafile=/etc/kubernetes/cert/ca.pem \\ --etcd-certfile=/etc/kubernetes/cert/kubernetes.pem \\ --etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem \\ --etcd-servers=${ETCD_ENDPOINTS} \\ --bind-address=##NODE_IP## \\ --secure-port=6443 \\ --tls-cert-file=/etc/kubernetes/cert/kubernetes.pem \\ --tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem \\ --insecure-port=0 \\ --audit-dynamic-configuration \\ --audit-log-maxage=15 \\ --audit-log-maxbackup=3 \\ --audit-log-maxsize=100 \\ --audit-log-truncate-enabled \\ --audit-log-path=${K8S_DIR}/kube-apiserver/audit.log \\ --audit-policy-file=/etc/kubernetes/audit-policy.yaml \\ --profiling \\ --anonymous-auth=false \\ --client-ca-file=/etc/kubernetes/cert/ca.pem \\ --enable-bootstrap-token-auth \\ --requestheader-allowed-names="aggregator" \\ --requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \\ --requestheader-extra-headers-prefix="X-Remote-Extra-" \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-username-headers=X-Remote-User \\ --service-account-key-file=/etc/kubernetes/cert/ca.pem \\ --authorization-mode=Node,RBAC \\ --runtime-config=api/all=true \\ --enable-admission-plugins=NodeRestriction \\ --allow-privileged=true \\ --apiserver-count=3 \\ --event-ttl=168h \\ --kubelet-certificate-authority=/etc/kubernetes/cert/ca.pem \\ --kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem \\ --kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem \\ --kubelet-https=true \\ --kubelet-timeout=10s \\ --proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem \\ --proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem \\ --service-cluster-ip-range=${SERVICE_CIDR} \\ --service-node-port-range=${NODE_PORT_RANGE} \\ --logtostderr=true \\ --v=2 Restart=on-failure RestartSec=10 Type=notify LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF
--advertise-address
:apiserver 对外通告的 IP(kubernetes 服务后端节点 IP);--default-*-toleration-seconds
:设置节点异常相关的阈值;--max-*-requests-inflight
:请求相关的最大阈值;--etcd-*
:访问 etcd 的证书和 etcd 服务器地址;--experimental-encryption-provider-config
:指定用于加密 etcd 中 secret 的配置;--bind-address
: https 监听的 IP,不能为 127.0.0.1
,不然外界不能访问它的安全端口 6443;--secret-port
:https 监听端口;--insecure-port=0
:关闭监听 http 非安全端口(8080);--tls-*-file
:指定 apiserver 使用的证书、私钥和 CA 文件;--audit-*
:配置审计策略和审计日志文件相关的参数;--client-ca-file
:验证 client (kue-controller-manager、kube-scheduler、kubelet、kube-proxy 等)请求所带的证书;--enable-bootstrap-token-auth
:启用 kubelet bootstrap 的 token 认证;--requestheader-*
:kube-apiserver 的 aggregator layer 相关的配置参数,proxy-client & HPA 须要使用;--requestheader-client-ca-file
:用于签名 --proxy-client-cert-file
和 --proxy-client-key-file
指定的证书;在启用了 metric aggregator 时使用;--requestheader-allowed-names
:不能为空,值为逗号分割的 --proxy-client-cert-file
证书的 CN 名称,这里设置为 "aggregator";--service-account-key-file
:签名 ServiceAccount Token 的公钥文件,kube-controller-manager 的 --service-account-private-key-file
指定私钥文件,二者配对使用;--runtime-config=api/all=true
: 启用全部版本的 APIs,如 autoscaling/v2alpha1;--authorization-mode=Node,RBAC
、--anonymous-auth=false
: 开启 Node 和 RBAC 受权模式,拒绝未受权的请求;--enable-admission-plugins
:启用一些默认关闭的 plugins;--allow-privileged
:运行执行 privileged 权限的容器;--apiserver-count=3
:指定 apiserver 实例的数量;--event-ttl
:指定 events 的保存时间;--kubelet-*
:若是指定,则使用 https 访问 kubelet APIs;须要为证书对应的用户(上面 kubernetes*.pem 证书的用户为 kubernetes) 用户定义 RBAC 规则,不然访问 kubelet API 时提示未受权;--proxy-client-*
:apiserver 访问 metrics-server 使用的证书;--service-cluster-ip-range
: 指定 Service Cluster IP 地址段;--service-node-port-range
: 指定 NodePort 的端口范围;若是 kube-apiserver 机器没有运行 kube-proxy,则还须要添加 --enable-aggregator-routing=true
参数;
关于 --requestheader-XXX
相关参数,参考:
注意:
--requestheader-allowed-names
不为空,且 --proxy-client-cert-file
证书的 CN 名称不在 allowed-names 中,则后续查看 node 或 pods 的 metrics 失败,提示:[root@zhangjun-k8s01 1.8+]# kubectl top nodes Error from server (Forbidden): nodes.metrics.k8s.io is forbidden: User "aggregator" cannot list resource "nodes" in API group "metrics.k8s.io" at the cluster scope
替换模板文件中的变量,为各节点生成 systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for (( i=0; i < 3; i++ )) do sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-apiserver.service.template > kube-apiserver-${NODE_IPS[i]}.service done ls kube-apiserver*.service
分发生成的 systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-apiserver-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-apiserver.service done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-apiserver" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-apiserver && systemctl restart kube-apiserver" done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-apiserver |grep 'Active:'" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u kube-apiserver
source /opt/k8s/bin/environment.sh ETCDCTL_API=3 etcdctl \ --endpoints=${ETCD_ENDPOINTS} \ --cacert=/opt/k8s/work/ca.pem \ --cert=/opt/k8s/work/etcd.pem \ --key=/opt/k8s/work/etcd-key.pem \ get /registry/ --prefix --keys-only
$ kubectl cluster-info Kubernetes master is running at https://127.0.0.1:8443 To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. $ kubectl get all --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default service/kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 12m $ kubectl get componentstatuses NAME STATUS MESSAGE ERROR controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused etcd-0 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"}
若是执行 kubectl 命令式时输出以下错误信息,则说明使用的 ~/.kube/config
文件不对,先检查该文件是否存在,而后再检查该文件中参数是否缺乏值,而后再执行该命令:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
执行 kubectl get componentstatuses
命令时,apiserver 默认向 127.0.0.1 发送请求。当 controller-manager、scheduler 以集群模式运行时,有可能和 kube-apiserver 不在一台机器上,这时 controller-manager 或 scheduler 的状态为 Unhealthy,但实际上它们工做正常。
$ sudo netstat -lnpt|grep kube tcp 0 0 172.27.137.240:6443 0.0.0.0:* LISTEN 101442/kube-apiserv
在执行 kubectl exec、run、logs 等命令时,apiserver 会将请求转发到 kubelet 的 https 端口。这里定义 RBAC 规则,受权 apiserver 使用的证书(kubernetes.pem)用户名(CN:kuberntes)访问 kubelet API 的权限:
kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes
本文档介绍部署高可用 kube-controller-manager 集群的步骤。
该集群包含 3 个节点,启动后将经过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用时,阻塞的节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。
为保证通讯安全,本文档先生成 x509 证书和私钥,kube-controller-manager 在以下两种状况下使用该证书:
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md。
建立证书签名请求:
cd /opt/k8s/work cat > kube-controller-manager-csr.json <<EOF { "CN": "system:kube-controller-manager", "key": { "algo": "rsa", "size": 2048 }, "hosts": [ "127.0.0.1", "172.27.137.240", "172.27.137.239", "172.27.137.238" ], "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "system:kube-controller-manager", "OU": "4Paradigm" } ] } EOF
system:kube-controller-manager
,kubernetes 内置的 ClusterRoleBindings system:kube-controller-manager
赋予 kube-controller-manager 工做所需的权限。生成证书和私钥:
cd /opt/k8s/work cfssl gencert -ca=/opt/k8s/work/ca.pem \ -ca-key=/opt/k8s/work/ca-key.pem \ -config=/opt/k8s/work/ca-config.json \ -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager ls kube-controller-manager*pem
将生成的证书和私钥分发到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-controller-manager*.pem root@${node_ip}:/etc/kubernetes/cert/ done
kube-controller-manager 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-controller-manager 证书:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh kubectl config set-cluster kubernetes \ --certificate-authority=/opt/k8s/work/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kube-controller-manager.kubeconfig kubectl config set-credentials system:kube-controller-manager \ --client-certificate=kube-controller-manager.pem \ --client-key=kube-controller-manager-key.pem \ --embed-certs=true \ --kubeconfig=kube-controller-manager.kubeconfig kubectl config set-context system:kube-controller-manager \ --cluster=kubernetes \ --user=system:kube-controller-manager \ --kubeconfig=kube-controller-manager.kubeconfig kubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig
分发 kubeconfig 到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-controller-manager.kubeconfig root@${node_ip}:/etc/kubernetes/ done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kube-controller-manager.service.template <<EOF [Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] WorkingDirectory=${K8S_DIR}/kube-controller-manager ExecStart=/opt/k8s/bin/kube-controller-manager \\ --profiling \\ --cluster-name=kubernetes \\ --controllers=*,bootstrapsigner,tokencleaner \\ --kube-api-qps=1000 \\ --kube-api-burst=2000 \\ --leader-elect \\ --use-service-account-credentials\\ --concurrent-service-syncs=2 \\ --bind-address=##NODE_IP## \\ --secure-port=10252 \\ --tls-cert-file=/etc/kubernetes/cert/kube-controller-manager.pem \\ --tls-private-key-file=/etc/kubernetes/cert/kube-controller-manager-key.pem \\ --port=0 \\ --authentication-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\ --client-ca-file=/etc/kubernetes/cert/ca.pem \\ --requestheader-allowed-names="" \\ --requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \\ --requestheader-extra-headers-prefix="X-Remote-Extra-" \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-username-headers=X-Remote-User \\ --authorization-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\ --cluster-signing-cert-file=/etc/kubernetes/cert/ca.pem \\ --cluster-signing-key-file=/etc/kubernetes/cert/ca-key.pem \\ --experimental-cluster-signing-duration=876000h \\ --horizontal-pod-autoscaler-sync-period=10s \\ --concurrent-deployment-syncs=10 \\ --concurrent-gc-syncs=30 \\ --node-cidr-mask-size=24 \\ --service-cluster-ip-range=${SERVICE_CIDR} \\ --pod-eviction-timeout=6m \\ --terminated-pod-gc-threshold=10000 \\ --root-ca-file=/etc/kubernetes/cert/ca.pem \\ --service-account-private-key-file=/etc/kubernetes/cert/ca-key.pem \\ --kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\ --logtostderr=true \\ --v=2 Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target EOF
--port=0
:关闭监听非安全端口(http),同时 --address
参数无效,--bind-address
参数有效;--secure-port=10252
、--bind-address=0.0.0.0
: 在全部网络接口监听 10252 端口的 https /metrics 请求;--kubeconfig
:指定 kubeconfig 文件路径,kube-controller-manager 使用它链接和验证 kube-apiserver;--authentication-kubeconfig
和 --authorization-kubeconfig
:kube-controller-manager 使用它链接 apiserver,对 client 的请求进行认证和受权。kube-controller-manager
再也不使用 --tls-ca-file
对请求 https metrics 的 Client 证书进行校验。若是没有配置这两个 kubeconfig 参数,则 client 链接 kube-controller-manager https 端口的请求会被拒绝(提示权限不足)。--cluster-signing-*-file
:签名 TLS Bootstrap 建立的证书;--experimental-cluster-signing-duration
:指定 TLS Bootstrap 证书的有效期;--root-ca-file
:放置到容器 ServiceAccount 中的 CA 证书,用来对 kube-apiserver 的证书进行校验;--service-account-private-key-file
:签名 ServiceAccount 中 Token 的私钥文件,必须和 kube-apiserver 的 --service-account-key-file
指定的公钥文件配对使用;--service-cluster-ip-range
:指定 Service Cluster IP 网段,必须和 kube-apiserver 中的同名参数一致;--leader-elect=true
:集群运行模式,启用选举功能;被选为 leader 的节点负责处理工做,其它节点为阻塞状态;--controllers=*,bootstrapsigner,tokencleaner
:启用的控制器列表,tokencleaner 用于自动清理过时的 Bootstrap token;--horizontal-pod-autoscaler-*
:custom metrics 相关参数,支持 autoscaling/v2alpha1;--tls-cert-file
、--tls-private-key-file
:使用 https 输出 metrics 时使用的 Server 证书和秘钥;--use-service-account-credentials=true
: kube-controller-manager 中各 controller 使用 serviceaccount 访问 kube-apiserver;替换模板文件中的变量,为各节点建立 systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for (( i=0; i < 3; i++ )) do sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-controller-manager.service.template > kube-controller-manager-${NODE_IPS[i]}.service done ls kube-controller-manager*.service
分发到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-controller-manager-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-controller-manager.service done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-controller-manager" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-controller-manager && systemctl restart kube-controller-manager" done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-controller-manager|grep Active" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u kube-controller-manager
kube-controller-manager 监听 10252 端口,接收 https 请求:
[root@kube-node1 work]# netstat -lnpt | grep kube-cont tcp 0 0 192.168.75.110:10252 0.0.0.0:* LISTEN 11439/kube-controll
注意:如下命令在 kube-controller-manager 节点上执行。
[root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://172.27.137.240:10252/metrics |head ^X^Z [1]+ Stopped curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://172.27.137.240:10252/metrics | head [root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10252/metrics |head # HELP ClusterRoleAggregator_adds (Deprecated) Total number of adds handled by workqueue: ClusterRoleAggregator # TYPE ClusterRoleAggregator_adds counter ClusterRoleAggregator_adds 13 # HELP ClusterRoleAggregator_depth (Deprecated) Current depth of workqueue: ClusterRoleAggregator # TYPE ClusterRoleAggregator_depth gauge ClusterRoleAggregator_depth 0 # HELP ClusterRoleAggregator_longest_running_processor_microseconds (Deprecated) How many microseconds has the longest running processor for ClusterRoleAggregator been running. # TYPE ClusterRoleAggregator_longest_running_processor_microseconds gauge ClusterRoleAggregator_longest_running_processor_microseconds 0 # HELP ClusterRoleAggregator_queue_latency (Deprecated) How long an item stays in workqueueClusterRoleAggregator before being requested.
ClusteRole system:kube-controller-manager
的权限很小,只能建立 secret、serviceaccount 等资源对象,各 controller 的权限分散到 ClusterRole system:controller:XXX
中:
[root@kube-node1 work]# kubectl describe clusterrole system:kube-controller-manager Name: system:kube-controller-manager Labels: kubernetes.io/bootstrapping=rbac-defaults Annotations: rbac.authorization.kubernetes.io/autoupdate: true PolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- secrets [] [] [create delete get update] endpoints [] [] [create get update] serviceaccounts [] [] [create get update] events [] [] [create patch update] tokenreviews.authentication.k8s.io [] [] [create] subjectaccessreviews.authorization.k8s.io [] [] [create] configmaps [] [] [get] namespaces [] [] [get] *.* [] [] [list watch]
须要在 kube-controller-manager 的启动参数中添加 --use-service-account-credentials=true
参数,这样 main controller 会为各 controller 建立对应的 ServiceAccount XXX-controller。内置的 ClusterRoleBinding system:controller:XXX 将赋予各 XXX-controller ServiceAccount 对应的 ClusterRole system:controller:XXX 权限。
$ kubectl get clusterrole|grep controller system:controller:attachdetach-controller 51m system:controller:certificate-controller 51m system:controller:clusterrole-aggregation-controller 51m system:controller:cronjob-controller 51m system:controller:daemon-set-controller 51m system:controller:deployment-controller 51m system:controller:disruption-controller 51m system:controller:endpoint-controller 51m system:controller:expand-controller 51m system:controller:generic-garbage-collector 51m system:controller:horizontal-pod-autoscaler 51m system:controller:job-controller 51m system:controller:namespace-controller 51m system:controller:node-controller 51m system:controller:persistent-volume-binder 51m system:controller:pod-garbage-collector 51m system:controller:pv-protection-controller 51m system:controller:pvc-protection-controller 51m system:controller:replicaset-controller 51m system:controller:replication-controller 51m system:controller:resourcequota-controller 51m system:controller:route-controller 51m system:controller:service-account-controller 51m system:controller:service-controller 51m system:controller:statefulset-controller 51m system:controller:ttl-controller 51m system:kube-controller-manager 51m
以 deployment controller 为例:
$ kubectl describe clusterrole system:controller:deployment-controller Name: system:controller:deployment-controller Labels: kubernetes.io/bootstrapping=rbac-defaults Annotations: rbac.authorization.kubernetes.io/autoupdate: true PolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- replicasets.apps [] [] [create delete get list patch update watch] replicasets.extensions [] [] [create delete get list patch update watch] events [] [] [create patch update] pods [] [] [get list update watch] deployments.apps [] [] [get list update watch] deployments.extensions [] [] [get list update watch] deployments.apps/finalizers [] [] [update] deployments.apps/status [] [] [update] deployments.extensions/finalizers [] [] [update] deployments.extensions/status [] [] [update]
[root@kube-node1 work]# kubectl get endpoints kube-controller-manager --namespace=kube-system -o yaml apiVersion: v1 kind: Endpoints metadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-node3_ef7efd0f-0149-11ea-8f8a-000c291d1820","leaseDurationSeconds":15,"acquireTime":"2019-11-07T10:39:33Z","renewTime":"2019-11-07T10:43:10Z","leaderTransitions":2}' creationTimestamp: "2019-11-07T10:32:42Z" name: kube-controller-manager namespace: kube-system resourceVersion: "3766" selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager uid: ee2f71e3-0149-11ea-98c9-000c291d1820
可见,当前的 leader 为 kube-node1 节点。
停掉一个或两个节点的 kube-controller-manager 服务,观察其它节点的日志,看是否获取了 leader 权限。
本文档介绍部署高可用 kube-scheduler 集群的步骤。
该集群包含 3 个节点,启动后将经过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用后,剩余节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。
为保证通讯安全,本文档先生成 x509 证书和私钥,kube-scheduler 在以下两种状况下使用该证书:
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md。
建立证书签名请求:
cd /opt/k8s/work cat > kube-scheduler-csr.json <<EOF { "CN": "system:kube-scheduler", "hosts": [ "127.0.0.1", "192.168.75.110", "192.168.75.111", "192.168.75.112" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "system:kube-scheduler", "OU": "4Paradigm" } ] } EOF
system:kube-scheduler
,kubernetes 内置的 ClusterRoleBindings system:kube-scheduler
将赋予 kube-scheduler 工做所需的权限;生成证书和私钥:
cd /opt/k8s/work cfssl gencert -ca=/opt/k8s/work/ca.pem \ -ca-key=/opt/k8s/work/ca-key.pem \ -config=/opt/k8s/work/ca-config.json \ -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler ls kube-scheduler*pem
将生成的证书和私钥分发到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-scheduler*.pem root@${node_ip}:/etc/kubernetes/cert/ done
kube-scheduler 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-scheduler 证书:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh kubectl config set-cluster kubernetes \ --certificate-authority=/opt/k8s/work/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kube-scheduler.kubeconfig kubectl config set-credentials system:kube-scheduler \ --client-certificate=kube-scheduler.pem \ --client-key=kube-scheduler-key.pem \ --embed-certs=true \ --kubeconfig=kube-scheduler.kubeconfig kubectl config set-context system:kube-scheduler \ --cluster=kubernetes \ --user=system:kube-scheduler \ --kubeconfig=kube-scheduler.kubeconfig kubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig
分发 kubeconfig 到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-scheduler.kubeconfig root@${node_ip}:/etc/kubernetes/ done
cd /opt/k8s/work cat >kube-scheduler.yaml.template <<EOF apiVersion: kubescheduler.config.k8s.io/v1alpha1 kind: KubeSchedulerConfiguration bindTimeoutSeconds: 600 clientConnection: burst: 200 kubeconfig: "/etc/kubernetes/kube-scheduler.kubeconfig" qps: 100 enableContentionProfiling: false enableProfiling: true hardPodAffinitySymmetricWeight: 1 healthzBindAddress: ##NODE_IP##:10251 leaderElection: leaderElect: true metricsBindAddress: ##NODE_IP##:10251 EOF
--kubeconfig
:指定 kubeconfig 文件路径,kube-scheduler 使用它链接和验证 kube-apiserver;--leader-elect=true
:集群运行模式,启用选举功能;被选为 leader 的节点负责处理工做,其它节点为阻塞状态;替换模板文件中的变量:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for (( i=0; i < 3; i++ )) do sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.yaml.template > kube-scheduler-${NODE_IPS[i]}.yaml done ls kube-scheduler*.yaml
分发 kube-scheduler 配置文件到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-scheduler-${node_ip}.yaml root@${node_ip}:/etc/kubernetes/kube-scheduler.yaml done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kube-scheduler.service.template <<EOF [Unit] Description=Kubernetes Scheduler Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] WorkingDirectory=${K8S_DIR}/kube-scheduler ExecStart=/opt/k8s/bin/kube-scheduler \\ --config=/etc/kubernetes/kube-scheduler.yaml \\ --bind-address=##NODE_IP## \\ --secure-port=10259 \\ --port=0 \\ --tls-cert-file=/etc/kubernetes/cert/kube-scheduler.pem \\ --tls-private-key-file=/etc/kubernetes/cert/kube-scheduler-key.pem \\ --authentication-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\ --client-ca-file=/etc/kubernetes/cert/ca.pem \\ --requestheader-allowed-names="" \\ --requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \\ --requestheader-extra-headers-prefix="X-Remote-Extra-" \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-username-headers=X-Remote-User \\ --authorization-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\ --logtostderr=true \\ --v=2 Restart=always RestartSec=5 StartLimitInterval=0 [Install] WantedBy=multi-user.target EOF
替换模板文件中的变量,为各节点建立 systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for (( i=0; i < 3; i++ )) do sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.service.template > kube-scheduler-${NODE_IPS[i]}.service done ls kube-scheduler*.service
分发 systemd unit 文件到全部 master 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp kube-scheduler-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-scheduler.service done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-scheduler" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-scheduler && systemctl restart kube-scheduler" done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-scheduler|grep Active" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u kube-scheduler
注意:如下命令在 kube-scheduler 节点上执行。
kube-scheduler 监听 10251 和 10259 端口:
两个接口都对外提供 /metrics
和 /healthz
的访问。
[root@kube-node1 work]# netstat -lnpt |grep kube-sch tcp 0 0 192.168.75.110:10259 0.0.0.0:* LISTEN 17034/kube-schedule tcp 0 0 192.168.75.110:10251 0.0.0.0:* LISTEN 17034/kube-schedule [root@kube-node1 work]# curl -s http://192.168.75.110:10251/metrics |head # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0 [root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10259/metrics |head # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
[root@kube-node1 work]# kubectl get endpoints kube-scheduler --namespace=kube-system -o yaml apiVersion: v1 kind: Endpoints metadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-node1_a0c24012-0152-11ea-9e7b-000c294f53fa","leaseDurationSeconds":15,"acquireTime":"2019-11-07T11:34:59Z","renewTime":"2019-11-07T11:39:36Z","leaderTransitions":0}' creationTimestamp: "2019-11-07T11:34:57Z" name: kube-scheduler namespace: kube-system resourceVersion: "6598" selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler uid: a00f12ce-0152-11ea-98c9-000c291d1820
可见,当前的 leader 为 kube-node1 节点。
随便找一个或两个 master 节点,停掉 kube-scheduler 服务,看其它节点是否获取了 leader 权限。
kubernetes worker 节点运行以下组件:
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
参考 06-0.apiserver高可用之nginx代理.md。
CentOS:
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "yum install -y epel-release" ssh root@${node_ip} "yum install -y conntrack ipvsadm ntp ntpdate ipset jq iptables curl sysstat libseccomp && modprobe ip_vs " done
Ubuntu:
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "apt-get install -y conntrack ipvsadm ntp ntpdate ipset jq iptables curl sysstat libseccomp && modprobe ip_vs " done
docker 运行和管理容器,kubelet 经过 Container Runtime Interface (CRI) 与它进行交互。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
到 docker 下载页面 下载最新发布包:
cd /opt/k8s/work wget https://download.docker.com/linux/static/stable/x86_64/docker-18.09.6.tgz tar -xvf docker-18.09.6.tgz
分发二进制文件到全部 worker 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp docker/* root@${node_ip}:/opt/k8s/bin/ ssh root@${node_ip} "chmod +x /opt/k8s/bin/*" done
cd /opt/k8s/work cat > docker.service <<"EOF" [Unit] Description=Docker Application Container Engine Documentation=http://docs.docker.io [Service] WorkingDirectory=##DOCKER_DIR## Environment="PATH=/opt/k8s/bin:/bin:/sbin:/usr/bin:/usr/sbin" EnvironmentFile=-/run/flannel/docker ExecStart=/opt/k8s/bin/dockerd $DOCKER_NETWORK_OPTIONS ExecReload=/bin/kill -s HUP $MAINPID Restart=on-failure RestartSec=5 LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity Delegate=yes KillMode=process [Install] WantedBy=multi-user.target EOF
EOF 先后有双引号,这样 bash 不会替换文档中的变量,如 $DOCKER_NETWORK_OPTIONS
(这些环境变量是 systemd 负责替换的。);
dockerd 运行时会调用其它 docker 命令,如 docker-proxy,因此须要将 docker 命令所在的目录加到 PATH 环境变量中;
flanneld 启动时将网络配置写入 /run/flannel/docker
文件中,dockerd 启动前读取该文件中的环境变量 DOCKER_NETWORK_OPTIONS
,而后设置 docker0 网桥网段;
若是指定了多个 EnvironmentFile
选项,则必须将 /run/flannel/docker
放在最后(确保 docker0 使用 flanneld 生成的 bip 参数);
docker 须要以 root 用于运行;
docker 从 1.13 版本开始,可能将 iptables FORWARD chain的默认策略设置为DROP,从而致使 ping 其它 Node 上的 Pod IP 失败,遇到这种状况时,须要手动设置策略为 ACCEPT
:
$ sudo iptables -P FORWARD ACCEPT
而且把如下命令写入 /etc/rc.local
文件中,防止节点重启iptables FORWARD chain的默认策略又还原为DROP
/sbin/iptables -P FORWARD ACCEPT
分发 systemd unit 文件到全部 worker 机器:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh sed -i -e "s|##DOCKER_DIR##|${DOCKER_DIR}|" docker.service for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" scp docker.service root@${node_ip}:/etc/systemd/system/ done
使用国内的仓库镜像服务器以加快 pull image 的速度,同时增长下载的并发数 (须要重启 dockerd 生效):
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > docker-daemon.json <<EOF { "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","https://hub-mirror.c.163.com"], "insecure-registries": ["docker02:35000"], "max-concurrent-downloads": 20, "live-restore": true, "max-concurrent-uploads": 10, "debug": true, "data-root": "${DOCKER_DIR}/data", "exec-root": "${DOCKER_DIR}/exec", "log-opts": { "max-size": "100m", "max-file": "5" } } EOF
分发 docker 配置文件到全部 worker 节点:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /etc/docker/ ${DOCKER_DIR}/{data,exec}" scp docker-daemon.json root@${node_ip}:/etc/docker/daemon.json done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable docker && systemctl restart docker" done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status docker|grep Active" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u docker
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "/usr/sbin/ip addr show flannel.1 && /usr/sbin/ip addr show docker0" done
确认各 worker 节点的 docker0 网桥和 flannel.1 接口的 IP 处于同一个网段中(以下172.30.24.0/32 位于 172.30.24.1/21 中):
>>> 192.168.75.110 3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether ea:90:d9:9a:7c:a7 brd ff:ff:ff:ff:ff:ff inet 172.30.24.0/32 scope global flannel.1 valid_lft forever preferred_lft forever 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:a8:55:ff:36 brd ff:ff:ff:ff:ff:ff inet 172.30.24.1/21 brd 172.30.31.255 scope global docker0 valid_lft forever preferred_lft forever
注意: 若是您的服务安装顺序不对或者机器环境比较复杂, docker服务早于flanneld服务安装,此时 worker 节点的 docker0 网桥和 flannel.1 接口的 IP可能不会同处同一个网段下,这个时候请先中止docker服务, 手工删除docker0网卡,从新启动docker服务后便可修复:
systemctl stop docker ip link delete docker0 systemctl start docker
[root@kube-node1 work]# ps -elfH|grep docker 4 S root 22497 1 0 80 0 - 108496 ep_pol 20:44 ? 00:00:00 /opt/k8s/bin/dockerd --bip=172.30.24.1/21 --ip-masq=false --mtu=1450 4 S root 22515 22497 0 80 0 - 136798 futex_ 20:44 ? 00:00:00 containerd --config /data/k8s/docker/exec/containerd/containerd.toml --log-level debug [root@kube-node1 work]# docker info Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 18.09.6 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84 runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30 init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 4.4.199-1.el7.elrepo.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 1.936GiB Name: kube-node1 ID: MQYP:O7RJ:F22K:TYEC:C5UW:XOLP:XRMF:VF6J:6JVH:AMGN:YLAI:U2FJ Docker Root Dir: /data/k8s/docker/data Debug Mode (client): false Debug Mode (server): true File Descriptors: 22 Goroutines: 43 System Time: 2019-11-07T20:48:23.252463652+08:00 EventsListeners: 0 Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: docker02:35000 127.0.0.0/8 Registry Mirrors: https://docker.mirrors.ustc.edu.cn/ https://hub-mirror.c.163.com/ Live Restore Enabled: true Product License: Community Engine
kubelet 运行在每一个 worker 节点上,接收 kube-apiserver 发送的请求,管理 Pod 容器,执行交互式命令,如 exec、run、logs 等。
kubelet 启动时自动向 kube-apiserver 注册节点信息,内置的 cadvisor 统计和监控节点的资源使用状况。
为确保安全,部署时关闭了 kubelet 的非安全 http 端口,对请求进行认证和受权,拒绝未受权的访问(如 apiserver、heapster 的请求)。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" # 建立 token export BOOTSTRAP_TOKEN=$(kubeadm token create \ --description kubelet-bootstrap-token \ --groups system:bootstrappers:${node_name} \ --kubeconfig ~/.kube/config) # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/cert/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig # 设置客户端认证参数 kubectl config set-credentials kubelet-bootstrap \ --token=${BOOTSTRAP_TOKEN} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig # 设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig # 设置默认上下文 kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig done
查看 kubeadm 为各节点建立的 token:
[root@kube-node1 work]# kubeadm token list --kubeconfig ~/.kube/config TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS 83n69a.70n786zxgkhl1agc 23h 2019-11-08T20:52:48+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube-node1 99ljss.x7u9m04h01js5juo 23h 2019-11-08T20:52:48+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube-node2 9pfh4d.2on6eizmkzy3pgr1 23h 2019-11-08T20:52:48+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube-node3
system:bootstrap:<Token ID>
,group 设置为 system:bootstrappers
,后续将为这个 group 设置 ClusterRoleBinding;查看各 token 关联的 Secret:
[root@kube-node1 work]# kubectl get secrets -n kube-system|grep bootstrap-token bootstrap-token-83n69a bootstrap.kubernetes.io/token 7 63s bootstrap-token-99ljss bootstrap.kubernetes.io/token 7 62s bootstrap-token-9pfh4d bootstrap.kubernetes.io/token 7 62s
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" scp kubelet-bootstrap-${node_name}.kubeconfig root@${node_name}:/etc/kubernetes/kubelet-bootstrap.kubeconfig done
从 v1.10 开始,部分 kubelet 参数需在配置文件中配置,kubelet --help
会提示:
DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag
建立 kubelet 参数配置文件模板(可配置项参考代码中注释 ):
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kubelet-config.yaml.template <<EOF kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 address: "##NODE_IP##" staticPodPath: "" syncFrequency: 1m fileCheckFrequency: 20s httpCheckFrequency: 20s staticPodURL: "" port: 10250 readOnlyPort: 0 rotateCertificates: true serverTLSBootstrap: true authentication: anonymous: enabled: false webhook: enabled: true x509: clientCAFile: "/etc/kubernetes/cert/ca.pem" authorization: mode: Webhook registryPullQPS: 0 registryBurst: 20 eventRecordQPS: 0 eventBurst: 20 enableDebuggingHandlers: true enableContentionProfiling: true healthzPort: 10248 healthzBindAddress: "##NODE_IP##" clusterDomain: "${CLUSTER_DNS_DOMAIN}" clusterDNS: - "${CLUSTER_DNS_SVC_IP}" nodeStatusUpdateFrequency: 10s nodeStatusReportFrequency: 1m imageMinimumGCAge: 2m imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80 volumeStatsAggPeriod: 1m kubeletCgroups: "" systemCgroups: "" cgroupRoot: "" cgroupsPerQOS: true cgroupDriver: cgroupfs runtimeRequestTimeout: 10m hairpinMode: promiscuous-bridge maxPods: 220 podCIDR: "${CLUSTER_CIDR}" podPidsLimit: -1 resolvConf: /etc/resolv.conf maxOpenFiles: 1000000 kubeAPIQPS: 1000 kubeAPIBurst: 2000 serializeImagePulls: false evictionHard: memory.available: "100Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "15%" evictionSoft: {} enableControllerAttachDetach: true failSwapOn: true containerLogMaxSize: 20Mi containerLogMaxFiles: 10 systemReserved: {} kubeReserved: {} systemReservedCgroup: "" kubeReservedCgroup: "" enforceNodeAllocatable: ["pods"] EOF
为各节点建立和分发 kubelet 配置文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" sed -e "s/##NODE_IP##/${node_ip}/" kubelet-config.yaml.template > kubelet-config-${node_ip}.yaml.template scp kubelet-config-${node_ip}.yaml.template root@${node_ip}:/etc/kubernetes/kubelet-config.yaml done
建立 kubelet systemd unit 文件模板:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kubelet.service.template <<EOF [Unit] Description=Kubernetes Kubelet Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=docker.service Requires=docker.service [Service] WorkingDirectory=${K8S_DIR}/kubelet ExecStart=/opt/k8s/bin/kubelet \\ --allow-privileged=true \\ --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \\ --cert-dir=/etc/kubernetes/cert \\ --cni-conf-dir=/etc/cni/net.d \\ --container-runtime=docker \\ --container-runtime-endpoint=unix:///var/run/dockershim.sock \\ --root-dir=${K8S_DIR}/kubelet \\ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\ --config=/etc/kubernetes/kubelet-config.yaml \\ --hostname-override=##NODE_NAME## \\ --pod-infra-container-image=registry.cn-beijing.aliyuncs.com/images_k8s/pause-amd64:3.1 \\ --image-pull-progress-deadline=15m \\ --volume-plugin-dir=${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/ \\ --logtostderr=true \\ --v=2 Restart=always RestartSec=5 StartLimitInterval=0 [Install] WantedBy=multi-user.target EOF
--hostname-override
选项,则 kube-proxy
也须要设置该选项,不然会出现找不到 Node 的状况;--bootstrap-kubeconfig
:指向 bootstrap kubeconfig 文件,kubelet 使用该文件中的用户名和 token 向 kube-apiserver 发送 TLS Bootstrapping 请求;--cert-dir
目录建立证书和私钥文件,而后写入 --kubeconfig
文件;--pod-infra-container-image
不使用 redhat 的 pod-infrastructure:latest
镜像,它不能回收容器的僵尸;为各节点建立和分发 kubelet systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" sed -e "s/##NODE_NAME##/${node_name}/" kubelet.service.template > kubelet-${node_name}.service scp kubelet-${node_name}.service root@${node_name}:/etc/systemd/system/kubelet.service done
kubelet 启动时查找 --kubeletconfig
参数对应的文件是否存在,若是不存在则使用 --bootstrap-kubeconfig
指定的 kubeconfig 文件向 kube-apiserver 发送证书签名请求 (CSR)。
kube-apiserver 收到 CSR 请求后,对其中的 Token 进行认证,认证经过后将请求的 user 设置为 system:bootstrap:<Token ID>
,group 设置为 system:bootstrappers
,这一过程称为 Bootstrap Token Auth。
默认状况下,这个 user 和 group 没有建立 CSR 的权限,kubelet 启动失败,错误日志以下:
$ sudo journalctl -u kubelet -a |grep -A 2 'certificatesigningrequests' May 26 12:13:41 zhangjun-k8s01 kubelet[128468]: I0526 12:13:41.798230 128468 certificate_manager.go:366] Rotating certificates May 26 12:13:41 zhangjun-k8s01 kubelet[128468]: E0526 12:13:41.801997 128468 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot cre ate certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:82jfrm" cannot create resource "certificatesigningrequests" i n API group "certificates.k8s.io" at the cluster scope May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.044828 128468 kubelet.go:2244] node "zhangjun-k8s01" not found May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.078658 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Unauthor ized May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.079873 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Unauthorize d May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.082683 128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Unau thorized May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.084473 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unau thorized May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.088466 128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: U nauthorized
解决办法是:建立一个 clusterrolebinding,将 group system:bootstrappers 和 clusterrole system:node-bootstrapper 绑定:
$ kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --group=system:bootstrappers
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/" ssh root@${node_ip} "/usr/sbin/swapoff -a" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet" done
$ journalctl -u kubelet |tail 8月 15 12:16:49 zhangjun-k8s01 kubelet[7807]: I0815 12:16:49.578598 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} 8月 15 12:16:49 zhangjun-k8s01 kubelet[7807]: I0815 12:16:49.578698 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.205871 7807 mount_linux.go:214] Detected OS with systemd 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.205939 7807 server.go:408] Version: v1.11.2 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206013 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206101 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletServerCertificate:true RotateKubeletClientCertificate:true]} 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206217 7807 plugins.go:97] No cloud provider specified. 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206237 7807 server.go:524] No cloud provider specified: "" from the config file: "" 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206264 7807 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file 8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.208628 7807 bootstrap.go:86] No valid private key and/or certificate found, reusing existing private key or creating a new one
kubelet 启动后使用 --bootstrap-kubeconfig 向 kube-apiserver 发送 CSR 请求,当这个 CSR 被 approve 后,kube-controller-manager 为 kubelet 建立 TLS 客户端证书、私钥和 --kubeletconfig 文件。
注意:kube-controller-manager 须要配置 --cluster-signing-cert-file
和 --cluster-signing-key-file
参数,才会为 TLS Bootstrap 建立证书和私钥。
[root@kube-node1 work]# kubectl get csr NAME AGE REQUESTOR CONDITION csr-4stvn 67m system:bootstrap:9pfh4d Pending csr-5dc4g 18m system:bootstrap:99ljss Pending csr-5xbbr 18m system:bootstrap:9pfh4d Pending csr-6599v 64m system:bootstrap:83n69a Pending csr-7z2mv 3m34s system:bootstrap:9pfh4d Pending csr-89fmf 3m35s system:bootstrap:99ljss Pending csr-9kqzb 34m system:bootstrap:83n69a Pending csr-c6chv 3m38s system:bootstrap:83n69a Pending csr-cxk4d 49m system:bootstrap:83n69a Pending csr-h7prh 49m system:bootstrap:9pfh4d Pending csr-jh6hp 34m system:bootstrap:9pfh4d Pending csr-jwv9x 64m system:bootstrap:99ljss Pending csr-k8ss7 18m system:bootstrap:83n69a Pending csr-nnwwm 49m system:bootstrap:99ljss Pending csr-q87ps 67m system:bootstrap:99ljss Pending csr-t4bb5 64m system:bootstrap:9pfh4d Pending csr-wpjh5 34m system:bootstrap:99ljss Pending csr-zmrbh 67m system:bootstrap:83n69a Pending [root@kube-node1 work]# kubectl get nodes No resources found.
建立三个 ClusterRoleBinding,分别用于自动 approve client、renew client、renew server 证书:
cd /opt/k8s/work cat > csr-crb.yaml <<EOF # Approve all CSRs for the group "system:bootstrappers" kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: auto-approve-csrs-for-group subjects: - kind: Group name: system:bootstrappers apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: system:certificates.k8s.io:certificatesigningrequests:nodeclient apiGroup: rbac.authorization.k8s.io --- # To let a node of the group "system:nodes" renew its own credentials kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: node-client-cert-renewal subjects: - kind: Group name: system:nodes apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient apiGroup: rbac.authorization.k8s.io --- # A ClusterRole which instructs the CSR approver to approve a node requesting a # serving cert matching its client cert. kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: approve-node-server-renewal-csr rules: - apiGroups: ["certificates.k8s.io"] resources: ["certificatesigningrequests/selfnodeserver"] verbs: ["create"] --- # To let a node of the group "system:nodes" renew its own server credentials kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: node-server-cert-renewal subjects: - kind: Group name: system:nodes apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: approve-node-server-renewal-csr apiGroup: rbac.authorization.k8s.io EOF kubectl apply -f csr-crb.yaml
等待一段时间(1-10 分钟),三个节点的 CSR 都被自动 approved:
[root@kube-node1 work]# kubectl get csr NAME AGE REQUESTOR CONDITION csr-4stvn 70m system:bootstrap:9pfh4d Pending csr-5dc4g 22m system:bootstrap:99ljss Pending csr-5xbbr 22m system:bootstrap:9pfh4d Pending csr-6599v 67m system:bootstrap:83n69a Pending csr-7z2mv 7m22s system:bootstrap:9pfh4d Approved,Issued csr-89fmf 7m23s system:bootstrap:99ljss Approved,Issued csr-9kqzb 37m system:bootstrap:83n69a Pending csr-c6chv 7m26s system:bootstrap:83n69a Approved,Issued csr-cxk4d 52m system:bootstrap:83n69a Pending csr-h7prh 52m system:bootstrap:9pfh4d Pending csr-jfvv4 30s system:node:kube-node1 Pending csr-jh6hp 37m system:bootstrap:9pfh4d Pending csr-jwv9x 67m system:bootstrap:99ljss Pending csr-k8ss7 22m system:bootstrap:83n69a Pending csr-nnwwm 52m system:bootstrap:99ljss Pending csr-q87ps 70m system:bootstrap:99ljss Pending csr-t4bb5 67m system:bootstrap:9pfh4d Pending csr-w2w2k 16s system:node:kube-node3 Pending csr-wpjh5 37m system:bootstrap:99ljss Pending csr-z5nww 23s system:node:kube-node2 Pending csr-zmrbh 70m system:bootstrap:83n69a Pending
全部节点均 ready:
[root@kube-node1 work]# kubectl get nodes NAME STATUS ROLES AGE VERSION kube-node1 Ready <none> 76s v1.14.2 kube-node2 Ready <none> 69s v1.14.2 kube-node3 Ready <none> 61s v1.14.2
kube-controller-manager 为各 node 生成了 kubeconfig 文件和公私钥:
[root@kube-node1 work]# ls -l /etc/kubernetes/kubelet.kubeconfig -rw------- 1 root root 2310 Nov 7 21:04 /etc/kubernetes/kubelet.kubeconfig [root@kube-node1 work]# ls -l /etc/kubernetes/cert/|grep kubelet -rw------- 1 root root 1277 Nov 7 22:11 kubelet-client-2019-11-07-22-11-52.pem lrwxrwxrwx 1 root root 59 Nov 7 22:11 kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem
基于安全性考虑,CSR approving controllers 不会自动 approve kubelet server 证书签名请求,须要手动 approve:
# 以下这个根据实际状况而定 # kubectl get csr NAME AGE REQUESTOR CONDITION csr-5f4vh 9m25s system:bootstrap:82jfrm Approved,Issued csr-5r7j7 6m11s system:node:zhangjun-k8s03 Pending csr-5rw7s 9m23s system:bootstrap:b1f7np Approved,Issued csr-9snww 8m3s system:bootstrap:82jfrm Approved,Issued csr-c7z56 6m12s system:node:zhangjun-k8s02 Pending csr-j55lh 6m12s system:node:zhangjun-k8s01 Pending csr-m29fm 9m25s system:bootstrap:3gzd53 Approved,Issued csr-rc8w7 8m3s system:bootstrap:3gzd53 Approved,Issued csr-vd52r 8m2s system:bootstrap:b1f7np Approved,Issued # kubectl certificate approve csr-5r7j7 certificatesigningrequest.certificates.k8s.io/csr-5r7j7 approved # kubectl certificate approve csr-c7z56 certificatesigningrequest.certificates.k8s.io/csr-c7z56 approved # kubectl certificate approve csr-j55lh certificatesigningrequest.certificates.k8s.io/csr-j55lh approved [root@kube-node1 work]# ls -l /etc/kubernetes/cert/kubelet-* -rw------- 1 root root 1277 Nov 7 22:11 /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem lrwxrwxrwx 1 root root 59 Nov 7 22:11 /etc/kubernetes/cert/kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem -rw------- 1 root root 1317 Nov 7 22:23 /etc/kubernetes/cert/kubelet-server-2019-11-07-22-23-05.pem lrwxrwxrwx 1 root root 59 Nov 7 22:23 /etc/kubernetes/cert/kubelet-server-current.pem -> /etc/kubernetes/cert/kubelet-server-2019-11-07-22-23-05.pem
kubelet 启动后监听多个端口,用于接收 kube-apiserver 或其它客户端发送的请求:
[root@kube-node1 work]# netstat -lnpt|grep kubelet tcp 0 0 127.0.0.1:38735 0.0.0.0:* LISTEN 24609/kubelet tcp 0 0 192.168.75.110:10248 0.0.0.0:* LISTEN 24609/kubelet tcp 0 0 192.168.75.110:10250 0.0.0.0:* LISTEN 24609/kubelet
--cadvisor-port
参数(默认 4194 端口),不支持访问 cAdvisor UI & API。例如执行 kubectl exec -it nginx-ds-5rmws -- sh
命令时,kube-apiserver 会向 kubelet 发送以下请求:
POST /exec/default/nginx-ds-5rmws/my-nginx?command=sh&input=1&output=1&tty=1
kubelet 接收 10250 端口的 https 请求,能够访问以下资源:
详情参考:https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/server.go#L434:3
因为关闭了匿名认证,同时开启了 webhook 受权,全部访问 10250 端口 https API 的请求都须要被认证和受权。
预约义的 ClusterRole system:kubelet-api-admin 授予访问 kubelet 全部 API 的权限(kube-apiserver 使用的 kubernetes 证书 User 授予了该权限):
[root@kube-node1 work]# kubectl describe clusterrole system:kubelet-api-admin Name: system:kubelet-api-admin Labels: kubernetes.io/bootstrapping=rbac-defaults Annotations: rbac.authorization.kubernetes.io/autoupdate: true PolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- nodes/log [] [] [*] nodes/metrics [] [] [*] nodes/proxy [] [] [*] nodes/spec [] [] [*] nodes/stats [] [] [*] nodes [] [] [get list watch proxy]
kubelet 配置了以下认证参数:
同时配置了以下受权参数:
kubelet 收到请求后,使用 clientCAFile 对证书签名进行认证,或者查询 bearer token 是否有效。若是二者都没经过,则拒绝请求,提示 Unauthorized:
[root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem https://192.168.75.110:10250/metrics Unauthorized [root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer 123456" https://192.168.75.110:10250/metrics Unauthorized
经过认证后,kubelet 使用 SubjectAccessReview API 向 kube-apiserver 发送请求,查询证书或 token 对应的 user、group 是否有操做资源的权限(RBAC);
$ # 权限不足的证书; [root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /etc/kubernetes/cert/kube-controller-manager.pem --key /etc/kubernetes/cert/kube-controller-manager-key.pem https://192.168.75.110:10250/metrics Forbidden (user=system:kube-controller-manager, verb=get, resource=nodes, subresource=metrics) # 使用部署 kubectl 命令行工具时建立的、具备最高权限的 admin 证书 [root@kube-node1 work]# curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10250/metrics|head # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
--cacert
、--cert
、--key
的参数值必须是文件路径,如上面的 ./admin.pem
不能省略 ./
,不然返回 401 Unauthorized
;建立一个 ServiceAccount,将它和 ClusterRole system:kubelet-api-admin 绑定,从而具备调用 kubelet API 的权限:
kubectl create sa kubelet-api-test kubectl create clusterrolebinding kubelet-api-test --clusterrole=system:kubelet-api-admin --serviceaccount=default:kubelet-api-test SECRET=$(kubectl get secrets | grep kubelet-api-test | awk '{print $1}') TOKEN=$(kubectl describe secret ${SECRET} | grep -E '^token' | awk '{print $2}') echo ${TOKEN} [root@kube-node1 work]# curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.75.110:10250/metrics|head # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
cadvisor 是内嵌在 kubelet 二进制中的,统计所在节点各容器的资源(CPU、内存、磁盘、网卡)使用状况的服务。
浏览器访问 https://172.27.137.240:10250/metrics 和 https://172.27.137.240:10250/metrics/cadvisor 分别返回 kubelet 和 cadvisor 的 metrics。
注意:
从 kube-apiserver 获取各节点 kubelet 的配置:
$ # 使用部署 kubectl 命令行工具时建立的、具备最高权限的 admin 证书; [root@kube-node1 work]# source /opt/k8s/bin/environment.sh [root@kube-node1 work]# curl -sSL --cacert /etc/kubernetes/cert/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem ${KUBE_APISERVER}/api/v1/nodes/kube-node1/proxy/configz | jq \ > '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' { "syncFrequency": "1m0s", "fileCheckFrequency": "20s", "httpCheckFrequency": "20s", "address": "192.168.75.110", "port": 10250, "rotateCertificates": true, "serverTLSBootstrap": true, "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "registryPullQPS": 0, "registryBurst": 20, "eventRecordQPS": 0, "eventBurst": 20, "enableDebuggingHandlers": true, "enableContentionProfiling": true, "healthzPort": 10248, "healthzBindAddress": "192.168.75.110", "oomScoreAdj": -999, "clusterDomain": "cluster.local", "clusterDNS": [ "10.254.0.2" ], "streamingConnectionIdleTimeout": "4h0m0s", "nodeStatusUpdateFrequency": "10s", "nodeStatusReportFrequency": "1m0s", "nodeLeaseDurationSeconds": 40, "imageMinimumGCAge": "2m0s", "imageGCHighThresholdPercent": 85, "imageGCLowThresholdPercent": 80, "volumeStatsAggPeriod": "1m0s", "cgroupsPerQOS": true, "cgroupDriver": "cgroupfs", "cpuManagerPolicy": "none", "cpuManagerReconcilePeriod": "10s", "runtimeRequestTimeout": "10m0s", "hairpinMode": "promiscuous-bridge", "maxPods": 220, "podCIDR": "172.30.0.0/16", "podPidsLimit": -1, "resolvConf": "/etc/resolv.conf", "cpuCFSQuota": true, "cpuCFSQuotaPeriod": "100ms", "maxOpenFiles": 1000000, "contentType": "application/vnd.kubernetes.protobuf", "kubeAPIQPS": 1000, "kubeAPIBurst": 2000, "serializeImagePulls": false, "evictionHard": { "memory.available": "100Mi" }, "evictionPressureTransitionPeriod": "5m0s", "enableControllerAttachDetach": true, "makeIPTablesUtilChains": true, "iptablesMasqueradeBit": 14, "iptablesDropBit": 15, "failSwapOn": true, "containerLogMaxSize": "20Mi", "containerLogMaxFiles": 10, "configMapAndSecretChangeDetectionStrategy": "Watch", "enforceNodeAllocatable": [ "pods" ], "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1" }
或者参考代码中的注释。
kube-proxy 运行在全部 worker 节点上,它监听 apiserver 中 service 和 endpoint 的变化状况,建立路由规则以提供服务 IP 和负载均衡功能。
本文档讲解使用 ipvs 模式的 kube-proxy 的部署过程。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
各节点须要安装 ipvsadm
和 ipset
命令,加载 ip_vs
内核模块。
建立证书签名请求:
cd /opt/k8s/work cat > kube-proxy-csr.json <<EOF { "CN": "system:kube-proxy", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ] } EOF
system:kube-proxy
;system:node-proxier
将User system:kube-proxy
与 Role system:node-proxier
绑定,该 Role 授予了调用 kube-apiserver
Proxy 相关 API 的权限;生成证书和私钥:
cd /opt/k8s/work cfssl gencert -ca=/opt/k8s/work/ca.pem \ -ca-key=/opt/k8s/work/ca-key.pem \ -config=/opt/k8s/work/ca-config.json \ -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy ls kube-proxy*
cd /opt/k8s/work source /opt/k8s/bin/environment.sh kubectl config set-cluster kubernetes \ --certificate-authority=/opt/k8s/work/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kube-proxy.kubeconfig kubectl config set-credentials kube-proxy \ --client-certificate=kube-proxy.pem \ --client-key=kube-proxy-key.pem \ --embed-certs=true \ --kubeconfig=kube-proxy.kubeconfig kubectl config set-context default \ --cluster=kubernetes \ --user=kube-proxy \ --kubeconfig=kube-proxy.kubeconfig kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
--embed-certs=true
:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl-proxy.kubeconfig 文件中(不加时,写入的是证书文件路径);分发 kubeconfig 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" scp kube-proxy.kubeconfig root@${node_name}:/etc/kubernetes/ done
从 v1.10 开始,kube-proxy 部分参数能够配置文件中配置。可使用 --write-config-to
选项生成该配置文件,或者参考 源代码的注释。
建立 kube-proxy config 文件模板:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kube-proxy-config.yaml.template <<EOF kind: KubeProxyConfiguration apiVersion: kubeproxy.config.k8s.io/v1alpha1 clientConnection: burst: 200 kubeconfig: "/etc/kubernetes/kube-proxy.kubeconfig" qps: 100 bindAddress: ##NODE_IP## healthzBindAddress: ##NODE_IP##:10256 metricsBindAddress: ##NODE_IP##:10249 enableProfiling: true clusterCIDR: ${CLUSTER_CIDR} hostnameOverride: ##NODE_NAME## mode: "ipvs" portRange: "" kubeProxyIPTablesConfiguration: masqueradeAll: false kubeProxyIPVSConfiguration: scheduler: rr excludeCIDRs: [] EOF
bindAddress
: 监听地址;clientConnection.kubeconfig
: 链接 apiserver 的 kubeconfig 文件;clusterCIDR
: kube-proxy 根据 --cluster-cidr
判断集群内部和外部流量,指定 --cluster-cidr
或 --masquerade-all
选项后 kube-proxy 才会对访问 Service IP 的请求作 SNAT;hostnameOverride
: 参数值必须与 kubelet 的值一致,不然 kube-proxy 启动后会找不到该 Node,从而不会建立任何 ipvs 规则;mode
: 使用 ipvs 模式;为各节点建立和分发 kube-proxy 配置文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for (( i=0; i < 3; i++ )) do echo ">>> ${NODE_NAMES[i]}" sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-proxy-config.yaml.template > kube-proxy-config-${NODE_NAMES[i]}.yaml.template scp kube-proxy-config-${NODE_NAMES[i]}.yaml.template root@${NODE_NAMES[i]}:/etc/kubernetes/kube-proxy-config.yaml done
cd /opt/k8s/work source /opt/k8s/bin/environment.sh cat > kube-proxy.service <<EOF [Unit] Description=Kubernetes Kube-Proxy Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target [Service] WorkingDirectory=${K8S_DIR}/kube-proxy ExecStart=/opt/k8s/bin/kube-proxy \\ --config=/etc/kubernetes/kube-proxy-config.yaml \\ --logtostderr=true \\ --v=2 Restart=on-failure RestartSec=5 LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF
分发 kube-proxy systemd unit 文件:
cd /opt/k8s/work source /opt/k8s/bin/environment.sh for node_name in ${NODE_NAMES[@]} do echo ">>> ${node_name}" scp kube-proxy.service root@${node_name}:/etc/systemd/system/ done
cd /opt/k8s/works source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-proxy" ssh root@${node_ip} "modprobe ip_vs_rr" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-proxy && systemctl restart kube-proxy" done
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-proxy|grep Active" done
确保状态为 active (running)
,不然查看日志,确认缘由:
journalctl -u kube-proxy
[root@kube-node1 work]# netstat -lnpt|grep kube-proxy tcp 0 0 192.168.75.110:10249 0.0.0.0:* LISTEN 6648/kube-proxy tcp 0 0 192.168.75.110:10256 0.0.0.0:* LISTEN 6648/kube-proxy
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "/usr/sbin/ipvsadm -ln" done
预期输出:
>>> 192.168.75.110 IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.254.0.1:443 rr -> 192.168.75.110:6443 Masq 1 0 0 -> 192.168.75.111:6443 Masq 1 0 0 -> 192.168.75.112:6443 Masq 1 0 0 >>> 192.168.75.111 IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.254.0.1:443 rr -> 192.168.75.110:6443 Masq 1 0 0 -> 192.168.75.111:6443 Masq 1 0 0 -> 192.168.75.112:6443 Masq 1 0 0 >>> 192.168.75.112 IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.254.0.1:443 rr -> 192.168.75.110:6443 Masq 1 0 0 -> 192.168.75.111:6443 Masq 1 0 0 -> 192.168.75.112:6443 Masq 1 0 0
可见全部经过 https 访问 K8S SVC kubernetes 的请求都转发到 kube-apiserver 节点的 6443 端口;
本文档使用 daemonset 验证 master 和 worker 节点是否工做正常。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行,而后远程分发文件和执行命令。
[root@kube-node1 work]# kubectl get nodes NAME STATUS ROLES AGE VERSION kube-node1 Ready <none> 16h v1.14.2 kube-node2 Ready <none> 16h v1.14.2 kube-node3 Ready <none> 16h v1.14.2
都为 Ready 时正常。
cd /opt/k8s/work cat > nginx-ds.yml <<EOF apiVersion: v1 kind: Service metadata: name: nginx-ds labels: app: nginx-ds spec: type: NodePort selector: app: nginx-ds ports: - name: http port: 80 targetPort: 80 --- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: nginx-ds labels: addonmanager.kubernetes.io/mode: Reconcile spec: template: metadata: labels: app: nginx-ds spec: containers: - name: my-nginx image: nginx:1.7.9 ports: - containerPort: 80 EOF
[root@kube-node1 work]# kubectl create -f nginx-ds.yml service/nginx-ds created daemonset.extensions/nginx-ds created
在这中间有一个逐步建立并启动的过程
[root@kube-node1 work]# kubectl get pods -o wide|grep nginx-ds nginx-ds-7z464 0/1 ContainerCreating 0 22s <none> kube-node2 <none> <none> nginx-ds-hz5fd 0/1 ContainerCreating 0 22s <none> kube-node1 <none> <none> nginx-ds-skcrt 0/1 ContainerCreating 0 22s <none> kube-node3 <none> <none> [root@kube-node1 work]# kubectl get pods -o wide|grep nginx-ds nginx-ds-7z464 0/1 ContainerCreating 0 34s <none> kube-node2 <none> <none> nginx-ds-hz5fd 0/1 ContainerCreating 0 34s <none> kube-node1 <none> <none> nginx-ds-skcrt 1/1 Running 0 34s 172.30.200.2 kube-node3 <none> <none> [root@kube-node1 work]# kubectl get pods -o wide|grep nginx-ds nginx-ds-7z464 1/1 Running 0 70s 172.30.40.2 kube-node2 <none> <none> nginx-ds-hz5fd 1/1 Running 0 70s 172.30.24.2 kube-node1 <none> <none> nginx-ds-skcrt 1/1 Running 0 70s 172.30.200.2 kube-node3 <none> <none>
可见,nginx-ds 的 Pod IP 分别是 172.30.40.2
、172.30.24.2
、172.30.200.2
,在全部 Node 上分别 ping 这三个 IP,看是否连通:
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh ${node_ip} "ping -c 1 172.30.24.2" ssh ${node_ip} "ping -c 1 172.30.40.2" ssh ${node_ip} "ping -c 1 172.30.200.2" done
[root@kube-node1 work]# kubectl get svc |grep nginx-ds nginx-ds NodePort 10.254.94.213 <none> 80:32039/TCP 3m24s
可见:
在全部 Node 上 curl Service IP:
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh ${node_ip} "curl -s 10.254.94.213" done
预期输出 nginx 欢迎页面内容。
在全部 Node 上执行:
source /opt/k8s/bin/environment.sh for node_ip in ${NODE_IPS[@]} do echo ">>> ${node_ip}" ssh ${node_ip} "curl -s ${node_ip}:32039" done
预期输出 nginx 欢迎页面内容。
插件是集群的附件组件,丰富和完善了集群的功能。
注意:
注意:
将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件。
cd /opt/k8s/work/kubernetes/ tar -xzvf kubernetes-src.tar.gz
coredns 目录是 cluster/addons/dns
:
cd /opt/k8s/work/kubernetes/cluster/addons/dns/coredns cp coredns.yaml.base coredns.yaml source /opt/k8s/bin/environment.sh sed -i -e "s/__PILLAR__DNS__DOMAIN__/${CLUSTER_DNS_DOMAIN}/" -e "s/__PILLAR__DNS__SERVER__/${CLUSTER_DNS_SVC_IP}/" coredns.yaml ### 注意 ### 在文件coredns.yaml中,拉取的coredns镜像是k8s.gcr.io/coredns:1.3.1,可是网站k8s.gcr.io被墙,没法访问,因此须要使用文档中提供的地址更换镜像下载地址: 地址:http://mirror.azure.cn/help/gcr-proxy-cache.html 文档中须要修改的地方: 将image: k8s.gcr.io/coredns:1.3.1 换成 image: gcr.azk8s.cn/google_containers/coredns:1.3.1 此时才能拉取镜像,避免后面因镜像没法拉取而致使的容器启动错误
kubectl create -f coredns.yaml # 注意 若在上一步中忘记修改镜像地址,形成coredns没法成功运行,可使用以下命令先删除操做,而后修改上述步骤提到的修改镜像地址,而后再建立
[root@kube-node1 coredns]# kubectl get all -n kube-system NAME READY STATUS RESTARTS AGE pod/coredns-58c479c699-blpdq 1/1 Running 0 4m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-dns ClusterIP 10.254.0.2 <none> 53/UDP,53/TCP,9153/TCP 4m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/coredns 1/1 1 1 4m NAME DESIRED CURRENT READY AGE replicaset.apps/coredns-58c479c699 1 1 1 4m # 注意:pod/coredns状态应该是Running才行,不然后面的步骤都没法验证
新建一个 Deployment
cd /opt/k8s/work cat > my-nginx.yaml <<EOF apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-nginx spec: replicas: 2 template: metadata: labels: run: my-nginx spec: containers: - name: my-nginx image: nginx:1.7.9 ports: - containerPort: 80 EOF kubectl create -f my-nginx.yaml
export 该 Deployment, 生成 my-nginx
服务:
[root@kube-node1 work]# kubectl expose deploy my-nginx service/my-nginx exposed [root@kube-node1 work]# kubectl get services --all-namespaces |grep my-nginx default my-nginx ClusterIP 10.254.63.243 <none> 80/TCP 11s
建立另外一个 Pod,查看 /etc/resolv.conf
是否包含 kubelet
配置的 --cluster-dns
和 --cluster-domain
,是否可以将服务 my-nginx
解析到上面显示的 Cluster IP 10.254.242.255
cd /opt/k8s/work cat > dnsutils-ds.yml <<EOF apiVersion: v1 kind: Service metadata: name: dnsutils-ds labels: app: dnsutils-ds spec: type: NodePort selector: app: dnsutils-ds ports: - name: http port: 80 targetPort: 80 --- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: dnsutils-ds labels: addonmanager.kubernetes.io/mode: Reconcile spec: template: metadata: labels: app: dnsutils-ds spec: containers: - name: my-dnsutils image: tutum/dnsutils:latest command: - sleep - "3600" ports: - containerPort: 80 EOF kubectl create -f dnsutils-ds.yml
[root@kube-node1 work]# kubectl get pods -lapp=dnsutils-ds NAME READY STATUS RESTARTS AGE dnsutils-ds-5krtg 1/1 Running 0 64s dnsutils-ds-cxzlg 1/1 Running 0 64s dnsutils-ds-tln64 1/1 Running 0 64s
[root@kube-node1 work]# kubectl -it exec dnsutils-ds-5krtg bash root@dnsutils-ds-5krtg:/# cat /etc/resolv.conf nameserver 10.254.0.2 search default.svc.cluster.local svc.cluster.local cluster.local mshome.net options ndots:5
注意:若下面这些步骤均没法验证,则很大多是coredns镜像拉取不到,此时能够经过以下命令查看具体缘由: kubectl get pod -n kube-system # 查看coredns kubectl describe pods -n kube-system coredns名称全称 # 查看具体描述信息
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kubernetes Server: 10.254.0.2 Address: 10.254.0.2#53 Name: kubernetes.default.svc.cluster.local Address: 10.254.0.1
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup www.baidu.com Server: 10.254.0.2 Address: 10.254.0.2#53 Non-authoritative answer: Name: www.baidu.com.mshome.net Address: 218.28.144.36
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup my-nginx Server: 10.254.0.2 Address: 10.254.0.2#53 Name: my-nginx.default.svc.cluster.local Address: 10.254.63.243
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster Server: 10.254.0.2 Address: 10.254.0.2#53 Non-authoritative answer: Name: kube-dns.kube-system.svc.cluster.mshome.net Address: 218.28.144.37
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc Server: 10.254.0.2 Address: 10.254.0.2#53 Name: kube-dns.kube-system.svc.cluster.local Address: 10.254.0.2
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster.local Server: 10.254.0.2 Address: 10.254.0.2#53 Name: kube-dns.kube-system.svc.cluster.local Address: 10.254.0.2
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster.local. Server: 10.254.0.2 Address: 10.254.0.2#53 Name: kube-dns.kube-system.svc.cluster.local Address: 10.254.0.2
注意:
将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件。
cd /opt/k8s/work/kubernetes/ tar -xzvf kubernetes-src.tar.gz
dashboard 对应的目录是:cluster/addons/dashboard
:
cd /opt/k8s/work/kubernetes/cluster/addons/dashboard
修改 service 定义,指定端口类型为 NodePort,这样外界能够经过地址 NodeIP:NodePort 访问 dashboard;
# cat dashboard-service.yaml apiVersion: v1 kind: Service metadata: name: kubernetes-dashboard namespace: kube-system labels: k8s-app: kubernetes-dashboard kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: type: NodePort # 增长这一行 selector: k8s-app: kubernetes-dashboard ports: - port: 443 targetPort: 8443
# ls *.yaml dashboard-configmap.yaml dashboard-controller.yaml dashboard-rbac.yaml dashboard-secret.yaml dashboard-service.yaml # 注意,须要修改其中镜像地址的文件 dashboard-controller.yaml image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1修改为image: gcr.azk8s.cn/google_containers/kubernetes-dashboard-amd64:v1.10.1 # kubectl apply -f .
[root@kube-node1 dashboard]# kubectl get deployment kubernetes-dashboard -n kube-system NAME READY UP-TO-DATE AVAILABLE AGE kubernetes-dashboard 1/1 1 1 14s [root@kube-node1 dashboard]# kubectl --namespace kube-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-58c479c699-blpdq 1/1 Running 0 30m 172.30.200.4 kube-node3 <none> <none> kubernetes-dashboard-64ffdff795-5rgd2 1/1 Running 0 33s 172.30.24.3 kube-node1 <none> <none> <none> [root@kube-node1 dashboard]# kubectl get services kubernetes-dashboard -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes-dashboard NodePort 10.254.110.235 <none> 443:31673/TCP 47s
# kubernetes-dashboard-64ffdff795-5rgd2 是pod名称 [root@kube-node1 dashboard]# kubectl exec --namespace kube-system -it kubernetes-dashboard-64ffdff795-5rgd2 -- /dashboard --help 2019/11/08 07:55:04 Starting overwatch Usage of /dashboard: --alsologtostderr log to standard error as well as files --api-log-level string Level of API request logging. Should be one of 'INFO|NONE|DEBUG'. Default: 'INFO'. (default "INFO") --apiserver-host string The address of the Kubernetes Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8080. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and local discovery is attempted. --authentication-mode strings Enables authentication options that will be reflected on login screen. Supported values: token, basic. Default: token.Note that basic option should only be used if apiserver has '--authorization-mode=ABAC' and '--basic-auth-file' flags set. (default [token]) --auto-generate-certificates When set to true, Dashboard will automatically generate certificates used to serve HTTPS. Default: false. --bind-address ip The IP address on which to serve the --secure-port (set to 0.0.0.0 for all interfaces). (default 0.0.0.0) --default-cert-dir string Directory path containing '--tls-cert-file' and '--tls-key-file' files. Used also when auto-generating certificates flag is set. (default "/certs") --disable-settings-authorizer When enabled, Dashboard settings page will not require user to be logged in and authorized to access settings page. --enable-insecure-login When enabled, Dashboard login view will also be shown when Dashboard is not served over HTTPS. Default: false. --enable-skip-login When enabled, the skip button on the login page will be shown. Default: false. --heapster-host string The address of the Heapster Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8082. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and service proxy will be used. --insecure-bind-address ip The IP address on which to serve the --port (set to 0.0.0.0 for all interfaces). (default 127.0.0.1) --insecure-port int The port to listen to for incoming HTTP requests. (default 9090) --kubeconfig string Path to kubeconfig file with authorization and master location information. --log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0) --log_dir string If non-empty, write log files in this directory --logtostderr log to standard error instead of files --metric-client-check-period int Time in seconds that defines how often configured metric client health check should be run. Default: 30 seconds. (default 30) --port int The secure port to listen to for incoming HTTPS requests. (default 8443) --stderrthreshold severity logs at or above this threshold go to stderr (default 2) --system-banner string When non-empty displays message to Dashboard users. Accepts simple HTML tags. Default: ''. --system-banner-severity string Severity of system banner. Should be one of 'INFO|WARNING|ERROR'. Default: 'INFO'. (default "INFO") --tls-cert-file string File containing the default x509 Certificate for HTTPS. --tls-key-file string File containing the default x509 private key matching --tls-cert-file. --token-ttl int Expiration time (in seconds) of JWE tokens generated by dashboard. Default: 15 min. 0 - never expires (default 900) -v, --v Level log level for V logs --vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging pflag: help requested command terminated with exit code 2
dashboard 的 --authentication-mode
支持 token、basic,默认为 token。若是使用 basic,则 kube-apiserver 必须配置 --authorization-mode=ABAC
和 --basic-auth-file
参数。
从 1.7 开始,dashboard 只容许经过 https 访问,若是使用 kube proxy 则必须监听 localhost 或 127.0.0.1。对于 NodePort 没有这个限制,可是仅建议在开发环境中使用。
对于不知足这些条件的登陆访问,在登陆成功后浏览器不跳转,始终停在登陆界面。
https://NodeIP:NodePort
地址访问 dashboard;这一步不操做
启动代理:
$ kubectl proxy --address='localhost' --port=8086 --accept-hosts='^*$' Starting to serve on 127.0.0.1:8086
--accept-hosts
选项,不然浏览器访问 dashboard 页面时提示 “Unauthorized”;浏览器访问 URL:http://127.0.0.1:8086/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
使用这种方式访问
获取集群服务地址列表:
[root@kube-node1 work]# kubectl cluster-info Kubernetes master is running at https://127.0.0.1:8443 CoreDNS is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy kubernetes-dashboard is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
浏览器访问 URL:https://192.168.75.110:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
dashboard 默认只支持 token 认证(不支持 client 证书认证),因此若是使用 Kubeconfig 文件,须要将 token 写入到该文件。
kubectl create sa dashboard-admin -n kube-system kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin ADMIN_SECRET=$(kubectl get secrets -n kube-system | grep dashboard-admin | awk '{print $1}') DASHBOARD_LOGIN_TOKEN=$(kubectl describe secret -n kube-system ${ADMIN_SECRET} | grep -E '^token' | awk '{print $2}') echo ${DASHBOARD_LOGIN_TOKEN} eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tZnpjbWwiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiY2FmYzk3MDctMDFmZi0xMWVhLThlOTctMDAwYzI5MWQxODIwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.YdK7a1YSUa-Y4boHDM2qLrI5PrimxIUd3EfuCX7GiiDVZ3EvJZQFA4_InGWcbHdZoA8AYyh2pQn-hGhiVz0lU2jLIIIFEF2zHc5su1CSISRciONv6NMrFBlTr6tNFsf6SEeEep9tvGILAFTHXPqSVsIb_lCmHeBdH_CDo4sAyLFATDYqI5Q2jBxnCU7DsD73j3LvLY9WlgpuLwAhOrNHc6USxPvB91-z-4GGbcpGIQPpDQ6OlT3cAP47zFRBIpIc2JwBZ63EmcZJqLxixgPMROqzFvV9mtx68o_GEAccsIELMEMqq9USIXibuFtQT6mV0U3p_wntIhr4OPxe5b7jvQ
使用输出的 token 登陆 Dashboard。
在浏览器登录界面选择使用令牌
source /opt/k8s/bin/environment.sh # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/cert/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=dashboard.kubeconfig # 设置客户端认证参数,使用上面建立的 Token kubectl config set-credentials dashboard_user \ --token=${DASHBOARD_LOGIN_TOKEN} \ # 注意这个参数,若使用shell脚本,有可能获取不到这个值,能够在shell脚本中手动设置这个值 --kubeconfig=dashboard.kubeconfig # 设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=dashboard_user \ --kubeconfig=dashboard.kubeconfig # 设置默认上下文 kubectl config use-context default --kubeconfig=dashboard.kubeconfig
因为缺乏 Heapster 插件,当前 dashboard 不能展现 Pod、Nodes 的 CPU、内存等统计数据和图表。
注意:
metrics-server 经过 kube-apiserver 发现全部节点,而后调用 kubelet APIs(经过 https 接口)得到各节点(Node)和 Pod 的 CPU、Memory 等资源使用状况。
从 Kubernetes 1.12 开始,kubernetes 的安装脚本移除了 Heapster,从 1.13 开始彻底移除了对 Heapster 的支持,Heapster 再也不被维护。
替代方案以下:
Kubernetes Dashboard 还不支持 metrics-server(PR:#3504),若是使用 metrics-server 替代 Heapster,将没法在 dashboard 中以图形展现 Pod 的内存和 CPU 状况,须要经过 Prometheus、Grafana 等监控方案来弥补。
注意:若是没有特殊指明,本文档的全部操做均在 kube-node1 节点上执行。
从 github clone 源码:
$ cd /opt/k8s/work/ $ git clone https://github.com/kubernetes-incubator/metrics-server.git $ cd metrics-server/deploy/1.8+/ $ ls aggregated-metrics-reader.yaml auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml
修改 metrics-server-deployment.yaml
文件,为 metrics-server 添加三个命令行参数:
# cat metrics-server-deployment.yaml 34 args: 35 - --cert-dir=/tmp 36 - --secure-port=4443 37 - --metric-resolution=30s # 新增 38 - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP # 新增 同时还须要修改镜像的拉取地址: 把image: k8s.gcr.io/metrics-server-amd64:v0.3.6换成image: gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6
部署 metrics-server:
# cd /opt/k8s/work/metrics-server/deploy/1.8+/ # kubectl create -f .
[root@kube-node1 1.8+]# kubectl -n kube-system get pods -l k8s-app=metrics-server NAME READY STATUS RESTARTS AGE metrics-server-65879bf98c-ghqbk 1/1 Running 0 38s [root@kube-node1 1.8+]# kubectl get svc -n kube-system metrics-server NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE metrics-server ClusterIP 10.254.244.235 <none> 443/TCP 55s
# docker run -it --rm gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6 --help Launch metrics-server Usage: [flags] Flags: --alsologtostderr log to standard error as well as files --authentication-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to create tokenaccessreviews.authentication.k8s.io. --authentication-skip-lookup If false, the authentication-kubeconfig will be used to lookup missing authentication configuration from the clust er. --authentication-token-webhook-cache-ttl duration The duration to cache responses from the webhook token authenticator. (default 10s) --authentication-tolerate-lookup-failure If true, failures to look up missing authentication configuration from the cluster are not considered fatal. Note that this can result in authentication that treats all requests as anonymous. --authorization-always-allow-paths strings A list of HTTP paths to skip during authorization, i.e. these are authorized without contacting the 'core' kubernetes server. --authorization-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to create subjectaccessreviews.authorization.k8s.io. --authorization-webhook-cache-authorized-ttl duration The duration to cache 'authorized' responses from the webhook authorizer. (default 10s) --authorization-webhook-cache-unauthorized-ttl duration The duration to cache 'unauthorized' responses from the webhook authorizer. (default 10s) --bind-address ip The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank, all interfaces will be used (0.0.0.0 for all IPv4 interfaces and :: for all IPv6 interfaces). (default 0.0.0.0) --cert-dir string The directory where the TLS certs are located. If --tls-cert-file and --tls-private-key-file are provided, this flag will be ignored. (default "apiserver.local.config/certificates") --client-ca-file string If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the CommonName of the client certificate. --contention-profiling Enable lock contention profiling, if profiling is enabled -h, --help help for this command --http2-max-streams-per-connection int The limit that the server gives to clients for the maximum number of streams in an HTTP/2 connection. Zero means t o use golang's default. --kubeconfig string The path to the kubeconfig used to connect to the Kubernetes API server and the Kubelets (defaults to in-cluster c onfig) --kubelet-certificate-authority string Path to the CA to use to validate the Kubelet's serving certificates. --kubelet-insecure-tls Do not verify CA of serving certificates presented by Kubelets. For testing purposes only. --kubelet-port int The port to use to connect to Kubelets. (default 10250) --kubelet-preferred-address-types strings The priority of node address types to use when determining which address to use to connect to a particular node (d efault [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP]) --log-flush-frequency duration Maximum number of seconds between log flushes (default 5s) --log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0) --log_dir string If non-empty, write log files in this directory --log_file string If non-empty, use this log file --logtostderr log to standard error instead of files (default true) --metric-resolution duration The resolution at which metrics-server will retain metrics. (default 1m0s) --profiling Enable profiling via web interface host:port/debug/pprof/ (default true) --requestheader-allowed-names strings List of client certificate common names to allow to provide usernames in headers specified by --requestheader-user name-headers. If empty, any client certificate validated by the authorities in --requestheader-client-ca-file is allowed. --requestheader-client-ca-file string Root certificate bundle to use to verify client certificates on incoming requests before trusting usernames in hea ders specified by --requestheader-username-headers. WARNING: generally do not depend on authorization being already done for incoming requests. --requestheader-extra-headers-prefix strings List of request header prefixes to inspect. X-Remote-Extra- is suggested. (default [x-remote-extra-]) --requestheader-group-headers strings List of request headers to inspect for groups. X-Remote-Group is suggested. (default [x-remote-group]) --requestheader-username-headers strings List of request headers to inspect for usernames. X-Remote-User is common. (default [x-remote-user]) --secure-port int The port on which to serve HTTPS with authentication and authorization.If 0, don't serve HTTPS at all. (default 443) --skip_headers If true, avoid header prefixes in the log messages --stderrthreshold severity logs at or above this threshold go to stderr --tls-cert-file string File containing the default x509 Certificate for HTTPS. (CA cert, if any, concatenated after server cert). If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to the directory specified by --cert-dir. --tls-cipher-suites strings Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be use. Possible values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_RC4_128_SHA --tls-min-version string Minimum TLS version supported. Possible values: VersionTLS10, VersionTLS11, VersionTLS12 --tls-private-key-file string File containing the default x509 private key matching --tls-cert-file. --tls-sni-cert-key namedCertKey A pair of x509 certificate and private key file paths, optionally suffixed with a list of domain patterns which are fully qualified domain names, possibly with prefixed wildcard segments. If no domain patterns are provided, the names of the certificate are extracted. Non-wildcard matches trump over wildcard matches, explicit domain patterns trump over extracted names. For multiple key/certificate pairs, use the --tls-sni-cert-key multiple times. Examples: "example.crt,example.key" or "foo.crt,foo.key:*.foo.com,foo.com". (default []) -v, --v Level number for the log level verbosity --vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
经过 kube-apiserver 或 kubectl proxy 访问:
使用浏览器访问,直接返回结果
https://192.168.75.110:6443/apis/metrics.k8s.io/v1beta1/nodes
https://192.168.75.110:6443/apis/metrics.k8s.io/v1beta1/pods
直接使用 kubectl 命令访问:
kubectl get --raw apis/metrics.k8s.io/v1beta1/nodes
kubectl get --raw apis/metrics.k8s.io/v1beta1/pods
# kubectl get --raw "/apis/metrics.k8s.io/v1beta1" | jq . { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "metrics.k8s.io/v1beta1", "resources": [ { "name": "nodes", "singularName": "", "namespaced": false, "kind": "NodeMetrics", "verbs": [ "get", "list" ] }, { "name": "pods", "singularName": "", "namespaced": true, "kind": "PodMetrics", "verbs": [ "get", "list" ] } ] } # kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq . { "kind": "NodeMetricsList", "apiVersion": "metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes" }, "items": [ { "metadata": { "name": "zhangjun-k8s01", "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s01", "creationTimestamp": "2019-05-26T10:55:10Z" }, "timestamp": "2019-05-26T10:54:52Z", "window": "30s", "usage": { "cpu": "311155148n", "memory": "2881016Ki" } }, { "metadata": { "name": "zhangjun-k8s02", "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s02", "creationTimestamp": "2019-05-26T10:55:10Z" }, "timestamp": "2019-05-26T10:54:54Z", "window": "30s", "usage": { "cpu": "253796835n", "memory": "1028836Ki" } }, { "metadata": { "name": "zhangjun-k8s03", "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s03", "creationTimestamp": "2019-05-26T10:55:10Z" }, "timestamp": "2019-05-26T10:54:54Z", "window": "30s", "usage": { "cpu": "280441339n", "memory": "1072772Ki" } } ] }
kubectl top 命令从 metrics-server 获取集群节点基本的指标信息:
[root@kube-node1 1.8+]# kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% kube-node1 125m 3% 833Mi 44% kube-node2 166m 4% 891Mi 47% kube-node3 126m 3% 770Mi 40%
注意:
将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件。
cd /opt/k8s/work/kubernetes/ tar -xzvf kubernetes-src.tar.gz
EFK 目录是 kubernetes/cluster/addons/fluentd-elasticsearch
。
# cd /opt/k8s/work/kubernetes/cluster/addons/fluentd-elasticsearch # vim fluentd-es-ds.yaml 把path: /var/lib/docker/containers修改为 path: /data/k8s/docker/data/containers/ 把image: k8s.gcr.io/fluentd-elasticsearch:v2.4.0修改为 image: gcr.azk8s.cn/google_containers/fluentd-elasticsearch:v2.4.0 # vim es-statefulset.yaml 官方文档中容器名称和镜像写的有问题,须要修改为以下形式 serviceAccountName: elasticsearch-logging containers: - name: elasticsearch-logging #image: gcr.io/fluentd-elasticsearch/elasticsearch:v6.6.1 image: docker.elastic.co/elasticsearch/elasticsearch:6.6.1 #gcr.azk8s.cn/fluentd-elasticsearch/elasticsearch:v6.6.1
[root@kube-node1 fluentd-elasticsearch]# pwd /opt/k8s/work/kubernetes/cluster/addons/fluentd-elasticsearch [root@kube-node1 fluentd-elasticsearch]# ls *.yaml es-service.yaml es-statefulset.yaml fluentd-es-configmap.yaml fluentd-es-ds.yaml kibana-deployment.yaml kibana-service.yaml # kubectl apply -f .
# 理想状态下的结果 [root@kube-node1 fluentd-elasticsearch]# kubectl get pods -n kube-system -o wide|grep -E 'elasticsearch|fluentd|kibana' elasticsearch-logging-0 1/1 Running 1 92s 172.30.24.6 kube-node1 <none> <none> elasticsearch-logging-1 1/1 Running 1 85s 172.30.40.6 kube-node2 <none> <none> fluentd-es-v2.4.0-k72m9 1/1 Running 0 91s 172.30.200.7 kube-node3 <none> <none> fluentd-es-v2.4.0-klvbr 1/1 Running 0 91s 172.30.24.7 kube-node1 <none> <none> fluentd-es-v2.4.0-pcq8p 1/1 Running 0 91s 172.30.40.5 kube-node2 <none> <none> kibana-logging-f4d99b69f-779gm 1/1 Running 0 91s 172.30.200.6 kube-node3 <none> <none> # 不理想状态下的结果 # 两个elasticsearch-logging只有一个是正常的,三个fluentd-es中有俩是正常的 # 不过过一下子有问题的也会出现running正常状况,没问题的出现问题 # 初步判断是由于出问题所在主机系统平均负载压力大的缘故 [root@kube-node1 fluentd-elasticsearch]# kubectl get pods -n kube-system -o wide|grep -E 'elasticsearch|fluentd|kibana' elasticsearch-logging-0 1/1 Running 0 16m 172.30.48.3 kube-node2 <none> <none> elasticsearch-logging-1 0/1 CrashLoopBackOff 7 15m 172.30.24.6 kube-node1 <none> <none> fluentd-es-v2.4.0-lzcl7 1/1 Running 0 16m 172.30.96.3 kube-node3 <none> <none> fluentd-es-v2.4.0-mm6gs 0/1 CrashLoopBackOff 5 16m 172.30.48.4 kube-node2 <none> <none> fluentd-es-v2.4.0-vx5vj 1/1 Running 0 16m 172.30.24.3 kube-node1 <none> <none> kibana-logging-f4d99b69f-6kjlr 1/1 Running 0 16m 172.30.96.5 kube-node3 <none> <none> [root@kube-node1 fluentd-elasticsearch]# kubectl get service -n kube-system|grep -E 'elasticsearch|kibana' elasticsearch-logging ClusterIP 10.254.202.87 <none> 9200/TCP 116s kibana-logging ClusterIP 10.254.185.3 <none> 5601/TCP 114s
kibana Pod 第一次启动时会用较长时间(0-20分钟)来优化和 Cache 状态页面,能够 tailf 该 Pod 的日志观察进度:
$ kubectl logs kibana-logging-7445dc9757-pvpcv -n kube-system -f {"type":"log","@timestamp":"2019-05-26T11:36:18Z","tags":["info","optimize"],"pid":1,"message":"Optimizing and caching bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page. This may take a few minutes"} {"type":"log","@timestamp":"2019-05-26T11:40:03Z","tags":["info","optimize"],"pid":1,"message":"Optimization of bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page complete in 224.57 seconds"}
注意:只有当 Kibana pod 启动完成后,浏览器才能查看 kibana dashboard,不然会被拒绝。
操做这个步骤
```bash [root@kube-node1 fluentd-elasticsearch]# kubectl cluster-info|grep -E 'Elasticsearch|Kibana' Elasticsearch is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy Kibana is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/kibana-logging/proxy ```
浏览器访问 URL: https://192.168.75.111:6443/api/v1/namespaces/kube-system/services/kibana-logging/proxy
对于 virtuabox 作了端口映射: http://127.0.0.1:8080/api/v1/namespaces/kube-system/services/kibana-logging/proxy
经过 kubectl proxy 访问
不操做这个步骤
建立代理
$ kubectl proxy --address='172.27.137.240' --port=8086 --accept-hosts='^*$' Starting to serve on 172.27.129.150:8086
浏览器访问 URL:http://172.27.137.240:8086/api/v1/namespaces/kube-system/services/kibana-logging/proxy
对于 virtuabox 作了端口映射: http://127.0.0.1:8086/api/v1/namespaces/kube-system/services/kibana-logging/proxy
在 Management -> Indices 页面建立一个 index(至关于 mysql 中的一个 database),选中 Index contains time-based events
,使用默认的 logstash-*
pattern,点击 Create
(这一步对操做所在节点的系统平均负载压力很大) ; 建立 Index 后,稍等几分钟就能够在 Discover
菜单下看到 ElasticSearch logging 中汇聚的日志;
系统平均负载压力大,表现是kswapd0进程占用CPU太高,深层含义是主机物理内存不足
注意:这一步不操做,私有仓库采用Harbor来部署
注意:本文档介绍使用 docker 官方的 registry v2 镜像部署私有仓库的步骤,你也能够部署 Harbor 私有仓库(部署 Harbor 私有仓库)。
本文档讲解部署一个 TLS 加密、HTTP Basic 认证、用 ceph rgw 作后端存储的私有 docker registry 步骤,若是使用其它类型的后端存储,则能够从 “建立 docker registry” 节开始;
示例两台机器 IP 以下:
$ ceph-deploy rgw create 172.27.132.66 # rgw 默认监听7480端口 $
$ radosgw-admin user create --uid=demo --display-name="ceph rgw demo user" $
当前 registry 只支持使用 swift 协议访问 ceph rgw 存储,暂时不支持 s3 协议;
$ radosgw-admin subuser create --uid demo --subuser=demo:swift --access=full --secret=secretkey --key-type=swift $
$ radosgw-admin key create --subuser=demo:swift --key-type=swift --gen-secret { "user_id": "demo", "display_name": "ceph rgw demo user", "email": "", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [ { "id": "demo:swift", "permissions": "full-control" } ], "keys": [ { "user": "demo", "access_key": "5Y1B1SIJ2YHKEHO5U36B", "secret_key": "nrIvtPqUj7pUlccLYPuR3ntVzIa50DToIpe7xFjT" } ], "swift_keys": [ { "user": "demo:swift", "secret_key": "ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb" } ], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "temp_url_keys": [] }
ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb
为子帐号 demo:swift 的 secret key;建立 registry 使用的 x509 证书
$ mkdir -p registry/{auth,certs} $ cat > registry-csr.json <<EOF { "CN": "registry", "hosts": [ "127.0.0.1", "172.27.132.67" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ] } EOF $ cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \ -ca-key=/etc/kubernetes/cert/ca-key.pem \ -config=/etc/kubernetes/cert/ca-config.json \ -profile=kubernetes registry-csr.json | cfssljson -bare registry $ cp registry.pem registry-key.pem registry/certs $
建立 HTTP Baisc 认证文件
$ docker run --entrypoint htpasswd registry:2 -Bbn foo foo123 > registry/auth/htpasswd $ cat registry/auth/htpasswd foo:$2y$05$iZaM45Jxlcg0DJKXZMggLOibAsHLGybyU.CgU9AHqWcVDyBjiScN.
配置 registry 参数
export RGW_AUTH_URL="http://172.27.132.66:7480/auth/v1" export RGW_USER="demo:swift" export RGW_SECRET_KEY="ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb" cat > config.yml << EOF # https://docs.docker.com/registry/configuration/#list-of-configuration-options version: 0.1 log: level: info fromatter: text fields: service: registry storage: cache: blobdescriptor: inmemory delete: enabled: true swift: authurl: ${RGW_AUTH_URL} username: ${RGW_USER} password: ${RGW_SECRET_KEY} container: registry auth: htpasswd: realm: basic-realm path: /auth/htpasswd http: addr: 0.0.0.0:8000 headers: X-Content-Type-Options: [nosniff] tls: certificate: /certs/registry.pem key: /certs/registry-key.pem health: storagedriver: enabled: true interval: 10s threshold: 3 EOF [k8s@zhangjun-k8s01 cert]$ cp config.yml registry [k8s@zhangjun-k8s01 cert]$ scp -r registry 172.27.132.67:/opt/k8s
建立 docker registry:
ssh k8s@172.27.132.67 $ docker run -d -p 8000:8000 --privileged \ -v /opt/k8s/registry/auth/:/auth \ -v /opt/k8s/registry/certs:/certs \ -v /opt/k8s/registry/config.yml:/etc/docker/registry/config.yml \ --name registry registry:2
将签署 registry 证书的 CA 证书拷贝到 /etc/docker/certs.d/172.27.132.67:8000
目录下
[k8s@zhangjun-k8s01 cert]$ sudo mkdir -p /etc/docker/certs.d/172.27.132.67:8000 [k8s@zhangjun-k8s01 cert]$ sudo cp /etc/kubernetes/cert/ca.pem /etc/docker/certs.d/172.27.132.67:8000/ca.crt
登录私有 registry:
$ docker login 172.27.132.67:8000 Username: foo Password: Login Succeeded
登录信息被写入 ~/.docker/config.json
文件:
$ cat ~/.docker/config.json { "auths": { "172.27.132.67:8000": { "auth": "Zm9vOmZvbzEyMw==" } } }
将本地的 image 打上私有 registry 的 tag:
$ docker tag prom/node-exporter:v0.16.0 172.27.132.67:8000/prom/node-exporter:v0.16.0 $ docker images |grep pause prom/node-exporter:v0.16.0 latest f9d5de079539 2 years ago 239.8 kB 172.27.132.67:8000/prom/node-exporter:v0.16.0 latest f9d5de079539 2 years ago 239.8 kB
将 image push 到私有 registry:
$ docker push 172.27.132.67:8000/prom/node-exporter:v0.16.0 The push refers to a repository [172.27.132.67:8000/prom/node-exporter:v0.16.0] 5f70bf18a086: Pushed e16a89738269: Pushed latest: digest: sha256:9a6b437e896acad3f5a2a8084625fdd4177b2e7124ee943af642259f2f283359 size: 916
查看 ceph 上是否已经有 push 的 pause 容器文件:
$ rados lspools rbd cephfs_data cephfs_metadata .rgw.root k8s default.rgw.control default.rgw.meta default.rgw.log default.rgw.buckets.index default.rgw.buckets.data $ rados --pool default.rgw.buckets.data ls|grep node-exporter 1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/cdb7590af5f064887f3d6008d46be65e929c74250d747813d85199e04fc70463/link 1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/revisions/sha256/55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086/link 1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/tags/v0.16.0/current/link 1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/tags/v0.16.0/index/sha256/55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086/link 1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/224a21997e8ca8514d42eb2ed98b19a7ee2537bce0b3a26b8dff510ab637f15c/link 1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/528dda9cf23d0fad80347749d6d06229b9a19903e49b7177d5f4f58736538d4e/link 1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/188af75e2de0203eac7c6e982feff45f9c340eaac4c7a0f59129712524fa2984/link
$ curl --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/_catalog {"repositories":["prom/node-exporter"]}
$ curl --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/tags/list {"name":"prom/node-exporter","tags":["v0.16.0"]}
向 v2/<repoName>/manifests/<tagName>
发 GET 请求,从响应的头部 Docker-Content-Digest
获取 image digest,从响应的 body 的 fsLayers.blobSum
中获取 layDigests;
注意,必须包含请求头:Accept: application/vnd.docker.distribution.manifest.v2+json
:
$ curl -v -H "Accept: application/vnd.docker.distribution.manifest.v2+json" --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/manifests/v0.16.0 * About to connect() to 172.27.132.67 port 8000 (#0) * Trying 172.27.132.67... * Connected to 172.27.132.67 (172.27.132.67) port 8000 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: /etc/docker/certs.d/172.27.132.67:8000/ca.crt CApath: none * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: * subject: CN=registry,OU=4Paradigm,O=k8s,L=BeiJing,ST=BeiJing,C=CN * start date: Jul 05 12:52:00 2018 GMT * expire date: Jul 02 12:52:00 2028 GMT * common name: registry * issuer: CN=kubernetes,OU=4Paradigm,O=k8s,L=BeiJing,ST=BeiJing,C=CN * Server auth using Basic with user 'foo' > GET /v2/prom/node-exporter/manifests/v0.16.0 HTTP/1.1 > Authorization: Basic Zm9vOmZvbzEyMw== > User-Agent: curl/7.29.0 > Host: 172.27.132.67:8000 > Accept: application/vnd.docker.distribution.manifest.v2+json > < HTTP/1.1 200 OK < Content-Length: 949 < Content-Type: application/vnd.docker.distribution.manifest.v2+json < Docker-Content-Digest: sha256:55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086 < Docker-Distribution-Api-Version: registry/2.0 < Etag: "sha256:55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086" < X-Content-Type-Options: nosniff < Date: Fri, 06 Jul 2018 06:18:41 GMT < { "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 3511, "digest": "sha256:188af75e2de0203eac7c6e982feff45f9c340eaac4c7a0f59129712524fa2984" }, "layers": [ { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 2392417, "digest": "sha256:224a21997e8ca8514d42eb2ed98b19a7ee2537bce0b3a26b8dff510ab637f15c" }, { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 560703, "digest": "sha256:cdb7590af5f064887f3d6008d46be65e929c74250d747813d85199e04fc70463" }, { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 5332460, "digest": "sha256:528dda9cf23d0fad80347749d6d06229b9a19903e49b7177d5f4f58736538d4e" } ]
向 /v2/<name>/manifests/<reference>
发送 DELETE 请求,reference 为上一步返回的 Docker-Content-Digest 字段内容:
$ curl -X DELETE --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/manifests/sha256:68effe31a4ae8312e47f54bec52d1fc925908009ce7e6f734e1b54a4169081c5 $
向 /v2/<name>/blobs/<digest>
发送 DELETE 请求,其中 digest 是上一步返回的 fsLayers.blobSum
字段内容:
$ curl -X DELETE --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 $ curl -X DELETE --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/blobs/sha256:04176c8b224aa0eb9942af765f66dae866f436e75acef028fe44b8a98e045515 $
执行 http://docs.ceph.com/docs/master/install/install-ceph-gateway/ 里面的 s3 test.py 程序失败:
[k8s@zhangjun-k8s01 cert]$ python s3test.py Traceback (most recent call last): File "s3test.py", line 12, in bucket = conn.create_bucket('my-new-bucket') File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 625, in create_bucket response.status, response.reason, body) boto.exception.S3ResponseError: S3ResponseError: 416 Requested Range Not Satisfiable
解决版办法:
For anyone who is hitting this issue set default pg_num and pgp_num to lower value(8 for example), or set mon_max_pg_per_osd to a high value in ceph.conf radosgw-admin doesn' throw proper error when internal pool creation fails, hence the upper level error which is very confusing.
https://tracker.ceph.com/issues/21497
[root@zhangjun-k8s01 ~]# docker login 172.27.132.67:8000 Username: foo Password: Error response from daemon: login attempt to https://172.27.132.67:8000/v2/ failed with status: 503 Service Unavailable
缘由: docker run 缺乏 --privileged 参数
本文档介绍使用 docker-compose 部署 harbor 私有仓库的步骤,你也可使用 docker 官方的 registry 镜像部署私有仓库(部署 Docker Registry)。
本文档用到的变量定义以下:
# 这个环境变量后面会用到,可是搞不清楚这个IP究竟是从哪儿来的??? export NODE_IP=10.64.3.7 # 当前部署 harbor 的节点 IP
从 docker compose 发布页面下载最新的 docker-compose
二进制文件
cd /opt/k88/work wget https://github.com/docker/compose/releases/download/1.21.2/docker-compose-Linux-x86_64 mv docker-compose-Linux-x86_64 /opt/k8s/bin/docker-compose chmod a+x /opt/k8s/bin/docker-compose export PATH=/opt/k8s/bin:$PATH
从 harbor 发布页面下载最新的 harbor 离线安装包
cd /opt/k88/work wget --continue https://storage.googleapis.com/harbor-releases/release-1.5.0/harbor-offline-installer-v1.5.1.tgz tar -xzvf harbor-offline-installer-v1.5.1.tgz
导入离线安装包中 harbor 相关的 docker images:
cd harbor docker load -i harbor.v1.5.1.tar.gz
建立 harbor 证书签名请求:
cd /opt/k8s/work cat > harbor-csr.json <<EOF { "CN": "harbor", "hosts": [ "127.0.0.1", "${NODE_IP}" ### 前面未设置环境变量的话能够直接写死 ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "4Paradigm" } ] } EOF
生成 harbor 证书和私钥:
cd /opt/k8s/work cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \ -ca-key=/etc/kubernetes/cert/ca-key.pem \ -config=/etc/kubernetes/cert/ca-config.json \ -profile=kubernetes harbor-csr.json | cfssljson -bare harbor ls harbor* harbor.csr harbor-csr.json harbor-key.pem harbor.pem mkdir -p /etc/harbor/ssl cp harbor*.pem /etc/harbor/ssl
cd /opt/k8s/work/harbor cp harbor.cfg{,.bak} # 备份配置文件 vim harbor.cfg hostname = 172.27.129.81 ui_url_protocol = https ssl_cert = /etc/harbor/ssl/harbor.pem ssl_cert_key = /etc/harbor/ssl/harbor-key.pem cp prepare{,.bak} vim prepare 把empty_subj = "/C=/ST=/L=/O=/CN=/" 修改为 empty_subj = "/"
须要修改 prepare 脚本的 empyt_subj 参数,不然后续 install 时出错退出:
Fail to generate key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt
参考:https://github.com/vmware/harbor/issues/2920
cd /opt/k8s/work/harbor mkdir -p /data # 用来存放日志相关的 后期能够考虑修改到其余路径下 ./install.sh [Step 0]: checking installation environment ... Note: docker version: 18.03.0 Note: docker-compose version: 1.21.2 [Step 1]: loading Harbor images ... Loaded image: vmware/clair-photon:v2.0.1-v1.5.1 Loaded image: vmware/postgresql-photon:v1.5.1 Loaded image: vmware/harbor-adminserver:v1.5.1 Loaded image: vmware/registry-photon:v2.6.2-v1.5.1 Loaded image: vmware/photon:1.0 Loaded image: vmware/harbor-migrator:v1.5.1 Loaded image: vmware/harbor-ui:v1.5.1 Loaded image: vmware/redis-photon:v1.5.1 Loaded image: vmware/nginx-photon:v1.5.1 Loaded image: vmware/mariadb-photon:v1.5.1 Loaded image: vmware/notary-signer-photon:v0.5.1-v1.5.1 Loaded image: vmware/harbor-log:v1.5.1 Loaded image: vmware/harbor-db:v1.5.1 Loaded image: vmware/harbor-jobservice:v1.5.1 Loaded image: vmware/notary-server-photon:v0.5.1-v1.5.1 [Step 2]: preparing environment ... loaded secret from file: /data/secretkey Generated configuration file: ./common/config/nginx/nginx.conf Generated configuration file: ./common/config/adminserver/env Generated configuration file: ./common/config/ui/env Generated configuration file: ./common/config/registry/config.yml Generated configuration file: ./common/config/db/env Generated configuration file: ./common/config/jobservice/env Generated configuration file: ./common/config/jobservice/config.yml Generated configuration file: ./common/config/log/logrotate.conf Generated configuration file: ./common/config/jobservice/config.yml Generated configuration file: ./common/config/ui/app.conf Generated certificate, key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt The configuration files are ready, please use docker-compose to start the service. [Step 3]: checking existing instance of Harbor ... [Step 4]: starting Harbor ... Creating network "harbor_harbor" with the default driver Creating harbor-log ... done Creating redis ... done Creating harbor-adminserver ... done Creating harbor-db ... done Creating registry ... done Creating harbor-ui ... done Creating harbor-jobservice ... done Creating nginx ... done ✔ ----Harbor has been installed and started successfully.---- Now you should be able to visit the admin portal at https://192.168.75.110. For more details, please visit https://github.com/vmware/harbor .
确认全部组件都工做正常:
[root@kube-node1 harbor]# docker-compose ps Name Command State Ports ------------------------------------------------------------------------------------------------------------------------------------- harbor-adminserver /harbor/start.sh Up (healthy) harbor-db /usr/local/bin/docker-entr ... Up (healthy) 3306/tcp harbor-jobservice /harbor/start.sh Up harbor-log /bin/sh -c /usr/local/bin/ ... Up (healthy) 127.0.0.1:1514->10514/tcp harbor-ui /harbor/start.sh Up (healthy) nginx nginx -g daemon off; Up (healthy) 0.0.0.0:443->443/tcp, 0.0.0.0:4443->4443/tcp, 0.0.0.0:80->80/tcp redis docker-entrypoint.sh redis ... Up 6379/tcp registry /entrypoint.sh serve /etc/ ... Up (healthy) 5000/tcp
浏览器访问 https://192.168.75.110
;
用帐号 admin
和 harbor.cfg 配置文件中的默认密码 Harbor12345
登录系统。
harbor 将日志打印到 /var/log/harbor 的相关目录下,使用 docker logs XXX 或 docker-compose logs XXX 将看不到容器的日志。
# 日志目录 ls /var/log/harbor adminserver.log jobservice.log mysql.log proxy.log registry.log ui.log # 数据目录,包括数据库、镜像仓库 ls /data/ ca_download config database job_logs registry secretkey
# 修改"secretkey"的路径 vim harbor.cfg #The path of secretkey storage secretkey_path = /data/harbor-data # 默认是 /data # 修改原先全部默认为"/data"的volume的挂载路径 vim docker-compose.yml # 完成上述修改后执行下述命令从新部署容器便可: ./prepare docker-compose up -d # 注意:在整个部署过程当中,不要手动修改上述关联挂载路径下的内容。若要修改相关内容,必定要保证在容器彻底移除(docker-compose down)的前提下进行。
将签署harbor 证书的 CA 证书拷贝到客户端的指定目录下 ,假设Harbor仓库部署在主机IP是192.168.75.110的主机上,主机IP是192.168.75.111的想要远程的登录该仓库。
# 在主机IP是192.168.75.111上建立指定目录用来存放仓库镜像。注意后面的IP地址,仓库地址是ip则用ip,是网址的话则用网址 mkdir -p /etc/docker/certs.d/192.168.75.110 # 在主机ip是192.168.75.110上操做,把CA证书拷贝到客户端的指定目录下,也就是上一步建立的目录下,并重命名为ca.crt scp /etc/kubernetes/cert/ca.pem root@192.168.75.111:/etc/docker/certs.d/192.168.75.110/ca.crt
登录 harbor
# docker login https://192.168.75.110 Username: admin Password: Harbor12345 # 默认密码
认证信息自动保存到 ~/.docker/config.json
文件。
下列操做的工做目录均为 解压离线安装文件后 生成的 harbor 目录。
# 修改仓库镜像保存路径,日志文件保存路径等会用到这些,能够参考上面的步骤:修改默认的数据目录等 # 中止 harbor docker-compose down -v # 修改配置 vim harbor.cfg # 更修改的配置更新到 docker-compose.yml 文件 ./prepare Clearing the configuration file: ./common/config/ui/app.conf Clearing the configuration file: ./common/config/ui/env Clearing the configuration file: ./common/config/ui/private_key.pem Clearing the configuration file: ./common/config/db/env Clearing the configuration file: ./common/config/registry/root.crt Clearing the configuration file: ./common/config/registry/config.yml Clearing the configuration file: ./common/config/jobservice/app.conf Clearing the configuration file: ./common/config/jobservice/env Clearing the configuration file: ./common/config/nginx/cert/admin.pem Clearing the configuration file: ./common/config/nginx/cert/admin-key.pem Clearing the configuration file: ./common/config/nginx/nginx.conf Clearing the configuration file: ./common/config/adminserver/env loaded secret from file: /data/secretkey Generated configuration file: ./common/config/nginx/nginx.conf Generated configuration file: ./common/config/adminserver/env Generated configuration file: ./common/config/ui/env Generated configuration file: ./common/config/registry/config.yml Generated configuration file: ./common/config/db/env Generated configuration file: ./common/config/jobservice/env Generated configuration file: ./common/config/jobservice/app.conf Generated configuration file: ./common/config/ui/app.conf Generated certificate, key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt The configuration files are ready, please use docker-compose to start the service. chmod -R 666 common ## 防止容器进程没有权限读取生成的配置 # 启动 harbor docker-compose up -d
停相关进程:
systemctl stop kubelet kube-proxy flanneld docker kube-proxy kube-nginx
清理文件:
source /opt/k8s/bin/environment.sh # umount kubelet 和 docker 挂载的目录 mount | grep "${K8S_DIR}" | awk '{print $3}'|xargs sudo umount # 删除 kubelet 工做目录 rm -rf ${K8S_DIR}/kubelet # 删除 docker 工做目录 rm -rf ${DOCKER_DIR} # 删除 flanneld 写入的网络配置文件 rm -rf /var/run/flannel/ # 删除 docker 的一些运行文件 rm -rf /var/run/docker/ # 删除 systemd unit 文件 rm -rf /etc/systemd/system/{kubelet,docker,flanneld,kube-nginx}.service # 删除程序文件 rm -rf /opt/k8s/bin/* # 删除证书文件 rm -rf /etc/flanneld/cert /etc/kubernetes/cert
清理 kube-proxy 和 docker 建立的 iptables:
iptables -F && sudo iptables -X && sudo iptables -F -t nat && sudo iptables -X -t nat
删除 flanneld 和 docker 建立的网桥:
ip link del flannel.1 ip link del docker0
停相关进程:
systemctl stop kube-apiserver kube-controller-manager kube-scheduler kube-nginx
清理文件:
# 删除 systemd unit 文件 rm -rf /etc/systemd/system/{kube-apiserver,kube-controller-manager,kube-scheduler,kube-nginx}.service # 删除程序文件 rm -rf /opt/k8s/bin/{kube-apiserver,kube-controller-manager,kube-scheduler} # 删除证书文件 rm -rf /etc/flanneld/cert /etc/kubernetes/cert
停相关进程:
systemctl stop etcd
清理文件:
source /opt/k8s/bin/environment.sh # 删除 etcd 的工做目录和数据目录 rm -rf ${ETCD_DATA_DIR} ${ETCD_WAL_DIR} # 删除 systemd unit 文件 rm -rf /etc/systemd/system/etcd.service # 删除程序文件 rm -rf /opt/k8s/bin/etcd # 删除 x509 证书文件 rm -rf /etc/etcd/cert/*
浏览器访问 kube-apiserver 的安全端口 6443 时,提示证书不被信任:
这是由于 kube-apiserver 的 server 证书是咱们建立的根证书 ca.pem 签名的,须要将根证书 ca.pem 导入操做系统,并设置永久信任。
对于 Mac,操做以下:
对于 windows 系统使用如下命令导入 ca.perm:
keytool -import -v -trustcacerts -alias appmanagement -file "PATH...\\ca.pem" -storepass password -keystore cacerts
再次访问 apiserver 地址,已信任,但提示 401,未受权的访问:
注意:从这个地方开始进行操做
咱们须要给浏览器生成一个 client 证书,访问 apiserver 的 6443 https 端口时使用。
这里使用部署 kubectl 命令行工具时建立的 admin 证书、私钥和上面的 ca 证书,建立一个浏览器可使用 PKCS#12/PFX 格式的证书:
$ openssl pkcs12 -export -out admin.pfx -inkey admin-key.pem -in admin.pem -certfile ca.pem # 中间输入密码的地方都不输入密码,直接回车 # windows系统直接在使用的浏览器设置中导入生成的这个证书便可
将建立的 admin.pfx 导入到系统的证书中。对于 Mac,操做以下:
重启浏览器,再次访问 apiserver 地址,提示选择一个浏览器证书,这里选中上面导入的 admin.pfx:
这一次,被受权访问 kube-apiserver 的安全端口:
以校验 kubernetes 证书(后续部署 master 节点时生成的)为例:
$ openssl x509 -noout -text -in kubernetes.pem ... Signature Algorithm: sha256WithRSAEncryption Issuer: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=Kubernetes Validity Not Before: Apr 5 05:36:00 2017 GMT Not After : Apr 5 05:36:00 2018 GMT Subject: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=kubernetes ... X509v3 extensions: X509v3 Key Usage: critical Digital Signature, Key Encipherment X509v3 Extended Key Usage: TLS Web Server Authentication, TLS Web Client Authentication X509v3 Basic Constraints: critical CA:FALSE X509v3 Subject Key Identifier: DD:52:04:43:10:13:A9:29:24:17:3A:0E:D7:14:DB:36:F8:6C:E0:E0 X509v3 Authority Key Identifier: keyid:44:04:3B:60:BD:69:78:14:68:AF:A0:41:13:F6:17:07:13:63:58:CD X509v3 Subject Alternative Name: DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.default.svc.cluster.local, IP Address:127.0.0.1, IP Address:10.64.3.7, IP Address:10.254.0.1 ...
$ cfssl-certinfo -cert kubernetes.pem ... { "subject": { "common_name": "kubernetes", "country": "CN", "organization": "k8s", "organizational_unit": "System", "locality": "BeiJing", "province": "BeiJing", "names": [ "CN", "BeiJing", "BeiJing", "k8s", "System", "kubernetes" ] }, "issuer": { "common_name": "Kubernetes", "country": "CN", "organization": "k8s", "organizational_unit": "System", "locality": "BeiJing", "province": "BeiJing", "names": [ "CN", "BeiJing", "BeiJing", "k8s", "System", "Kubernetes" ] }, "serial_number": "174360492872423263473151971632292895707129022309", "sans": [ "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local", "127.0.0.1", "10.64.3.7", "10.64.3.8", "10.66.3.86", "10.254.0.1" ], "not_before": "2017-04-05T05:36:00Z", "not_after": "2018-04-05T05:36:00Z", "sigalg": "SHA256WithRSA", ...