系列目录html
问题描述:周五写字楼总体停电,周一再来的时候发现不少pod的状态都是Terminating
,经排查是由于测试环境kubernetes集群中的有些节点是PC机,停电后须要手动开机才能起来.起来之后节点恢复正常,可是经过journalctl -fu kubelet
查看日志不断有如下错误node
[root@k8s-node4 pods]# journalctl -fu kubelet -- Logs begin at 二 2019-05-21 08:52:08 CST. -- 5月 21 14:48:48 k8s-node4 kubelet[2493]: E0521 14:48:48.748460 2493 kubelet_volumes.go:140] Orphaned pod "d29f26dc-77bb-11e9-971b-0050568417a2" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
咱们经过cd进入/var/lib/kubelet/pods
目录,使用ls查看git
[root@k8s-node4 pods]# ls 36e224e2-7b73-11e9-99bc-0050568417a2 42e8cd65-76b1-11e9-971b-0050568417a2 42eaca2d-76b1-11e9-971b-0050568417a2 36e30462-7b73-11e9-99bc-0050568417a2 42e94e29-76b1-11e9-971b-0050568417a2 d29f26dc-77bb-11e9-971b-0050568417a2
能够看到,错误信息里的pod的ID在这里面,咱们cd进入它(d29f26dc-77bb-11e9-971b-0050568417a2),能够看到里面有如下文件github
[root@k8s-node4 d29f26dc-77bb-11e9-971b-0050568417a2]# ls containers etc-hosts plugins volumes
咱们查看etc-hosts
文件docker
[root@k8s-node4 d29f26dc-77bb-11e9-971b-0050568417a2]# cat etc-hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.244.7.7 sagent-b4dd8b5b9-zq649
咱们在主节点上执行kubectl get pod|grep sagent-b4dd8b5b9-zq649
发现这个pod已经不存在了.bash
问题的讨论查看这里有人在pr里提交了来解决这个问题,截至目前PR仍然是未合并状态.oop
目前解决办法是先在问题节点上进入/var/lib/kubelet/pods
目录,删除报错的pod对应的hash(rm -rf 名称
),而后从集群主节点删除此节点(kubectl delete node),而后在问题节点上执行测试
kubeadm reset systemctl stop kubelet systemctl stop docker systemctl start docker systemctl start kubelet
执行完成之后此节点从新加入集群this