个人ceph集群运行了一段时间后,报以下警告: shell
# ceph -s cluster c6e7e7d9-2b91-4550-80b0-6fa46d0644f6 health HEALTH_WARN clock skew detected on mon.c 896 pgs stuck inactive 896 pgs stuck unclean noscrub flag(s) set Monitor clock skew detected monmap e1: 5 mons at {a=101.71.4.11:6789/0,b=101.71.4.12:6789/0,c=101.71.4.13:6789/0,d=101.71.4.14:6789/0,e=101.71.4.15:6789/0} election epoch 28, quorum 0,1,2,3,4 a,b,c,d,e osdmap e1616: 240 osds: 216 up, 216 in flags noscrub pgmap v16891: 4992 pgs, 18 pools, 1093 GB data, 38340 objects 5446 GB used, 361 TB / 386 TB avail 4096 active+clean 896 creating核心信息就是 clock skew detected on mon.c
我采用以下办法解决问题:
1. 才每一台MON机器上执行以下命令关闭ntpd服务 rest
service ntpd stop2. 执行ntpupdate命令进行时间信息同步
[root@gnop029-ct-zhejiang_wenzhou-16-14 ~]# ntpdate us.pool.ntp.org 5 Dec 16:27:20 ntpdate[30359]: adjust time server 209.118.204.201 offset 0.000712 sec3. 从新启动ntpd服务
service ntpd start4.从新启动ceph -s后,发现集群再也不报时间问题:
[root@gnop029-ct-zhejiang_wenzhou-16-14 ~]# ceph -s cluster c6e7e7d9-2b91-4550-80b0-6fa46d0644f6 health HEALTH_WARN 896 pgs stuck inactive 896 pgs stuck unclean noscrub flag(s) set monmap e1: 5 mons at {a=101.71.4.11:6789/0,b=101.71.4.12:6789/0,c=101.71.4.13:6789/0,d=101.71.4.14:6789/0,e=101.71.4.15:6789/0} election epoch 28, quorum 0,1,2,3,4 a,b,c,d,e osdmap e1616: 240 osds: 216 up, 216 in flags noscrub pgmap v16891: 4992 pgs, 18 pools, 1093 GB data, 38340 objects 5446 GB used, 361 TB / 386 TB avail 4096 active+clean 896 creating
--------------------------------------------------------------------- code
能够经过crontab命令将动做放置到周期任务中
crontab -e
增长以下信息 server
10 * * * * /usr/sbin/ntpdate us.pool.ntp.org
service crond restart Stopping crond: [ OK ] Starting crond: [ OK ]