基于公司发展硬性需求,生产VM服务器要统一迁移到ZStack 虚拟化服务器。检查本身项目使用的服务器,其中zookeeper集群中招,因此须要进行迁移。java
为了使迁移不对业务产生影响,因此最好是采用扩容
-> 缩容
的方式进行。linux
说明: 1.原生产集群为VM-1,VM-2,VM-3组成一个3节点的ZK集群; 2.对该集群扩容,增长至6节点(新增ZS-1,ZS-2,ZS-3),进行数据同步完成; 3.进行缩容,下掉原先来的三个节点(VM-1,VM-2,VM-3); 4.替换nginx解析地址。 OK! 目标很明确,过程也很清晰,而后开干。
ZS-1
启动成功,zkServer.sh status 报错,用zkServer.sh status查看,反馈以下异常:[root@localhost bin]# ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg Error contacting service. It is probably not running.
此时查看数据,数据同步正常
ZS-1 数据同步正常,可是没法查看节点的状态信息;
如下方法来自于网络:nginx
第1、zoo.cfg文件配置:dataLogDir指定的目录未被建立。shell
1.zoo.cfg [root@SIA-215 conf]# cat zoo.cfg ... dataDir=/app/zookeeperdata/data dataLogDir=/app/zookeeperdata/log ... 2.路径 [root@SIA-215 conf]# cd /app/zookeeperdata/ [root@SIA-215 zookeeperdata]# ll total 8 drwxr-xr-x 3 root root 4096 Apr 23 19:59 data drwxr-xr-x 3 root root 4096 Aug 29 2015 log
经排查 排除该因素。apache
第2、myid文件中的整数格式不对,或者与zoo.cfg中的server整数不对应。api
[root@SIA-215 data]# cd /app/zookeeperdata/data [root@SIA-215 data]# cat myid 2[root@SIA-215 data]#
定位排查后排除不是该缘由。服务器
第3、防火墙未关闭。网络
使用service iptables stop 关闭防火墙;
使用service iptables status确认;
使用chkconfig iptables off禁用防火墙。app
确认防火墙是关闭的。tcp
[root@localhost ~]# service iptables status iptables: Firewall is not running. 确认防火墙是关闭的
第4、端口被占用。
[root@localhost bin]# netstat -tunlp | grep 2181 tcp 0 0 :::12181 :::* LISTEN 30035/java tcp 0 0 :::22181 :::* LISTEN 30307/java 确认端口没有被占用
第5、zoo.cfg文件中主机名出错。
经测试环境测试,主机名正确,多域名解析也正常,不存在此问题
第6、hosts文件中,本机的主机名有两个对应,只需保留主机名和ip地址的映射。
经测试环境测试,主机名正确,多域名解析也正常,不存在此问题 排除。
第7、zkServer.sh里的nc命令有问题。
多是机器上没有安装nc命令,还有种说法是在zkServer.sh里找到这句: STAT=`echo stat | nc localhost $(grep clientPort “$ZOOCFG” | sed -e ‘s/.*=//’) 2> /dev/null| grep Mode` 在nc与localhost之间加上 -q 1 (是数字1而不是字母l) zookeeper版本是3.4.6,zkServer.sh里根本没有这一句(获取状态的语句没有用nc命令) # -q is necessary on some versions of linux where nc returns too quickly, and no stat result is output clientPortAddress=`grep "^[[:space:]]*clientPortAddress[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'` if ! [ $clientPortAddress ] then clientPortAddress="localhost" fi clientPort=`grep "^[[:space:]]*clientPort[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'` STAT=`"$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ -cp "$CLASSPATH" $JVMFLAGS org.apache.zookeeper.client.FourLetterWordMain \ $clientPortAddress $clientPort srvr 2> /dev/null \ | grep Mode` if [ "x$STAT" = "x" ] then echo "Error contacting service. It is probably not running." exit 1 else echo $STAT exit 0 fi ;;
目前现象老集群数据同步正常,也能进行leader选举(从日志获取),可是没法查看节点状态,同异常信息;进行集群扩容,数据不能同步。
一、尝试进行foreground 模式启动,选择一台非主节点进行重启,能够前台查看启动日志。
zkserver.sh start-foreground 节点启动正常,无异常输出。
二、查看shell脚本:分析zkServer.sh。
STAT=`"$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ -cp "$CLASSPATH" $JVMFLAGS org.apache.zookeeper.client.FourLetterWordMain \ $clientPortAddress $clientPort srvr 2> /dev/null \ | grep Mode` if [ "x$STAT" = "x" ] then echo "Error contacting service. It is probably not running." exit 1 else echo $STAT exit 0 fi ;;
$STAT
获取存在异常 若是STAT变量为空,则会显示Error contacting service. It is probably not running.:OK,那就分析下这个$STAT
究竟是什么鬼?
if [ “x$STAT” = “x” ] then echo “Error contacting service. It is probably not running.” exit 1 else echo $STAT exit 0 fi
三、尝试用shell的debug模式 看下执行过程:
++ grep '^[[:space:]]*clientPort[^[:alpha:]]' /app/zookeeper-3.4.6/bin/../conf/zoo.cfg + clientPort=5181 ++ grep Mode ++ /opt/jdk1.8.0_131/bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp '/app/zookeeper-3.4.6/bin/../build/classes:/app/zookeeper-3.4.6/bin/../build/lib/*.jar:/app/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/app/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/app/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/app/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/app/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/app/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/app/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/app/zookeeper-3.4.6/bin/../conf:.:/opt/jdk1.8.0_131/lib/dt.jar:/opt/jdk1.8.0_131/lib/tools.jar' org.apache.zookeeper.client.FourLetterWordMain localhost 5181 srvr + STAT= + ‘[‘ x = x ‘]’ + echo ‘Error contacting service. It is probably not running.’ Error contacting service. It is probably not running. + exit 1
四、修改shell脚本:分析zkServer.sh 在脚本总增长输出STAT 内容,此次咱们不进行过滤。
STAT1=`"$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ -cp "$CLASSPATH" $JVMFLAGS org.apache.zookeeper.client.FourLetterWordMain \ $clientPortAddress $clientPort srvr 2> test.log \ ` echo "$STAT1"
[root@localhost bin]# ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /usr/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg Error contacting service. It is probably not running.
in thread “main” java.lang.NumberFormatException: For input string: “2181 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.zookeeper.client.FourLetterWordMain.main(FourLetterWordMain.java:76)
zkServer.sh里有这么一句:
clientPort=`grep “^[[:space:]]*clientPort[^[:alpha:]]” “$ZOOCFG” | sed -e ‘s/.*=//’` grep “^[[:space:]]*clientPort[^[:alpha:]]” “$ZOOCFG” | sed -e ‘s/.*=//’在执行过程当中,实际命令以下: grep ‘^[[:space:]]*clientPort[^[:alpha:]]’ /app/zookeeper-3.4.6/bin/../conf/zoo.cfg | sed -e ‘s/.*=//’
做者: 毛正卫