虽然spark master挂掉的概率很低,不过仍是被我遇到了一次。之前在spark standalone的文章中也介绍过standalone的ha,如今详细说下部署流程,其实也比较简单。web
zookeeper集群sql
zk1:2181 zk2:2181 zk3:2181
spark mastershell
spark-m1 spark-m2
spark workerbash
若干
1.进入spark-m1
修改conf/spark-env.sh
markdown
vi spark-env.sh
export SPARK_MASTER_IP=spark-m1
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181,zk3:2181 -Dspark.deploy.zookeeper.dir=/spark"
启动master和slavesapp
./sbin/start-master.sh ./sbin/start-slaves.sh
2.进入spark-m2ssh
修改conf/spark-env.sh
测试
vi spark-env.sh
export SPARK_MASTER_IP=spark-m2
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181,zk3:2181 -Dspark.deploy.zookeeper.dir=/spark"
启动master和slavesui
./sbin/start-master.sh ./sbin/start-slaves.sh
在spark-m1的web ui中能够看到状态
url
spark-m2中能够看处处于STANDBY状态
application提交时,master改成
--master spark://spark-m1:7077,spark-m2:7077
在spark-m1中启动spark Shell
spark-shell --master spark://spark-m1:7077,spark-m2:7077
链接后关闭spark-m1 master
./bin/stop-master.sh
发现spark-shell不会断开而是转到spark-m2的master上继续执行(该过程持续大概1分钟,woker会从新注册到spark-m2上),同时spark-m2变为alive状态。
能够在spark-m2的master日志中看到:
15/08/17 14:45:35 INFO ZooKeeperLeaderElectionAgent: We have gained leadership
15/08/17 14:45:36 INFO Master: I have been elected leader! New state: RECOVERING
15/08/17 14:45:36 INFO Master: Trying to recover worker:...
15/08/17 14:45:36 INFO Master: Trying to recover worker: ...
15/08/17 14:45:36 INFO Master: Trying to recover worker: ...
......
15/08/17 14:45:36 INFO Master: Worker has been re-registered: worker-...
15/08/17 14:45:36 INFO Master: Worker has been re-registered: worker-...
15/08/17 14:45:36 INFO Master: Worker has been re-registered: worker-...
...
15/08/17 14:45:36 INFO Master: Recovery complete - resuming operations!
部署结束