错误现象,刚开始 namenode log一直刷如下错误信息:html
2014-01-27 17:55:59,388 WARN resources.ExceptionHandler (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERRORjava
后面与此文相似,见 Hadoop运维笔记 之 Namenode异常中止后没法正常启动。node
同系 Hadoop-2.10-beta 版本的 bug(testNamenodeRestart fails with NullPointerException in trunk),linux
This is actually due to a bug in the NN. The http services are started before the image is loaded, the edits are processed, and the rpc server is started. During image loading and edits processing, webhdfs will NPE on the rpc server.web
无发启动,只好重作 Standby,具体步骤以下:sql
一、首先在 Active 上执行如下命令,而后手动备份整个 name目录:apache
# 关闭 故障自动切换控制器 hadoop-daemon.sh stop zkfc # 进入安全模式 hdfs dfsadmin -safemode enter # 刷新editslog 到fsimage hdfs dfsadmin -saveNamespace
二、而后在 Standby 上,先备份整个 name 及 journal 目录,再执行:bootstrap
hadoop-daemon.sh stop zkfc hdfs namenode -bootstrapStandby
若报错:安全
FATAL ha.BootstrapStandby: Unable to read transaction ids 10-100 from the configured shared edits storage qjournal://1.1.1.1:8485;1.1.1.2:8485/sec-hdfs-cluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node.
Error: Gap in transactions. Expected to be able to read up until at least txid 10 but unable to find any edit logs containing txid 10bash
则将 Active 上整个 name目录复制到 Standby,而后直接启动namenode便可:
scp -r /data/hadoop/name/ $standby_ip:/data/hadoop hadoop-daemon.sh start namenode
三、注意,此时无需执行 “bootstrapStandby”,不然会将刚刚复制过来的 name 目录重建清空。
参考: