Hadoop HA重作 Standby

时间 2019-11-09

原文原文链接

错误现象，刚开始 namenode log一直刷如下错误信息：html

2014-01-27 17:55:59,388 WARN resources.ExceptionHandler (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERRORjava

后面与此文相似，见 Hadoop运维笔记之 Namenode异常中止后没法正常启动。node

同系 Hadoop-2.10-beta 版本的 bug（testNamenodeRestart fails with NullPointerException in trunk），linux

This is actually due to a bug in the NN. The http services are started before the image is loaded, the edits are processed, and the rpc server is started. During image loading and edits processing, webhdfs will NPE on the rpc server.web

无发启动，只好重作 Standby，具体步骤以下：sql

一、首先在 Active 上执行如下命令，而后手动备份整个 name目录：apache

# 关闭 故障自动切换控制器
hadoop-daemon.sh stop zkfc

# 进入安全模式
hdfs dfsadmin -safemode enter

# 刷新editslog 到fsimage
hdfs dfsadmin -saveNamespace

二、而后在 Standby 上，先备份整个 name 及 journal 目录，再执行：bootstrap

hadoop-daemon.sh stop zkfc
hdfs namenode -bootstrapStandby

若报错：安全

FATAL ha.BootstrapStandby: Unable to read transaction ids 10-100 from the configured shared edits storage qjournal://1.1.1.1:8485;1.1.1.2:8485/sec-hdfs-cluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node.
Error: Gap in transactions. Expected to be able to read up until at least txid 10 but unable to find any edit logs containing txid 10bash

则将 Active 上整个 name目录复制到 Standby，而后直接启动namenode便可：

scp -r /data/hadoop/name/ $standby_ip:/data/hadoop
hadoop-daemon.sh start namenode

三、注意，此时无需执行 “bootstrapStandby”，不然会将刚刚复制过来的 name 目录重建清空。

参考：