在非HA模式中,能保证NameNode中的元数据可靠性,可是当NameNode宕机后,没法保证对外界提供服务,即没法保证服务的可用性。因此在Hadoop2.x中提供了HA部署模式解决这个问题。
分析几个须要面临的问题:node
思路:应当让NameNode集群中,同一时间只有一个active状态节点响应客户端请求,其余node为standby状态。每一个集群能够有一个active,一个standby,一个集群成为一个Federal,能够扩展多个Federal集群提供服务,这样客户端在访问的时候,配置文件中就不能配置为NameNode的host,应该是Federal的nameservice,如ns1,ns2等等,ns下面分别有两个节点nn1, nn2shell
HOST | IP | SOFTS | PROCESS |
---|---|---|---|
hdcluster01 | 10.211.55.22 | jdk, hadoop | NameNode, DFSZKFailoverController(zkfc) |
hdcluster02 | 10.211.55.23 | jdk, hadoop | NameNode, DFSZKFailoverController(zkfc) |
hdcluster03 | 10.211.55.27 | jdk, hadoop | ResourceManager |
hdcluster04 | 10.211.55.28 | jdk, hadoop | ResourceManager |
zk01 | 10.211.55.24 | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
zk02 | 10.211.55.25 | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
zk03 | 10.211.55.26 | jdk, hadoop, zookeeper | DataNode, NodeManager, JournalNode, QuorumPeerMain |
配置hdcluster01
1)配置hadoop-env.shapache
export JAVA_HOME=/home/parallels/app/jdk1.7.0_65
2)配置core-site.xmlbootstrap
<configuration> <!-- 指定hdfs的nameservice为ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1/</value> </property> <!-- 指定hadoop临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/home/parallels/app/hadoop-2.4.1/data/</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>zk01:2181,zk02:2181,zk03:2181</value> </property> </configuration>
3)配置hdfs-site.xml网络
<configuration> <!--指定hdfs的nameservice为ns1,须要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <!-- ns1下面有两个NameNode,分别是nn1,nn2 --> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通讯地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>hdcluster01:9000</value> </property> <!-- nn1的http通讯地址 --> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>hdcluster01:50070</value> </property> <!-- nn2的RPC通讯地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>hdcluster02:9000</value> </property> <!-- nn2的http通讯地址 --> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>hdcluster02:50070</value> </property> <!-- 指定NameNode的元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://zk01:8485;zk02:8485;zk03:8485/ns1</value> </property> <!-- 指定JournalNode在本地磁盘存放数据的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/parallels/app/hadoop-2.4.1/journaldata</value> </property> <!-- 开启NameNode失败自动切换 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失败自动切换实现方式 --> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制方法,多个机制用换行分割,即每一个机制暂用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔离机制时须要ssh免登录 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/parallels/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔离机制超时时间 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <!-- DataNode进程死亡或者网络故障形成DataNode没法与NameNode通讯,NameNode不会 当即把该节点断定为死亡,要通过一段超时时间。HDFS默认的超时时间是10分钟+30秒,若是定 义超时时间为timeout,则其计算公式为: timeout = 2 * heartbeat.recheck.interval + 10 * dfs.heartbeat.interval --> <property> <name>heartbeat.recheck.interval</name> <!-- 单位:毫秒 --> <value>2000</value> </property> <property> <name>dfs.heartbeat.interval</name> <!-- 单位:秒 --> <value>1</value> </property> <!-- 在平常维护hadoop集群过程当中会发现这样一种现象:某个节点因为网络故障或者 DataNode进程死亡,被NameNode断定为死亡,HDFS立刻自动开始数据块的容错拷贝, 当该节点从新加入到集群中,因为该节点的数据并无损坏,致使集群中某些block的 备份数超过了设定数值。默认状况下要通过1个小时的时间才会对这些冗余block进行清理。 而这个时长与数据块报告时间有关。DataNode会按期将该节点上的全部block信息报告给 NameNode,默认间隔1小时。下面的参数能够修改报告时间 --> <property> <name>dfs.blockreport.intervalMsec</name> <value>10000</value> <description>Determines block reporting interval in milliseconds.</description> </property> </configuration>
4)配置mapred-site.xmlapp
<configuration> <!-- 指定mr框架为yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
5)配置yarn-site.xml框架
<configuration> <!-- 开启RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分别指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hdcluster03</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hdcluster04</value> </property> <!-- 指定zk集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>zk01:2181,zk02:2181,zk03:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
6)修改slaves文件less
[parallels@hdcluster01 hadoop]$ less slaves zk01 zk02 zk03
slaves是指定子节点的位置,由于要在hdcluster01上启动HDFS、在hdcluster03启动yarn,因此hdcluster01上的slaves文件指定的是datanode的位置,hdcluster03上的slaves文件指定的是nodemanager的位置
7)将配置完成的hadoop安装目录scp到其余六台节点
8)配置免密码ssh登陆
hdcluster01须要免密登陆到hdcluster02和zk01, zk02, zk03,首先生成密钥ssh
ssh-keygen -t rsa
再将公钥拷贝到上述节点分布式
ssh-copy-id hdcluster01 ssh-copy-id hdcluster02 ssh-copy-id zk01 ssh-copy-id zk02 ssh-copy-id zk03
hdcluster03须要启动ResourceManager,须要免密登陆到DataNode节点,同理配置hdcluster03到zk01, zk02, zk03的ssh登陆密钥
9)启动zookeeper集群(以zk01为例分别启动zk01, zk02, zk03)
[parallels@zk01 bin]$ ./zkServer.sh start JMX enabled by default Using config: /home/parallels/app/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
查看状态
[parallels@zk01 bin]$ ./zkServer.sh status JMX enabled by default Using config: /home/parallels/app/zookeeper-3.4.5/bin/../conf/zoo.cfg Mode: leader
10)分别在zk01, zk02, zk03上启动journalnode
[parallels@zk01 sbin]$ ./hadoop-daemon.sh start journalnode starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk01.out [parallels@zk01 sbin]$ jps 11528 Jps 8530 QuorumPeerMain 11465 JournalNode [parallels@zk01 sbin]$ pwd /home/parallels/app/hadoop-2.4.1/sbin
11)格式化HDFS(hdcluster01)
hdfs namenode -format
要保证fsimage的初始数据一致性,能够直接手动拷贝在core-site.xml中所配置的hadoop.tmp.dir目录到hdcluster02中,或者在hdcluster01的hdfs启动后,在hdcluster02执行命令:
hdfs namenode -bootstrapStandb
不过这样会致使hdcluster02上的namenode中止,须要从新启动。
12)格式化ZKFC(hdcluster01)
hdfs zkfc -formatZK
此时查看zookeeper集群数据,能够看到:
[zk: localhost:2181(CONNECTED) 0] ls / [hadoop-ha, zkData, zookeeper] [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha [ns1] [zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha/ns1 [] [zk: localhost:2181(CONNECTED) 3] get /hadoop-ha/ns1 cZxid = 0x300000003 ctime = Tue Oct 02 14:32:43 CST 2018 mZxid = 0x300000003 mtime = Tue Oct 02 14:32:43 CST 2018 pZxid = 0x30000000e cversion = 4 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2
13)启动HDFS(hdcluster01)
[parallels@hdcluster01 sbin]$ start-dfs.sh Starting namenodes on [hdcluster01 hdcluster02] hdcluster01: starting namenode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-namenode-hdcluster01.out hdcluster02: starting namenode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-namenode-hdcluster02.out zk01: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk01.out zk03: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk03.out zk02: starting datanode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-datanode-zk02.out Starting journal nodes [zk01 zk02 zk03] zk03: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk03.out zk01: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk01.out zk02: starting journalnode, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-journalnode-zk02.out Starting ZK Failover Controllers on NN hosts [hdcluster01 hdcluster02] hdcluster01: starting zkfc, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-zkfc-hdcluster01.out hdcluster02: starting zkfc, logging to /home/parallels/app/hadoop-2.4.1/logs/hadoop-parallels-zkfc-hdcluster02.out
14)启动yarn(hdcluster03和hdcluster04)
[parallels@hdcluster03 sbin]$ ./start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-resourcemanager-hdcluster03.out zk03: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk03.out zk01: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk01.out zk02: starting nodemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-nodemanager-zk02.out [parallels@hdcluster03 sbin]$ jps 29964 Jps 28916 ResourceManager
[parallels@hdcluster04 sbin]$ yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/parallels/app/hadoop-2.4.1/logs/yarn-parallels-resourcemanager-hdcluster04.out [parallels@hdcluster04 sbin]$ jps 29404 ResourceManager 29455 Jps
至此Hadoop集群的HA模式部署完成。
[parallels@hdcluster01 bin]$ hdfs dfsadmin -report Configured Capacity: 64418205696 (59.99 GB) Present Capacity: 60574326784 (56.41 GB) DFS Remaining: 60140142592 (56.01 GB) DFS Used: 434184192 (414.07 MB) DFS Used%: 0.72% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (3 total, 0 dead) Live datanodes: Name: 10.211.55.24:50010 (zk01) Hostname: zk01 Decommission Status : Normal Configured Capacity: 21472735232 (20.00 GB) DFS Used: 144728064 (138.02 MB) Non DFS Used: 1281323008 (1.19 GB) DFS Remaining: 20046684160 (18.67 GB) DFS Used%: 0.67% DFS Remaining%: 93.36% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Tue Oct 02 15:41:47 CST 2018 Name: 10.211.55.26:50010 (zk03) Hostname: zk03 Decommission Status : Normal Configured Capacity: 21472735232 (20.00 GB) DFS Used: 144728064 (138.02 MB) Non DFS Used: 1281269760 (1.19 GB) DFS Remaining: 20046737408 (18.67 GB) DFS Used%: 0.67% DFS Remaining%: 93.36% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Tue Oct 02 15:41:47 CST 2018 Name: 10.211.55.25:50010 (zk02) Hostname: zk02 Decommission Status : Normal Configured Capacity: 21472735232 (20.00 GB) DFS Used: 144728064 (138.02 MB) Non DFS Used: 1281286144 (1.19 GB) DFS Remaining: 20046721024 (18.67 GB) DFS Used%: 0.67% DFS Remaining%: 93.36% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Tue Oct 02 15:41:47 CST 2018
[parallels@hdcluster01 bin]$ hdfs haadmin -getServiceState nn1 standby [parallels@hdcluster01 bin]$ hdfs haadmin -getServiceState nn2 active
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start zkfc