我学习过程是一块一块深刻的,在把hdfs基本弄懂以及HA成功的状况开始尝试搭建yarn的,建议在搭建前先去看一下转载的原理篇,懂了原理后搭建会很快的,再次强调一下hdfs我默认已经搭建成功了node
rtest-mysql-01: 主NN,ResourceManager 运行进程(NameNode,ResourceManager,DFSZKFailoverController)mysql
主要运行nn和ResourceManager。
web
rtest-mysql-02:备NN,ResourceManager 运行程序(NameNode,DFSZKFailoverController,DataNode,ResourceManager,JournalNode,NodeManager,zookeeper)sql
备nn和ResourceManager,同时和03,04搭建了zookeeper集群和journalnode集群。
express
rtest-mysql-03:运行程序(DataNode,JournalNode,NodeManager,zookeeper)apache
数据节点
centos
rtest-mysql-04:运行程序(DataNode,JournalNode,NodeManager,zookeeper)app
数据节点
less
在hadoop2.X中一般由两个NameNode组成,一个处于active状态,另外一个处于standby状态。Active NameNode对外提供服务,而Standby NameNode则不对外提供服务,仅同步active namenode的状态,以便可以在它失败时快速进行切换。hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另外一种是QJM(由cloudra提出,原理相似zookeeper)。这里我使用QJM完成。主备NameNode之间经过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。一般配置奇数个JournalNodewebapp
hdfs的ha已经成功,由于yarn的ha也是须要zkfc的,若是zkfc不没有成功,天然yarn切换也没有成功了。
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn. </description> </property> <!-- jobhistory properties --> <property> <name>mapreduce.jobhistory.address</name> <value>0.0.0.0:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>0.0.0.0:19888</value> <description>MapReduce JobHistory Server Web UI host:port</description> </property> </configuration>
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <!-- Resource Manager Configs --> <!--rm失联后从新连接的时间--> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <!--开启resourcemanagerHA,默认为false--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--配置resourcemanager--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>rtest-mysql-02:2181,rtest-mysql-03:2181,rtest-mysql-04:2181</value> </property> <!--开启故障自动切换--> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>rtest-mysql-01</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>rtest-mysql-01</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>rtest-mysql-02</value> </property> <!-- 注意:通常都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另外一个机器上必定要修改 --> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> <description>If we want to launch more than one RM in single node,we need this configuration</description> </property> <!--开启自动恢复功能--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--配置与zookeeper的链接地址--> <property> <name>yarn.resourcemanager.zk-state-store.address</name> <value>rtest-mysql-02:2181,rtest-mysql-03:2181,rtest-mysql-04:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>rtest-mysql-02:2181,rtest-mysql-03:2181,rtest-mysql-04:2181</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <!--schelduler失联等待链接时间--> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property> <!--注意:通常都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另外一个机器上必定要修改--> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address.rm1</name> <value>rtest-mysql-01:8032</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>rtest-mysql-01:8030</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>rtest-mysql-01:8088</value> </property> <property> <description>The https adddress of the RM web application.</description> <name>yarn.resourcemanager.webapp.https.address.rm1</name> <value>rtest-mysql-01:8090</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>rtest-mysql-01:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address.rm1</name> <value>rtest-mysql-01:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>rtest-mysql-01:23142</value> </property> <!--*******************************************************--> <property> <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <description>The class to use as the resource scheduler.</description> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/home/biedong/hadoop/yarn/log</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>23080</value> </property> <property> <description>fair-scheduler conf location</description> <name>yarn.scheduler.fair.allocation.file</name> <value>/home/biedong/hadoop-2.7.0/etc/hadoop/fairscheduler.xml</value> </property> <property> <description>List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this. </description> <name>yarn.nodemanager.local-dirs</name> <value>/home/biedong/hadoop/yarn/local</value> </property> <property> <description>Whether to enable log aggregation</description> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> <property> <description>Amount of physical memory, in MB, that can be allocated for containers.</description> <name>yarn.nodemanager.resource.memory-mb</name> <value>8192</value> </property> <property> <description>Number of CPU cores that can be allocated for containers.</description> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> </property> <!--故障处理类--> <property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name> <value>/yarn-leader-election</value> <description>Optionalsetting.Thedefaultvalueis/yarn-leader-election</description> </property> </configuration>
你可使用如下命令分别启动ResourceManager和NodeManager:
sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager(若是有多个datanode,需使用yarn-daemons.sh)
或者一次启动过:sbin/start-yarn.sh
输入命令验证主备关系:
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2
在网页输入:http:rtest-mysql-02:8088,是否能打开这个页面
访问rm1节点的nodemanager会提示
This is standby RM. Redirecting to the current active RM: http://rtest-mysql-01:8088/cluster/apps
下面KILL掉rm2的resourcemanager
再次验证主备关系:
执行以下:bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar pi 20 10
若是程序没有明显报错,证实安装成功!