Hadoop-Yarn-HA集群搭建（搭建篇）

时间 2019-11-19

标签 hadoop yarn 集群搭建栏目 Hadoop 繁體版

原文原文链接

1.前提条件

我学习过程是一块一块深刻的，在把hdfs基本弄懂以及HA成功的状况开始尝试搭建yarn的，建议在搭建前先去看一下转载的原理篇，懂了原理后搭建会很快的，再次强调一下hdfs我默认已经搭建成功了node

2.搭建环境准备

1，主机环境：4台centos机器。

　　　　rtest-mysql-01：主NN，ResourceManager 运行进程（NameNode，ResourceManager，DFSZKFailoverController）mysql

　　　　　　主要运行nn和ResourceManager。
web

　　　　rtest-mysql-02：备NN，ResourceManager 运行程序（NameNode,DFSZKFailoverController,DataNode,ResourceManager,JournalNode,NodeManager，zookeeper）sql

　　　　　　备nn和ResourceManager，同时和03,04搭建了zookeeper集群和journalnode集群。
express

　　　　 rtest-mysql-03：运行程序（DataNode,JournalNode,NodeManager，zookeeper）apache

　　　　　　数据节点
centos

　　　　 rtest-mysql-04：运行程序（DataNode,JournalNode,NodeManager，zookeeper）app

　　数据节点
less

2，各主机详解

　　　　在hadoop2.X中一般由两个NameNode组成，一个处于active状态，另外一个处于standby状态。Active NameNode对外提供服务，而Standby NameNode则不对外提供服务，仅同步active namenode的状态，以便可以在它失败时快速进行切换。hadoop2.0官方提供了两种HDFS HA的解决方案，一种是NFS，另外一种是QJM（由cloudra提出，原理相似zookeeper）。这里我使用QJM完成。主备NameNode之间经过一组JournalNode同步元数据信息，一条数据只要成功写入多数JournalNode即认为写入成功。一般配置奇数个JournalNodewebapp

3.搭建过程

1.前提在强调一下

hdfs的ha已经成功，由于yarn的ha也是须要zkfc的，若是zkfc不没有成功，天然yarn切换也没有成功了。

2.mapred-site.xml

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <description>The runtime framework for executing MapReduce jobs.
                        Can be one of local, classic or yarn.
                </description>
        </property>

<!-- jobhistory properties -->
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>0.0.0.0:10020</value>
                <description>MapReduce JobHistory Server IPC host:port</description>
        </property>

        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>0.0.0.0:19888</value>
                <description>MapReduce JobHistory Server Web UI host:port</description>
        </property>

</configuration>

3.yarn-site.xml 主要的配置文件

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<!-- Resource Manager Configs -->

<!--rm失联后从新连接的时间-->
    <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
    </property>

<!--开启resourcemanagerHA,默认为false-->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

<!--配置resourcemanager-->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        <value>rtest-mysql-02:2181,rtest-mysql-03:2181,rtest-mysql-04:2181</value>
    </property>

<!--开启故障自动切换-->
    <property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    
  　<property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>rtest-mysql-01</value>
  　</property> 
  
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>rtest-mysql-01</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>rtest-mysql-02</value>
    </property>
    
    <!--
    注意：通常都喜欢把配置好的文件远程复制到其它机器上，但这个在YARN的另外一个机器上必定要修改
    -->
    <property>
        <name>yarn.resourcemanager.ha.id</name>
        <value>rm1</value>
        <description>If we want to launch more than one RM in single node,we need this configuration</description>
    </property>

<!--开启自动恢复功能-->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

<!--配置与zookeeper的链接地址-->
    <property>
        <name>yarn.resourcemanager.zk-state-store.address</name>
        <value>rtest-mysql-02:2181,rtest-mysql-03:2181,rtest-mysql-04:2181</value>
    </property>

    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>

    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>rtest-mysql-02:2181,rtest-mysql-03:2181,rtest-mysql-04:2181</value>
    </property>

    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn-cluster</value>
    </property>
    
    <!--schelduler失联等待链接时间-->
    <property>
        <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
        <value>5000</value>
    </property>
   
    <!--注意：通常都喜欢把配置好的文件远程复制到其它机器上，但这个在YARN的另外一个机器上必定要修改-->
  　<property>
        <description>The address of the applications manager interface in the RM.</description>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>rtest-mysql-01:8032</value>
  　</property>

  　<property>
        <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>rtest-mysql-01:8030</value>
  　</property>

  　<property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>rtest-mysql-01:8088</value>
  　</property>

  　<property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address.rm1</name>
        <value>rtest-mysql-01:8090</value>
  　</property>

  　<property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>rtest-mysql-01:8031</value>
  　</property>

  　<property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>rtest-mysql-01:8033</value>
  　</property>
  
    <property>
        <name>yarn.resourcemanager.ha.admin.address.rm1</name>
        <value>rtest-mysql-01:23142</value>
    </property>

    <!--*******************************************************-->
    
    <property>
        <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
  　</property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    
  　<property>
        <description>The class to use as the resource scheduler.</description>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  　</property>
  
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/home/biedong/hadoop/yarn/log</value>
    </property>

    <property>
        <name>mapreduce.shuffle.port</name>
        <value>23080</value>
    </property>

  　<property>
        <description>fair-scheduler conf location</description>
        <name>yarn.scheduler.fair.allocation.file</name>
        <value>/home/biedong/hadoop-2.7.0/etc/hadoop/fairscheduler.xml</value>
  　</property>

  　<property>
        <description>List of directories to store localized files in. An 
              application's localized file directory will be found in:
              ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
              Individual containers' work directories, called container_${contid}, will
              be subdirectories of this.
       　　</description>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/biedong/hadoop/yarn/local</value>
  　</property>

  　<property>
        <description>Whether to enable log aggregation</description>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
  　</property>

  　<property>
        <description>Where to aggregate logs to.</description>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
  　</property>

  　<property>
        <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>8192</value>
  　</property>

  　<property>
        <description>Number of CPU cores that can be allocated for containers.</description>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>4</value>
  　</property>
  
  <!--故障处理类-->
    <property>
        <name>yarn.client.failover-proxy-provider</name>
        <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
        <value>/yarn-leader-election</value>
        <description>Optionalsetting.Thedefaultvalueis/yarn-leader-election</description>
    </property>

</configuration>

详细配置

4.启动YARN

你可使用如下命令分别启动ResourceManager和NodeManager：
　　sbin/yarn-daemon.sh start resourcemanager
　　sbin/yarn-daemon.sh start nodemanager（若是有多个datanode，需使用yarn-daemons.sh）
　　或者一次启动过：sbin/start-yarn.sh

5.验证是否成功

输入命令验证主备关系：

　　yarn rmadmin -getServiceState rm1

　　yarn rmadmin -getServiceState rm2

在网页输入：http:rtest-mysql-02:8088，是否能打开这个页面

6.主备自动切换验证

访问rm1节点的nodemanager会提示
This is standby RM. Redirecting to the current active RM: http://rtest-mysql-01:8088/cluster/apps
下面KILL掉rm2的resourcemanager

再次验证主备关系：

7.运行应用程序

执行以下：bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar pi 20 10

若是程序没有明显报错，证实安装成功！