配置 hadoop+yarn+hbase+storm+kafka+spark+zookeeper 高可用集群,同时安装相关组建:JDK,MySQL,Hive,Flumehtml
虚拟机数量:8 台java
操做系统版本:CentOS-7-x86_64-Minimal-1611.isonode
每台虚拟机的配置以下:mysql
虚拟机名称 | CPU核心数 | 内存(G) | 硬盘(G) | 网卡 |
---|---|---|---|---|
hadoop1 | 2 | 8 | 100 | 2 |
hadoop2 | 2 | 8 | 100 | 2 |
hadoop3 | 2 | 8 | 100 | 2 |
hadoop4 | 2 | 8 | 100 | 2 |
hadoop5 | 2 | 8 | 100 | 2 |
hadoop6 | 2 | 8 | 100 | 2 |
hadoop7 | 2 | 8 | 100 | 2 |
hadoop8 | 2 | 8 | 100 | 2 |
8节点Hadoop+Yarn+Spark+Hbase+Kafka+Storm+ZooKeeper高可用集群搭建:linux
集群 | 虚拟机节点 |
---|---|
HadoopHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
YarnHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
ZooKeeper集群 | hadoop3,hadoop4,hadoop5 |
Hbase集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
Kafka集群 | hadoop6,hadoop7,hadoop8 |
Storm集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
SparkHA集群 | hadooop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
集群详细规划:git
虚拟机名称 | IP | 安装软件 | 进程 | 功能 |
---|---|---|---|---|
hadoop1 | 59.68.29.79 | jdk,hadoop,mysql | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),master(spark) | hadoop的NameNode节点,spark的master节点,yarn的ResourceManeger节点 |
hadoop2 | 10.230.203.11 | jdk,hadoop,spark | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),worker(spark) | hadoop(yarn)的容灾节点,spark的容灾节点 |
hadoop3 | 10.230.203.12 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HMaster,…(storm),worker(spark) | storm,hbase,zookeeper的主节点 |
hadoop4 | 10.230.203.13 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop5 | 10.230.203.14 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop6 | 10.230.203.15 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | kafka的主节点 |
hadoop7 | 10.230.203.16 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | |
hadoop8 | 10.230.203.17 | jdk,hadoop,kafka,spark | DataNode,NodeManager,journalnode,kafka,worker(spark) |
JDK版本: jdk-8u65-linux-x64.tar.gzweb
hadoop版本: hadoop-2.7.6.tar.gzsql
zookeeper版本: zookeeper-3.4.12.tar.gzshell
hbase版本: hbase-1.2.6-bin.tar.gz数据库
Storm版本: apache-storm-1.1.3.tar.gz
kafka版本: kafka_2.11-2.0.0.tgz
MySQL版本: mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz
hive版本: apache-hive-2.3.3-bin.tar.gz
Flume版本: apache-flume-1.8.0-bin.tar.gz
Spark版本: spark-2.3.1-bin-hadoop2.7.tgz
每台主机节点都进行相同设置
千万注意:不要在root权限下配置集群
$> groupadd centos
$> useradd centos -g centos
$> passwd centos
$> nano /etc/sudoers 添加以下语句: ## Allow root to run any commands anywhere root ALL=(ALL) ALL centos ALL=(ALL) ALL
$> sudo nano /etc/hostname 用户名:hadoop1,hadoop2.....
$> sudo nano /etc/hosts 添加内容以下: 127.0.0.1 localhost 59.68.29.79 hadoop1 10.230.203.11 hadoop2 10.230.203.12 hadoop3 10.230.203.13 hadoop4 10.230.203.14 hadoop5 10.230.203.15 hadoop6 10.230.203.16 hadoop7 10.230.203.17 hadoop8
命令:pwd。形如 ~ 转换为 /home/centos。方便肯定当前文件的路径
[centos@hadoop1 ~]$ sudo nano /etc/profile 在末尾添加: export PS1='[\u@\h `pwd`]\$' // source /etc/profile 立刻生效 [centos@hadoop1 /home/centos]$
hadoop1 和 hadoop2 是容灾节点(解决单点故障问题),因此这两个主机除了能互相访问以外,还须要登陆其余主机节点,能够免密登陆
[centos@hadoop1 /home/centos]$ yum list installed | grep ssh
[centos@hadoop1 /home/centos]$ ps -Af | grep sshd
[centos@hadoop1 /home/centos]$ mkdir .ssh [centos@hadoop1 /home/centos]$ chmod 700 ~/.ssh
//生成秘钥对 [centos@hadoop1 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa //进入 ~/.ssh 文件夹下 [centos@hadoop1 /home/centos]$ cd ~/.ssh //追加公钥到~/.ssh/authorized_keys文件中 [centos@hadoop1 /home/centos/.ssh]$ cat id_rsa.pub >> authorized_keys // 修改authorized_keys文件的权限为644 [centos@hadoop1 /home/centos/.ssh]$ chmod 644 authorized_keys
//重名名 [centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop1.pub [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop2:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop3:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop4:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop5:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop6:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop7:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop8:/home/centos/.ssh/authorized_keys
//生成秘钥对 [centos@hadoop2 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa //重名名 [centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop2.pub //追加公钥到~/.ssh/authorized_keys文件中 [centos@hadoop1 /home/centos/.ssh]$ cat id_rsa_hadoop2.pub >> authorized_keys //将authorized_keys分发给其余节点 [centos@hadoop1 /home/centos/.ssh]$ scp authorized_keys centos@hadoop:/home/centos/.ssh/ ... 分发给其余节点
为了保证集群正常启动,先要关闭各台主机的防火墙,一些命令以下:
[cnetos 6.5以前的版本] $>sudo service firewalld stop //中止服务 $>sudo service firewalld start //启动服务 $>sudo service firewalld status //查看状态 [centos7] $>sudo systemctl enable firewalld.service //"开机启动"启用 $>sudo systemctl disable firewalld.service //"开机自启"禁用 $>sudo systemctl start firewalld.service //启动防火墙 $>sudo systemctl stop firewalld.service //中止防火墙 $>sudo systemctl status firewalld.service //查看防火墙状态 [开机自启] $>sudo chkconfig firewalld on //"开启自启"启用 $>sudo chkconfig firewalld off //"开启自启"禁用
提示:为了全局可用,脚本都放在 /usr/local/bin 目录下。只在hadoop1和hadoop2节点配置
//以本地用户身份建立xcall.sh $>touch ~/xcall.sh //centos //将其复制到 /usr/local/bin 目录下 $>sudo mv xcall.sh /usr/local/bin //修改权限 $>sudo chmod a+x xcall.sh //添加脚本 $>sudo nano xcall.sh
#!/bin/bash params=$@ i=1 for (( i=1 ; i <= 8 ; i = $i + 1 )) ; do echo ============= s$i $params ============= ssh hadoop$i "$params" done
#!/bin/bash if [[ $# -lt 1 ]] ; then echo no params ; exit ; fi p=$1 #echo p=$p dir=`dirname $p` #echo dir=$dir filename=`basename $p` #echo filename=$filename cd $dir fullpath=`pwd -P .` #echo fullpath=$fullpath user=`whoami` for (( i = 1 ; i <= 8 ; i = $i + 1 )) ; do echo ======= hadoop$i ======= rsync -lr $p ${user}@hadoop$i:$fullpath done ;
准备JDK:jdk-8u65-linux-x64.tar.gz,将其上传到主机hadoop1的 /home/centos/localsoft 目录下,该目录用于存放全部须要安装的软件安装包
在根目录下(/)新建一个 soft 文件夹,并将该文件夹的用户组权限和用户权限改成 centos,该文件夹下为全部须要安装的软件
//建立soft文件夹 [centos@hadoop1 /home/centos]$ sudo mkdir /soft //修改权限(centosmin0是本身的本机用户名) [centos@hadoop1 /home/centos]$ sudo chown centos:centos /soft
// 从 /home/centos/localsoft 下解压到 /soft [centos@hadoop1 /home/centos/localsoft]$ tar -xzvf jdk-8u65-linux-x64.tar.gz -C /soft // 建立符号连接 [centos@hadoop1 /soft]$ ln -s /soft/jdk1.8.0_65 jdk
// 进入profile [centos@hadoop1 /home/centos]$ sudo nano /etc/profile // 环境变量 # jdk export JAVA_HOME=/soft/jdk export PATH=$PATH:$JAVA_HOME/bin // source 当即生效 [centos@hadoop1 /home/centos]$ source /etc/profile
[centos@hadoop1 /home/centos]$ java -version // 显示以下 java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
// 从 /home/centos/localsoft 下解压到 /soft [centos@hadoop1 /home/centos/localsoft]$ tar -xzvf hadoop-2.7.6.tar.gz -C /soft // 建立符号连接 [centos@hadoop1 /soft]$ ln -s /soft/hadoop-2.7.6 hadoop
// 进入profile [centos@hadoop1 /home/centos]$ sudo nano /etc/profile // 环境变量 # hadoop export HADOOP_HOME=/soft/hadoop export PATH=$PATH:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin // source 当即生效 [centos@hadoop1 /home/centos]$ source /etc/profilea // 检测是否安装成功 [centos@hadoop1 /home/centos]$ hadoop version 显示以下: Hadoop 2.7.6 Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r 085099c66cf28be31604560c376fa282e69282b8 Compiled by kshvachk on 2018-04-18T01:33Z Compiled with protoc 2.5.0 From source with checksum 71e2695531cb3360ab74598755d036 This command was run using /soft/hadoop-2.7.6/share/hadoop/common/hadoop-common-2.7.6.jar
提示: 如今的操做在hadoop1节点上,先不用在其余节点进行安装配置,等后续配置结束后再一块儿将配置传给其余节点,能大大节省工做量。
基于hadoop的原生NameNode HA搭建,后面会与zookeeper集群进行整合,实现自动容灾(Yarn+NameNode)
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop ha [centos@hadoop1 /soft/hadoop/etc]$ cp hadoop full [centos@hadoop1 /soft/hadoop/etc]$ cp hadoop pesudo // 建立符号连接 [centos@hadoop1 /soft/hadoop/etc]$ ln -s /soft/hadoop/etc/ha hadoop
[core-site.xml]
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <!--- 配置新的本地目录 --> <property> <name>hadoop.tmp.dir</name> <value>/home/centos/hadoop</value> </property> <property> <name>ipc.client.connect.max.retries</name> <value>20</value> </property> <property> <name>ipc.client.connect.retry.interval</name> <value>5000</value> </property> </configuration>
[hdfs-site.xml]
<configuration> <!-- 配置nameservice --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- myucluster下的名称节点两个id --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!-- 配置每一个nn的rpc地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop2:8020</value> </property> <!-- 配置webui端口 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop2:50070</value> </property> <!-- 名称节点共享编辑目录 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop3:8485;hadoop4:8485;hadoop5:8485;hadoop6:8485;hadoop7:8485;hadoop8:8485/mycluster</value> </property> <!-- java类,client使用它判断哪一个节点是激活态 --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!