操做系统:CentOS7.2(1511)html
Java JDK版本:jdk-8u65-linux-x64.tar.gzjava
Hadoop版本:hadoop-2.8.3.tar.gznode
下载地址:linux
连接:https://pan.baidu.com/s/1iQfjO-d2ojA6mAeOOKb6CA 提取码:l0qp
node1 | node2 | node3 |
---|---|---|
NameNode | ResourceManage | |
DataNode | DataNode | DataNode |
NodeManager | NodeManager | NodeManager |
HistoryServer | SecondaryNameNode |
三始主机分别命名为:node1,node2,node3,IP地址和主机名称对应关系以下:web
序号 | 主机名 | IP地址 | 备注 |
---|---|---|---|
1 | node1 | 192.168.100.11 | 主节点 |
2 | node2 | 192.168.100.12 | 从节点 |
3 | node3 | 192.168.100.13 | 从节点 |
在三个节点上分别执行修改主机名的命令:shell
node1:bash
[root@localhost ~]# hostnamectl set-hostname node1
node2:服务器
[root@localhost ~]# hostnamectl set-hostname node2
node3:app
[root@localhost ~]# hostnamectl set-hostname node3
按ctrl+d快捷键或输入exit,退出终端,从新登陆后,查看主机名,以下图所示:ssh
以node1节点为例,在三个节点执行修改IP地址的操做(注意网卡名称因机器的不一样可能不同,例如,node1的网卡名为:eno16777736):
[root@node1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eno16777736
将node1,node2,node3节点的IP地址分别设置为:192.168.100.11,192.168.100.12,192.168.100.13
在三个节点分别执行以下操做,添加主机名和IP地址的映射关系:
[root@node1 ~]# vi /etc/hosts
在node1,node2,node3三个节点上分别执行生成密钥的命令(遇到选择项,直接按回国键Enter):
[root@node1 ~]# ssh-keygen
进入.ssh目录,查看生成的公钥:
[root@node1 ~]# cd ~/.ssh/ [root@node1 .ssh]# ls id_rsa id_rsa.pub
将生成的公钥拷贝至节点(包括自身节点):
node1节点:
[root@node1 .ssh]# ssh-copy-id -i id_rsa.pub root@node1 The authenticity of host 'node1 (192.168.100.11)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node1's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node1'" and check to make sure that only the key(s) you wanted were added.
[root@node1 .ssh]# ssh-copy-id -i id_rsa.pub root@node2 The authenticity of host 'node2 (192.168.100.12)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node2's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node2'" and check to make sure that only the key(s) you wanted were added.
[root@node1 .ssh]# ssh-copy-id -i id_rsa.pub root@node3 The authenticity of host 'node3 (192.168.100.13)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node3's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node3'" and check to make sure that only the key(s) you wanted were added.
node2节点:
[root@node2 .ssh]# ssh-copy-id -i id_rsa.pub root@node1 The authenticity of host 'node1 (192.168.100.11)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node1's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node1'" and check to make sure that only the key(s) you wanted were added.
[root@node2 .ssh]# ssh-copy-id -i id_rsa.pub root@node2 The authenticity of host 'node2 (192.168.100.12)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node2's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node2'" and check to make sure that only the key(s) you wanted were added.
[root@node2 .ssh]# ssh-copy-id -i id_rsa.pub root@node3 The authenticity of host 'node3 (192.168.100.13)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node3's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node3'" and check to make sure that only the key(s) you wanted were added.
node3节点:
[root@node3 .ssh]# ssh-copy-id -i id_rsa.pub root@node1 The authenticity of host 'node1 (192.168.100.11)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node1's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node1'" and check to make sure that only the key(s) you wanted were added.
[root@node3 .ssh]# ssh-copy-id -i id_rsa.pub root@node2 The authenticity of host 'node2 (192.168.100.12)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node2's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node2'" and check to make sure that only the key(s) you wanted were added.
[root@node3 .ssh]# ssh-copy-id -i id_rsa.pub root@node3 The authenticity of host 'node3 (192.168.100.13)' can't be established. ECDSA key fingerprint is e1:6c:f3:7f:be:79:dc:87:15:97:51:4d:e5:b4:56:78. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@node3's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@node3'" and check to make sure that only the key(s) you wanted were added.
在三个节点上分别执行命令,访问相关节点(含自身节点),若是不须要输入密码进行身份验证,则表示成功(以node3节点上的操做为例):
[root@node3 .ssh]# ssh node1 Last login: Thu Jan 21 11:32:29 2021 from 192.168.100.1 [root@node1 ~]# exit logout Connection to node1 closed. [root@node3 .ssh]# ssh node2 Last login: Thu Jan 21 16:01:47 2021 from node1 [root@node2 ~]# exit logout Connection to node2 closed. [root@node3 .ssh]# ssh node3 Last login: Thu Jan 21 16:01:59 2021 from node1 [root@node3 ~]# exit logout Connection to node3 closed.
三个节点都要执行:
[root@node1 .ssh]# systemctl stop firewalld [root@node1 .ssh]# systemctl disable firewalld Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service. Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
三个节点都要设置selinux为disabled:
[root@node1 ~]# vi /etc/selinux/config
将selinux设置为disabled后,须要重启机器生效,也能够执行以下命令,将selinux设置为permissive(一样也要在三个节点操做):
[root@node1 ~]# setenforce 0 [root@node1 ~]# getenforce Permissive
在node1节点下建立目录/opt/jdk,将jdk包上传至此目录:
[root@node1 ~]# mkdir -p /opt/jdk [root@node1 ~]# cd /opt/jdk [root@node1 jdk]# ls jdk-8u65-linux-x64.tar.gz
解压缩jdk-8u65-linux-x64.tar.gz至当前目录,完成后删除压缩包:
[root@node1 jdk]# tar zxvf jdk-8u65-linux-x64.tar.gz [root@node1 jdk]# rm -f jdk-8u65-linux-x64.tar.gz
修改/etc/profile文件,添加Java环境配置信息:
[root@node1 jdk]# vi /etc/profile
#Java Start export JAVA_HOME=/opt/jdk/jdk1.8.0_65 export PATH=$PATH:${JAVA_HOME}/bin export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar #Java End
使用Java环境配置信息生效:
[root@node1 jdk]# source /etc/profile [root@node1 jdk]# java -version java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
在node1节点下建立目录/opt/hadoop,将hadoop包上传至此目录:
[root@node1 ~]# mkdir -p /opt/hadoop [root@node1 ~]# cd /opt/hadoop/ [root@node1 hadoop]# ls hadoop-2.8.3.tar.gz
解压缩hadoop-2.8.3.tar.gz至当前目录,完成后删除压缩包:
[root@node1 hadoop]# tar zxvf hadoop-2.8.3.tar.gz [root@node1 hadoop]# rm -f hadoop-2.8.3.tar.gz
依次修改etc目录下 hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中的JDK路径,将其分别指向/opt/jdk/jdk1.8.0_65/,注意在编辑配置文件时,先把# export前的符号”#“去掉:
[root@node1 ~]# cd /opt/hadoop/hadoop-2.8.3/etc/hadoop/
[root@node1 hadoop]# vi hadoop-env.sh
[root@node1 hadoop]# vi mapred-env.sh
[root@node1 hadoop]# vi yarn-env.sh
在三个节点上分别建立hadoop临时目录/opt/datas/tmp:
[root@node1 ~]# mkdir -p /opt/datas/tmp
[root@node2 ~]# mkdir -p /opt/datas/tmp
[root@node3 ~]# mkdir -p /opt/datas/tmp
在node1节点上修改core-site.xml配置信息:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/core-site.xml
添加以下内容:
<configuration> <property> <!-- NameNode主机地址及端口号 --> <name>fs.defaultFS</name> <value>hdfs://node1:8020</value> </property> <!-- hadoop临时目录的地址 --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/data/tmp</value> </property> </configuration>
在三个节点分别建立好存放NameNode数据的目录/opt/datas/dfs/namenode,以及存入DataNode数据的目录/opt/datas/dfs/datanode(以node1上的操做为例,node2和node3上的操做相同):
[root@node1 ~]# mkdir -p /opt/datas/dfs/namenode [root@node1 ~]# mkdir -p /opt/datas/dfs/datanode
编辑hdfs-site.xml文件,配置相关信息:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/hdfs-site.xml
<configuration> <!-- 指定建立的副本数 --> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- 指定SecondaryNameNode的地址和端口号,将node2做为SecondaryNameNode服务器 --> <property> <name>dfs.namenode.secondary.http-address</name> <value>node2:50090</value> </property> <!-- NameNode 数据存放路径 --> <property> <name>dfs.namenode.name.dir</name> <value>/opt/datas/dfs/namenode</value> </property> <!-- DataNode 数据存放路径 --> <property> <name>dfs.datanode.data.dir</name> <value>/opt/datas/dfs/datanode</value> </property> </configuration>
slaves文件用于指定hdfs DataNode 工做节点,编辑slaves文件:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/slaves
将文件内容修改成:
编辑yarn-site.xml文件:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/yarn-site.xml
修改文件内容:
<configuration> <!-- NodeManager上运行的附属服务,需配置成mapreduce_shuffle,才能运行MapReduce程序 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定ResourceManager服务器--> <property> <name>yarn.resourcemanager.hostname</name> <value>node2</value> </property> <!-- 配置是否启用日志汇集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 配置汇集的日志在hdfs上最长保存时间 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>106800</value> </property> </configuration>
以mapred-site.xml.template为模板,复制一个mapred-site.xml文件:
[root@node1 ~]# cp /opt/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml.template /opt/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml
编辑mapred-site.xml文件:
[root@node1 ~]# vi /opt/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml
<configuration> <!-- 设置mapreduce任务运行在yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 设置mapreduce历史服务器地址及端口号 --> <property> <name>mapreduce.jobhistory.address</name> <value>node1:10020</value> </property> <!-- 设置mapreduce历史服务器的web页面地址和端口号 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node1:19888</value> </property> </configuration>
编辑环境配置文件/etc/profile:
[root@node1 ~]# vi /etc/profile
#Hadoop Start export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3 export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin #Hadoop End
让环境配置信息生效:
[root@node1 ~]# source /etc/profile
[root@node2 ~]# mkdir -p /opt/jdk [root@node2 ~]# mkdir -p /opt/hadoop
[root@node1 ~]# scp -r /opt/jdk/jdk1.8.0_65/ node2:/opt/jdk [root@node1 ~]# scp -r /opt/jdk/jdk1.8.0_65/ node3:/opt/jdk
[root@node1 ~]# scp -r /opt/hadoop/hadoop-2.8.3/ node2:/opt/hadoop [root@node1 ~]# scp -r /opt/hadoop/hadoop-2.8.3/ node3:/opt/hadoop
[root@node1 ~]# scp /etc/profile node2:/etc/profile [root@node1 ~]# scp /etc/profile node3:/etc/profile
node2:
[root@node2 ~]# source /etc/profile
node3:
[root@node3 ~]# source /etc/profile
若是须要从新格式化NameNode,须要先将原来NameNode和DataNode下的文件所有删,否则会报错,由于每次格式化,默认是建立一个集群ID,并写入NameNode和DataNode的VERSION文件中(VERSION文件所在目录为dfs/namenode/current 和 dfs/datanode/current),从新格式化时,默认会生成一个新的集群ID,若是不删除原来的目录,会致使NameNode中的VERSION文件中是新的集群ID,而DataNode中是旧的集群ID,从而不一致,致使报错,另外一种方法是格式化时指定集群ID参数,指定为旧的集群ID。
NameNode和DataNode所在目录是在hdfs-site.xml中dfs.namenode.name.dir、dfs.datanode.data.dir所配置。
[root@node1 ~]# cd /opt/hadoop/hadoop-2.8.3/bin/ [root@node1 bin]# ./hdfs namenode -format
[root@node1 ~]# cd /opt/hadoop/hadoop-2.8.3/sbin/
[root@node1 sbin]# ./start-dfs.sh Starting namenodes on [node1] node1: starting namenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-namenode-node1.out node3: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-node3.out node2: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-node2.out node1: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-datanode-node1.out Starting secondary namenodes [node2] node2: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-root-secondarynamenode-node2.out [root@node1 sbin]#
jps 命令查看进程启动状况,能看到node1节点启动了 NameNode 和 DataNode进程。
[root@node1 sbin]# jps 1588 NameNode 1717 DataNode 1930 Jps
在node2节点上执行命令:
[root@node2 ~]# cd /opt/hadoop/hadoop-2.8.3/sbin/ [root@node2 sbin]# ./start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-resourcemanager-node2.out node3: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-node3.out node1: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-node1.out node2: starting nodemanager, logging to /opt/hadoop/hadoop-2.8.3/logs/yarn-root-nodemanager-node2.out [root@node2 sbin]#
jps 命令查看进程启动状况,能看到node2节点启动了ResourceManager进程:
[root@node2 sbin]# jps 2629 NodeManager 2937 Jps 1434 DataNode 1531 SecondaryNameNode 2525 ResourceManager [root@node2 sbin]#
注意,若是不在ResourceManager主机上运行 $HADOOP_HOME/sbin/start-yarn.sh 命令的话,ResourceManager 进程将不会启动,须要到 ResourceManager 主机上执行./yarn-daemon.sh start resourcemanager 命令来启动ResourceManager进程。
在node1节点上启动MapReduce日志服务:
[root@node1 sbin]# ./mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /opt/hadoop/hadoop-2.8.3/logs/mapred-root-historyserver-node1.out [root@node1 sbin]#
jps 命令查看进程启动状况,能看到node1节点启动了JobHistoryServer进程:
[root@node1 sbin]# jps 1588 NameNode 1717 DataNode 2502 Jps 2462 JobHistoryServer 2303 NodeManager [root@node1 sbin]#
地址为 NameNode 进程运行主机ip,端口为50070(网址:http://192.168.100.11:50070):
地址为node2主机ip,端口号为:8088(网址:http://192.168.100.12:8088)
地址为node1主机ip,端口号为:19888(网址:http://192.168.100.11:19888/jobhistory)
[root@node1 ~]# vi example.txt
在example.txt文件中添加以下内容:
hadoop mapreduce hive hbase spark storm sqoop hadoop hive spark hadoop
[root@node1 ~]# hadoop fs -mkdir -p /datas/input
[root@node1 ~]# hadoop fs -put ~/example.txt /datas/input
[root@node1 ~]# hadoop jar /opt/hadoop/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount /datas/input/example.txt /datas/output
[root@node1 ~]# hadoop fs -cat /datas/output/part-r-00000 hadoop 3 hbase 1 hive 2 mapreduce 1 spark 2 sqoop 1 storm 1 [root@node1 ~]#