6.安装Hadoopjava
1)在Hadoop网站下,下载稳定版的而且已经编译好的二进制包,并解压缩。
node
[hadoop@master ~]$ wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz [hadoop@master ~]$ tar -zxvf hadoop-2.7.3.tar.gz -C ~/opt [hadoop@master ~]$ cd ~/opt/hadoop-2.7.3
2)设置环境变量:apache
[hadoop@master hadoop-2.7.3]$ vim ~/.bashrc # User specific aliases and functions export HADOOP_PREFIX=$HOME/opt/hadoop-2.7.3 export HADOOP_COMMON_HOME=HADOOP_PREFIX export HADOOP_HDFS_HOME=HADOOP_PREFIX export HADOOP_MAPRED_HOME=HADOOP_PREFIX export HADOOP_YARN_HOME=HADOOP_PREFIX export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
3)修改配置文件(etc/hadoop/hadoop-env.sh),添加下面的命令(这里须要注意JAVA_HOME的设置须要根据本身机器的实际状况进行设置):vim
##把JAVA_HOME后的内容修改为本机设定的JAVA_HOME # I added export JAVA_HOME=/usr/lib/jvm/java ##我修改成 export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
4) 修改配置文件(etc/hadoop/core-site.xml),内容以下:(注①)bash
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.0.131:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/opt/var/hadoop/tmp/hadoop-$USER</value> </property> </configuration>
5) 修改配置文件(etc/hadoop/hdfs-site.xml),内容以下:ssh
<configuration> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/opt/var/hadoop/hdfs/datanode</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/opt/var/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:/home/hadoop/opt/var/hadoop/hdfs/namesecondary</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- <property> <name>dfs.datanode.max.xcievers</name> <value>2</value> </property>--> </configuration>
6) 修改配置文件(etc/hadoop/yarn-site.xml),内容以下:jvm
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
7) 首先,复制etc/hadoop/mapred-site.xml.template为etc/hadoop/mapred-site.xmloop
[hadoop@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
而后修改配置文件(etc/hadoop/mapred-site.xm),内容以下:大数据
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobtracker.staging.root.dir</name> <value>/home</value> </property> </configuration>
8)复制hadoop至slave1,slave2。(注②)网站
[hadoop@master opt]$ scp -r /home/hadoop/opt/hadoop-2.7.3 hadoop@slave1:/home/hadoop/opt/ [hadoop@master opt]$ scp -r /home/hadoop/opt/hadoop-2.7.3 hadoop@slave2:/home/hadoop/opt/
9)格式化HDFS:
[hadoop@master hadoop]$ hdfs namenod -format
10)启动hadoop集群,启动结束后使用jps命令列出守护进程安装验证是否成功。(注③)
#在使用过程当中,我发现每次都须要输入master的密码。因而又将公钥拷给了master一份以后,就不须要输入“三次”密码了! [hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub master #启动hadoop集群 [hadoop@master ~]$ start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [master] master: starting namenode, logging to /home/hadoop/opt/hadoop-2.7.3/logs/hadoop-hadoop-namenode-master.out slave1: starting datanode, logging to /home/hadoop/opt/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave1.out slave2: starting datanode, logging to /home/hadoop/opt/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave2.out master: starting datanode, logging to /home/hadoop/opt/hadoop-2.7.3/logs/hadoop-hadoop-datanode-master.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/opt/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-master.out starting yarn daemons starting resourcemanager, logging to /home/hadoop/opt/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-master.out slave1: starting nodemanager, logging to /home/hadoop/opt/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave1.out slave2: starting nodemanager, logging to /home/hadoop/opt/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave2.out master: starting nodemanager, logging to /home/hadoop/opt/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-master.out #master主节点 [hadoop@master ~]$ jps 44469 DataNode 45256 Jps 44651 SecondaryNameNode 44811 ResourceManager 44939 NodeManager 44319 NameNode #slave1节点 [hadoop@slave1 ~]$ jps 35973 NodeManager 35847 DataNode 36106 Jps #slave2节点 [hadoop@slave2 ~]$ jps 36360 NodeManager 36234 DataNode 36493 Jps
注①:等同于一个选项,name中是选项名,value中是选项内容。中间的设置能够查阅hadoop2.7.3的官方文档http://hadoop.apache.org/docs/r2.7.3/ 我在其中翻阅了以后留下的这些配置不必定所有都是须要的,可是我已经配置了半个月了,心力憔悴之下我不想再试错了。若是有和我一样配置的朋友能把这份配置精简一下,万分感谢。
注②:在《Hadoop大数据分析与挖掘实战》这本书里没有这一项,虽然没有试错,可是我认为是必须的。
注③:书中分两步启动,start-dfs.sh及start-yarn.sh,可是启动以后书中P30页的实践内容作不了。后来通过排查,发现start-all.sh能够。应该是2.7.3中又添加了其余的启动项(?!),但愿有大神能够指正。
作了半个月,内容自用,谢绝转载。