Hadoop安装及配置

1、系统及软件环境

1、操做系统html

CentOS release 6.5 (Final)java

内核版本:2.6.32-431.el6.x86_64node

master.fansik.com192.168.83.118linux

node1.fansik.com192.168.83.119web

node2.fansik.com192.168.83.120apache

2jdk版本:1.7.0_75vim

3Hadoop版本:2.7.2安全

2、安装前准备

1、关闭防火墙和selinux服务器

# setenforce 0app

# service iptables stop

2、配置host文件

192.168.83.118 master.fansik.com

192.168.83.119 node1.fansik.com

192.168.83.120 node2.fansik.com

3、生成秘钥

master.fansik.com上执行# ssh-keygen一直回车

# scp ~/.ssh/id_rsa.pub node1.fansik.com:/root/.ssh/authorized_keys

# scp ~/.ssh/id_rsa.pub node2.fansik.com:/root/.ssh/authorized_keys

# chmod 600 /root/.ssh/authorized_keys

4、安装jdk

# tar xf jdk-7u75-linux-x64.tar.gz

# mv jdk1.7.0_75 /usr/local/jdk1.7

# vim /etc/profile.d/java.sh加入以下内容:

export JAVA_HOME=/usr/local/jdk1.7

export JRE_HOME=/usr/local/jdk1.7/jre

export CLASSPATH=.:$JAVA_HOME/lib:/dt.jar:$JAVA_HOME/lib/tools.jar

export PATH=$PATH:$JAVA_HOME/bin

# source /etc/profile

5、同步时间(不然后边分析文件的时候可能会有问题)

# ntpdate 202.120.2.101(上海交通大学的服务器)

3、安装Hadoop

Hadoop的官方下载站点,能够选择相应的版本下载:http://hadoop.apache.org/releases.html

分别在三台机器上执行下面的操做:

# tar xf hadoop-2.7.2.tar.gz

# mv hadoop-2.7.2 /usr/local/hadoop

# cd /usr/local/hadoop/

# mkdir tmp dfs dfs/data dfs/name

4、配置Hadoop

master.fansik.com上的配置

# vim /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://192.168.83.118:9000</value>

  </property>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>file:/usr/local/hadoop/tmp</value>

  </property>

  <property>

    <name>io.file.buffer.size</name>

    <value>121702</value>

  </property>

</configuration>

# vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<configuration>

  <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:/usr/local/hadoop/dfs/name</value>

  </property>

  <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:/usr/local/hadoop/dfs/data</value>

  </property>

  <property>

    <name>dfs.replication</name>

    <value>2</value>

  </property>

  <property>

    <name>dfs.namenode.secondary.http-address</name>

    <value>192.168.83.118.9001</value>

  </property>

  <property>

    <name>dfs.webhdfs.enabled</name>

    <value>true</value>

  </property>

</configuration>

# cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

# vim (!$|/usr/local/hadoop/etc/hadoop/mapred-site.xml)

<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.address</name>

    <value>192.168.83.118:10020</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.webapp.address</name>

    <value>192.168.83.118:19888</value>

  </property>

</configuration>

# vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

<configuration>

  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

  <property>

    <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

    <value>org.apache.hadoop.mapred.ShuffleHandler</value>

  </property>

  <property>

    <name>yarn.resourcemanager.address</name>

    <value>192.168.83.118:8032</value>

  </property>

  <property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>192.168.83.118:8030</value>

  </property>

  <property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>192.168.83.118:8031</value>

  </property>

  <property>

    <name>yarn.resourcemanager.admin.address</name>

    <value>192.168.83.118:8033</value>

  </property>

  <property>

    <name>yarn.resourcemanager.webapp.address</name>

    <value>192.168.83.118:8088</value>

  </property>

  <property>

    <name>yarn.resourcemanager.resource.memory.mb</name>

    <value>2048</value>

  </property>

</configuration>

# vim /usr/local/hadoop/etc/hadoop/slaves

192.168.83.119

192.168.83.120

master上的etc目录同步至node1node2

# rsync -av /usr/local/hadoop/etc/ node1.fansik.com:/usr/local/hadoop/etc/

# rsync -av /usr/local/hadoop/etc/ node2.fansik.com:/usr/local/hadoop/etc/

master.fansik.com上操做便可,两个node会自动启动

配置Hadoop的环境变量

# vim /etc/profile.d/hadoop.sh

export PATH=/usr/local/hadoop/bin:/usr/local/hadoop/bin:$PATH

# source /etc/profile

初始化

# hdfs namenode -format

查看是否报错

# echo $?

启动服务

# start-all.sh

中止服务

# stop-all.sh

启动服务后便可经过下列地址访问:

http://192.168.83.118:8088

http://192.168.83.118:50070

5、测试Hadoop

master.fansik.com上操做

# hdfs dfs -mkdir /fansik

若是在建立目录的时候提示下列的警告能够忽略

16/07/29 17:38:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your pform... using builtin-java classes where applicable

解决办法:

到下列站点去下载相应的版本便可:

http://dl.bintray.com/sequenceiq/sequenceiq-bin/

# tar -xvf hadoop-native-64-2.7.0.tar -C /usr/local/hadoop/lib/native/

若是提示:copyFromLocalCannot create directory /123/. Name node is in safe mode

说明Hadoop开启了安全模式,解决办法

hdfs dfsadmin -safemode leave

myservicce.sh复制到fansik目录下

# hdfs dfs -copyFromLocal ./myservicce.sh /fansik

查看/fansik目录下是否有了myservicce.sh文件

# hdfs dfs -ls /fansik

使用workcount分析文件

# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /fansik/myservicce.sh /zhangshan/

查看分析后的文件:

# hdfs dfs -ls /zhangshan/

Found 2 items

-rw-r--r--   2 root supergroup          0 2016-08-02 15:19 /zhangshan/_SUCCESS

-rw-r--r--   2 root supergroup        415 2016-08-02 15:19 /zhangshan/part-r-00000

查看分析结果:

# hdfs dfs -cat /zhangshan/part-r-00000