hadoop集群搭建


hadoop集群搭建

1 集群规划

三台虚拟机
操做系统:CentOS7 Minimal
经过桥接方式联网(NAT和Host-only应该也能够)
IP地址分别是:
192.168.1.101
192.168.1.102
192.168.1.103
java

2 集群基本配置

修改三台机器的/etc/hosts文件,增长以下内容:
node

192.168.1.101 master
192.168.1.102 slave1
192.168.1.103 slave2

分别修改/etc/hostname,内容为master/slave1/slave2
sql

3 配置ssh免密码访问

在slave1中
vim

su
vim /etc/ssh/sshd_config
StrictModes no
RSAAuthentication yes
PubkeyAuthentication yes
/bin/systemctl restart  sshd.service
mkdir .ssh

在master中
segmentfault

ssh-keygen -t dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_dsa.pub | ssh galaxy@slave1 'cat - >> ~/.ssh/authorized_keys'
ssh slave1

用一样的方式处理slave2和master(master本身也要可以ssh免密码访问)
centos

参考资料:
http://my.oschina.net/u/1169607/blog/175899
http://segmentfault.com/a/1190000002911599
bash

4 安装hadoop

省略
ssh

5 配置hadoop

hadoop-env.sh
ide

#export JAVA_HOME=$JAVA_HOME                  //错误,不能这么改
export JAVA_HOME=/usr/java/jdk1.8.0_45

core-site.xml
oop

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://master:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/tmp</value>
	</property>
</configuration>

hdfs-site.xml

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
</configuration>

mapred-site.xml

<configuration>
	<property>
		<name>mapred.job.tracker</name>
		<value>master:9001</value>
	</property>
</configuration>

masters

master

slaves

slave1
slave2

将配置复制到另外两台机器上

scp etc/hadoop/* galaxy@slave1:/home/galaxy/hadoop-2.5.1/etc/hadoop/
scp etc/hadoop/* galaxy@slave2:/home/galaxy/hadoop-2.5.1/etc/hadoop/

6 启动hadoop集群

格式化namenode

./bin/hadoop namenode -format
出现:15/11/09 19:25:59 INFO common.Storage: Storage directory /tmp/dfs/name has been successfully formatted.

启动hadoop

./sbin/start-dfs.sh

经过jps验证是否都正常运行

[galaxy@master hadoop-2.5.1]$ jps
5924 ResourceManager
6918 SecondaryNameNode
7718 Jps
6743 NameNode

[galaxy@slave1 ~]$ jps
6402 Jps
6345 DataNode

[galaxy@slave2 ~]$ jps
25552 Jps
25495 DataNode

7 查看集群状态

命令行方式

./bin/hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

网页方式
http://192.168.1.101:50070
注意:须要关闭centos7的防火墙:systemctl stop firewalld

8 运行测试程序

建立本地测试文件

mkdir input
vim input/f1
vim input/f2

建立hadoop目录

./bin/hadoop fs  -mkdir /tmp
./bin/hadoop fs  -mkdir /tmp/input
./bin/hadoop fs -ls /

上传测试文件

./bin/hadoop fs -put input/ /tmp
注意:须要关闭全部节点centos7的防火墙:systemctl stop firewalld,不然上传文件会报错
./bin/hadoop fs -ls /tmp/input
-rw-r--r--   2 galaxy supergroup         16 2015-11-11 04:30 /tmp/input/f1
-rw-r--r--   2 galaxy supergroup         24 2015-11-11 04:30 /tmp/input/f2

运行wordcount

./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /tmp/input /output

查看输出结果

[galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -ls /output
Found 2 items
-rw-r--r--   2 galaxy supergroup          0 2015-11-11 04:44 /output/_SUCCESS
-rw-r--r--   2 galaxy supergroup         31 2015-11-11 04:44 /output/part-r-00000
[galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -cat /output/*
bye	2
hadoop	2
hello	2
world	1

Author: galaxy

Created: 2015-11-11 Wed 18:00

Emacs 24.5.6 (Org mode 8.2.10)

Validate