docker安装hadoop集群?图啥呢?不图啥,就是图好玩.本篇博客主要是来教你们如何搭建一个docker的hadoop集群.不要问java
为何我要作这么无聊的事情,答案你也许知道,由于没有女票.......node
好了,很少说这些没有必要的东西了,首先,咱们来安装docker.linux
一.docker的安装web
sudo yum install -y docker-io算法
sudo wget https://get.docker.com/builds/Linux/x86_64/docker-latest -O /usr/bin/dockerdocker
咱们来启动咱们的docker:apache
sudo service docker startvim
开机也自启动docker
sudo chkconfig docker oncentos
二.拉取一个镜像bash
若是咱们要6.5的centos 版本,额,不要问我问什么用6.5的,由于宿主机是内核6.5的...
sudo docker pull insaneworks/centos
而后咱们就能够去吃个饭喝壶茶了......反正你就就慢慢等吧.....
.......
ok,饭吃完了,咱们来产生一个容器吧.
sudo docker run -it insaneworks/centos /bin/bash
ctrl+p ctrl+q能够帮助咱们从容器返回宿主机.
sudo docker ps 能够查看运行的容器.
ok,咱们不想要这个容器了,怎么办呢?
sudo docker stop b152861ef001
同时再把容器删除
sudo docker rm b152861ef001
三.制做一个hadoop镜像
这是这里最繁琐的过程,不过,咱们能够分解来作.少年郎,我夹你杠哦,你会了这个,就不再用担忧hadoop
不会装了.走起!
sudo docker run -it -h master --name master insaneworks/centos /bin/bash
进入到容器里,咱们第一步就是把gcc给我装了.
yum install -y gcc
装vim
yum install -y vim
装lrzsz
yum install -y lrzsz
装ssh:
yum -y install openssh-server
yum -y install openssh-clients
注意,这里咱们要改一下ssh的一些配置:vim /etc/ssh/sshd_config
放开PermitEmptyPasswords no
更改UsePAM no
放开PermitRootLogin yes
启动sshd
service sshd start
而后咱们要用到ssh密码设置
ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
这样完了后呢,咱们ssh连本身试试
ssh master
不错,很是好使.
接下来咱们把java给装了.
经过rz将java rpm包给传上去,好了,又该去喝一壶了........
rpm -ivh jdk-7u75-linux-x64.rpm
修改环境变量
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
export JAVA_HOME=/usr/java/jdk1.7.0_75
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile
下面咱们该装hadoop了,激动吧,呵呵,先把tar给装了先.
yum install -y tar
同样,咱们用rz把hadoop的tar包传到容器内.事先埋个伏笔,以后会有一件很坑爹的事情,反正到时候你就知道了.
嗯......等的真是好漫长..........
咚咚哒哒呼噜娃.......咚咚哒哒呼噜娃.......
好了,解压:
tar zxvf hadoop-2.6.0.tar.gz
完美!
配置环境变量
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
而后又一件事情要作,这件事情看上去好像不用作,但老子试过n次,不作就是hadoop起不来.
vim hadoop-env.sh 和 yarn-env.sh 在开头添加以下环境变量(必定要添加切勿少了)
export JAVA_HOME=/usr/java/jdk1.7.0_75
哦,这个文件都在hadoop解压目录的etc中.
下面咱们来写配置文件.
修改hadoop core-site.xml文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/songfy/hadoop-2.6.0/tmp</value>
</property>
</configuration>
修改hdfs-site.xml文件
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/songfy/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/songfy/hadoop-2.6.0/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
修改mapred-site.xml文件
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
修改yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
</configuration>
在slaves文件中添加
slave1
slave2
slave3
彷佛一切都好像搞定了,少年,别急,吓死你!
ldd /home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0
而后你会看到:
/home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0)
linux-vdso.so.1 => (0x00007fff24dbc000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ff8c6371000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff8c5fdc000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff8c679b000)
人生是这样的无情,人生是这样的冷酷,以前有个小朋友问过我这个问题......我没有理,如今,然我亲手灭了这个问题!
不过你们可能明白了为何我一上来就装个gcc了吧.
yum install -y wget
wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz
tar zxvf glibc-2.14.tar.gz
cd glibc-2.14
mkdir build
cd build
../configure --prefix=/usr/local/glibc-2.14
make
make install
ln -sf /usr/local/glibc-2.14/lib/libc-2.14.so /lib64/libc.so.6
此时,ldd /home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0
就没有任何问题了
linux-vdso.so.1 => (0x00007fff72b7c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb996ce9000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb99695c000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb997113000
这样,咱们的镜像就能够commit了
docker commit master songfy/hadoop
咱们能够用docker images来查看镜像.
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
songfy/hadoop latest 311318c0a407 42 seconds ago 1.781 GB
insaneworks/centos latest 9d29fe7b2e52 9 days ago 121.1 MB
下面咱们来启动hadoop集群
三.启动hadoop集群
docker rm master
sudo docker run -it -p 50070:50070 -p 19888:19888 -p 8088:8088 -h master --name master songfy/hadoop /bin/bash
sudo docker run -it -h slave1 --name slave1 songfy/hadoop /bin/bash
sudo docker run -it -h slave2 --name slave2 songfy/hadoop /bin/bash
sudo docker run -it -h slave3 --name slave3 songfy/hadoop /bin/bash
attach到每一个节点上执行
source /etc/profile
service sshd start
接下来咱们还要给每台机器配host
docker inspect --format='{{.NetworkSettings.IPAddress}}' master
这条语句能够查看ip
172.17.0.4 master
172.17.0.5 slave1
172.17.0.6 slave2
172.17.0.7 slave3
用scp将hosts文件分发到各个node中.
好了,咱们终于要启动hadoop了.
hadoop namenode -format
/home/hadoop/hadoop-2.6.0/sbin/start-dfs.sh
/home/hadoop/hadoop-2.6.0/sbin/start-yarn.sh
用jps查看,发现都起来了.
下面咱们简单来对hdfs操做一下.
hadoop fs -mkdir /input
hadoop fs -ls /
drwxr-xr-x - root supergroup 0 2015-08-09 09:09 /input
下面咱们来运行一下大名鼎鼎的wordcount程序来看看.
hadoop fs -put /home/hadoop/hadoop-2.6.0/etc/hadoop/* /input/
hadoop jar /home/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input/ /output/wordcount/
不要觉得一下就成功了.咱们发现事实上,程序并无跑出来,查了下日志,看到:
2015-08-09 09:23:23,481 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : slave1:41978 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
嗯,意思是内存不够,咱们就分2G过去.
咱们发现大名鼎鼎的hadoop运行的简直奇慢无比........因此说,当你机器多,你会跑的很快,若是是docker,就歇了吧.
固然,本人也试过多宿主机部署hadoop,不过由于没有那么多实体机,所以是在多个vmvare虚拟机上部署的docker hadoop集群.
这就是虚拟机上的云端hadoop........事实上,除了统计次数的时候,把其中一台宿主虚拟机跑跪之外,几乎没什么软用.......
好了,结果出来了,咱们来看看:
policy 3
port 5
ports 2
ports. 2
potential 2
preferred 3
prefix. 1
present, 1
principal 4
principal. 1
printed 1
priorities. 1
priority 1
privileged 2
privileges 1
privileges. 1
properties 6
property 11
protocol 6
protocol, 2
不错....好玩吧.......下次咱们再选一个有趣的主题吧,嗯,那就hive或者storm吧......固然本人并不可靠,
或许换成lda或者word2vec这种算法主题的,或者cuda异构计算也不必定,博主是个神经病,谁知道呐.