集群环境至少须要3个节点,也就是至少须要3台服务器设备:1个Master,2个Slave,节点之间局域网链接,能够相互ping通,下面经过表格的方式,列出3台服务器设备的具体配置信息:java
Hostname | IP | User | Password | Role |
master | 192.168.1.101 | hadoop | 123456 | namenode |
slave1 | 192.168.1.105 | hadoop | 123456 | datanode |
slave2 | 192.168.1.106 | hadoop | 123456 | datanode |
为了便于维护,集群环境配置项最好使用相同用户名、用户密码、相同hadoop、hbase、zookeeper目录结构。node
以上软件可在 http://www.apache.org/dyn/closer.cgi 下载。web
很简单,使用root帐户登陆三台节点,建立hadoop帐户。shell
$ useradd hadoop $ passwd hadoop #设置密码为123456
分别在三个节点上添加hosts映射关系:apache
$ vim /etc/hosts
添加的内容以下:vim
192.168.1.101 master 192.168.1.105 slave1 192.168.1.106 slave2
CentOS默认安装了ssh,若是没有你须要先安装ssh 。浏览器
集群环境的使用必须经过ssh无密码登录来执行,本机登录本机必须无密码登录,主机与从机之间必须能够双向无密码登录,从机与从机之间无限制。服务器
主要有三步:session
$ ssh-keygen -t rsa -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 700 ~/.ssh && chmod 600 ~/.ssh/*
测试,第一次登陆可能须要yes确认,以后就能够直接登陆了:app
$ ssh localhost Last login: Sat Jul 18 22:57:44 2015 from localhost
对于 slave1 和 slave2,进行无密码自登录设置,操做同上。
$ cat ~/.ssh/id_rsa.pub | ssh hadoop@slave1 'cat - >> ~/.ssh/authorized_keys' $ cat ~/.ssh/id_rsa.pub | ssh hadoop@slave2 'cat - >> ~/.ssh/authorized_keys'
测试:
[hadoop@master ~]$ ssh hadoop@slave1 Last login: Sat Jul 18 23:25:41 2015 from master [hadoop@master ~]$ ssh hadoop@slave2 Last login: Sat Jul 18 23:25:14 2015 from master
分别在slave一、slave2上执行:
$ cat ~/.ssh/id_rsa.pub | ssh hadoop@master 'cat - >> ~/.ssh/authorized_keys'
这是安装三个软件的前提,推荐你们使用rpm安装,很是简单。我专门写了一篇博客,请移步阅读《Centos6.6 64位安装配置JDK 8教程》,虽然我如今使用的Centos7,此教程不受系统版本影响的。
在master节点上,将hadoop、hbase、zookeeper的安装包都解压到/home/hadoop(也就是hadoop帐户的默认家目录),并重命名为hadoop、hbase、zookeeper。Hadoop的配置文件都在~/hadoop/etc/目录下,接下来对hadoop进行配置。
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> </configuration>
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hadoop/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
将 export JAVA_HOME=${JAVA_HOME} 改成 export JAVA_HOME=/usr/java/default
将 export JAVA_HOME=${JAVA_HOME} 改成 export JAVA_HOME=/usr/java/default
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>768</value> </property> </configuration>
vi masters ,加入如下配置内容:
master
vi slaves,加入如下配置内容:
slave1 slave2
使用scp命令进行从本地到远程(或远程到本地)的文件拷贝操做:
scp -r /home/hadoop/hadoop slave1:/home/hadoop scp -r /home/hadoop/hadoop slave2:/home/hadoop
进入master的~/hadoop目录,执行如下操做:
$ bin/hadoop namenode -format
格式化namenode,第一次启动服务前执行的操做,之后不须要执行。
在master节点上,执行以下命令,启动hadoop集群:
[hadoop@master ~]$ ~/hadoop/sbin/start-dfs.sh Starting namenodes on [master] master: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-master.out slave2: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave2.out slave1: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave1.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out [hadoop@master ~]$
在master节点上,执行以下命令,是否能够看到下面几个进程:
[hadoop@master ~]$ jps 2598 SecondaryNameNode 2714 Jps 2395 NameNode [hadoop@master ~]$
在两台slave上执行以下命令,是否能够看到下面几个进程:
[hadoop@slave1 ~]$ jps 2394 Jps 2317 DataNode [hadoop@slave1 ~]$ ############# [hadoop@slave2 ~]$ jps 2396 Jps 2319 DataNode [hadoop@slave2 ~]$
在master节点上,执行以下命令,启动YARN:
[hadoop@master ~]~/hadoop/sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-master.out slave2: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave2.out slave1: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave1.out [hadoop@master ~]$ jps
使用jps,若是出现 ResourceManager 这个进程,证实yarn已经启动成功了。
最后,验证集群计算,执行Hadoop自带的examples,执行以下命令:
~/hadoop/bin/hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar randomwriter out
在浏览器浏览:
http://192.168.1.101:8088/
http://192.168.1.101:50070/
看是否能打开,以了解一下集群的信息。
解压zookeeper安装包,并重命名为zookeeper,而后进行如下操做。
进入~/zookeeper/conf目录,拷贝zoo_sample.cfg文件为zoo.cfg
$ cp zoo_sample.cfg zoo.cfg
对zoo.cfg进行编辑,内容以下:
dataDir=/home/hadoop/zookeeper/data server.1=master:2888:3888 server.2=slave1:2888:3888 server.3=slave2:2888:3888
在dataDir目录下新建myid文件,输入一个数字(master为1,slave1为2,slave2为3),好比master主机上的操做以下:
$ mkdir /home/hadoop/zookeeper/data $ echo "1" > /home/hadoop/zookeeper/data/myid
一样,你也可使用scp命令进行远程复制,只不过要修改每一个节点上myid文件中的数字。
在ZooKeeper集群的每一个结点上,执行启动ZooKeeper服务的脚本:
$ ~/zookeeper/bin/zkServer.sh start
这里要强调的是,须要在三台机器上都启动Zookeeper。只在master启动是不行的。
将hbase安装包进行解压,并重命名为hbase,而后进行以下配置。
export JAVA_HOME=/usr/java/default export HBASE_CLASSPATH=/home/hadoop/hadoop/etc/hadoop/ export HBASE_MANAGES_ZK=false
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.master</name> <value>master</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave1,slave2</value> </property> <property> <name>zookeeper.session.timeout</name> <value>60000000</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property> </configuration>
在 regionservers 文件中添加slave列表:
slave1 slave2
将整个hbase安装目录都远程拷贝到全部slave服务器:
$ scp -r /home/hadoop/hbase slave1:/home/hadoop $ scp -r /home/hadoop/hbase slave2:/home/hadoop
在master节点上,执行以下命令:
[hadoop@master ~]$ ~/hbase/bin/start-hbase.sh starting master, logging to /home/hadoop/hbase/bin/../logs/hbase-hadoop-master-master.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 slave1: starting regionserver, logging to /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave1.out slave2: starting regionserver, logging to /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave2.out slave1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 slave1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 slave2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 slave2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 [hadoop@master ~]$
在master节点,使用 jps 查看进程:
[hadoop@master ~]$ jps 3586 Jps 2408 NameNode 2808 ResourceManager 3387 HMaster 2607 SecondaryNameNode 3231 QuorumPeerMain [hadoop@master ~]$
在slave节点,使用 jps 查看进程:
[hadoop@slave1 ~]$ jps 2736 HRegionServer 2952 Jps 2313 DataNode 2621 QuorumPeerMain [hadoop@slave1 ~]$
若是看到了HMaster和HRegionServer则标识HBase启动成功。
8.6.使用shell操做HBase
[hadoop@master ~]$ ~/hbase/bin/hbase shell 2016-10-27 11:52:47,394 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.3, rbd63744624a26dc3350137b564fe746df7a721a4, Mon Aug 29 15:13:42 PDT 2016 hbase(main):001:0> list TABLE member 1 row(s) in 0.4470 seconds => ["member"] hbase(main):002:0> version 1.2.3, rbd63744624a26dc3350137b564fe746df7a721a4, Mon Aug 29 15:13:42 PDT 2016 hbase(main):004:0> status 1 active master, 0 backup masters, 2 servers, 0 dead, 1.5000 average load hbase(main):005:0> exit
若是可以顺利执行,则表明HBase,启动成功了。里面的member表是我后来建立的,你也可使用create命令建立一个。
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Hadoop 2已经将HDFS和YARN分开管理,这样分开管理,可使HDFS更方便地进行HA或Federation,实现HDFS的线性扩展(Scale out),从而保证HDFS集群的高可用性。从另外一个方面们来讲,HDFS能够做为一个通用的分布式存储系统,而为第三方的分布式计算框架提供方便,就像相似YARN的计算框架,其余的如,Spark等等。
YARN就是MapReduce V2,将原来Hadoop 1.x中的JobTracker拆分为两部分:一部分是负责资源的管理(Resource Manager),另外一部分负责任务的调度(Scheduler)。
https://www.iwwenbo.com/hadoop-hbase-zookeeper/ (特别要感谢此做者,我是经过他的教程顺利完成安装的,可是因为hadoop,zookeeper,hbase使用的版本过老了,此次我写到教程加入了YARN的配置和我的的理解。)
http://blog.csdn.net/shirdrn/article/details/9731423
http://blog.csdn.net/renfengjun/article/details/25320043
http://blog.csdn.net/young_kim1/article/details/50324345