规划 三台机器,html
vi /etc/hosts #127.0.0.1 localhost.localdomain localhost 为了以防万一,我将127.0.0.1也注释掉,正常应该只注释ipv6 ::1便可 #::1 localhost6.localdomain6 localhost6 192.168.79.135 master 192.168.79.131 slave1 192.168.79.132 slave2
能够每一个机器都单独配置,也可使用scp命令进行服务器间的拷贝,可是此时没有进行免密码登录(后边将有ssh免密码登陆的说明),拷贝时须要输入密码
#vi /etc/sysconfig/network NETWORKING=yes NETWORKING_IPV6=no HOSTNAME =master NTPSERVERARGS=iburst
将HOSTNAME修改成预先规划的名称,三个机器都要修改,固然、不要重复java
每一个机器执行 ssh-keygen 在主节点master执行 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh slave1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh slave2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
执行完之后,将authorized_keys文件传输到每一个机器的/etc/目录下node
先查看 rpm -qa | grep java 显示以下信息: java-1.4.2-gcj-compat-1.4.2.0-40jpp.115 java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5 卸载: rpm -e --nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115 rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
下载jdk 对应版本tar包,解压到指定目录,本人(/usr/java/)解压后在/etc/profile中配置jdk环境变量export JAVA_HOME=/usr/java/jdk1.7.0_67 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JRE_HOME=/usr/java/jdk1.7.0_67/jre export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin修改完成后,执行source /etc/profile,是环境变量生效web
接下来验证jdk是否安装成功:[root@master ~]# java -version java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) [root@master ~]# javac -version javac 1.7.0_67 [root@master ~]# $JAVA_HOME -bash: /usr/java/jdk1.7.0_67: is a directory如上显示正常版本及jdk路径,则安装成功,能够把java安装文件及/etc/profile文件拷贝到其余节点机器,执行 source /etc/profile,便可。apache
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop/tmp/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> <!-- i/o properties --> <property> <name>io.file.buffer.size</name> <value>131072</value> <description>The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.</description> </property> </configuration>
(3).hdfs-site.xmlbash
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/data/hadoop/hadoop-2.5.2/hdfs/name</value> <description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/hadoop/hadoop-2.5.2/hdfs/data</value> <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> </property> <property> <name>dfs.replication</name> <value>2</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> <description> The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB). </description> </property> <property> <name>dfs.namenode.handler.count</name> <value>10</value> <description>The number of server threads for the namenode.</description> </property> </configuration>
(4)、mapred-site.xml服务器
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn. </description> </property> <!-- jobhistory properties --> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> <description>MapReduce JobHistory Server Web UI host:port</description> </property> </configuration>
(5)、yarn-site.xml网络
<configuration> <!-- Site specific YARN configuration properties --> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> <description>The address of the applications manager interface in the RM.</description> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <description>The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum. default is 1024 </description> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value. default value is 8192</description> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property> <property> <description>Amount of physical memory, in MB, that can be allocated for containers.default value is 8192</description> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <description>Whether to enable log aggregation. Log aggregation collects each container's logs and moves these logs onto a file-system, for e.g. HDFS, after the application completes. Users can configure the "yarn.nodemanager.remote-app-log-dir" and "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine where these logs are moved to. Users can access the logs via the Application Timeline Server. </description> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> </configuration>
slave1
slave2
在以上配置文件中,有好多属性为hadoop默认属性值,拿来只是为了标注清楚,在配置时,若发现与默认文档相同的值,能够省略app
至此,hadoop配置文件就配置完了,接着须要配置hadoop的环境变量,以前java环境变量也是包含其中,以下:dom
/etc/profile
#set java_env export JAVA_HOME=/usr/java/jdk1.7.0_67 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JRE_HOME=/usr/java/jdk1.7.0_67/jre export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin ###set hadoop_env export HADOOP_HOME=/data/hadoop/hadoop-2.5.2 export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
一样执行source /etc/profile
<property> <name>dfs.namenode.rpc-bind-host</name> <value></value> <description> The actual address the RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0. </description> </property>
[root@master hadoop]# jps 2630 Jps 1955 SecondaryNameNode 1785 NameNode [root@slave1 ~]# jps 1942 Jps 1596 DataNode
若在master使用jps发现上边两个进程,在slave发现DataNode,则dfs启动成功(固然你须要在slave节点查看日志,如有错仍还须要排查)
[root@master hadoop]# jps 2630 Jps 1955 SecondaryNameNode 1785 NameNode 2316 ResourceManager [root@slave1 ~]# jps 1942 Jps 1596 DataNode 1774 NodeManager
启动成功后,master增长一个ResourceManager,slave增长一个NodeManager,启动成功
如下命令 用于启动mapreduce-jobhistoryserver (若不须要查看,则不须要启动),jps能够查看到多了一个JobHistoryServer./mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
Daemon | Web Interface | Notes |
---|---|---|
NameNode | http://nn_host:port/ | Default HTTP port is 50070. |
ResourceManager | http://rm_host:port/ | Default HTTP port is 8088. |
MapReduce JobHistory Server | http://jhs_host:port/ | Default HTTP port is 19888. |
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From slave1/192.168.79.131 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 拒绝链接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused