hadoop 集群搭建

1、准备环境html

1.下载hadoop 2.9.2java

https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz

2. 下载java8node

https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

3. 解压、设置JAVA_HOME,PATHweb

export JAVA_HOME=/home/*/hadoop/jdk1.8.0_191
export J2SDKDIR=${JAVA_HOME}
export J2REDIR=${JAVA_HOME}/jre
export DERBY_HOME=${JAVE_HOME}/db

export PATH=${JAVA_HOME}/bin:${JAVA_HOME}/jre/bin:${JAVA_HOME}/db/bin:$PATH
export MANPATH=${JAVA_HOME}/man:$MANPATH

4.设置HADOOP_HOME、etc/hadoop/hadoop-env.shexpress

export HADOOP_HOME=/home/*/hadoop/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin

vim etc/hadoop/hadoop-env.sh
修改成
export JAVA_HOME=/home/*/hadoop/jdk1.8.0_191

2、配置hadoop 集群apache

1. 配置etc/hadoop/core-site.xmlvim

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/*/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hostname:8800</value>
    </property>
</configuration>
View Code

2.配置etc/hadoop/hdfs-site.xmloracle

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/home/*/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/*/hadoop/hdfs/data</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hostname:8801</value>
        <description>secondarynamenode的web地址</description>
    </property>
    <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
       <description>提供web访问hdfs的权限</description>
    </property>
</configuration>
View Code

3.配置etc/hadoop/yarn-site.xmlapp

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hostname</value>
    </property>
</configuration>
View Code

4.配置etc/hadoop/mapred-site.xmlless

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobtracker.address</name>
        <value>master:8010</value>
    </property>
</configuration>
View Code

 5.配置etc/hadoop/slaves

hostname
hostname-slave
View Code

6.配置slave

a.能够把全部配置拷贝到slave 的同路径;

b.路径能够不一样,可是须要本身手动启动datanode,若是jps查看发现启动不成功能够查日志定位问题,若是有端口冲突能够在hdfs-site.xml 中修改。

sbin/hadoop-daemon.sh --config etc/hadoop --script hdfs start datanode

3、启动集群

1.开启集群机器间免密登陆

ssh-keygen -t rsa
ssh-copy-id -i /home/*/.ssh/id_rsa.hadoop.pub work@hostname-slave

2.格式化hdfs文件系统(若是已经格式化就不须要了)

bin/hdfs namenode -format

3.启动dfs

sbin/start-dfs.sh

4.查看启动状态(NameNode、SecondaryNameNode、DataNode)

$ jps
47249 NameNode
53393 Jps
49952 SecondaryNameNode
48514 DataNode
bin/hdfs dfsadmin -report
bin/hdfs dfs -ls /

5.启动yarn

sbin/start-yarn.sh

6.查看yarn 状态(NodeManager、ResourceManager)

$ jps
45656 NodeManager
47249 NameNode
45375 ResourceManager
52477 Jps
49952 SecondaryNameNode
48514 DataNode

网页UI: http://hostname-master:8088/cluster

7.查看集群状态

$ bin/hdfs dfsadmin -report
Configured Capacity: 15501430390784 (14.10 TB)
Present Capacity: 13610185841062 (12.38 TB)
DFS Remaining: 13606793785344 (12.38 TB)
DFS Used: 3392055718 (3.16 GB)
DFS Used%: 0.02%
Under replicated blocks: 569
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 10.156.88.35:50010 (slave)
Hostname: hostname
Decommission Status : Normal
Configured Capacity: 3875357597696 (3.52 TB)
DFS Used: 1600417310 (1.49 GB)
Non DFS Used: 1882846601698 (1.71 TB)
DFS Remaining: 1990893801472 (1.81 TB)
DFS Used%: 0.04%
DFS Remaining%: 51.37%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Jan 29 11:30:15 CST 2019
Last Block Report: Tue Jan 29 11:29:33 CST 2019


Name: 10.182.48.147:50010 (master)
Hostname: hostname
Decommission Status : Normal
Configured Capacity: 11626072793088 (10.57 TB)
DFS Used: 1791639552 (1.67 GB)
Non DFS Used: 8330874880 (7.76 GB)
DFS Remaining: 11615899947008 (10.56 TB)
DFS Used%: 0.02%
DFS Remaining%: 99.91%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Jan 29 11:30:13 CST 2019
Last Block Report: Tue Jan 29 11:29:37 CST 2019
View Code