Hadoop学习01_Single Node Setup

时间 2019-11-06

标签 hadoop 学习 single node setup 栏目 Hadoop 繁體版

原文原文链接

目的

本文的目的主要是为了说明如何单点配置hadoop,从而能使用单个节点进行Hadoop MapReduce 和Hadoop Distributed File System (HDFS)运算。 html

先决条件

平台支持

GNU/Linux 做为开发和生产环境. Hadoop 已经在 GNU/Linux 上验证了 2000 个节点的集群.
Win32 也能够做为开发环境. 分布式操做不能再 Win32上进行很好的测试, 因此不能做为生产环境。

必要的软件

不管在Linux 仍是在 Windows都须要以下软件: node

Java^TM 1.6.x, 最好使用Sun的，必定要安装.
ssh 必定要安装而且 sshd 必定要处于运行状态，从而使Hadoop scripts能够管理远程Hadoop实例(Hadoop daemons).

另外 Windows 环境还须要安装以下软件: web

Cygwin - 为以上安装的软件提供shell脚本支持.

安装软件

若是你的集群没有安装必要的软件，请安装他们. shell

Ubuntu Linux 的一个例子: express

$ sudo apt-get install ssh
$ sudo apt-get install rsync

在 Windows上, 若是在你安装cygwin的时候你没有安装必要的软件, 开启cygwin安装软件选择以下文件夹:

openssh - the Net category

下载

从这里下载一个稳定版本 stable release . apache

准备开始配置Hadoop

解压下载的Hadoop distribution文件. 编辑 conf/hadoop-env.sh 定义 JAVA_HOME 到你的安装目录. ssh

尝试使用以下命令: 分布式

$ bin/hadoop

将展现出对于使用 hadoop script 有用的文档信息.

如今你能够开始如下三种中的一种你的开启你的 Hadoop cluster : oop

单机模式(Local (Standalone) Mode)
伪分布式模式（Pseudo-Distributed Mode）
全分布式模式（Fully-Distributed Mode）

单机模式操做

在默认状况下会以非分布式模式(non-distributed mode)做为一个Java进程运行.这样作的好处是有利于调试..

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory. 学习

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*

伪分布式模式操做

Hadoop也能够运行一个单点，每一个Hadoop 实例(daemon) 以一个独立的Java进程运行，从而使Hadoop以伪分布式模式运行。

配置以下

使用以下配置:

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

设置无密码 ssh

检查您能够经过 ssh登陆 localhost 不适用密码:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

运行

格式化一个分布式文件系统:

$ bin/hadoop namenode -format

运行hadoop daemons:

$ bin/start-all.sh

hadoop daemon 把日志输出在 ${HADOOP_LOG_DIR} 指定的目录下 (默认在 ${HADOOP_HOME}/logs).

浏览NameNode和JobTracker的web接口，默认状况下在：

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/

拷贝文件到分布式系统:

$ bin/hadoop fs -put conf input

运行提供的例子:

$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

检查输出的文件:

拷贝输出的文件到本地文件系统，而且检查他们:

$ bin/hadoop fs -get output output
$ cat output/*

或者，你也能够这么作，

浏览分布式系统上输出的文件:

$ bin/hadoop fs -cat output/*

当你完成工做以后,使用以下命令中止daemons:

$ bin/stop-all.sh

以上是本人我的为了学习hadoop，对官方的文档的翻译，若有差错，请你们指正！谢谢。

官网该篇的地址是：http://hadoop.apache.org/docs/stable/single_node_setup.html

1. spark implementation hadoop setup,cleanup
2. node - timer学习
3. 学习-Pytest（三）setup/teardown
4. hadoop入门学习
5. Hadoop YARN Node Label
6. node path 学习
7. node学习
8. node -- hapi 学习
9. node 学习（一）
10. Node学习
更多相关文章...
• 您已经学习了 XML Schema，下一步学习什么呢？ - XML Schema 教程
• 我们已经学习了 SQL，下一步学习什么呢？ - SQL 教程
• Tomcat学习笔记（史上最全tomcat学习笔记）
• Kotlin学习（一）基本语法