详细来源:05-Hadoop本地运行模式配置java
在Windows开发环境中实现Hadoop的本地运行模式,详细步骤以下: linux
一、在本地安装好jdk、hadoop2.4.1,并配置好环境变量:JAVA_HOME、HADOOP_HOME、Path路径(配置好环境变量后最好重启电脑)apache
二、用hadoop-common-2.2.0-bin-master的bin目录替换本地hadoop2.4.1的bin目录,由于hadoop2.0版本中没有hadoop.dll和winutils.exe这两个文件。 windows
若是缺乏hadoop.dll和winutils.exe话,程序将会抛出下面异常:app
java.io.IOException: Could not locate executable D:\hadoop-2.4.1\bin\winutils.exe in the Hadoop binaries.eclipse
java.lang.Exception: java.lang.NullPointerException分布式
因此用hadoop-common-2.2.0-bin-master的bin目录替换本地hadoop2.4.1的bin目录是必要的一个步骤。 ide
注意:若是只是将hadoop-common-2.2.0-bin-master的bin目录中的hadoop.dll和winutils.exe这两个文件添加到hadoop2.4.1的bin目录中,也是可行的,但最好用用hadoop-common-2.2.0-bin-master的bin目录替换本地hadoop2.4.1的bin目录。 oop
上面这两个步骤完成以后咱们就能够跑程序了,从而实现Hadoop的本地运行模式: spa
首先输入输出路径都选择windows的文件系统:
代码以下:
代码以下:
package MapReduce;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
public class WordCount
{
public static String path1 = "file:///C:\\word.txt";//读取本地windows文件系统中的数据
public static String path2 = "file:///D:\\dir";
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(conf);
if(fileSystem.exists(new Path(path2)))
{
fileSystem.delete(new Path(path2), true);
}
Job job = Job.getInstance(conf);
job.setJarByClass(WordCount.class);
FileInputFormat.setInputPaths(job, new Path(path1));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setNumReduceTasks(1);
job.setPartitionerClass(HashPartitioner.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(path2));
job.waitForCompletion(true);
}
public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>
{
protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException
{
String[] splited = v1.toString().split("\t");
for (String string : splited)
{
context.write(new Text(string),new LongWritable(1L));
}
}
}
public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>
{
protected void reduce(Text k2, Iterable<LongWritable> v2s,Context context)throws IOException, InterruptedException
{
long sum = 0L;
for (LongWritable v2 : v2s)
{
sum += v2.get();
}
context.write(k2,new LongWritable(sum));
}
}
}
在dos下查看运行中的java进程:
其中28568为windows中启动的eclipse进程。
接下来咱们查看运行结果:
part-r-00000中的内容以下:
hello 2me 1you 1
接下来输入路径选择windows本地,输出路径换成HDFS文件系统,代码以下:
package MapReduce;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
public class WordCount
{
public static String path1 = "file:///C:\\word.txt";//读取windows文件系统中的数据
public static String path2 = "hdfs://hadoop20:9000/dir";//输出到hdfs中
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(conf);
if(fileSystem.exists(new Path(path2)))
{
fileSystem.delete(new Path(path2), true);
}
Job job = Job.getInstance(conf);
job.setJarByClass(WordCount.class);
FileInputFormat.setInputPaths(job, new Path(path1));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setNumReduceTasks(1);
job.setPartitionerClass(HashPartitioner.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(path2));
job.waitForCompletion(true);
}
public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>
{
protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException
{
String[] splited = v1.toString().split("\t");
for (String string : splited)
{
context.write(new Text(string),new LongWritable(1L));
}
}
}
public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>
{
protected void reduce(Text k2, Iterable<LongWritable> v2s,Context context)throws IOException, InterruptedException
{
long sum = 0L;
for (LongWritable v2 : v2s)
{
sum += v2.get();
}
context.write(k2,new LongWritable(sum));
}
}
}
程序抛出异常:
处理措施同上:
Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://hadoop20:9000/"); FileSystem fileSystem = FileSystem.get(conf);//获取HDFS中的FileSystem实例
查看运行结果:
[root@hadoop20 dir4]# hadoop fs -cat /dir/part-r-00000hello 2me 1you 1
好的,到这里hadoop的本地文件系统就讲述完了,注意一下几点:
一、file:\\ 表明本地文件系统,hdfs:// 表明hdfs分布式文件系统
二、linux下的hadoop本地运行模式很简单,可是windows下的hadoop本地运行模式须要配置相应文件。
三、MapReduce所用的文件放在哪里是没有关系的(能够放在Windows本地文件系统、能够放在Linux本地文件系统、也能够放在HDFS分布式文件系统中),最后是经过FileSystem这个实例来获取文件的。
若有问题,欢迎留言指正!
注意:若是用户用的是Hadoop1.0版本,而且是Windows环境下实现本地运行模式,则只需设置HADOOP_HOME与PATH路径,其他不用任何设置!
--Exception: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
错误
Windows的惟一方法用于检查当前进程的请求,在给定的路径的访问权限,因此咱们先给以能进行访问,咱们本身先修改源代码,return true 时容许访问。咱们下载对应hadoop源代码,hadoop-2.7.3-src.tar.gz解压,hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 复制到对应的Eclipse的project
即:把红色源码进行修改
修改成返回true
问题解决
处理方式:
第一步:下载hadoo2.7.3的hadoop.dll和winutils.exe.zip赋值覆盖hadoop本地bin下,同时拷贝到C:\Windows\System32下(覆盖)
第二步:项目下新建包名org.apache.hadoop.io.nativeio新建类NativeIO,接下来再次在Windows下运行eclipse中的Hadoop程序,Ok