Java大数据开发(三)Hadoop(15)-MapReduce2

导读：上一节咱们认识了MapReduce，知道了它的概念和结构，下面跟着我继续探寻它的奥秘吧。

Java类型和hadoop序列化类型对比
java

Java类型spring	Hadoop Writable类型apache
boolean编程	BooleanWritableswift
byte微信	ByteWritableapp
intide	IntWritableoop
float大数据	FloatWritable
long	LongWritable
double	DoubleWritable
String	Text
map	MapWritable
array	ArrayWritable

重点注意：Java里的String，在hadoop里是Text类型。

MapReduce编程规范

用户编写的程序分红三个部分：Mapper，Reducer，Driver(提交运行mr程序的客户端)

一、Mapper阶段

（1）用户自定义的Mapper要继承本身的父类

（2）Mapper的输入数据是KV对的形式（KV的类型可自定义）

（3）Mapper中的业务逻辑写在map()方法中

（4）Mapper的输出数据是KV对的形式（KV的类型可自定义）

（5）map()方法（maptask进程）对每个<K,V>调用一次

二、Reducer阶段

（1）用户自定义的Reducer要继承本身的父类

（2）Reducer的输入数据类型对应Mapper的输出数据类型，也是KV

（3）Reducer的业务逻辑写在reduce()方法中

（4）Reducetask进程对每一组相同k的<k,v>组调用一次reduce()方法

三、Driver阶段

整个程序须要一个Drvier来进行提交，提交的是一个描述了各类必要信息的job对象

WordCount案例实操

1．需求

在给定的文本文件中统计输出每个单词出现的总次数

2．需求分析

按照MapReduce编程规范，分别编写Mapper，Reducer，Driver

3．编码

（1）建立项目，添加如下依赖，如已添加，忽略此步。

<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>RELEASE</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.8.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.2</version> </dependency></dependencies>

（2）添加log4j配置文件

log4j.rootLogger=INFO, stdoutlog4j.appender.stdout=org.apache.log4j.ConsoleAppenderlog4j.appender.stdout.layout=org.apache.log4j.PatternLayoutlog4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%nlog4j.appender.logfile=org.apache.log4j.FileAppenderlog4j.appender.logfile.File=target/spring.loglog4j.appender.logfile.layout=org.apache.log4j.PatternLayoutlog4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

（3）编写mapper类

public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
 Text k = new Text(); IntWritable v = new IntWritable(1);
 @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // 1 获取一行 String line = value.toString(); // 2 切割 String[] words = line.split(" "); // 3 输出 for (String word : words) { k.set(word); context.write(k, v); } }}

（4）编写reducer类

public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
 int sum; IntWritable v = new IntWritable();
 @Override protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
 // 1 累加求和 sum = 0; for (IntWritable count : values) { sum += count.get(); } // 2 输出 v.set(sum); context.write(key,v); }}

（5）编写Driver驱动类

public class WordcountDriver {
 public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
 // 1 获取配置信息以及封装任务 Configuration configuration = new Configuration(); Job job = Job.getInstance(configuration);
 // 2 设置jar加载路径 job.setJarByClass(WordcountDriver.class);
 // 3 设置map和reduce类 job.setMapperClass(WordcountMapper.class); job.setReducerClass(WordcountReducer.class);
 // 4 设置map输出 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);
 // 5 设置最终输出kv类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class);
 // 6 设置输入和输出路径 FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
 // 7 提交 boolean result = job.waitForCompletion(true);
 System.exit(result ? 0 : 1); }}

（6）在d盘下建立input目录，该目录下再建立inputword目录，建立txt文档，里边编写单词

（7）在idea里运行，指定运行的输入目录和输出目录

（8）运行程序，查看输出目录

 2cs 1heima 3itcast 1ss 1

关注「跟我一块儿学大数据」

跟我一块儿学大数据

本文分享自微信公众号 - 跟我一块儿学大数据（java_big_data）。
若有侵权，请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一块儿分享。