hadoop 1.2.1 Eclipse mapreduce hello word 学习笔记

时间 2020-08-17

标签 hadoop 1.2.1 eclipse mapreduce hello word 学习笔记栏目 Hadoop 繁體版

原文原文链接

在 hadoop 1.2.1成功配置了为分布式环境,通过了十一长假,该继续学习了,html

此次要在eclipse下运行一个hadoop 应用java

开发环境apache

操做系统:CentOS Linux release 6.0 (Final)vim

eclipse4.3app

java version "1.7.0_25"
eclipse

第一步运行 start-all.sh 能够参照上一篇文章,启动守护线程分布式

发现启动有问题,原来是ip地址冲突了而个人xml配置中设置的ip地址没有生效,没办法改一下ipide

DEVICE="eth0"
BOOTPROTO=static
IPADDR=192.168.2.88
此处改为没有被占用的ipoop

/etc/rc.d/init.d/network restart 使修改生效
学习

生效后修改vim core-site.xml
vim mapred-site.xml 设置的ip (若是设置成 localhost 就不用改了)

配置eclipse插件

获取插件

参考:http://f.dataguru.cn/thread-187770-1-1.html 能够本身生成也能够直接下载使用

安装完从新打开eclipse后

在showview里面能够考到选项若是

选择让其显示在控制台旁边

右键新建一个

如图

master 处填写 mapred-site.xml ip和端口 dfs master 处填写 core-site.xml ip和端口

设置hadoop的安装路径如图

设置完后能够看到资源目录下如图

咱们能够在这里经过右键对dfs文件进行操做 (增删上传下载)

建立helloword工程

File -> New -> Project 选择“Map/Reduce Project”，而后输入项目名称，建立项目。插件会自动把hadoop根目录和lib目录下的全部jar包导入

如图

第一个例子准备运行文档中的实例

打开http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

点击如图

按照例子创建package 和 class 将代码复制

package org.myorg; 
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
       public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
         private final static IntWritable one = new IntWritable(1);
         private Text word = new Text();
                          
         public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
             word.set(tokenizer.nextToken());
             output.collect(word, one);
           }
         }
       }
                          
       public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
         public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
           int sum = 0;
           while (values.hasNext()) {
             sum += values.next().get();
           }
           output.collect(key, new IntWritable(sum));
         }
       }   
       public static void main(String[] args) throws Exception {
         JobConf conf = new JobConf(WordCount.class);
         conf.setJobName("wordcount");
                          
         conf.setOutputKeyClass(Text.class);
         conf.setOutputValueClass(IntWritable.class);
                          
         conf.setMapperClass(Map.class);
         conf.setCombinerClass(Reduce.class);
         conf.setReducerClass(Reduce.class);
                          
         conf.setInputFormat(TextInputFormat.class);
         conf.setOutputFormat(TextOutputFormat.class);
                          
         FileInputFormat.setInputPaths(conf, new Path(args[0]));
         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
                          
         JobClient.runJob(conf);
       }
    }

直接运行会报错报错了 (须要两个参数) 参考文档

能够根据根据DFS 中的目录进行设置也能够直接写绝对目录如图

点击运行成功

经过

hadoop dfs -cat /home/hadoop-1.2.1/output/part-00000 能够查看输出也能够在eclipse中dfs目录进行查看