Eclipse链接Hadoop集群及WordCount实践

时间 2019-12-12

标签 eclipse 链接 hadoop 集群 wordcount 实践栏目 Eclipse 繁體版

原文原文链接

声明：做者原创，转载注明出处。
做者：帅气陈吃苹果html

1、环境准备

一、JDK安装与配置

二、Eclipse下载

下载解压便可，下载地址：https://pan.baidu.com/s/1i51UsVNjava

三、Hadoop下载与配置

下载解压便可，下载地址：https://pan.baidu.com/s/1i57ZXqt
配置环境变量：
在系统变量中新建变量：HADOOP_HOME，值：E:Hadoophadoop-2.6.5
在Path系统变量中添加Hadoop的/bin路径，值：E:Hadoophadoop-2.6.5binnode

四、正常的集群状态

确保集群处于启动状态，而且windows本地机器与集群中的master能够互相ping通，而且能够进行SSH链接；
在 C:WindowsSystem32driversetchosts文件中，追加Hadoop集群master节点的IP地址和主机名映射，以下：apache

192.168.29.188 vnetwindows

五、Eclipse-Hadoop插件下载

下载地址：https://pan.baidu.com/s/1o7791VGapp

下载后将插件放在Eclipse安装目录的plugins目录下，重启Eclipse便可。oop

六、Eclipse的Map/Reduce视图设置

1）重启Eclipse后，在左侧栏能够看到此视图：ui

打开Window--->Perspective--->Open Perspective--->Other...，选择Map/Reduce。若没有看到此选项，在确保插件放入plugins目录后已经重启的状况下，猜想多是Eclipse或插件的版本问题致使，需从新下载相匹配的版本。spa

<img width="300" src="https://i.imgur.com/Twag1wi.p...; />.net

2）打开Window--->Preferences--->Hadoop Map/Reduce，配置Hadoop的安装目录。

2、WordCount项目实战

一、Hadoop Location的建立与配置

在Eclipse底部栏中选择Map/Reduce Locations视图，右键选择New Hadoop Locations，以下图：

具体配置以下：

点击finish，若没有报错，则表示链接成功，在Eclipse左侧的DFS Locations中能够看到HDFS文件系统的目录结构和文件内容；

若遇到 An internal error occurred during: "Map/Reduce location status updater". java.lang.NullPointerExcept 的问题，则表示当前HDFS文件系统为空，只需在HDFS文件系统上建立文件，刷新DFS Locations后便可看到文件系统内容；

二、建立输入文件及目录

在master节点上建立输入文件，并上传到HDFS对应的输入目录中，以下：

vi input.txt                                                  //而后输入单词计数的文件内容，保存

hdfs dfs -put input.txt /user/root/input/             //将Linux本地文件系统的文件上传到HDFS上

input.txt

hello world 

hello hadoop

bye

bye hadoop

三、建立Map/Reduce项目

File--->New--->Project--->Map/Reduce Project，填入项目名称，还须要选择Hadoop Library的路径，这里选择“Use default Hadoop”便可，就是咱们以前在Eclipse中配置的Hadoop。

WordCount.java代码：

package com.wecon.sqchen;

import java.io.IOException;  
import java.util.StringTokenizer;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.LongWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;  
  
public class WordCount {  
  
    public static class WordCountMap extends  
            Mapper<LongWritable, Text, Text, IntWritable> {  
  
        private final IntWritable one = new IntWritable(1);  
        private Text word = new Text();  
  
        public void map(LongWritable key, Text value, Context context)  
                throws IOException, InterruptedException {  
            String line = value.toString();  
            StringTokenizer token = new StringTokenizer(line);  
            while (token.hasMoreTokens()) {  
                word.set(token.nextToken());  
                context.write(word, one);  
            }  
        }  
    }  
  
    public static class WordCountReduce extends  
            Reducer<Text, IntWritable, Text, IntWritable> {  
  
        public void reduce(Text key, Iterable<IntWritable> values,  
                Context context) throws IOException, InterruptedException {  
            int sum = 0;  
            for (IntWritable val : values) {  
                sum += val.get();  
            }  
            context.write(key, new IntWritable(sum));  
        }  
    }  
  
    public static void main(String[] args) throws Exception {
        System.setProperty("hadoop.home.dir","E:/Hadoop/hadoop-2.6.5" );
        Configuration conf = new Configuration();  
        Job job = new Job(conf);  
        job.setJarByClass(WordCount.class);  
        job.setJobName("wordcount");  
  
        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(IntWritable.class);  
  
        job.setMapperClass(WordCountMap.class);  
        job.setReducerClass(WordCountReduce.class);  
  
        job.setInputFormatClass(TextInputFormat.class);  
        job.setOutputFormatClass(TextOutputFormat.class);  
  
        FileInputFormat.addInputPath(job, new Path(args[0]));  
        FileOutputFormat.setOutputPath(job, new Path(args[1]));  
  
        job.waitForCompletion(true);  
    }  
}

右键打开Run AS ---> Run Configurations，配置Arguments，即程序中指定的文件输入目录和输出目录，以下：

配置好后，Run AS---> Java Application，若无报错，则表示程序执行成功，在Eclipse左侧的
DFS Locations刷新后，能够看到输出目录和输出文件，以下：

四、解决遇到的问题

1）java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

解决方式：

在main方法中、job提交以前，指定本地Hadoop的安装路径，即添加下列代码：
System.setProperty("hadoop.home.dir","E:/Hadoop/hadoop-2.6.5" );

2）`(null) entry in command string: null chmod 0700 E:tmphadoop-Administratormapredstaging
Administr`

解决方式：

参考连接：https://ask.hellobi.com/blog/...
连接中所需文件下载地址：https://pan.baidu.com/s/1i4Z4aVV

3）org.apache.hadoop.security.AccessControlException: Permission denied: user=Administrator, access=WRITE, inode="/user/root":root:supergroup:drwxr-xr-x

解决方式：

这是本地用户执行Application时，HDFS上的用户权限问题；
参考连接：http://blog.csdn.net/Camu7s/a...
采用第三种方法，在master节点机器上执行下列命令：

adduser Administrator

groupadd supergroup

usermod -a -G supergroup Administrator

4）org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://vnet:9000/user/root/output already exists

解决方式：

这是由于该项目的输出目录在HDFS中已经存在，而输出目录是在程序运行过程当中建立的，不容许提早存在，因此只需删除HDFS上的对应output目录便可。

5）

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.
MutableMetricsFactory).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

解决方式：

在项目的src目录下，New--->Other--->General--->File，建立文件“log4j.properties”，文件内容以下：

log4j.rootLogger=WARN, stdout

log4j.appender.stdout=org.apache.log4j.ConsoleAppender

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n

五、参考连接：

http://blog.csdn.net/bd_ai_io...

http://blog.csdn.net/songchun...

http://blog.chinaunix.net/uid...

http://blog.csdn.net/jediael_...