OS: windows 10
JDK: jdk1.8.0_121
Scala: scala-2.11.11
IDE: IntelliJ IDEA ULTIMATE 2017.2.1html
OS: CentOS_6.5_x64
JDK: jdk1.8.111
Hadoop: hadoop-2.6.5
Spark: spark-1.6.3-bin-hadoop2.6
Scala: scala-2.11.11java
配置环境变量apache
JAVA_HOME CLASSPATH Path
文件位置vim
C:\Windows\System32\drivers\etc
windows
新增以下内容(和集群的hosts文件内容同样,根据本身集群的实际状况修改)api
192.168.1.10 master 192.168.1.11 slave1 192.168.1.12 slave2
Maven
File -> Project Structure -> Libraries添加spark-assembly-1.6.3-hadoop2.6.0.jar
(位置在服务器端spark/lib/
下)服务器
在src\main\java
目录下新建java
类ConnectionUtil
intellij-idea
import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; public class ConnectionUtil { public static final String master = "spark://master:7077"; public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName("demo").setMaster(master); JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf); System.out.println(javaSparkContext); javaSparkContext.stop(); } }
若是出现上图结果则证实,运行正确。maven
hdfs
上。$ vim wordcount.txt hello Tom hello Jack hello Ning # 上传文件 $ hadoop fs -put wordcount.txt /user/hadoop/ # 查看文件是否上传成功 $ hadoop fs -ls /user/hadoop/
spark
安装包中的example
,指定了jar包和输入文件的路径)import scala.Tuple2; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import java.util.Arrays; import java.util.List; import java.util.regex.Pattern; public final class JavaWordCount { private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) throws Exception { // if (args.length < 1) { // System.err.println("Usage: JavaWordCount <file>"); // System.exit(1); // } SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount") .setMaster("spark://master:7077") .set("spark.executor.memory", "512M"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); ctx.addJar("D:\\workspace\\spark\\JavaWordCount.jar"); String path = "hdfs://master:9000/user/hadoop/wordcount.txt"; JavaRDD<String> lines = ctx.textFile(path); JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String s) { return Arrays.asList(SPACE.split(s)); } }); JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); } }); JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer i1, Integer i2) { return i1 + i2; } }); List<Tuple2<String, Integer>> output = counts.collect(); for (Tuple2<?,?> tuple : output) { System.out.println(tuple._1() + ": " + tuple._2()); } ctx.stop(); } }
在File -> Project Structure ->Artifacts点击绿色“+”,Add-->JAR-->From Modules with Dependencieside
输入main class入口函数名,将Output Layout下全部jar包删掉(由于spark运行环境已经包含了这些包),若是已经存在META-INF
要先将这个文件夹删除。而后Apply,OK
编译程序:Build-->Build Artifacts...,而后选择要编译的项目进行编译
在当前工程生成的out目录下就能够找到输出的jar包,放到程序中指定的位置(就是addJar()
方法中所设置的路径)
java: 没法访问scala.Cloneable找不到scala.Cloneable的类文件
缘由:原来使用的是spark-2.1.0-bin-hadoop2.4
没有spark-assembly-1.6.3-hadoop2.6.0.jar
依赖包所致。
解决:由于原来是用的hadoop版本为2.5.2
相应的依赖包官网已经再也不支持,因此更新的平台的hadoop环境为2.6.5
,spark 2.X相应的文档不多,更改版本为1.6.3
。
Create: 2017-08-12 10:33:55 星期六 Update1: 2017-08-14 20:10:47 星期一