地址spark.apache.orgshell
复制一台单独的虚拟机,名capache
修改其ip,192.168.56.200网络
修改其hostname为c,hostnamectl set-hostname c分布式
修改/etc/hosts加入对本机的解析ide
重启网络服务 systemctl restart networkoop
上传spark安装文件到root目录spa
解压spark到/usr/local下,将其名字修改成sparkrest
cd /usr/local/sparkorm
./bin/spark-submit --class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.11-2.1.0.jar 10000进程
建立root下的文本文件hello.txt
./bin/spark-shell
再次链接一个terminal,用jps观察进程,会看到spark-submit进程
sc
sc.textFile("/root/hello.txt")
val lineRDD = sc.textFile("/root/hello.txt")
lineRDD.foreach(println)
观察网页端状况
val wordRDD = lineRDD.flatMap(line => line.split(" "))
wordRDD.collect
val wordCountRDD = wordRDD.map(word => (word,1))
wordCountRDD.collect
val resultRDD = wordCountRDD.reduceByKey((x,y)=>x+y)
resultRDD.collect
val orderedRDD = resultRDD.sortByKey(false)
orderedRDD.collect
orderedRDD.saveAsTextFile("/root/result")
观察结果
简便写法:sc.textFile("/root/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortByKey().collect
start-dfs.sh
spark-shell执行:sc.textFile("hdfs://192.168.56.100:9000/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortByKey().collect (能够把ip换成master,修改/etc/hosts)
sc.textFile("hdfs://192.168.56.100:9000/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortByKey().saveAsTextFile("hdfs://192.168.56.100:9000/output1")
在master和全部slave上解压spark
修改master上conf/slaves文件,加入slave
修改conf/spark-env.sh,export SPARK_MASTER_HOST=master
复制spark-env.sh到每一台slave
cd /usr/local/spark
./sbin/start-all.sh
在c上执行:./bin/spark-shell --master spark://192.168.56.100:7077 (也能够使用配置文件)
观察http://master:8080