怎么在本地安装Spark,这里就再也不描述了。java
List-1docker
#在SPARK_HOME/conf下 >cp slaves.template slaves #以后在slaves里面写入hostname,以下 >more slaves mjduan-host >cp spark-env.sh.template spark-env.sh #修改spark-env.sh,写入以下内容,SPARK_MASTER_IP写mjduan-host >more spark-env.sh export JAVA_HOME=/opt/software/tool/jdk1.8 export HADOOP_HOME=/opt/software/docker/hadoop/hadoop-2.7.7 export SCALA_HOME=/opt/software/tool/scala2.12 export HADOOP_CONF_DIR=/opt/software/docker/hadoop/hadoop-2.7.7/etc/hadoop export SPARK_MASTER_IP=mjduan-host export SPARK_WORKER_MEMORY=2048M
以后进入$SPARK_HOME/sbin下,执行start-all.sh,能够看下是否启动成功,以后去看localhost:8080,能够看到spark的界面。bash
jps命令能够看到有个Master和Worker。分布式
注意,启动Spark是能够启动的,可是若是往Spark上提交程序,Spark默认会从hdfs读取数据,而不是本地。因此要安装好HADOOP,安装Hadoop的时候,带上了hdfs、yarn都安装好了。oop
给Spark提交任务,master后面的URL,若是不知道,能够从日志中看到或者Spark UI界面中看到。spa
List-2.net
spark-submit --class com.mjduan.project.SimpleApp --master spark://mjduan-host:7077 Spark-helloworld.jar
安装Hadoop-2.7,Hive-2.3,参考: https://blog.csdn.net/u013332124/article/details/85223496 ,左边安装Hadoop的教程里面,没有设置yarn,要配置yarn,否则在hive命令行中进入insert时会报错,配置yarn参考: https://blog.csdn.net/linbo_18874208784/article/details/74178236,在安装Hive时,报用户没法访问hive,要修改hive的配置文件,参考: https://stackoverflow.com/questions/40603714/hive-is-not-allowed-to-impersonate-hive命令行
Spark伪分布式的安装,参考: https://blog.csdn.net/zhihaoma/article/details/52296645scala