构建 docker 镜像
docker build –rm -t sequenceiq/spark:1.4.0 .
-t 选项是你要构建的sequenceiq/spark image的tag,就比如ubuntu:13.10同样 –rm 选项是告诉Docker在构建完成后删除临时的Container,Dockerfile的每一行指令都会建立一个临时的Container,通常你是不须要这些临时生成的Container的node
运行镜像docker
docker run -it -p 8088:8088 -p 8042:8042 -h sandbox sequenceiq/spark:1.4.0 bash
or
docker run -d -h sandbox sequenceiq/spark:1.4.0 -d
apache
若是使用-p或者-P,那么容器会开放部分端口到主机,只要对方能够链接到主机,就能够链接到容器内部。当使用-P时,Docker会在主机中随机从49153 和65535之间查找一个未被占用的端口绑定到容器。你可使用docker port来查找这个随机绑定端口。ubuntu
若是在docker run后面追加-d=true或者-d,那么容器将会运行在后台模式。此时全部I/O数据只能经过网络资源或者共享卷组来进行交互。由于容器再也不监听你执行docker run的这个终端命令行窗口。但你能够经过执行docker attach来从新附着到该容器的回话中。须要注意的是,容器运行在后台模式下,是不能使用–rm选项的。ruby
-p 8088:8088 这个端口是resourcemanager 或者 集群 ,-p 8042:8042 这个端口是 nodemanager端口 bash
版本
Hadoop 2.6.0 and Apache Spark v1.4.0 on Centos markdown
测试
There are two deploy modes that can be used to launch Spark applications on YARN.网络
In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.
Estimating Pi (yarn-cluster mode):
# execute the the following command which should write the "Pi is roughly 3.1418" into the logs
# note you must specify --files argument in cluster mode to enable metrics
spark-submit \
--class org.apache.spark.examples.SparkPi \
--files $SPARK_HOME/conf/metrics.properties \
--master yarn-cluster \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
$SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar
# execute the the following command which should print the "Pi is roughly 3.1418" to the screen
spark-submit \
--class org.apache.spark.examples.SparkPi \ --master yarn-client \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 1 \ $SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar
版权声明:本文为博主原创文章,未经博主容许不得转载。app