除了运行在mesos或yarn集群管理器中,spark也提供了简单的standalone部署模式。你能够经过手动启动master和worker节点来建立集群,或者用官网提供的启动脚本。这些守护进程也能够只在一台机器上以便测试使用。node
安装Spark Standalone集群,你只须要在每一个节点上部署编译好的Spark便可。你能够在官网上获得已经预编译好的,也能够根据本身的须要进行编译。web
2.手动启动集群shell
你能够启动Standalone模式的master服务,经过执行以下命令:apache
./sbin/start-master.sh安全
一旦启动,master节点将打印出Spark://HOST:PORT URL,你能够用这个URL来链接worker节点或者把它赋值给“master”参数传递给SparkContext。你也能够在master的WEB UI找到这个URL,默认的是http://localhost:8080,最好是http://master所在的ip地址:8080,这样和master在同一个局域网内的机器均可以访问。网络
一样地,你能够启动一个或多个worker节点并把它注册到master节点上,执行以下命令:app
./sbin/start-slave.sh <master-spark-URL>less
一旦你启动了worker节点,经过master的WEB UI,你能够看到注册到它上面的worker的信息,好比CPU核数、内存等。dom
最后,下面的配置选项能够传递给master和worker节点。ssh
Argument |
Meaning |
-h HOST, --host HOST |
Hostname to listen on |
-i HOST, --ip HOST |
Hostname to listen on (deprecated, use -h or --host) |
-p PORT, --port PORT |
Port for service to listen on (default: 7077 for master, random for worker) |
--webui-port PORT |
Port for web UI (default: 8080 for master, 8081 for worker) |
-c CORES, --cores CORES |
Total CPU cores to allow Spark applications to use on the machine (default: all available); only on worker |
-m MEM, --memory MEM |
Total amount of memory to allow Spark applications to use on the machine, in a format like 1000M or 2G (default: your machine's total RAM minus 1 GB); only on worker |
-d DIR, --work-dir DIR |
Directory to use for scratch space and job output logs (default: SPARK_HOME/work); only on worker |
--properties-file FILE |
Path to a custom Spark properties file to load (default: conf/spark-defaults.conf) |
3.集群建立脚本
若是用脚本启动集群的话,你应该在你的Spark_HOME下建立一个conf/slaves,这个slaves文件必须包含worker的主机名,每行一个。若是conf/slaves不存在的话,建立脚本默认值启动本机单个节点,这对于测试颇有用。注意,master经过ssh来和worker进行通讯。
一旦你设置了这个文件,你能够经过下面的Shell脚原本启动或中止集群,相似于Hadoop的部署脚本,这些脚本在SPARK_HOME/sbin下找到。
注意这些脚本必须在你想要运行Spark master节点上,而不是你本地机器
你能够在conf/spark-env.sh中选择性地配置下面的选项,这个文件集群中的每台机器都必须有。
Environment Variable |
Meaning |
SPARK_MASTER_IP |
Bind the master to a specific IP address, for example a public one. |
SPARK_MASTER_PORT |
Start the master on a different port (default: 7077). |
SPARK_MASTER_WEBUI_PORT |
Port for the master web UI (default: 8080). |
SPARK_MASTER_OPTS |
Configuration properties that apply only to the master in the form "-Dx=y" (default: none). See below for a list of possible options. |
SPARK_LOCAL_DIRS |
Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. |
SPARK_WORKER_CORES |
Total number of cores to allow Spark applications to use on the machine (default: all available cores). |
SPARK_WORKER_MEMORY |
Total amount of memory to allow Spark applications to use on the machine, e.g. 1000m, 2g (default: total memory minus 1 GB); note that each application's individual memory is configured using its spark.executor.memory property. |
SPARK_WORKER_PORT |
Start the Spark worker on a specific port (default: random). |
SPARK_WORKER_WEBUI_PORT |
Port for the worker web UI (default: 8081). |
SPARK_WORKER_INSTANCES |
Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set SPARK_WORKER_CORES explicitly to limit the cores per worker, or else each worker will try to use all the cores. |
SPARK_WORKER_DIR |
Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work). |
SPARK_WORKER_OPTS |
Configuration properties that apply only to the worker in the form "-Dx=y" (default: none). See below for a list of possible options. |
SPARK_DAEMON_MEMORY |
Memory to allocate to the Spark master and worker daemons themselves (default: 1g). |
SPARK_DAEMON_JAVA_OPTS |
JVM options for the Spark master and worker daemons themselves in the form "-Dx=y" (default: none). |
SPARK_PUBLIC_DNS |
The public DNS name of the Spark master and workers (default: none). |
SPARK_MASTER_OPTS能够配置下面的系统属性:
Property Name |
Default |
Meaning |
spark.deploy.retainedApplications |
200 |
The maximum number of completed applications to display. Older applications will be dropped from the UI to maintain this limit. |
spark.deploy.retainedDrivers |
200 |
The maximum number of completed drivers to display. Older drivers will be dropped from the UI to maintain this limit. |
spark.deploy.spreadOut |
true |
Whether the standalone cluster manager should spread applications out across nodes or try to consolidate them onto as few nodes as possible. Spreading out is usually better for data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. |
spark.deploy.defaultCores |
(infinite) |
Default number of cores to give to applications in Spark's standalone mode if they don't set spark.cores.max. If not set, applications always get all available cores unless they configure spark.cores.max themselves. Set this lower on a shared cluster to prevent users from grabbing the whole cluster by default. |
spark.worker.timeout |
60 |
Number of seconds after which the standalone deploy master considers a worker lost if it receives no heartbeats. |
SPARK_WORKER_OPTS能够配置下面的系统属性:
Property Name |
Default |
Meaning |
spark.worker.cleanup.enabled |
false |
Enable periodic cleanup of worker / application directories. Note that this only affects standalone mode, as YARN works differently. Only the directories of stopped applications are cleaned up. |
spark.worker.cleanup.interval |
1800 (30 minutes) |
Controls the interval, in seconds, at which the worker cleans up old application work dirs on the local machine. |
spark.worker.cleanup.appDataTtl |
7 * 24 * 3600 (7 days) |
The number of seconds to retain application work directories on each worker. This is a Time To Live and should depend on the amount of available disk space you have. Application logs and jars are downloaded to each application work dir. Over time, the work dirs can quickly fill up disk space, especially if you run jobs very frequently. |
4.提交应用到集群
在Spark集群中运行一个Spark应用程序,须要把master节点的Spark://IP:PORT URL传递给SparkContext 的构造函数中。
在交互式Shell中Spark应用程序,需运行下面的命令:
./bin/spark-shell --master spark://IP:PORT
你也能够传递选项--total-executor-cores <numCores>来控制Spark Shell使用的机器的核数。
5.建立Spark应用
spark-submit脚本提供了提供应用到集群最直接的方式。对于Standalone模式而言,Spark目前支持两种部署模式。在Client模式中,Driver程序在提交命令的机器上。在Cluster模式中,Driver从集群中的worker节点中任取一个运行驱动程序。
若是你的应用经过Spark submit提交,这个应用jar自动分发到集群中的全部worker节点上。对于你的应用依赖的额外的jars,你应该经过--jars 参数来指定,多个之间用逗号分隔(若是:--jars jar1,jar2)
另外,standalone cluster模式也自动重启你的应用程序。为了使用这个特性,你能够在spark-submit启动你的应用程序时传递--supervise参数。
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>
6.资源调度及分配
Standalone cluster模式目前仅支持应用调度的FIFO模式。为了运行多个用户,你能够控制每一个应用使用的最大资源。默认,它会使用集群中全部机器的核数,这只对于集群中只有一个应用有效。你能够经过 spark.cores.max 参数来控制核数,以下所示:
val conf = new SparkConf()
.setMaster(...)
.setAppName(...)
.set("spark.cores.max", "10")val sc = new SparkContext(conf)
另外,你能够在集群的master中配置 spark.deploy.defaultCores参数来改变默认值。以下所示:
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=<value>"
7.监控与日志
Spark Standalone模式提供了一个web接口来监控集群。master和每一个worker有他们本身的WEB UI。默认你能够经过8080端口访问master的WEB UI。这个端口能够在配置文件中修改或在命令行中选项修改。
另外,每一个job的详细日志默认写入到每一个slave节点的工做目录(默认SPARK_HOME/work)。在目录下,对于每一个job,你会看到两个文件分别是stdout和stderr。
8.与Hadoop共存
你能够基于你现有的Hadoop集群运行Spark,只须要在一样的机器上启动单独的服务便可。在Spark中访问Hadoop中的数据,只须要使用hdfs:// URL (典型hdfs://<namenode>:9000/path)路径便可。另外,你能够为Spark建立一个独立的集群,经过网络仍然能够访问HDFS,这可能比本次磁盘慢。
9.配置网络安全端口
Spark大量使用网络,一些环境有严格的防火墙要求。想要了解配置的端口,请看安全模块。
10.高可用性
默认,standalone集群调度对于worker节点的失效是有弹性的。然而,集群调度器经过master作决策,默认只有单个节点。若是master宕机了,将不会再建立新的应用。为了不单点故障,咱们提供两种高可用性模式,详情以下。
10.1基于Zookeeper的Master
使用Zookeeper来提供leader选举和一些转态存储,你能够在基于Zookeeper的集群中启动多个master。一旦一个master被选中为“leader”,其余的将处于standby转态。若是当前的leader宕机了,Zookeeper将会从新选举出另一个master,从前一个master的转态中继续任务调度。整个的恢复过程耗时在1-2分钟。注意,这种延迟仅仅影响调用新的应用程序而不影响正在运行的应用。
配置
为了支持这种恢复模式,你能够在spark-env.sh中设置SPARK_DAEMON_JAVA_OPTS配置以下选项:
System property |
Meaning |
spark.deploy.recoveryMode |
Set to ZOOKEEPER to enable standby Master recovery mode (default: NONE). |
spark.deploy.zookeeper.url |
The ZooKeeper cluster url (e.g., 192.168.1.100:2181,192.168.1.101:2181). |
spark.deploy.zookeeper.dir |
The directory in ZooKeeper to store recovery state (default: /spark). |
详情
若是你集群中已经安装好了Zookeeper,容许HA是很简单的。只须要在不一样的节点上启动读个master进程便可,master能够随时增删。
为了调度新的应用或集群中添加worker,他们须要知道当期啊leader 的ip地址。这仅须要传递一个list便可。例如,你经过spark://host1:port1,host2:port2来启动应用程序时,若是host1宕机了,集群仍让正常,由于集群已经从新找到了一个新的leader,即host2
10.2本地系统的单节点恢复
Zookeeper是最好的HA方式,但若是你想要master若是宕了重启的话,文件系统模式支持。当应用程序和worker注册到master后,他们有足够的转态写入到了特定目录中,这些转态能够在master进程重启时恢复。
配置
为了支持这种恢复模式,你能够在spark-env.sh中设置SPARK_DAEMON_JAVA_OPTS配置以下选项:
System property |
Meaning |
spark.deploy.recoveryMode |
Set to FILESYSTEM to enable single-node recovery mode (default: NONE). |
spark.deploy.recoveryDirectory |
The directory in which Spark will store recovery state, accessible from the Master's perspective. |
详情
尽管这种方式官网不推荐,你能够挂载一个NFS目录做为一个恢复目录,若是原来的master宕了,你能够在一个新的节点上启动一个master,它能正确地恢复以前注册的应用程序和workers。