以前说的各类脚本:spark-submit,spark-class也好,仍是launcher工程也好,主要工做是准备各类环境、依赖包、JVM参数等运行环境。实际的提交主要仍是Spark Code中的deploy下的SparkSubmit类来负责的。apache
deploy目录下的SparkSubmit类,前面提到过,主要入口方法是runMain。app
咱们先看看其余方法吧。ui
一、prepareSubmitEnvironmentspa
这个方法准备提交的环境和参数。.net
先判断集群管理方式(cluster manager):yarn、meros、k8s,standalone。部署方式(deploy mode ): client仍是cluster。code
后面要根据这些信息设置不一样的Backend和Wapper类等。部署
提交模式这一段真很差讲,由于它包含了太多种类的部署环境了,个性化较强,要慢慢看了。get
cluster方式只看两种:yarn cluster和standalone cluster。把yarn和standalone两个搞懂了,其余的也就很好理解了。it
这个方法返回一个四元组:spark
@return a 4-tuple:
* (1) the arguments for the child process,
* (2) a list of classpath entries for the child,
* (3) a map of system properties, and
* (4) the main class for the child
核心代码
if (deployMode == CLIENT) { childMainClass = args.mainClass if (localPrimaryResource != null && isUserJar(localPrimaryResource)) { childClasspath += localPrimaryResource } if (localJars != null) { childClasspath ++= localJars.split(",") } } // Add the main application jar and any added jars to classpath in case YARN client // requires these jars. // This assumes both primaryResource and user jars are local jars, or already downloaded // to local by configuring "spark.yarn.dist.forceDownloadSchemes", otherwise it will not be // added to the classpath of YARN client. if (isYarnCluster) { if (isUserJar(args.primaryResource)) { childClasspath += args.primaryResource } if (args.jars != null) { childClasspath ++= args.jars.split(",") } } if (deployMode == CLIENT) { if (args.childArgs != null) { childArgs ++= args.childArgs } } if (args.isStandaloneCluster) { if (args.useRest) { childMainClass = REST_CLUSTER_SUBMIT_CLASS childArgs += (args.primaryResource, args.mainClass) } else { // In legacy standalone cluster mode, use Client as a wrapper around the user class childMainClass = STANDALONE_CLUSTER_SUBMIT_CLASS if (args.supervise) { childArgs += "--supervise" } Option(args.driverMemory).foreach { m => childArgs += ("--memory", m) } Option(args.driverCores).foreach { c => childArgs += ("--cores", c) } childArgs += "launch" childArgs += (args.master, args.primaryResource, args.mainClass) } if (args.childArgs != null) { childArgs ++= args.childArgs } } // In yarn-cluster mode, use yarn.Client as a wrapper around the user class if (isYarnCluster) { childMainClass = YARN_CLUSTER_SUBMIT_CLASS if (args.isPython) { childArgs += ("--primary-py-file", args.primaryResource) childArgs += ("--class", "org.apache.spark.deploy.PythonRunner") } else if (args.isR) { val mainFile = new Path(args.primaryResource).getName childArgs += ("--primary-r-file", mainFile) childArgs += ("--class", "org.apache.spark.deploy.RRunner") } else { if (args.primaryResource != SparkLauncher.NO_RESOURCE) { childArgs += ("--jar", args.primaryResource) } childArgs += ("--class", args.mainClass) } if (args.childArgs != null) { args.childArgs.foreach { arg => childArgs += ("--arg", arg) } } }
上面这段代码很是核心,很是重要。它定义了不一样的集群模式不一样的部署方式下,应用使用什么类来包装咱们的spark程序,好适应不一样的集群环境下的提交流程。
咱们就多花点时间来分析一下这段代码。
先看看ChildMainClass:
standaloneCluster下:REST_CLUSTER_SUBMIT_CLASS=classOf[RestSubmissionClientApp].getName()
yarnCluster下:YARN_CLUSTER_SUBMIT_CLASS=org.apache.spark.deploy.yarn.YarnClusterApplication
standalone client模式下:STANDALONE_CLUSTER_SUBMIT_CLASS = classOf[ClientApp].getName()
二、runMain
上一步得到四元组以后,就是runMain的流程了。
核心代码先上:
private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args) val loader = getSubmitClassLoader(sparkConf) for (jar <- childClasspath) { addJarToClasspath(jar, loader) } var mainClass: Class[_] = null try { mainClass = Utils.classForName(childMainClass) } catch { } val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) { mainClass.getConstructor().newInstance().asInstanceOf[SparkApplication] } else { new JavaMainApplication(mainClass) } try { app.start(childArgs.toArray, sparkConf) } catch { case t: Throwable => throw findCause(t) } }
搞清了prepareSubmitEnvironment的流程,runMain也就很简单了,它就是启动ChildMainClass(是SparkApplication的子类),而后执行start方法。
若是不是cluster模式而是client模式,那么ChildMainClass就是args.mainClass。这点须要注意下,这时候ChildMainClass就会用JavaMainApplication来包装了:
new JavaMainApplication(mainClass);
后面的内容就是看看RestSubmissionClientApp和org.apache.spark.deploy.yarn.YarnClusterApplication的实现逻辑了。