Spark源码之任务调度

时间 2019-11-06

标签 spark 源码任务调度栏目 Spark 繁體版

原文原文链接

主要部分包括：任务调度（Schedule），Shuffle机制，Executor，Task，BlockManager，DAG，ScheduleBackEnd，TasksetManager，数组

1. Job运行流程

SparkContext的runJob提交—》 DagSchedule –》 dag.runJob --> 服务器

dag.handleJobSummitted --> 建立finalResultStage，而后submitStage（里面循环提交父stage）--》 dag.submitMissingTasks --> 判断是shuffleMapTask仍是ResultTask，这是Spark的两种task类型 –》生成taskSet --> taskScheduler.submitTasks提交执行，其中taskSchedule是taskScheduleImpl –》app

taskScheduleImpl.submitTasks的流程：ide

createTaskSetManager –》 SchedulerBackend.reviveOffers() –》经过CoarseGrainedSchedulerBackend（集群状态）oop

{post

override def reviveOffers() {ui

driverEndpoint.send(ReviveOffers)this

}spa

}线程

ReviveOffsers --> makeOffers() --> LauchTasks --> exectorEndpoint.send(LaunchTask)

经过driverEntPoint发送ReviveOffers到Exector执行。

具体执行：CoarseGrainedExecutorBackend

{

case LaunchTask(data) =>

if (executor == null) {

exitExecutor(1, "Received LaunchTask command but executor was null")

} else {

val taskDesc = TaskDescription.decode(data.value)

logInfo("Got assigned task " + taskDesc.taskId)

executor.launchTask(this, taskDesc)

}

1. ScheduleBackEnd

典型的集群环境下的ScheduleBackEnd实现之一：CoarseGrainedScheduleBackEnd

首先看看CoarseGrained集群服务器之间的消息类型有哪些：

大致上能够分红两大类：Executor相关的消息和Task相关的消息。Executor相关的消息包括Executor的注册、删除、状态更新等。Task的消息包括LaunchTask，KillTask，状态更新等。集群间Task的调度和执行主要是经过ScheduleBackEnd来维护的。

除了CoarseGrainedScheduleBackEnd还有LocalScheduleBackEnd和StandaloneScheduleBackEnd等种类。

1. 1. Executor端

启动CoarseGrainedExecutorBackend，和主节点的ScheduleBackEnd通讯。首先建立Driver端和主站联系，获取主站conf信息。而后建立SparkEnv。最后启动CoarseGrainedExecutorBackend消息处理主线程，接收ScheduleBackEnd的建立Task，关闭Exector等消息。

若是收到从ScheduleBackEnd来的注册成功消息（也就是RegisteredExecutor），则建立Executor，执行Task操做。CoarseGrainedExecutorBackend只是负责和ScheduleBackEnd之间的通讯，并非具体执行Task的类。

1. DagSchedule

Dag调度类，对一个RDD进行shuffle分析，分解成多个Stage，从最后一个Stage逆向执行。Stage分红ResultStage和ShuffleMapStage两类。每一个Stage根据分区分解成多个任务，用一个taskSetManager来管理。

DagSchedule用EventLoopProcess处理交互消息。有的消息时调用TaskScheduleImpl的方法；有的消息执行DagSchdule本身的私有方法。

1. 1. DAGSchedulerEvent消息类型

名称	说明
JobSubmitted	建立finalResultStage，最后执行submitStage。必定是最后一个stage，也就是ResultStage来触发job的提交，并建立ActiveJob对应它。 stage和他的父stage的jobId是同一个值。
MapStageSubmitted	处理ShuffleMapStage，和jobSubmitted是对应的。 clearCacheLocs()
StageCancelled	对该Stage的每一个job执行handleJobCancellation方法。 handleJobCancellation方法对job的每一个stage，执行： { taskScheduler.cancelTasks(stageId, shouldInterruptThread) markStageAsFinished(stage, Some(failureReason)) } 从running Stages中删除，并通知listenerBus
JobCancelled	执行failJobAndIndependentStages 清除runningStage，调用TaskSchedule对应的消息处理，通知listenerBus等。
JobGroupCancelled	批量处理JobCancelled
AllJobsCancelled	批量处理JobCancelled
BeginEvent	很简单： listenerBus.post(SparkListenerTaskStart(task.stageId, stageAttemptId, taskInfo))
GettingResultEvent	很简单： listenerBus.post(SparkListenerTaskGettingResult(taskInfo))
CompletionEvent	task执行完成事件，根据完成的状态和结果来决定是否要从新提交，是否触发整个stage结束等状态更迭。最后将事件发送给listenerBus。这段代码比较长。
ExecutorAdded	Executor事件，新的Executor启动了。从failedEpoch中删除该Executor
ExecutorLost	Executor事件，Executor关闭了。删除该Executor的blockManager信息，更新ShuffleStage的输出outputMapper信息，清除CacheLocs
TaskSetFailed	对依赖Stage和Job执行failJobAndIndependentStages
ResubmitFailedStages	从新submitStage，针对已经失败的stage。
executorHeartbeatReceived	通知blockmanager发送心跳，通知listenerBus

1. TaskScheduleImp

做为任务调度系统的重要类，（DagSchedule、TaskScheduleImpl、TaskSetManager）。

主要方法：

名称	说明
start	启动backend 启动SpeculatableTasks
executorLost	删除executor 通知DagSchedule
submitTasks	建立taskSetManager，并添加到ScheduleBuilder，等待下一步调度； backend的reviveOffers方法进行调度。
cancelTasks	向backend发送KillTask消息
stop	中止backend，中止taskResultGetter
executorHeartbeatReceived	更新matrics，通知dagschedule
killTaskAttempt	backend发送KillTask消息。（backend向对应的executor发送KillTask消息）
applicationId	生成新的applicaionId，每一个application对应一个TaskScheduleImpl。

其余方法（不是TaskSchedule接口中的方法）：

名称

说明

resourceOffers

对每个taskSet，执行resourceOfferSingleTaskSet方法，直到不能找到知足条件的task为止：

搜索待执行的task。

backend会调用该方法获取待运行的task，根据本地化task优先级，获取指定本地化级别的task，最后生成待执行的task数组。

最后一步：提交执行task数组。

1. CoarseGrainedSchedulerBackend

SchedulerBackend接口的一个实现。调度后台，负责集群间调度消息的传递。CoarseGrainedSchedulerBackend有一个DriverEndpoint，经过DriverEndPoint的receive方法接收消息，执行实际的消息处理。

主要方法：

名称	说明
start	建立并启动DriverEndPoint
stop	中止DriverEndPoint
makeOffers	（1）调用TaskScheduleImpl的resourcesOffer方法，从全部Executor中寻找能够分配的task数组。（2）执行launchTask方法，向Executor发送LauchTask消息。

1. CoarseGrainedExecutorBackend

Executor端，接收ScheduleEndpoint的消息，主要是LaunchTask消息。经过Executor执行。Executor启动时建立本地SparkEnv。

初始化参数：

driverUrl	driver端的链接地址
executorId	executor的编号，惟一
cores	Executor的cpu数量

核心示例代码：

override def receive: PartialFunction[Any, Unit] = {

case RegisteredExecutor =>

logInfo("Successfully registered with driver")

try {

executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)

} catch {

case NonFatal(e) =>

exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)

}

case RegisterExecutorFailed(message) =>

exitExecutor(1, "Slave registration failed: " + message)

case LaunchTask(data) =>

if (executor == null) {

exitExecutor(1, "Received LaunchTask command but executor was null")

} else {

val taskDesc = TaskDescription.decode(data.value)

logInfo("Got assigned task " + taskDesc.taskId)

executor.launchTask(this, taskDesc)

}

case KillTask(taskId, _, interruptThread, reason) =>

if (executor == null) {

exitExecutor(1, "Received KillTask command but executor was null")

} else {

executor.killTask(taskId, interruptThread, reason)

}

case StopExecutor =>

stopping.set(true)

logInfo("Driver commanded a shutdown")

// Cannot shutdown here because an ack may need to be sent back to the caller. So send

// a message to self to actually do the shutdown.

self.send(Shutdown)

case Shutdown =>

stopping.set(true)

new Thread("CoarseGrainedExecutorBackend-stop-executor") {

override def run(): Unit = {

// executor.stop() will call `SparkEnv.stop()` which waits until RpcEnv stops totally.

// However, if `executor.stop()` runs in some thread of RpcEnv, RpcEnv won't be able to

// stop until `executor.stop()` returns, which becomes a dead-lock (See SPARK-14180).

// Therefore, we put this line in a new thread.

executor.stop()

}

}.start()

}