【互动问答分享】第15期决胜云计算大数据时代Spark亚太研究院公益大讲堂

时间 2019-11-08

标签互动问答分享决胜计算数据时代 spark 亚太研究院公益讲堂栏目 Spark 繁體版

原文原文链接

“决胜云计算大数据时代”app

Spark亚太研究院100期公益大讲堂【第15期互动问答分享】ide

Q1：AppClient和worker、master之间的关系是什么？oop

:AppClient是在StandAlone模式下SparkContext.runJob的时候在Client机器上应用程序的表明，要完成程序的registerApplication等功能；大数据
当程序完成注册后Master会经过Akka发送消息给客户端来启动Driver；this
在Driver中管理Task和控制Worker上的Executor来协同工做；云计算

Q2：Spark的shuffle 和hadoop的shuffle的区别大么？spa

Spark的Shuffle是一种比较严格意义上的shuffle，在Spark中Shuffle是有RDD操做的依赖关系中的Lineage上父RDD中的每一个partition元素的内容交给多个子RDD； ip
在Hadoop中的Shuffle是一个相对模糊的概念，Mapper阶段介绍后把数据交给Reducer就会产生Shuffle，Reducer三阶段的第一个阶段便是Shuffle；hadoop

Q3：Spark 的HA怎么处理的？ rem

对于Master的HA，在Standalone模式下，Worker节点自动是HA的，对于Master的HA，通常采用Zookeeper；
Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected “leader” and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master’s state, and then resume scheduling. The entire recovery process (from the time the the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling new applications – applications that were already running during Master failover are unaffected；
对于Yarn和Mesos模式，ResourceManager通常也会采用ZooKeeper进行HA;