YARN与MapReduce1的对比

Apache YARN (Yet Another Resource Negotiator)从Hadoop2开始。YARN为集群提供资源管理和Applications的调度。YARN的API用于操做集群的资源。node

 

MapReduce1:分布式

JobTracker的职责:ide

(1)Job调度(将Tasks与TaskTrackers匹配)oop

(2)Task进程监控(keeping track of tasks, restarting failed orslow tasks, and doing task bookkeeping, such as maintaining counter totals)scala

(3)存储已经完成的job的历史信息rest

TaskTracker的职责:blog

运行tasks,向JobTracker发送进展报告进程

Scalability:内存

MapReduce 1 hits scalabilitybottlenecks in the region of 4,000 nodes and 40,000 tasks资源

Yarn is designed to scale up to 10,000 nodes and 100,000 tasks

Availability:

 High availability (HA) is usually achieved by replicating the state needed for anotherdaemon to take over the work needed to provide the service, in the event of the service daemon failing.

JobTracker的内存复杂而且不断变化(each task status is updated every few seconds),很难支持HA。而YARN的RM、NM、AM都支持HA。

Utilization:

MapReduce1中,每一个TaskTracker在配置阶段被分配固定大小的slot,分别为map slot (只能运行map task)和 reduce slot(只能运行reduce task),所以MRv1可能存在只有map slot可用而reduce slot不可用,形成reduce tasks必须等待的状况。此外,slot太大会浪费资源,slot过小可能致使失败。

YARN中每一个NodeManager掌管一个资源池,资源是细粒度的,aoo请求所需的资源便可。

Multitenancy:

YARN最大的优点是从Hadoop中抽离出来,可以支持除了MapReduce以外的其余分布式Application,好比Spark的ClusterManager能够使YARN

相关文章
相关标签/搜索