hadoop的shuffle过程

时间 2021-01-12

原文原文链接

hadoop的shuffle分为map端的shuffle和reduce端的shuffle 1、map端的shuffle 如上图所示： maptask先调用InputFormat中的getRecordReader方法，获取RecordReader对象读取文件。读取进内存中，经过map方法中的context.write()写出，由OutPutCollector收集到数据，并存放到环形缓冲区中