MapReduce中的一些自定义-------总结

时间 2020-01-24

标签 mapreduce 一些自定义总结栏目 Hadoop 繁體版

原文原文链接

1.计数器：能够让开发人员以全局的视角来审查程序运行情况和各个指标。数组

得到计数器：Conter myConter = config.getConter("组的名字"，"计数器名");
网络

为计数器设置初值：myConter.setValue(初始值);
ide

增长：myConter.increment();
spa

2.Combiners（规约）排序

每个map会产生大量的输出，combiner的做用就是在map端对输出作一次合并，以减小到reduce的数据量，网络传输少。
开发

只能在本地map中进行合并，并不能跨map执行，因此还须要reduce
rem

combiner是选配的，由于对于某些逻辑，使用前与使用后的计算结果不一致。
get

job.setCombinerClass(MyReduce.class);
it

3.Partitioner（分组）io

1.mapreduce的默认partitioner是HashPartitioner

2.自定义

class KpiPartitioner extends Partitioner<Text, KpiWritable>{

@Override

public int getPartition(Text key, LongWritable value, int numPartitions) {

return (key.toString().length()==11)?0:1;

}

而后在main方法中加入

job.setPartitionerClass(KpiPartitioner.class);

job.setNumberReduceTasks(2);

4.排序和分组

1.在map和reduce阶段进行排序时，比较的是k2,v2是不参与排序比较的，若是想让v2参与排序，须要把k2和v2组装成新的类，做为k2，才能比较。

2.分组也是按照k2进行的。

class NewGroup implements RawComparator<NewKey>{

/**

* 比较字节数组中指定的字节序列的大小

* b1：第一个参与比较的数组

* b2：第二个参与比较的数组

* s1：第一个参与比较的字节数组的开始位置

* s2：第二个

* l1：比较长度

@Override

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

return WritableComparator.compareBytes(b1, s1, 8, b2, s2, 8);

}

@Override

public int compare(NewKey o1, NewKey o2) {

// TODO Auto-generated method stub

return 0;

}

而后在main中

job.setGroupingComparatorClass(NewGroup.class).