... String cacheFilePath = "/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000"; DistributedCache.addCacheFile(new Path(cacheFilePath).toUri(), job.getConfiguration()); ...
... // 从当前做业中获取要缓存的文件 Path[] paths = DistributedCache.getLocalCacheFiles(context.getConfiguration()); for (Path path : paths) { if (path.toString().contains("cmc_unitparameter")) { ...
MR1 Path: hdfs://host:fs_port/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000 MR1 Path: hdfs://host:fs_port/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000 MR2 Path: /data4/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000006/part-m-00000 MR2 Path: /data17/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000002/part-m-00000 MR2 Path: /data23/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000005/part-m-00000看了上面两种差别我想你能明白为啥分布式缓存在 MR2 下面“失效了”。。。
解决这个问题不难: html
其实在 MR1 时代咱们上面的代码是不够规范的,每次都遍历了整个分布式缓存,咱们应该用到一个小技巧:createSymlink java
... String cacheFilePath = "/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000"; Path inPath = new Path(cacheFilePath); // # 号以后的名称是对上面文件的连接,不一样文件的连接名不能相同,虽然由你本身随便取 String inPathLink=inPath.toUri().toString()+"#"+"DIYFileName"; DistributedCache.addCacheFile(new URI(inPathLink), job.getConfiguration()); ...
加了软连接后,path 信息的最后部分就是你刚才的 DIYFileName: 缓存
/data4/yarn/local/usercache/root/appcache/application_1394073762364_1966/container_1394073762364_1966_01_000005/cmcs_paracontrolvalues /data4/yarn/local/usercache/root/appcache/application_1394073762364_1966/container_1394073762364_1966_01_000005/cmc_unitparameter
BufferedReader br = null; br = new BufferedReader(new InputStreamReader(new FileInputStream("DIYFileName")));
一、Hadoop 多表 join:map side join 范例 架构
http://my.oschina.net/leejun2005/blog/111963 app
二、Hadoop DistributedCache详解 分布式
http://dongxicheng.org/mapreduce-nextgen/hadoop-distributedcache-details/ ide
三、迭代式MapReduce解决方案(二) DistributedCache 函数
http://hongweiyi.com/2012/02/iterative-mapred-distcache/ oop
四、DistributedCache小记 spa