hadoop jobhistory记录下已运行完的MapReduce做业信息并存放在指定的HDFS目录下,默认状况下是没有启动的,须要配置完后手工启动服务。html
mapred-site.xml添加以下配置node
<property> <name>mapreduce.jobhistory.address</name> <value>hadoop000:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop000:19888</value> <description>MapReduce JobHistory Server Web UI host:port</description> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/history/done</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/history/done_intermediate</value></property>
启动history-server:web
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
中止history-server:浏览器
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver
history-server启动以后,能够经过浏览器访问WEBUI: hadoop000:19888app
在hdfs上会生成两个目录webapp
hadoop fs -ls /history
drwxrwx--- - spark supergroup 0 2014-10-11 15:11 /history/done drwxrwxrwt - spark supergroup 0 2014-10-11 15:16 /history/done_intermediate
mapreduce.jobhistory.done-dir(/history/done): Directory where history files are managed by the MR JobHistory Server(已完成做业信息)
mapreduce.jobhistory.intermediate-done-dir(/history/done_intermediate): Directory where history files are written by MapReduce jobs.(正在运行做业信息)oop
测试:测试
经过hive查询city表观察hdfs文件目录和hadoop000:19888spa
hive> select id, name from city;
观察hdfs文件目录:code
1)历史做业记录是按照年/月/日的形式分别存放在相应的目录(/history/done/2014/10/11/000000);
2)每一个做业有2个不一样的后缀名的记录:jhist和xml
hadoop fs -ls /history/done/2014/10/11/000000
-rwxrwx--- 1 spark supergroup 22572 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002-1413012208648-spark-select+id%2C+name+from+city%28Stage%2D1%29-1413012224777-1-0-SUCCEEDED-root.spark-1413012216261.jhist -rwxrwx--- 1 spark supergroup 160149 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002_conf.xml
观察WEBUI: hadoop000:19888
在WEBUI中展示了每一个job使用的Map/Reduce的数量、做业提交时间、做业启动时间、做业完成时间、Job ID、提交人User、队列等信息;
点击【job_1413011730351_0002】弹出页面显示相似信息:Aggregation is not enabled. Try the nodemanager at ......
解决方法: yarn-site.xml添加以下配置
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
重启yarn便可。
参考CDH文档:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0/hadoop-project-dist/hadoop-common/ClusterSetup.html