接上一篇文章: https://my.oschina.net/zhzhenqin/blog/781670app
Tez On Yarn 安装成功后,是为了给 Hive 或者 Pig 提供执行引擎。oop
Hive 默认支持 MapReduce,Tez,Spark(在 SparkSQL 中支持) 等执行引擎。所以给 Hive 换上 Tez 很是简单,只需给 hive-site.xml 中设置:测试
<property> <name>hive.execution.engine</name> <value>tez</value> </property>
设置hive.execution.engine为 tez 后进入到 Hive 执行 SQL:.net
hive> select count(*) as c from userinfo; Query ID = zhenqin_20161104150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1478229439699_0007) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 2 2 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.19 s -------------------------------------------------------------------------------- OK 1000000 Time taken: 6.611 seconds, Fetched: 1 row(s)
能够看到,个人 userinfo 中有 100W 条记录,执行一遍 count 须要 6.19s。 如今把 engine 换为 mr翻译
set hive.execution.engine=mr;
再次执行 count userinfo:日志
hive> select count(*) as c from userinfo; Query ID = zhenqin_20161104152022_c7e6c5bd-d456-4ec7-b895-c81a369aab27 Total jobs = 1 Launching Job 1 out of 1 Starting Job = job_1478229439699_0010, Tracking URL = http://localhost:8088/proxy/application_1478229439699_0010/ Kill Command = /Users/zhenqin/software/hadoop/bin/hadoop job -kill job_1478229439699_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2016-11-04 15:20:28,323 Stage-1 map = 0%, reduce = 0% 2016-11-04 15:20:34,587 Stage-1 map = 100%, reduce = 0% 2016-11-04 15:20:40,796 Stage-1 map = 100%, reduce = 100% Ended Job = job_1478229439699_0010 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 215 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 1000000 Time taken: 19.46 seconds, Fetched: 1 row(s) hive>
能够看到,使用 Tez 效率比 MapReduce 有近3倍的提高。并且,Hive 在使用 Tez 引擎执行时,有 ==>> 动态的进度指示。而在使用 mr 时,只有日志输出 map and reduce 的进度百分比。使用 tez,输出的日志也清爽不少。code
在我测试的不少复杂的 SQL,Tez 的都比 MapReduce 快不少,快慢取决于 SQL 的复杂度。执行简单的 select 等并不能体现 tez 的优点。Tez 内部翻译 SQL 能任意的 Map,Reduce,Reduce 组合,而 MR 只能 Map->Reduce->Map->Reduce,所以在执行复杂 SQL 时, Tez 的优点明显。orm
上文一篇文章提到的 Tez Timeline 在配置好后,任何的 Tez DAG Job 都会在 UI 上展现。xml