Hive On Tez，Tez 和 MapReduce engine 性能对比

时间 2019-11-10

标签 hive tez mapreduce engine 性能对比栏目 Hadoop 繁體版

原文原文链接

接上一篇文章： https://my.oschina.net/zhzhenqin/blog/781670app

Tez On Yarn 安装成功后，是为了给 Hive 或者 Pig 提供执行引擎。oop

Hive 默认支持 MapReduce，Tez，Spark（在 SparkSQL 中支持）等执行引擎。所以给 Hive 换上 Tez 很是简单，只需给 hive-site.xml 中设置：测试

<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>

设置hive.execution.engine为 tez 后进入到 Hive 执行 SQL：.net

hive> select count(*) as c from userinfo;
Query ID = zhenqin_20161104150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1478229439699_0007)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 6.19 s     
--------------------------------------------------------------------------------
OK
1000000
Time taken: 6.611 seconds, Fetched: 1 row(s)

能够看到，个人 userinfo 中有 100W 条记录，执行一遍 count 须要 6.19s。如今把 engine 换为 mr翻译

set hive.execution.engine=mr;

再次执行 count userinfo:日志

hive> select count(*) as c from userinfo;
Query ID = zhenqin_20161104152022_c7e6c5bd-d456-4ec7-b895-c81a369aab27
Total jobs = 1
Launching Job 1 out of 1
Starting Job = job_1478229439699_0010, Tracking URL = http://localhost:8088/proxy/application_1478229439699_0010/
Kill Command = /Users/zhenqin/software/hadoop/bin/hadoop job  -kill job_1478229439699_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-11-04 15:20:28,323 Stage-1 map = 0%,  reduce = 0%
2016-11-04 15:20:34,587 Stage-1 map = 100%,  reduce = 0%
2016-11-04 15:20:40,796 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_1478229439699_0010
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   HDFS Read: 215 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1000000
Time taken: 19.46 seconds, Fetched: 1 row(s)
hive>

能够看到，使用 Tez 效率比 MapReduce 有近3倍的提高。并且，Hive 在使用 Tez 引擎执行时，有 ==>> 动态的进度指示。而在使用 mr 时，只有日志输出 map and reduce 的进度百分比。使用 tez，输出的日志也清爽不少。code

在我测试的不少复杂的 SQL，Tez 的都比 MapReduce 快不少，快慢取决于 SQL 的复杂度。执行简单的 select 等并不能体现 tez 的优点。Tez 内部翻译 SQL 能任意的 Map，Reduce，Reduce 组合，而 MR 只能 Map->Reduce->Map->Reduce，所以在执行复杂 SQL 时， Tez 的优点明显。orm

上文一篇文章提到的 Tez Timeline 在配置好后，任何的 Tez DAG Job 都会在 UI 上展现。xml