本文讨论 Hive explode 关键字使用, 并使用一个简单案例来验证结果.java
Hive 支持 array 和 map 类型, 可是如何统计 array 或 map 里的值, 一直没有找到好的方法. Pig 有行转列关键字 flatten. 查阅了不少 Hive 资料, 找到了 explode 关键字. 谨以此例来验证 Hive explode 功能.python
hive> create table if not exists explode_array > ( > userId string, > userName string, > tags array<string> > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > COLLECTION ITEMS TERMINATED BY ','; OK Time taken: 0.942 seconds
数据样例:日志
00001 zhzhenqin 80,90 00002 hello java, 女 00003 world java,python,90
查询:code
hive> select * from explode_array limit 10; OK 00001 zhzhenqin ["80","90"] 00002 hello ["java"," 女"] 00003 world ["java","python","90"] Time taken: 0.041 seconds, Fetched: 3 row(s)
使用 explode 查询:string
hive> select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId; Total jobs = 1 ... 省略部分日志 Total MapReduce CPU Time Spent: 0 msec OK 00001 zhzhenqin 80 00001 zhzhenqin 90 00002 hello java 00002 hello 女 00003 world java 00003 world python 00003 world 90 Time taken: 20.104 seconds, Fetched: 7 row(s)
二次嵌套查询, 并统计:it
hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId) as user_tag group by user_tag.tagId order by count DESC; Total jobs = 2 ... 省略日志 Total MapReduce CPU Time Spent: 0 msec OK java 2 90 2 python 1 80 1 女 1 Time taken: 39.994 seconds, Fetched: 5 row(s)
该例子是使用 array 类型, 咱们的用户和标签在 Hive 存储的是 map 类型, Map 的 key 为 tagid, value 为 weight. explode 也是支持 map 类型的.table
explode 在转 array 时, 输出一列; 转 map 时, 是输出2列, key 和 value 当作 2列输出.select
下面演示 Map 类型的 explode 用法:map
建表语句, 以及导入的数据 create table if not exists explode_map ( userId string, userName string, tags map<string, int> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':'; 数据 00001 zhzhenqin 80:1,90:2 00002 hello java:10,女:2 00003 world java:1,python:3,90:1
查询验证:方法
hive> select * from explode_map; OK 00001 zhzhenqin {"80":1,"90":2} 00002 hello {"java":10,"女":2} 00003 world {"java":1,"python":3,"90":1} Time taken: 0.04 seconds, Fetched: 3 row(s)
使用 explode 关键字查询:
hive>select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight; 00001 zhzhenqin 80 1 00001 zhzhenqin 90 2 00002 hello java 10 00002 hello 女 2 00003 world java 1 00003 world python 3 00003 world 90 1
内嵌查询及统计:
hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight) as user_tag group by user_tag.tagId order by count DESC; java 2 90 2 女 1 python 1 80 1