在数据库SQL处理中,经常有行转列(Pivot)和列转行(Unpivot)的数据处理需求。本文以示例说明在Data Lake Analytics中,如何使用SQL的一些技巧,达到行转列(Pivot)和列转行(Unpivot)的目的。另外,DLA支持函数式表达式的处理逻辑、丰富的JSON数据处理函数和UNNEST的SQL语法,结合这些功能,可以实现很是丰富、强大的SQL数据处理语义和能力,本文也以JSON数据列展开为示例,说明在DLA中使用这种SQL的技巧。数据库
test_pivot表内容:json
+------+----------+---------+--------+ | id | username | subject | source | +------+----------+---------+--------+ | 1 | 张三 | 语文 | 60 | | 2 | 李四 | 数学 | 70 | | 3 | 王五 | 英语 | 80 | | 4 | 王五 | 数学 | 75 | | 5 | 王五 | 语文 | 57 | | 6 | 李四 | 语文 | 80 | | 7 | 张三 | 英语 | 100 | +------+----------+---------+--------+
SQL语句:数据结构
SELECT username, max(CASE WHEN subject = '语文' THEN source END) AS `语文`, max(CASE WHEN subject = '数学' THEN source END) AS `数学`, max(CASE WHEN subject = '英语' THEN source END) AS `英语` FROM test_pivot GROUP BY username ORDER BY username;
结果:app
+----------+--------+--------+--------+ | username | 语文 | 数学 | 英语 | +----------+--------+--------+--------+ | 张三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+
该方法思路上分为两个步骤:
第一步,经过map_agg函数把两个列的多行的值,映射为map;
第二步,经过map的输出,达到多列输出的目的。函数
第一步SQL:spa
SELECT username, map_agg(subject, source) kv FROM test_pivot GROUP BY username ORDER BY username;
第一步输出:code
+----------+-----------------------------------+ | username | kv | +----------+-----------------------------------+ | 张三 | {语文=60, 英语=100} | | 李四 | {数学=70, 语文=80} | | 王五 | {数学=75, 语文=57, 英语=80} | +----------+-----------------------------------+
能够看到map_agg的输出效果。orm
最终,该方法的SQL:element
SELECT username, if(element_at(kv, '语文') = null, null, kv['语文']) AS `语文`, if(element_at(kv, '数学') = null, null, kv['数学']) AS `数学`, if(element_at(kv, '英语') = null, null, kv['英语']) AS `英语` FROM ( SELECT username, map_agg(subject, source) kv FROM test_pivot GROUP BY username ) t ORDER BY username;
结果:字符串
+----------+--------+--------+--------+ | username | 语文 | 数学 | 英语 | +----------+--------+--------+--------+ | 张三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+
test_unpivot表内容:
+----------+--------+--------+--------+ | username | 语文 | 数学 | 英语 | +----------+--------+--------+--------+ | 张三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+
SQL语句:
SELECT username, subject, source FROM ( SELECT username, '语文' AS subject, `语文` AS source FROM test_unpivot WHERE `语文` is not null UNION SELECT username, '数学' AS subject, `数学` AS source FROM test_unpivot WHERE `数学` is not null UNION SELECT username, '英语' AS subject, `英语` AS source FROM test_unpivot WHERE `英语` is not null ) ORDER BY username;
结果:
+----------+---------+--------+ | username | subject | source | +----------+---------+--------+ | 张三 | 语文 | 60 | | 张三 | 英语 | 100 | | 李四 | 语文 | 80 | | 李四 | 数学 | 70 | | 王五 | 英语 | 80 | | 王五 | 语文 | 57 | | 王五 | 数学 | 75 | +----------+---------+--------+
SQL语句:
SELECT t1.username, t2.subject, t2.source FROM test_unpivot t1 CROSS JOIN UNNEST ( array['语文', '数学', '英语'], array[`语文`, `数学`, `英语`] ) t2 (subject, source) WHERE t2.source is not null
结果:
+----------+---------+--------+ | username | subject | source | +----------+---------+--------+ | 张三 | 语文 | 60 | | 张三 | 英语 | 100 | | 李四 | 语文 | 80 | | 李四 | 数学 | 70 | | 王五 | 语文 | 57 | | 王五 | 数学 | 75 | | 王五 | 英语 | 80 | +----------+---------+--------+
JSON数据的表达能力很是灵活,所以在数据库和SQL中,经常须要处理JSON数据,经常碰到稍复杂的需求,就是将JSON数据中的某些属性字段,进行展开转换,转成行、列的关系型表达。
下面以多个示例说明。
SQL示例:
SELECT t.m, t.n FROM ( SELECT MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) as map_data ) CROSS JOIN unnest(map_data) AS t(m, n);
结果:
+------+------+ | m | n | +------+------+ | foo | 1 | | bar | 2 | +------+------+
SQL示例:
SELECT json_extract(t.a, '$.a') AS a, json_extract(t.a, '$.b') AS b FROM ( SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS package_array ) CROSS JOIN UNNEST(package_array) AS t(a);
结果:
+------+------+ | a | b | +------+------+ | 1 | 2 | | 3 | 4 | +------+------+
SQL示例:
SELECT t.m AS _col1, t.n AS _col2 FROM ( SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS array_1, cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') AS array<JSON>) AS array_2 ) CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);
结果:
+---------------+-----------------+ | _col1 | _col2 | +---------------+-----------------+ | {"a":1,"b":2} | {"a":5,"b":6} | | {"a":3,"b":4} | {"a":7,"b":8} | | NULL | {"a":9,"b":10} | | NULL | {"a":11,"b":12} | +---------------+-----------------+
SQL示例:
SELECT json_extract(t.m, '$.a') AS _col1, json_extract(t.m, '$.b') AS _col2, json_extract(t.n, '$.a') AS _col3, json_extract(t.n, '$.b') AS _col4 FROM ( SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS array_1, cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') AS array<JSON>) AS array_2 ) CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);
结果:
+-------+-------+-------+-------+ | _col1 | _col2 | _col3 | _col4 | +-------+-------+-------+-------+ | 1 | 2 | 5 | 6 | | 3 | 4 | 7 | 8 | | NULL | NULL | 9 | 10 | | NULL | NULL | 11 | 12 | +-------+-------+-------+-------+
SQL示例:
SELECT count(*) AS cnt, package_name FROM ( SELECT t.a AS package_name FROM ( SELECT transform(packages_map_array, x -> Element_at(x, 'packageName')) AS package_array FROM ( SELECT cast(Json_extract(data_json, '$.packages') AS array<map<VARCHAR, VARCHAR>>) AS packages_map_array FROM ( SELECT json_parse(data) AS data_json FROM ( SELECT '{ "packages": [ { "appName": "铁路12306", "packageName": "com.MobileTicket", "versionName": "4.1.9", "versionCode": "194" }, { "appName": "QQ飞车", "packageName": "com.tencent.tmgp.speedmobile", "versionName": "1.11.0.13274", "versionCode": "1110013274" }, { "appName": "掌阅", "packageName": "com.chaozh.iReaderFree", "versionName": "7.11.0", "versionCode": "71101" } ] }' AS data ) ) ) ) AS x (package_array) CROSS JOIN UNNEST(package_array) AS t (a) ) GROUP BY package_name ORDER BY cnt DESC;
结果:
+------+------------------------------+ | cnt | package_name | +------+------------------------------+ | 1 | com.MobileTicket | | 1 | com.tencent.tmgp.speedmobile | | 1 | com.chaozh.iReaderFree | +------+------------------------------+
原文连接 本文为云栖社区原创内容,未经容许不得转载。