昨天开发找到咱们DBA,要咱们写一条Hive SQL。html
需求:mysql
有一个t表,主要有机场名称airport,机场的经纬度distance这两个列组成,想获得全部距离小于100的两个机场名。sql
其实写这个SQL的逻辑并非很困难,难点是如何去重复值,apache
我用MySQL模拟的一个表,其实Hive语法和SQL差很少,插入了三条数据,a, b, c 分别表明三个机场名称,结构以下:bash
mysql> show create table t\G *************************** 1. row *************************** Table: t Create Table: CREATE TABLE `t` ( `airport` varchar(10) DEFAULT NULL, `distant` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> select * from t; +---------+---------+ | airport | distant | +---------+---------+ | a | 130 | | b | 140 | | c | 150 | +---------+---------+ 3 rows in set (0.00 sec)
经过!=筛选掉本机场本身之间的比较,用abs函数取绝对值获得位置小于100的两个机场
ide
mysql> select t1.airport, t2.airport from t t1,t t2 where t1.airport != t2.airport and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | b | a | | c | a | | a | b | | c | b | | a | c | | b | c | +---------+---------+ 6 rows in set (0.00 sec)
可是问题来了,(b,a) 与(a,b),(c,a)与(a,c),(c,b)与(b,c)这里被咱们视为重复值,咱们只须要获得其中某一行的数据,就知道是哪两个机场名了,那么,如何去掉这个重复值呢?函数
貌似distinct,group by都派不上用场了,最后咨询了一位资深的SQL高手,找到了这么一个函数hex(),能够把一个字符转化成十六进制,Hive也有对应的函数,效果以下:优化
mysql> select t1.airport,hex(t1.airport), t2.airport,hex(t2.airport) from t t1,t t2 where t1.airport != t2.airport and abs(t1.distant-t2.distant) < 100; +---------+-----------------+---------+-----------------+ | airport | hex(t1.airport) | airport | hex(t2.airport) | +---------+-----------------+---------+-----------------+ | b | 62 | a | 61 | | c | 63 | a | 61 | | a | 61 | b | 62 | | c | 63 | b | 62 | | a | 61 | c | 63 | | b | 62 | c | 63 | +---------+-----------------+---------+-----------------+ 6 rows in set (0.00 sec)
这样咱们就能够经过比较机场1和机场2的大小,来去掉重复值了ui
mysql> select t1.airport, t2.airport from t t1,t t2 where t1.airport != t2.airport and hex(t1.airport) < hex(t2.airport) and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | a | b | | a | c | | b | c | +---------+---------+ 3 rows in set (0.00 sec)
最后再优化一下,结果以下:spa
mysql> select t1.airport, t2.airport from t t1,t t2 where hex(t1.airport) < hex(t2.airport) and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | a | b | | a | c | | b | c | +---------+---------+ 3 rows in set (0.00 sec)
SQL并不复杂,没有太多表的join和子查询,可是以前遇到去掉重复值用distinct或者group by就能够解决了,此次貌似不太适用,因此记录一下,欢迎拍砖。
参考连接
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_hex