阿里的程序员也不过如此,竟被一个简单的 SQL 查询难住

时间 2020-05-09

原文原文链接

(请原谅我, 标题党一回, 花几分钟看看, 或许对你有帮助).最近工做上遇到一个”神奇”的问题, 或许对你们有帮助, 所以造成本文.html

背景

最近工做上遇到一个"神奇"的问题, 或许对你们有帮助, 所以造成本文.mysql

问题大概是, 我有两个表 TableA, TableB, 其中 TableA 表大概百万行级别(存量业务数据), TableB 表几行(新业务场景, 数据还未膨胀起来), 语义上 TableA.columnA = TableB.columnA, 其中 columnA 上创建了索引, 但查询的时候确巨慢无比, 基本上到 5-6 秒, 明显跟预期不符合.sql

下面我以一个具体的例子来讲明吧, 模拟其中的 SQL 查询场景.后端

场景重现

user_info 表, 为了场景尽可能简单, 我只 mock 了其中的三列数据.

mysql> desc user_info;
+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | int(11)      | NO   | PRI | NULL    | auto_increment |
| uid   | varchar(64)  | NO   | MUL | NULL    |                |
| name  | varchar(255) | YES  |     | NULL    |                |
+-------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
复制代码

user_score 表, 其中 uid 和 user_info.uid 语义一致:

mysql> desc user_info;
+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | int(11)      | NO   | PRI | NULL    | auto_increment |
| uid   | varchar(64)  | NO   | MUL | NULL    |                |
| name  | varchar(255) | YES  |     | NULL    |                |
+-------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
复制代码

其中数据状况以下, 都是很常见的场景.

mysql> select * from user_score limit 2;
+----+--------------------------------------+-------+
| id | uid                                  | score |
+----+--------------------------------------+-------+
|  5 | 111111111                            |   100 |
|  6 | 55116d58-be26-4eb7-8f7e-bd2d49fbb968 |   100 |
+----+--------------------------------------+-------+
2 rows in set (0.00 sec)

mysql> select * from user_info limit 2;
+----+--------------------------------------+-------------+
| id | uid                                  | name        |
+----+--------------------------------------+-------------+
|  1 | 111111111                            | tanglei     |
|  2 | 55116d58-be26-4eb7-8f7e-bd2d49fbb968 | hudsonemily |
+----+--------------------------------------+-------------+
2 rows in set (0.00 sec)

mysql> select count(*) from user_score
    -> union
    -> select count(*) from user_info;
+----------+
| count(*) |
+----------+
|        4 |
|  3000003 |
+----------+
2 rows in set (1.39 sec)
复制代码

索引状况是:

mysql> show index from user_score;
+------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table      | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user_score |          0 | PRIMARY   |            1 | id          | A         |           4 |     NULL | NULL   |      | BTREE      |         |               |
| user_score |          1 | index_uid |            1 | uid         | A         |           4 |     NULL | NULL   | YES  | BTREE      |         |               |
+------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

mysql> show index from user_info;
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user_info |          0 | PRIMARY   |            1 | id          | A         |     2989934 |     NULL | NULL   |      | BTREE      |         |               |
| user_info |          1 | index_uid |            1 | uid         | A         |     2989934 |     NULL | NULL   |      | BTREE      |         |               |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
复制代码

查询业务场景: 已知 user_score.id, 须要关联查询对应user_info的信息, (你们先忽略这个具体业务场景是否合理哈). 那么对应的 SQL 很天然的以下:

mysql> select * from user_score us
    -> inner join user_info ui on us.uid = ui.uid
    -> where us.id = 5;
+----+-----------+-------+---------+-----------+---------+
| id | uid       | score | id      | uid       | name    |
+----+-----------+-------+---------+-----------+---------+
|  5 | 111111111 |   100 |       1 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685399 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685400 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685401 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685402 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685403 | 111111111 | tanglei |
+----+-----------+-------+---------+-----------+---------+
6 rows in set (1.18 sec)
复制代码

请忽略其中的数据, 我刚开始 mock 了 100W, 而后又重复导入了两遍, 所以数据有一些重复. 300W 数据, 最后查询出来也是 1.18 秒. 按道理应该更快的. 老规矩 explain 看看啥状况?安全

mysql> explain
    -> select * from user_score us
    -> inner join user_info ui on us.uid = ui.uid
    -> where us.id = 5;
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type  | possible_keys     | key     | key_len | ref   | rows    | Extra       |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
|  1 | SIMPLE      | us    | const | PRIMARY,index_uid | PRIMARY | 4       | const |       1 | NULL        |
|  1 | SIMPLE      | ui    | ALL   | NULL              | NULL    | NULL    | NULL  | 2989934 | Using where |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
2 rows in set (0.00 sec)
复制代码

发现 user_info表没用上索引, 全表扫描近 300W 数据? 现象是这样, 为何呢?bash

你不妨思考一下, 若是你遇到这种场景, 应该怎么去排查?运维

我当时也是"一顿操做猛如虎", 然并卵? 尝试了什么多种 sql 写法来完成这个操做.工具

好比更换Join表的顺序(驱动表/被驱动表)oop

mysql> explain select * from user_info ui inner join user_score us on us.uid = ui.uid where us.id = 5;
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type  | possible_keys     | key     | key_len | ref   | rows    | Extra       |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
|  1 | SIMPLE      | us    | const | PRIMARY,index_uid | PRIMARY | 4       | const |       1 | NULL        |
|  1 | SIMPLE      | ui    | ALL   | NULL              | NULL    | NULL    | NULL  | 2989934 | Using where |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
2 rows in set (0.00 sec)
复制代码

再好比用子查询:测试

mysql> explain select * from user_info where uid in  (select uid from user_score where id = 5);
+----+-------------+------------+-------+-------------------+---------+---------+-------+---------+-------------+
| id | select_type | table      | type  | possible_keys     | key     | key_len | ref   | rows    | Extra       |
+----+-------------+------------+-------+-------------------+---------+---------+-------+---------+-------------+
|  1 | SIMPLE      | user_score | const | PRIMARY,index_uid | PRIMARY | 4       | const |       1 | NULL        |
|  1 | SIMPLE      | user_info  | ALL   | NULL              | NULL    | NULL    | NULL  | 2989934 | Using where |
+----+-------------+------------+-------+-------------------+---------+---------+-------+---------+-------------+
2 rows in set (0.00 sec)
复制代码

最终, 仍是没有结果. 但直接单表查询写 SQL 确能用上索引.

mysql> select * from user_info where uid = '111111111';
+---------+-----------+---------+
| id      | uid       | name    |
+---------+-----------+---------+
|       1 | 111111111 | tanglei |
| 3685399 | 111111111 | tanglei |
| 3685400 | 111111111 | tanglei |
| 3685401 | 111111111 | tanglei |
| 3685402 | 111111111 | tanglei |
| 3685403 | 111111111 | tanglei |
+---------+-----------+---------+
6 rows in set (0.01 sec)

mysql> explain select * from user_info where uid = '111111111';
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-----------------------+
| id | select_type | table     | type | possible_keys | key       | key_len | ref   | rows | Extra                 |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-----------------------+
|  1 | SIMPLE      | user_info | ref  | index_uid     | index_uid | 194     | const |    6 | Using index condition |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-----------------------+
1 row in set (0.01 sec)
复制代码

问题解决

尝试更换检索条件, 好比更换 uid 直接关联查询, 索引仍然用不上, 差点放弃了都. 在准备求助 DBA 前, 看了下表的建表语句.

mysql> show create table user_info;
+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table     | Create Table                                                                                                                                                                                                                                                 |
+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user_info | CREATE TABLE `user_info` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `uid` varchar(64) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_uid` (`uid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3685404 DEFAULT CHARSET=utf8 |
+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table user_score;
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                                                                                                                                             |
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user_score | CREATE TABLE `user_score` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `uid` varchar(64) NOT NULL,
  `score` float DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_uid` (`uid`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 |
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
复制代码

彻底有理由怀疑由于字符集不一致的问题致使索引失效的问题了. 因而修改了小表(真实线上环境可别乱操做)的字符集与大表一致, 再测试下.

mysql> select * from user_score us
    -> inner join user_info ui on us.uid = ui.uid
    -> where us.id = 5;
+----+-----------+-------+---------+-----------+---------+
| id | uid       | score | id      | uid       | name    |
+----+-----------+-------+---------+-----------+---------+
|  5 | 111111111 |   100 |       1 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685399 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685400 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685401 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685402 | 111111111 | tanglei |
|  5 | 111111111 |   100 | 3685403 | 111111111 | tanglei |
+----+-----------+-------+---------+-----------+---------+
6 rows in set (0.00 sec)

mysql> explain
    -> select * from user_score us
    -> inner join user_info ui on us.uid = ui.uid
    -> where us.id = 5;
+----+-------------+-------+-------+-------------------+-----------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys     | key       | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+-------------------+-----------+---------+-------+------+-------+
|  1 | SIMPLE      | us    | const | PRIMARY,index_uid | PRIMARY   | 4       | const |    1 | NULL  |
|  1 | SIMPLE      | ui    | ref   | index_uid         | index_uid | 194     | const |    6 | NULL  |
+----+-------------+-------+-------+-------------------+-----------+---------+-------+------+-------+
2 rows in set (0.00 sec)
复制代码

果真 work 了.

挖掘根因

其实深究缘由, 就是网上各类 MySQL军规/规约所提到的, "索引列不要参与计算". 此次这个 case, 若是知道 explain extended + show warnings 这个工具的话, (之前都不知道explain后面还能加 extended 参数), 可能就尽早"恍然大悟"了. (最新的 MySQL 8.0版本貌似不须要另外加这个关键字).

看下效果. (啊, 我还得把字符集改回去!!!)

mysql> explain extended select * from user_score us  inner join user_info ui on us.uid = ui.uid where us.id = 5;
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+----------+-------------+
| id | select_type | table | type  | possible_keys     | key     | key_len | ref   | rows    | filtered | Extra       |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+----------+-------------+
|  1 | SIMPLE      | us    | const | PRIMARY,index_uid | PRIMARY | 4       | const |       1 |   100.00 | NULL        |
|  1 | SIMPLE      | ui    | ALL   | NULL              | NULL    | NULL    | NULL  | 2989934 |   100.00 | Using where |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                              |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select '5' AS `id`,'111111111' AS `uid`,'100' AS `score`,`test`.`ui`.`id` AS `id`,`test`.`ui`.`uid` AS `uid`,`test`.`ui`.`name` AS `name` from `test`.`user_score` `us` join `test`.`user_info` `ui` where (('111111111' = convert(`test`.`ui`.`uid` using utf8mb4))) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
复制代码

索引列参与计算了, 每次都要根据字符集去转换, 全表扫描, 你说能快得起来么?

至于这个问题为何会发生? 综合来看, 就是由于历史缘由, 老业务场景中的原表是假 utf8, 新业务新表采用了真 utf8mb4.

考虑新表的时候, 忽略和原库字符集的比较. 其实, 发现库里面的不一样表可能都有不一样的字符集, 不一样人建的时候可能都依据我的喜爱去选择了不一样的字符集. 因而可知, 开发规范有多重要.
虽然知道索引列不能参与计算, 但这个场景下都是相同的类型, varchar(64) 最终查询过程当中仍然发生了类型转换. 所以须要把字段字符集不一致等同于字段类型不一致.
若是这个 case, 利用 fail-fast 的理念的话, 发现不一致, 直接不让 join 会不会更好? (就像 char v.s varchar 不能 join 同样).

留一道思考题

你能解释以下状况吗? 查询结果表现为什么不一致? 注意一下 SQL 的执行顺序, 查询优化器工做流程, 以及其中的 Using join buffer (Block Nested Loop), 建议多看看 MySQL 官方手册深刻背后原理.

mysql> select * from user_info ui
    -> inner join user_score us on us.uid = ui.uid
    -> where us.uid = '111111111';
+---------+-----------+---------+----+-----------+-------+
| id      | uid       | name    | id | uid       | score |
+---------+-----------+---------+----+-----------+-------+
|       1 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685399 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685400 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685401 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685402 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685403 | 111111111 | tanglei |  5 | 111111111 |   100 |
+---------+-----------+---------+----+-----------+-------+
6 rows in set (1.14 sec)

mysql> select * from user_info ui
    -> inner join user_score us on us.uid = ui.uid
    -> where ui.uid = '111111111';
+---------+-----------+---------+----+-----------+-------+
| id      | uid       | name    | id | uid       | score |
+---------+-----------+---------+----+-----------+-------+
|       1 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685399 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685400 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685401 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685402 | 111111111 | tanglei |  5 | 111111111 |   100 |
| 3685403 | 111111111 | tanglei |  5 | 111111111 |   100 |
+---------+-----------+---------+----+-----------+-------+
6 rows in set (0.00 sec)
复制代码

mysql> explain
    -> select * from user_info ui
    -> inner join user_score us on us.uid = ui.uid
    -> where us.uid = '111111111';
+----+-------------+-------+------+---------------+-----------+---------+-------+---------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key       | key_len | ref   | rows    | Extra                                              |
+----+-------------+-------+------+---------------+-----------+---------+-------+---------+----------------------------------------------------+
|  1 | SIMPLE      | us    | ref  | index_uid     | index_uid | 258     | const |       1 | Using index condition                              |
|  1 | SIMPLE      | ui    | ALL  | NULL          | NULL      | NULL    | NULL  | 2989934 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+-----------+---------+-------+---------+----------------------------------------------------+
2 rows in set (0.00 sec)

mysql> explain
    -> select * from user_info ui
    -> inner join user_score us on us.uid = ui.uid
    -> where ui.uid = '111111111';
+----+-------------+-------+------+---------------+-----------+---------+-------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key       | key_len | ref   | rows | Extra                                              |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+----------------------------------------------------+
|  1 | SIMPLE      | ui    | ref  | index_uid     | index_uid | 194     | const |    6 | Using index condition                              |
|  1 | SIMPLE      | us    | ALL  | index_uid     | NULL      | NULL    | NULL  |    4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+----------------------------------------------------+
2 rows in set (0.01 sec)

复制代码

说明: 本文测试场景基于 MySQL 5.6, 另外, 本文案例只是为了说明问题, 其中的 SQL 并不规范(例如尽可能别用 select * 之类的), 请勿模仿(模仿了我也不负责). 为了写本文, 可花了很多时间, 建 DB, 灌mock数据等等, 若是以为有用, 还望你帮忙"在看", "转发". 最后留一个思考题供讨论, 欢迎留言说出你的见解.

打个广告

阿里云ECS弹性计算服务是阿里云的最重要的云服务产品之一。弹性计算服务是一种简单高效，处理能力可弹性伸缩的计算服务。咱们始终致力于利用和创造业界最新的前沿技术，让更多的客户轻松享受这些技术红利，在云上快速构建更稳定、安全的应用，提高运维效率，下降IT成本，使客户更专一于本身的核心业务创新。弹性计算从新定义了人们使用计算资源的方式，这一新的方式正在而且将一直影响着关于计算资源的生态和经济圈。咱们正在创造历史，咱们真诚地邀请您加入咱们的队伍。

最近团队释放很多 HC, 诚招 P6/P7/P8 的同窗, 本组同窗主要招聘后端研发同窗(JD在此), 感兴趣的同窗可扫描下面二维码加我联系.

另外, 2021 届校招/实习生岗位也正在进行中(详情请戳), 若是你是 2020-11 -- 2021-07 月之间毕业, 同时对阿里巴巴感兴趣, 也欢迎联系我帮忙内推.