(请原谅我, 标题党一回, 花几分钟看看, 或许对你有帮助).最近工做上遇到一个”神奇”的问题, 或许对你们有帮助, 所以造成本文.html
最近工做上遇到一个"神奇"的问题, 或许对你们有帮助, 所以造成本文.mysql
问题大概是, 我有两个表 TableA, TableB, 其中 TableA 表大概百万行级别(存量业务数据), TableB 表几行(新业务场景, 数据还未膨胀起来), 语义上 TableA.columnA = TableB.columnA
, 其中 columnA
上创建了索引, 但查询的时候确巨慢无比, 基本上到 5-6 秒, 明显跟预期不符合.sql
下面我以一个具体的例子来讲明吧, 模拟其中的 SQL 查询场景.后端
user_info
表, 为了场景尽可能简单, 我只 mock 了其中的三列数据.mysql> desc user_info;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| uid | varchar(64) | NO | MUL | NULL | |
| name | varchar(255) | YES | | NULL | |
+-------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
复制代码
user_score
表, 其中 uid
和 user_info.uid
语义一致:mysql> desc user_info;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| uid | varchar(64) | NO | MUL | NULL | |
| name | varchar(255) | YES | | NULL | |
+-------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
复制代码
mysql> select * from user_score limit 2;
+----+--------------------------------------+-------+
| id | uid | score |
+----+--------------------------------------+-------+
| 5 | 111111111 | 100 |
| 6 | 55116d58-be26-4eb7-8f7e-bd2d49fbb968 | 100 |
+----+--------------------------------------+-------+
2 rows in set (0.00 sec)
mysql> select * from user_info limit 2;
+----+--------------------------------------+-------------+
| id | uid | name |
+----+--------------------------------------+-------------+
| 1 | 111111111 | tanglei |
| 2 | 55116d58-be26-4eb7-8f7e-bd2d49fbb968 | hudsonemily |
+----+--------------------------------------+-------------+
2 rows in set (0.00 sec)
mysql> select count(*) from user_score
-> union
-> select count(*) from user_info;
+----------+
| count(*) |
+----------+
| 4 |
| 3000003 |
+----------+
2 rows in set (1.39 sec)
复制代码
mysql> show index from user_score;
+------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user_score | 0 | PRIMARY | 1 | id | A | 4 | NULL | NULL | | BTREE | | |
| user_score | 1 | index_uid | 1 | uid | A | 4 | NULL | NULL | YES | BTREE | | |
+------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
mysql> show index from user_info;
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user_info | 0 | PRIMARY | 1 | id | A | 2989934 | NULL | NULL | | BTREE | | |
| user_info | 1 | index_uid | 1 | uid | A | 2989934 | NULL | NULL | | BTREE | | |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
复制代码
user_score.id
, 须要关联查询对应user_info
的信息, (你们先忽略这个具体业务场景是否合理哈). 那么对应的 SQL 很天然的以下:mysql> select * from user_score us
-> inner join user_info ui on us.uid = ui.uid
-> where us.id = 5;
+----+-----------+-------+---------+-----------+---------+
| id | uid | score | id | uid | name |
+----+-----------+-------+---------+-----------+---------+
| 5 | 111111111 | 100 | 1 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685399 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685400 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685401 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685402 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685403 | 111111111 | tanglei |
+----+-----------+-------+---------+-----------+---------+
6 rows in set (1.18 sec)
复制代码
请忽略其中的数据, 我刚开始 mock 了 100W, 而后又重复导入了两遍, 所以数据有一些重复. 300W 数据, 最后查询出来也是 1.18 秒. 按道理应该更快的. 老规矩 explain
看看啥状况?安全
mysql> explain
-> select * from user_score us
-> inner join user_info ui on us.uid = ui.uid
-> where us.id = 5;
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
| 1 | SIMPLE | us | const | PRIMARY,index_uid | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | ui | ALL | NULL | NULL | NULL | NULL | 2989934 | Using where |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
2 rows in set (0.00 sec)
复制代码
发现 user_info
表没用上索引, 全表扫描近 300W 数据? 现象是这样, 为何呢?bash
你不妨思考一下, 若是你遇到这种场景, 应该怎么去排查?运维
我当时也是"一顿操做猛如虎", 然并卵? 尝试了什么多种 sql 写法来完成这个操做.工具
好比更换Join表的顺序(驱动表/被驱动表)oop
mysql> explain select * from user_info ui inner join user_score us on us.uid = ui.uid where us.id = 5;
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
| 1 | SIMPLE | us | const | PRIMARY,index_uid | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | ui | ALL | NULL | NULL | NULL | NULL | 2989934 | Using where |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+-------------+
2 rows in set (0.00 sec)
复制代码
再好比用子查询:测试
mysql> explain select * from user_info where uid in (select uid from user_score where id = 5);
+----+-------------+------------+-------+-------------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+-------------------+---------+---------+-------+---------+-------------+
| 1 | SIMPLE | user_score | const | PRIMARY,index_uid | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | user_info | ALL | NULL | NULL | NULL | NULL | 2989934 | Using where |
+----+-------------+------------+-------+-------------------+---------+---------+-------+---------+-------------+
2 rows in set (0.00 sec)
复制代码
最终, 仍是没有结果. 但直接单表查询写 SQL 确能用上索引.
mysql> select * from user_info where uid = '111111111';
+---------+-----------+---------+
| id | uid | name |
+---------+-----------+---------+
| 1 | 111111111 | tanglei |
| 3685399 | 111111111 | tanglei |
| 3685400 | 111111111 | tanglei |
| 3685401 | 111111111 | tanglei |
| 3685402 | 111111111 | tanglei |
| 3685403 | 111111111 | tanglei |
+---------+-----------+---------+
6 rows in set (0.01 sec)
mysql> explain select * from user_info where uid = '111111111';
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-----------------------+
| 1 | SIMPLE | user_info | ref | index_uid | index_uid | 194 | const | 6 | Using index condition |
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+-----------------------+
1 row in set (0.01 sec)
复制代码
尝试更换检索条件, 好比更换 uid 直接关联查询, 索引仍然用不上, 差点放弃了都. 在准备求助 DBA 前, 看了下表的建表语句.
mysql> show create table user_info;
+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user_info | CREATE TABLE `user_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uid` varchar(64) NOT NULL,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_uid` (`uid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3685404 DEFAULT CHARSET=utf8 |
+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show create table user_score;
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user_score | CREATE TABLE `user_score` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uid` varchar(64) NOT NULL,
`score` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_uid` (`uid`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 |
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
复制代码
彻底有理由怀疑由于字符集不一致的问题致使索引失效的问题了. 因而修改了小表(真实线上环境可别乱操做)的字符集与大表一致, 再测试下.
mysql> select * from user_score us
-> inner join user_info ui on us.uid = ui.uid
-> where us.id = 5;
+----+-----------+-------+---------+-----------+---------+
| id | uid | score | id | uid | name |
+----+-----------+-------+---------+-----------+---------+
| 5 | 111111111 | 100 | 1 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685399 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685400 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685401 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685402 | 111111111 | tanglei |
| 5 | 111111111 | 100 | 3685403 | 111111111 | tanglei |
+----+-----------+-------+---------+-----------+---------+
6 rows in set (0.00 sec)
mysql> explain
-> select * from user_score us
-> inner join user_info ui on us.uid = ui.uid
-> where us.id = 5;
+----+-------------+-------+-------+-------------------+-----------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------+-----------+---------+-------+------+-------+
| 1 | SIMPLE | us | const | PRIMARY,index_uid | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | ui | ref | index_uid | index_uid | 194 | const | 6 | NULL |
+----+-------------+-------+-------+-------------------+-----------+---------+-------+------+-------+
2 rows in set (0.00 sec)
复制代码
果真 work 了.
其实深究缘由, 就是网上各类 MySQL军规/规约所提到的, "索引列不要参与计算". 此次这个 case, 若是知道 explain extended + show warnings
这个工具的话, (之前都不知道explain
后面还能加 extended
参数), 可能就尽早"恍然大悟"了. (最新的 MySQL 8.0版本貌似不须要另外加这个关键字).
看下效果. (啊, 我还得把字符集改回去!!!)
mysql> explain extended select * from user_score us inner join user_info ui on us.uid = ui.uid where us.id = 5;
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+----------+-------------+
| 1 | SIMPLE | us | const | PRIMARY,index_uid | PRIMARY | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | ui | ALL | NULL | NULL | NULL | NULL | 2989934 | 100.00 | Using where |
+----+-------------+-------+-------+-------------------+---------+---------+-------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select '5' AS `id`,'111111111' AS `uid`,'100' AS `score`,`test`.`ui`.`id` AS `id`,`test`.`ui`.`uid` AS `uid`,`test`.`ui`.`name` AS `name` from `test`.`user_score` `us` join `test`.`user_info` `ui` where (('111111111' = convert(`test`.`ui`.`uid` using utf8mb4))) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
复制代码
索引列参与计算了, 每次都要根据字符集去转换, 全表扫描, 你说能快得起来么?
至于这个问题为何会发生? 综合来看, 就是由于历史缘由, 老业务场景中的原表是假 utf8
, 新业务新表采用了真 utf8mb4
.
varchar(64)
最终查询过程当中仍然发生了类型转换. 所以须要把字段字符集不一致等同于字段类型不一致.fail-fast
的理念的话, 发现不一致, 直接不让 join 会不会更好? (就像 char v.s varchar
不能 join 同样).你能解释以下状况吗? 查询结果表现为什么不一致? 注意一下 SQL 的执行顺序, 查询优化器工做流程, 以及其中的 Using join buffer (Block Nested Loop), 建议多看看 MySQL 官方手册 深刻背后原理.
mysql> select * from user_info ui
-> inner join user_score us on us.uid = ui.uid
-> where us.uid = '111111111';
+---------+-----------+---------+----+-----------+-------+
| id | uid | name | id | uid | score |
+---------+-----------+---------+----+-----------+-------+
| 1 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685399 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685400 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685401 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685402 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685403 | 111111111 | tanglei | 5 | 111111111 | 100 |
+---------+-----------+---------+----+-----------+-------+
6 rows in set (1.14 sec)
mysql> select * from user_info ui
-> inner join user_score us on us.uid = ui.uid
-> where ui.uid = '111111111';
+---------+-----------+---------+----+-----------+-------+
| id | uid | name | id | uid | score |
+---------+-----------+---------+----+-----------+-------+
| 1 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685399 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685400 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685401 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685402 | 111111111 | tanglei | 5 | 111111111 | 100 |
| 3685403 | 111111111 | tanglei | 5 | 111111111 | 100 |
+---------+-----------+---------+----+-----------+-------+
6 rows in set (0.00 sec)
复制代码
mysql> explain
-> select * from user_info ui
-> inner join user_score us on us.uid = ui.uid
-> where us.uid = '111111111';
+----+-------------+-------+------+---------------+-----------+---------+-------+---------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-----------+---------+-------+---------+----------------------------------------------------+
| 1 | SIMPLE | us | ref | index_uid | index_uid | 258 | const | 1 | Using index condition |
| 1 | SIMPLE | ui | ALL | NULL | NULL | NULL | NULL | 2989934 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+-----------+---------+-------+---------+----------------------------------------------------+
2 rows in set (0.00 sec)
mysql> explain
-> select * from user_info ui
-> inner join user_score us on us.uid = ui.uid
-> where ui.uid = '111111111';
+----+-------------+-------+------+---------------+-----------+---------+-------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+----------------------------------------------------+
| 1 | SIMPLE | ui | ref | index_uid | index_uid | 194 | const | 6 | Using index condition |
| 1 | SIMPLE | us | ALL | index_uid | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+----------------------------------------------------+
2 rows in set (0.01 sec)
复制代码
说明: 本文测试场景基于 MySQL 5.6, 另外, 本文案例只是为了说明问题, 其中的 SQL 并不规范(例如尽可能别用 select * 之类的), 请勿模仿(模仿了我也不负责). 为了写本文, 可花了很多时间, 建 DB, 灌mock数据等等, 若是以为有用, 还望你帮忙"在看", "转发". 最后留一个思考题供讨论, 欢迎留言说出你的见解.
阿里云ECS弹性计算服务是阿里云的最重要的云服务产品之一。弹性计算服务是一种简单高效,处理能力可弹性伸缩的计算服务。咱们始终致力于利用和创造业界最新的前沿技术,让更多的客户轻松享受这些技术红利,在云上快速构建更稳定、安全的应用,提高运维效率,下降IT成本,使客户更专一于本身的核心业务创新。弹性计算从新定义了人们使用计算资源的方式,这一新的方式正在而且将一直影响着关于计算资源的生态和经济圈。咱们正在创造历史,咱们真诚地邀请您加入咱们的队伍。
最近团队释放很多 HC, 诚招 P6/P7/P8 的同窗, 本组同窗主要招聘后端研发同窗(JD在此), 感兴趣的同窗可扫描下面二维码加我联系.
另外, 2021 届校招/实习生岗位也正在进行中(详情请戳), 若是你是 2020-11 -- 2021-07 月之间毕业, 同时对阿里巴巴感兴趣, 也欢迎联系我帮忙内推.