在平常开发中,对于 LEFT JOIN
和 JOIN
的用法大部分应该都是同样的,若是有两个表 A,B,若是两个表的数据都想要,就使用 JOIN
,若是只想要一个表的所有数据,另外一个表数据无关紧要,就使用 LEFT JOIN
。(固然这么描述是不太准确的,可是很符合个人平常业务开发)。mysql
在 MYSQL LEFT JOIN 详解 这篇文章中咱们已经知道了,LEFT JOIN
是本身选择驱动表的,而 JOIN
是 MYSQL 优化器选择驱动表的。算法
那么,当咱们写了一条 LEFT JOIN
语句,MYSQL 会将这条语句优化成 JOIN
语句吗?sql
若是会优化的话,那么何时会优化呢?markdown
事实上,这正是我遇到的一个线上问题。咱们一块儿来看一下。post
在咱们线上有这么一条慢 SQL(已处理),执行时间超过 0.5 秒。测试
select
count(distinct order.order_id)
from order force index(shop_id)
left join `order_extend`
on `order`.`order_id` = `order_extend`.`order_id`
where `order`.`create_time` >= "2020-08-01 00:00:00"
and `order`.`create_time` <= "2020-08-01 23:59:59"
and `order`.`shop_id` = 328449726569069326
and `order`.`status` = 1
and `order_extend`.`shop_id` = 328449726569069326
and `order_extend`.`status` = 1
复制代码
explain 结果以下:优化
+----+-------------+--------------+------------+--------+------------------+----------+---------+------------------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------------+--------+------------------+----------+---------+------------------------+------+-------------+
| 1 | SIMPLE | order_extend | NULL | ref | order_id,shop_id | shop_id | 8 | const | 3892 | Using where |
| 1 | SIMPLE | order | NULL | eq_ref | shop_id | shop_id | 16 | example.order.order_id | 1 | Using where |
+----+-------------+--------------+------------+--------+------------------+----------+---------+------------------------+------+-------------+
2 rows in set, 1 warning (0.00 sec)
复制代码
经过 explain,再结合咱们以前讲的 MYSQL 链接查询算法,驱动表为 order_extend,循环 3892 次,说多也很少,说少也很多,被驱动表数据查询类型为 eq_ref
,因此应该不会太慢,那么问题就出如今 3892 次上面了,想办法将这个数字降下来便可。ui
等等!为何驱动表是 order_extend?我明明使用的是 LEFT JOIN
啊,按理说驱动表应该是 order 表,为何会变成了 order_extend 了。难道是 MYSQL 内部优化了?spa
顺着这个思路,既然驱动表变了,说明这条 SQL 变为 JOIN
语句了。code
咱们顺着分析 JOIN
语句的方式来分析一下这条语句。(ps:须要对 MYSQL JOIN 内部执行过程有必定的理解,若是不太熟悉,请先移步看这篇文章 → MYSQL 链接查询算法 )
MYSQL 选择 order_extend 当作驱动表,说明在 where 条件下 order_extend 查询的数据更少,MYSQL 会选择一个小的表当作驱动表。
咱们来分别适用上述的 where 条件单独执行 select count(*)
语句,查看一下大体每一个表都涉及到多少条 SQL 记录。
为了避免影响咱们的分析,咱们使用 explain 语句,这样整个过程就都是估算的结果,模拟一下 MYSQL 分析的过程。
mysql> explain select
count(distinct order.order_id)
from order force index(shop_id)
where `order`.`create_time` >= "2020-08-01 00:00:00"
and `order`.`create_time` <= "2020-08-01 23:59:59"
and `order`.`shop_id` = 328449726569069326
and `order`.`status` = 1;
+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+-------------+
| 1 | SIMPLE | order | NULL | ref | PRIMARY,shop_id,create_time... | shop_id | 8 | const | 320372 | Using where |
+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+-------------+
1 row in set, 1 warning (0.00 sec)
复制代码
select
count(distinct order_extend.order_id)
and `order_extend`.`shop_id` = 328449726569069326
and `order_extend`.`status` = 1
+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | order_extend | NULL | ref | order_id,shop_id | shop_id | 8 | const | 3892 | 10.00 | Using where |
+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
复制代码
能够看到,在上述 where 条件下,order_extend 表只会查询 3892 条数据,而 order 表会查询 320372 条数据,因此 order_extend 表当驱动表是彻底没有问题的。
那么咱们再来看看为何 order 表会扫描这么多数据呢?在 2020-08-01 这一天可能也没有这么多数据啊。那么这个时候咱们应该会很容易的想到,是强制走索引的问题,由于在上述查询语句中,咱们强制走了 shop_id
索引,这个索引可能不是最优索引,咱们把 force index(shop_id)
去掉再试试看
mysql> explain select
count(distinct order.order_id)
where `order`.`create_time` >= "2020-08-01 00:00:00"
and `order`.`create_time` <= "2020-08-01 23:59:59"
and `order`.`shop_id` = 328449726569069326
and `order`.`status` = 1;
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+--------------------------+
| 1 | SIMPLE | order | NULL | ref | create_time | create_time | 8 | const | <3892 | 10.00 | Using where; Using index |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
复制代码
能够看到,若是不强制走 shop_id
索引的话,走 create_time
索引的话,扫描的行数会更少,假设说 100 行,只会循环 100 次,扫描 100 x 3892
行数据,而以前的总共要循环 3892 次,扫描 3892 x 300000
行数据。
因此最终的这条慢 SQL 的缘由肯定了,是由于咱们强制走 shop_id
索引,致使 MYSQL 扫描的行数更多了,咱们只须要去掉强制走索引便可,大多数时间 MYSQL 都会选择正确的索引,因此强制使用索引的时候必定要当心谨慎。
SQL 慢的问题咱们已经解决了,咱们再来回顾一下文章开头的问题:LEFT JOIN
会被优化为 JOIN
吗?
答案是会的。那么何时会出现这种状况呢?
咱们再来回顾一下 MYSQL LEFT JOIN 详解 文章中的内容。
为了方便阅读,咱们将部份内容粘贴出来。
mysql> select * from goods left join goods_category on goods.category_id = goods_category.category_id;
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+
| 1 | 男鞋1 | 1 | 1 | 鞋 |
| 2 | 男鞋2 | 1 | 1 | 鞋 |
| 3 | 男鞋3 | 3 | 3 | 羽绒服 |
| 4 | T恤1 | 2 | 2 | T恤 |
| 5 | T恤2 | 2 | 2 | T恤 |
+----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)
mysql> select * from goods left join goods_category on goods.category_id = goods_category.category_id;
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+
| 1 | 男鞋1 | 1 | 1 | 鞋 |
| 2 | 男鞋2 | 1 | 1 | 鞋 |
| 3 | 男鞋3 | 4 | NULL | NULL |
| 4 | T恤1 | 2 | 2 | T恤 |
| 5 | T恤2 | 2 | 2 | T恤 |
+----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)
mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and g.goods_name = 'T恤1');
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+
| 1 | 男鞋1 | 1 | NULL | NULL |
| 2 | 男鞋2 | 1 | NULL | NULL |
| 3 | 男鞋3 | 4 | NULL | NULL |
| 4 | T恤1 | 2 | 2 | T恤 |
| 5 | T恤2 | 2 | NULL | NULL |
+----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)
mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and c.category_name = 'T恤');
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+
| 1 | 男鞋1 | 1 | NULL | NULL |
| 2 | 男鞋2 | 1 | NULL | NULL |
| 3 | 男鞋3 | 4 | NULL | NULL |
| 4 | T恤1 | 2 | 2 | T恤 |
| 5 | T恤2 | 2 | 2 | T恤 |
+----------+------------+-------------+-------------+---------------+
5 rows in set (0.00 sec)
mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id) where c.category_name = '鞋';
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+
| 1 | 男鞋1 | 1 | 1 | 鞋 |
| 2 | 男鞋2 | 1 | 1 | 鞋 |
+----------+------------+-------------+-------------+---------------+
2 rows in set (0.00 sec)
mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id) where g.goods_name = 'T恤1';
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+
| 4 | T恤1 | 2 | 2 | T恤 |
+----------+------------+-------------+-------------+---------------+
1 row in set (0.00 sec)
mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and g.goods_name = 'T恤2') where g.goods_name = 'T恤1';
+----------+------------+-------------+-------------+---------------+
| goods_id | goods_name | category_id | category_id | category_name |
+----------+------------+-------------+-------------+---------------+
| 4 | T恤1 | 2 | NULL | NULL |
+----------+------------+-------------+-------------+---------------+
1 row in set (0.00 sec)
复制代码
咱们能够看到,当 where 条件中有被驱动表的条件时,查询结果是和 JOIN
的结果是一致的,无 NULL 值的出现。
因此,咱们能够想到,LEFT JOIN
优化为 JOIN
的条件为:where 条件中有被驱动表的非空条件时,LEFT JOIN
等价于 JOIN
。
这不难理解,LEFT JOIN
会返回驱动表全部数据,当有被驱动表的 where 条件时,会过滤掉 NULL 的值,此时和 JOIN
的结果一致了,那么 MYSQL 会选择将 LEFT JOIN
优化为 JOIN
,这样就能够本身选择驱动表了。
咱们再来编写一个测试用例来验证一下咱们的结论。
CREATE TABLE `A` (
`id` int(11) auto_increment,
`a` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB;
delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while(i<=100)do
insert into A (`a`) values(i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
CREATE TABLE `B` (
`id` int(11) auto_increment,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `b` (`b`)
) ENGINE=InnoDB;
delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while(i<=100)do
insert into B (`b`) values(i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
复制代码
咱们建立了两张如出一辙的表,每一个表中有 100 条数据,而后咱们执行一下 LEFT JOIN
语句。
mysql> explain select * from A left join B on A.id = B.id where A.a <= 100;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
| 1 | SIMPLE | A | NULL | index | a | a | 5 | NULL | 100 | 100.00 | Using where; Using index |
| 1 | SIMPLE | B | NULL | eq_ref | PRIMARY | PRIMARY | 4 | example2.A.id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
复制代码
mysql> explain select * from A left join B on A.id = B.id where A.a <= 100 and B.b <= 50;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
| 1 | SIMPLE | B | NULL | range | PRIMARY,b | b | 5 | NULL | 50 | 100.00 | Using where; Using index |
| 1 | SIMPLE | A | NULL | eq_ref | PRIMARY,a | PRIMARY | 4 | example2.B.id | 1 | 100.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
复制代码
mysql> explain select * from A left join B on A.id = B.id where A.a <= 100 and B.b <= 100;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
| 1 | SIMPLE | A | NULL | index | PRIMARY,a | a | 5 | NULL | 100 | 100.00 | Using where; Using index |
| 1 | SIMPLE | B | NULL | eq_ref | PRIMARY,b | PRIMARY | 4 | example2.A.id | 1 | 100.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
复制代码
从上面看,给 B 表增长了 where 条件以后,若是 B 表扫描的行数更少,那么是有可能换驱动表的,这也说明了,LEFT JOIN
语句被优化成了 JOIN
语句。
上面咱们分析了一条慢 SQL 的问题,分析的过程涉及到了不少知识点,但愿你们能够认真研究一下。
同时咱们得出了一条结论:当有被驱动表的非空 where 条件时,MYSQL 会将 LEFT JOIN
语句优化为 JOIN
语句。