浅谈 MySQL 子查询及其优化

时间 2019-11-07

标签浅谈 mysql 查询及其优化栏目 MySQL 繁體版

原文原文链接

使用过oracle或者其余关系数据库的DBA或者开发人员都有这样的经验，在子查询上都认为数据库已经作过优化，可以很好的选择驱动表执行，而后在把该经验移植到mysql数据库上，可是不幸的是，mysql在子查询的处理上有可能会让你大失所望，在咱们的生产系统上就碰到过一些案例，例如： mysql

SELECT i_id,
       sum(i_sell) AS i_sell
FROM table_data
WHERE i_id IN
    (SELECT i_id
     FROM table_data
     WHERE Gmt_create >= '2011-10-07 00:00:00')
GROUP BY i_id;

（备注：sql的业务逻辑能够打个比方：先查询出10-07号新卖出的100本书，而后在查询这新卖出的100本书在整年的销量状况）。

这条sql之因此出现的性能问题在于mysql优化器在处理子查询的弱点，mysql优化器在处理子查询的时候，会将将子查询改写。一般状况下，咱们但愿由内到外，先完成子查询的结果，而后在用子查询来驱动外查询的表，完成查询；可是mysql处理为将会先扫描外面表中的全部数据，每条数据将会传到子查询中与子查询关联，若是外表很大的话，那么性能上将会出现问题；
针对上面的查询，因为table_data这张表的数据有70W的数据，同时子查询中的数据较多，有大量是重复的，这样就须要关联近70W次，大量的关联致使这条sql执行了几个小时也没有执行完成，因此咱们须要改写sql：
sql

SELECT t2.i_id,
       SUM(t2.i_sell) AS sold
FROM
  (SELECT DISTINCT i_id
   FROM table_data
   WHERE gmt_create >= '2011-10-07 00:00:00') t1,
                                              table_data t2
WHERE t1.i_id = t2.i_id
GROUP BY t2.i_id;

咱们将子查询改成了关联，同时在子查询中加上distinct，减小t1关联t2的次数；
改造后，sql的执行时间降到100ms之内。
mysql的子查询的优化一直不是很友好，一直有受业界批评比较多,也是我在sql优化中遇到过最多的问题之一，mysql在处理子查询的时候，会将子查询改写,一般状况下，咱们但愿由内到外，也就是先完成子查询的结果，而后在用子查询来驱动外查询的表，完成查询，可是偏偏相反，子查询不会先被执行；今天但愿经过介绍一些实际的案例来加深对mysql子查询的理解。下面将介绍一个完整的案例及其分析、调优的过程与思路。

一、案例：

用户反馈数据库响应较慢，许多业务动更新被卡住；登陆到数据库中观察，发现长时间执行的sql；
数据库

| 10437 | usr0321t9m9 | 10.242.232.50:51201 | oms | Execute | 1179 | Sending

Sql为：

SELECT tradedto0_.*
FROM a1 tradedto0_
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid IN
         (SELECT orderdto1_.tradeoid
          FROM a2 orderdto1_
          WHERE orderdto1_.proname LIKE '%??%'
            OR orderdto1_.procode LIKE '%??%'))
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

二、现象：其余表的更新被阻塞

UPDATE a1
SET tradesign='DAB67634-795C-4EAC-B4A0-78F0D531D62F',
              markColor=' #CD5555',
                        memotime='2012-09- 22',
                                 markPerson='??'
WHERE tradeoid IN ('gy2012092204495100032') ；

为了尽快恢复应用，将其长时间执行的sql kill掉后，应用恢复正常;

三、分析执行计划:

db@3306 ：explain
SELECT tradedto0_.*
FROM a1 tradedto0_
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid IN
         (SELECT orderdto1_.tradeoid
          FROM a2 orderdto1_
          WHERE orderdto1_.proname LIKE '%??%'
            OR orderdto1_.procode LIKE '%??%'))
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

+----+--------------------+------------+------+---------------+------+---------+------+-------+-----
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+------+-------+-----
| 1 | PRIMARY | tradedto0_ | ALL | NULL | NULL | NULL | NULL | 27454 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | orderdto1_ | ALL | NULL | NULL | NULL | NULL | 40998 | Using where |
+----+--------------------+------------+------+---------------+------+---------+------+-------+-----

从执行计划上，咱们开始一步一步地进行优化：
首先，咱们看看执行计划的第二行，也就是子查询的那部分，orderdto1_进行了全表的扫描，咱们看看能不能添加适当的索引：

A . 使用覆盖索引:

db@3306：alter table a2 add index ind_a2(proname,procode,tradeoid);
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

添加组合索引超过了最大key length限制：

B．查看该表的字段定义：

db@3306 ：DESC  a2 ;
+---------------------+---------------+------+-----+---------+-------+
| FIELD               | TYPE          | NULL | KEY | DEFAULT | Extra |
+---------------------+---------------+------+-----+---------+-------+
| OID                 | VARCHAR(50)   | NO   | PRI | NULL    |       |
| TRADEOID            | VARCHAR(50)   | YES  |     | NULL    |       |
| PROCODE             | VARCHAR(50)   | YES  |     | NULL    |       |
| PRONAME             | VARCHAR(1000) | YES  |     | NULL    |       |
| SPCTNCODE           | VARCHAR(200)  | YES  |     | NULL    |       |

C．查看表字段的平均长度：

db@3306 ：SELECT MAX(LENGTH(PRONAME)),avg(LENGTH(PRONAME)) FROM a2;
+----------------------+----------------------+
| MAX(LENGTH(PRONAME)) | avg(LENGTH(PRONAME)) |
+----------------------+----------------------+
|    95              |       24.5588 |

D．缩小字段长度

ALTER TABLE MODIFY COLUMN PRONAME VARCHAR(156);

再进行执行计划分析：

db@3306 ：explain
SELECT tradedto0_.*
FROM a1 tradedto0_
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid IN
         (SELECT orderdto1_.tradeoid
          FROM a2 orderdto1_
          WHERE orderdto1_.proname LIKE '%??%'
            OR orderdto1_.procode LIKE '%??%'))
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;


+----+--------------------+------------+-------+-----------------+----------------------+---------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+-----------------+----------------------+---------+
| 1 | PRIMARY | tradedto0_ | ref | ind_tradestatus | ind_tradestatus | 345 | const,const,const,const | 8962 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index |
+----+--------------------+------------+-------+-----------------+----------------------+---------+

发现性能仍是上不去，关键在两个表扫描的行数并无减少（8962*41005），上面添加的索引没有太大的效果，如今查看t表的执行结果：

db@3306 ：
SELECT orderdto1_.tradeoid
FROM t orderdto1_
WHERE orderdto1_.proname LIKE '%??%'
  OR orderdto1_.procode LIKE '%??%';

 Empty
SET (0.05 sec)

结果集为空，因此须要将t表的结果集作做为驱动表；

四、改写子查询：

经过上面测试验证，普通的mysql子查询写法性能上是不好的，为mysql的子查询自然的弱点，须要将sql进行改写为关联的写法：
mysql优化

SELECT tradedto0_.*
FROM a1 tradedto0_ ,
  (SELECT orderdto1_.tradeoid
   FROM a2 orderdto1_
   WHERE orderdto1_.proname LIKE '%??%'
     OR orderdto1_.procode LIKE '%??%')t2
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid=t2.tradeoid)
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

五、查看执行计划：

db@3306 ：explain
SELECT tradedto0_.*
FROM a1 tradedto0_ ,
  (SELECT orderdto1_.tradeoid
   FROM a2 orderdto1_
   WHERE orderdto1_.proname LIKE '%??%'
     OR orderdto1_.procode LIKE '%??%')t2
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid=t2.tradeoid)
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

+----+-------------+------------+-------+---------------+----------------------+---------+------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+----------------------+---------+------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index |
+----+-------------+------------+-------+---------------+----------------------+---------+------+

六、执行时间：

db@3306 ：
SELECT tradedto0_.*
FROM a1 tradedto0_ ,
  (SELECT orderdto1_.tradeoid
   FROM a2 orderdto1_
   WHERE orderdto1_.proname LIKE '%??%'
     OR orderdto1_.procode LIKE '%??%')t2
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid=t2.tradeoid)
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

 Empty
SET (0.03 sec)

缩短到了毫秒；

七、总结：

1. mysql子查询在执行计划上有着明显的弱点，须要将子查询进行改写
能够参考：
a. 生产库中遇到mysql的子查询：http://hidba.org/?p=412
b. 内建的builtin InnoDB,子查询阻塞更新：http://hidba.org/?p=456
2. 在表结构设计上，不要随便使用varchar(N)的大字段，致使没法使用索引
能够参考：
a. JDBC内存管理—varchar2(4000)的影响：http://hidba.org/?p=31
b. innodb中大字段的限制：http://hidba.org/?p=144
c. innodb使用大字段text，blob的一些优化建议： http://hidba.org/?p=551

八、Refer：

[1] 生产库中遇到mysql的子查询 http://hidba.org/?p=412 oracle

[2] 浅谈mysql的子查询 http://hidba.org/?p=624 性能

[3] mysql子查询的弱点 http://hidba.org/?p=260 测试