使用过oracle或者其余关系数据库的DBA或者开发人员都有这样的经验,在子查询上都认为数据库已经作过优化,可以很好的选择驱动表执行,而后在把该经验移植到mysql数据库上,可是不幸的是,mysql在子查询的处理上有可能会让你大失所望,在咱们的生产系统上就碰到过一些案例,例如: mysql
SELECT i_id, sum(i_sell) AS i_sell FROM table_data WHERE i_id IN (SELECT i_id FROM table_data WHERE Gmt_create >= '2011-10-07 00:00:00') GROUP BY i_id;(备注:sql的业务逻辑能够打个比方:先查询出10-07号新卖出的100本书,而后在查询这新卖出的100本书在整年的销量状况)。
这条sql之因此出现的性能问题在于mysql优化器在处理子查询的弱点,mysql优化器在处理子查询的时候,会将将子查询改写。一般状况下,咱们但愿由内到外,先完成子查询的结果,而后在用子查询来驱动外查询的表,完成查询;可是mysql处理为将会先扫描外面表中的全部数据,每条数据将会传到子查询中与子查询关联,若是外表很大的话,那么性能上将会出现问题;
针对上面的查询,因为table_data这张表的数据有70W的数据,同时子查询中的数据较多,有大量是重复的,这样就须要关联近70W次,大量的关联致使这条sql执行了几个小时也没有执行完成,因此咱们须要改写sql:
sql
SELECT t2.i_id, SUM(t2.i_sell) AS sold FROM (SELECT DISTINCT i_id FROM table_data WHERE gmt_create >= '2011-10-07 00:00:00') t1, table_data t2 WHERE t1.i_id = t2.i_id GROUP BY t2.i_id;咱们将子查询改成了关联,同时在子查询中加上distinct,减小t1关联t2的次数;
用户反馈数据库响应较慢,许多业务动更新被卡住;登陆到数据库中观察,发现长时间执行的sql;
数据库
| 10437 | usr0321t9m9 | 10.242.232.50:51201 | oms | Execute | 1179 | Sending Sql为: SELECT tradedto0_.* FROM a1 tradedto0_ WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid IN (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15;
UPDATE a1 SET tradesign='DAB67634-795C-4EAC-B4A0-78F0D531D62F', markColor=' #CD5555', memotime='2012-09- 22', markPerson='??' WHERE tradeoid IN ('gy2012092204495100032') ;为了尽快恢复应用,将其长时间执行的sql kill掉后,应用恢复正常;
db@3306 :explain SELECT tradedto0_.* FROM a1 tradedto0_ WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid IN (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; +----+--------------------+------------+------+---------------+------+---------+------+-------+----- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+------------+------+---------------+------+---------+------+-------+----- | 1 | PRIMARY | tradedto0_ | ALL | NULL | NULL | NULL | NULL | 27454 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | orderdto1_ | ALL | NULL | NULL | NULL | NULL | 40998 | Using where | +----+--------------------+------------+------+---------------+------+---------+------+-------+-----从执行计划上,咱们开始一步一步地进行优化:
db@3306:alter table a2 add index ind_a2(proname,procode,tradeoid); ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes添加组合索引超过了最大key length限制:
db@3306 :DESC a2 ; +---------------------+---------------+------+-----+---------+-------+ | FIELD | TYPE | NULL | KEY | DEFAULT | Extra | +---------------------+---------------+------+-----+---------+-------+ | OID | VARCHAR(50) | NO | PRI | NULL | | | TRADEOID | VARCHAR(50) | YES | | NULL | | | PROCODE | VARCHAR(50) | YES | | NULL | | | PRONAME | VARCHAR(1000) | YES | | NULL | | | SPCTNCODE | VARCHAR(200) | YES | | NULL | |
db@3306 :SELECT MAX(LENGTH(PRONAME)),avg(LENGTH(PRONAME)) FROM a2; +----------------------+----------------------+ | MAX(LENGTH(PRONAME)) | avg(LENGTH(PRONAME)) | +----------------------+----------------------+ | 95 | 24.5588 |
ALTER TABLE MODIFY COLUMN PRONAME VARCHAR(156);再进行执行计划分析:
db@3306 :explain SELECT tradedto0_.* FROM a1 tradedto0_ WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid IN (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; +----+--------------------+------------+-------+-----------------+----------------------+---------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+------------+-------+-----------------+----------------------+---------+ | 1 | PRIMARY | tradedto0_ | ref | ind_tradestatus | ind_tradestatus | 345 | const,const,const,const | 8962 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index | +----+--------------------+------------+-------+-----------------+----------------------+---------+发现性能仍是上不去,关键在两个表扫描的行数并无减少(8962*41005),上面添加的索引没有太大的效果,如今查看t表的执行结果:
db@3306 : SELECT orderdto1_.tradeoid FROM t orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%'; Empty SET (0.05 sec)结果集为空,因此须要将t表的结果集作做为驱动表;
经过上面测试验证,普通的mysql子查询写法性能上是不好的,为mysql的子查询自然的弱点,须要将sql进行改写为关联的写法:
mysql优化
SELECT tradedto0_.* FROM a1 tradedto0_ , (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2 WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid=t2.tradeoid) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15;
db@3306 :explain SELECT tradedto0_.* FROM a1 tradedto0_ , (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2 WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid=t2.tradeoid) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; +----+-------------+------------+-------+---------------+----------------------+---------+------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+----------------------+---------+------+ | 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables | | 2 | DERIVED | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index | +----+-------------+------------+-------+---------------+----------------------+---------+------+
db@3306 : SELECT tradedto0_.* FROM a1 tradedto0_ , (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2 WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid=t2.tradeoid) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; Empty SET (0.03 sec)缩短到了毫秒;
[1] 生产库中遇到mysql的子查询 http://hidba.org/?p=412 oracle
[2] 浅谈mysql的子查询 http://hidba.org/?p=624 性能
[3] mysql子查询的弱点 http://hidba.org/?p=260 测试