MySQL-join的初步优化

时间 2019-11-06

标签 mysql join 初步优化栏目 MySQL 繁體版

原文原文链接

强调一下关于 MySQL5.7 group 的问题ψ(*｀ー´)ψphp

一、MySQL中MAX函数与Group By一块儿使用的注意事项
条件：同一个user_role
业务：根据权限分组查询user_id最大的数据html

MySQL 5.5 版本 mysql> select * from user_role; +----+---------+---------+ | id | user_id | role_id | +----+---------+---------+ | 3 | 3 | 3 | | 4 | 4 | 4 | | 5 | 0 | 0 | | 11 | 17 | 1 | | 13 | 19 | 3 | | 16 | 22 | 2 | | 18 | 24 | 2 | | 19 | 25 | 3 | | 22 | 31 | 1 | +----+---------+---------+ 9 rows in set (0.00 sec) 以下就是咱们平时的SQL写法 mysql> select id,role_id,max(user_id) from user_role group by role_id; +----+---------+--------------+ | id | role_id | max(user_id) | +----+---------+--------------+ | 5 | 0 | 0 | | 11 | 1 | 31 | | 16 | 2 | 24 | | 3 | 3 | 25 | | 4 | 4 | 4 | +----+---------+--------------+ 5 rows in set (0.00 sec)

可是会很容易的发现实际上数据并不对好比咱们查询id为11的数据，user_id并非31mysql

mysql> select * from user_role where id = 11; +----+---------+---------+ | id | user_id | role_id | +----+---------+---------+ | 11 | 17 | 1 | +----+---------+---------+ 1 row in set (0.00 sec) 因此这里须要注意一下！！！SQL改成以下方式： select id,role_id,user_id from ( select * from user_role order by user_id desc ) as a group by role_id +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 5 | 0 | 0 | | 22 | 1 | 31 | | 18 | 2 | 24 | | 19 | 3 | 25 | | 4 | 4 | 4 | +----+---------+---------+ 5 rows in set (0.00 sec) 测试id = 22 数据 mysql> select * from user_role where id = 22; +----+---------+---------+ | id | user_id | role_id | +----+---------+---------+ | 22 | 31 | 1 | +----+---------+---------+ 1 row in set (0.00 sec)

切换为MySQL 5.7 测试 !!!∑(ﾟДﾟノ)ノ而后发现又是一个坑（*゜Д゜）σ凸←自爆按钮c++

mysql> select id,role_id,user_id from ( -> select * from user_role order by user_id desc -> ) as a group by role_id; +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 5 | 0 | 0 | | 11 | 1 | 17 | | 16 | 2 | 22 | | 3 | 3 | 3 | | 4 | 4 | 4 | +----+---------+---------+ 5 rows in set (0.00 sec) 数据不对呀(〃＞皿＜) 实际上MySQL 5.7 对于SQL进行了改写 mysql> explain select id,role_id,user_id from (select * from user_role order by user_id desc ) as a group by role_id; +----+-------------+-----------+------------+------+------+------+----------+---------------------------------+ | id | select_type | table | partitions | type | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+------+------+----------+---------------------------------+ | 1 | SIMPLE | user_role | NULL | ALL | NULL | 9 | 100.00 | Using temporary; Using filesort | +----+-------------+-----------+------------+------+------+------+----------+---------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show warnings \G; *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `mysql12`.`user_role`.`id` AS `id`,`mysql12`.`user_role`.`role_id` AS `role_id`,`mysql12`.`user_role`.`user_id` AS `user_id` from `mysql12`.`user_role` group by `mysql12`.`user_role`.`role_id` 1 row in set (0.00 sec)

官方的解释：https://bugs.mysql.com/bug.php?id=80131算法

那么可实行的方法：sql

一、 select u.id,u.role_id,u.user_id from user_role u, (select max(user_id) max_user_id,role_id from user_role group by role_id ) as a where a.max_user_id = u.user_id and a.role_id = u.role_id +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 4 | 4 | 4 | | 5 | 0 | 0 | | 18 | 2 | 24 | | 19 | 3 | 25 | | 22 | 1 | 31 | +----+---------+---------+ 5 rows in set (0.00 sec) 二、 select u.id,u.role_id,u.user_id from user_role u left join user_role a on u.role_id = a.role_id and u.user_id < a.user_id where a.user_id is null; 推介1把容易理解，相对来讲效率高一些 关于MySQL5.7 group 与 order 问题 还有解决方法就是 mysql> select id,role_id,user_id from (select * from user_role order by user_id desc limit 0,100 ) as a group by role_id; +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 5 | 0 | 0 | | 22 | 1 | 31 | | 18 | 2 | 24 | | 19 | 3 | 25 | | 4 | 4 | 4 | +----+---------+---------+ 5 rows in set (0.00 sec) 必需要加上limit 才能够

回到题目中，根据题目的意思实际上咱们须要分为两条SQL 经过 union all 和在一块儿(这是简单版本的)数据库

explain select a.name, b.income FROM customers1 a, ( select city,gender,min(monthsalary * 12 + yearbonus) as income from customers1 ignore index(idx_gender_city_monthsalary) group by city,gender ) b where a.city = b.city and a.gender = b.gender and (a.monthsalary * 12 + a.yearbonus) = b.income select a.name, b.income FROM customers1 a, ( select city,gender,max(monthsalary * 12 + yearbonus) as income from customers1 ignore index(idx_gender_city_monthsalary) group by city,gender ) b where a.city = b.city and a.gender = b.gender and (a.monthsalary * 12 + a.yearbonus) = b.income 优化的方式很简单能够在以前建立以下索引 alter table customers1 add index idx_gender_city_name_monthsalary_yearbonus(gender, city, name, monthsalary, yearbonus); alter table customers1 add index idx_gender_city_monthsalary_yearbonus(gender, city, monthsalary, yearbonus);

固然有同窗可能就会疑惑(*•ω•)，就是这优化就是对于对应的SQL语句而后建立联合索引嘛？缓存

若是从必定要把这条SQL优化来讲是那么回事，可是最主要的其实仍是咱们应该怎么去创建这个索引，这是咱们所须要关注的；在整个优化的过程当中咱们实际上主要都是尽可能使用覆盖索引去帮助咱们优化SQL;函数

实际上索引的创建最主要的几个点从咱们的案例中诠释的：...等等我仍是先再从新声明一下关于MySQL对于索引的选择 ლ(⁰⊖⁰ლ)最近有学员仍是不懂呀：mysql对于索引的选择主要是看where条件上的字段，它并不会优先考虑你须要获取的是那些字段列oop

好比：

show index from customers1; +------------+------------+---------------------------------------+--------------+-------------+-----------+-------------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Index_type | +------------+------------+---------------------------------------+--------------+-------------+-----------+-------------+------------+ | customers1 | 0 | PRIMARY | 1 | id | A | 577859 | BTREE | | customers1 | 1 | idx_monthsalary_yearbonus_birthdate | 1 | monthsalary | A | 450702 | BTREE | | customers1 | 1 | idx_monthsalary_yearbonus_birthdate | 2 | yearbonus | A | 555676 | BTREE | | customers1 | 1 | idx_monthsalary_yearbonus_birthdate | 3 | birthdate | A | 577859 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 1 | gender | A | 1 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 2 | city | A | 21 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 3 | monthsalary | A | 577859 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 4 | yearbonus | A | 577859 | BTREE | +------------+------------+---------------------------------------+--------------+-------------+-----------+-------------+------------+ 8 rows in set (0.01 sec) explain select count(monthsalary) from customers1; +----+-------------+------------+-------+-------------------------------------+---------+--------+----------+-------------+ | id | select_type | table | type | key | key_len | rows | filtered | Extra | +----+-------------+------------+-------+-------------------------------------+---------+--------+----------+-------------+ | 1 | SIMPLE | customers1 | index | idx_monthsalary_yearbonus_birthdate | 13 | 577859 | 100.00 | Using index | +----+-------------+------------+-------+-------------------------------------+---------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) explain select count(monthsalary) from customers1 where photo = "xxx"; +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | customers1 | NULL | ALL | NULL | NULL | NULL | NULL | 577859 | 10.00 | Using where | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)

如上的两条SQL，会发现第一条SQL使用到了idx_monthsalary_yearbonus_birthdate索引，可是第二个确没有使用到这个索引仅仅只是加了where条件，由于对于MySQL来讲索引的最大做用是用来涮选数据，count(monthsalary) 是须要获取的数据，第二个没有使用到是由于MySQL须要过滤一部分数据就须要根据where过滤，而idx_monthsalary_yearbonus_birthdate中并不包含这个字段因此不能使用

注意！！！关于使用idx_gender_city_monthsalary_yearbonus以后的状况

explain select a.name, b.income FROM customers1 a, ( select city,gender,min(monthsalary * 12 + yearbonus) as income from customers1 group by gender,city ) b where a.gender = b.gender and a.city = b.city and (a.monthsalary * 12 + a.yearbonus) = b.income\G; *************************** 1. row *************************** id: 1 select_type: PRIMARY table: a partitions: NULL type: ALL possible_keys: idx_gender_city_monthsalary_yearbonus key: NULL key_len: NULL ref: NULL rows: 577859 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: <derived2> partitions: NULL type: ref possible_keys: <auto_key0> key: <auto_key0> key_len: 40 ref: mysql12.a.city,mysql12.a.gender,func rows: 10 filtered: 100.00 Extra: Using where; Using index *************************** 3. row *************************** id: 2 select_type: DERIVED table: customers1 partitions: NULL type: index possible_keys: idx_gender_city_monthsalary_yearbonus key: idx_gender_city_monthsalary_yearbonus key_len: 43 ref: NULL rows: 577859 filtered: 100.00 Extra: Using index 3 rows in set, 1 warning (0.04 sec)

咱们会发现MySQL仅仅只是使用了一次idx_gender_city_monthsalary_yearbonus，可是在外部链接中实际上where上是存在着索引中的这些字段，那应该用的到索引呀？∑(´△｀)？！

注意：这里并非由于与最后的(a.monthsalary * 12 + a.yearbonus) = b.income 这部分的计算而是另外的缘由，咱们能够经过show warnings \G;查看一下MySQL对于咱们执行的SQL语句进行分析

show warnings \G; *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `mysql12`.`a`.`name` AS `name`,`b`.`income` AS `income` from `mysql12`.`customers1` `a` join (/* select#2 */ select `mysql12`.`customers1`.`c ity` AS `city`,`mysql12`.`customers1`.`gender` AS `gender`,min(((`mysql12`.`customers1`.`monthsalary` * 12) + `mysql12`.`customers1`.`yearbonus`)) AS `income` from `mysql12 `.`customers1` group by `mysql12`.`customers1`.`gender`,`mysql12`.`customers1`.`city`) `b` where ((`b`.`city` = `mysql12`.`a`.`city`) and (`b`.`gender` = `mysql12`.`a`.`gen der`) and (((`mysql12`.`a`.`monthsalary` * 12) + `mysql12`.`a`.`yearbonus`) = `b`.`income`)) 1 row in set (0.01 sec) ERROR: No query specified 美化一下ヾ(=･ω･=)o SELECT `mysql12`.`a`.`name` AS `name`,`b`.`income` AS `income` FROM `mysql12`.`customers1` `a` JOIN ( SELECT `mysql12`.`customers1`.`city` AS `city`, `mysql12`.`customers1`.`gender` AS `gender`, min((`mysql12`.`customers1`.`monthsalary` * 12) + `mysql12`.`customers1`.`yearbonus`) AS `income` FROM `mysql12`.`customers1` GROUP BY `mysql12`.`customers1`.`gender`,`mysql12`.`customers1`.`city` ) `b` WHERE `b`.`city` = `mysql12`.`a`.`city` AND `b`.`gender` = `mysql12`.`a`.`gender` AND ((`mysql12`.`a`.`monthsalary` * 12) + `mysql12`.`a`.`yearbonus`) = `b`.`income`

从上能够看出MySQL对于咱们的SQL重写以后，把本来的条件顺序置换了成了一个独立列字段，在MySQL中对于独立字段是不会使用索引的以下例子：

explain select * from customers1 where gender - 1 = 0; +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | customers1 | NULL | ALL | NULL | NULL | NULL | NULL | 577859 | 100.00 | Using where | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) explain select * from customers1 where gender = 0; +----+-------------+------------+------+---------------------------------------+---------------------------------------+---------+-------+--------+----------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | +----+-------------+------------+------------+------+---------------------------------------+---------------------------------------+---------+-------+--------+----------+- ------+ | 1 | SIMPLE | customers1 | ref | idx_gender_city_monthsalary_yearbonus | idx_gender_city_monthsalary_yearbonus | 1 | const | 288929 | 100.00 | +----+-------------+------------+------+---------------------------------------+---------------------------------------+---------+-------+--------+----------+ 1 row in set, 1 warning (0.00 sec)

因此就出现了如上的问题,对于子查询的查询，这是MySQL版本变革以后的处理

2. 5.5与5.7版本之间子查询的区别

在MySQL的5.6/5.7的版本中对于子查询作出了相应的优化处理，在MySQL5.5及以前的版本中对于子查询仅仅只是一个功能而已，性能差在开发中尽量的避免。

由于在5.5以前对于子查询的查询方式是先查询的外层的数据表，而后再去查询内表也就是经过外部表驱动内表，而在5.6之后对于子查询进行了优化MySQL内部的优化器把子查询改写成关联查询。

测试表 yd_admin_user 数据量 49条 desc yd_admin_user; +-----------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+--------------+------+-----+---------+----------------+ | user_id | smallint(5) | NO | PRI | NULL | auto_increment | | user_name | varchar(60) | NO | | | | | password | varchar(60) | NO | | | | | role_id | smallint(5) | YES | | 0 | | | real_name | varchar(30) | YES | | | | | mobile | char(20) | YES | | | | | email | varchar(100) | YES | | | | +-----------+--------------+------+-----+---------+----------------+ 7 rows in set (0.00 sec) role数据量13条 desc role; +----------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+--------------+------+-----+---------+----------------+ | id | smallint(5) | NO | PRI | NULL | auto_increment | | name | varchar(20) | YES | | | | | department_ids | varchar(200) | YES | | | | | action_list | text | YES | | NULL | | | do_list | text | YES | | NULL | | | add_time | int(10) | YES | | 0 | | +----------------+--------------+------+-----+---------+----------------+ 6 rows in set (0.00 sec) 不用salary以及customers是由于数据量太多了，导出导入太麻烦(ﾟДﾟ*)ﾉ

业务就对于两个表进行联查经过子查询的方式找出有权限的数据

可能查询方面不和理论(ー`´ー)，将就一下；可是仍是能够说明问题的(・ω<) ﾃﾍﾍﾟﾛ

在MySQL5.5版本中

explain select * from yd_admin_user where user_id in (select id from role where id = 1); +----+--------------------+---------------+-------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------------+-------+---------------+---------+---------+-------+------+-------------+ | 1 | PRIMARY | yd_admin_user | ALL | NULL | NULL | NULL | NULL | 49 | Using where | | 2 | DEPENDENT SUBQUERY | role | const | PRIMARY,id | PRIMARY | 2 | const | 1 | Using index | +----+--------------------+---------------+-------+---------------+---------+---------+-------+------+-------------+ 2 rows in set (0.07 sec)

MySQL5.7版本

explain select * from yd_admin_user where user_id in (select id from role where id = 1); +----+-------------+---------------+-------+---------------+---------+---------+-------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------+-------+---------------+---------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | yd_admin_user | const | PRIMARY | PRIMARY | 2 | const | 1 | 100.00 | NULL | | 1 | SIMPLE | role | const | PRIMARY,id | PRIMARY | 2 | const | 1 | 100.00 | Using index | +----+-------------+---------------+-------+---------------+---------+---------+-------+------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) show warnings \G; *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select '1' AS `user_id`,'admin' AS `user_name`,'14e1b600b1fd579f47433b88e8d85291' AS `password`,'1' AS `role_id`,'9' AS `d_id`,'1515330851' AS `last_login`,'127.0.0.1' AS `last_ip`,'Admin' AS `real_name`,'0' AS `add_time`,'0' AS `is_disable`,'13632483323' AS `mobile`,'' AS `email`,'超级管理员' AS `job`,'0' AS `is_responsible` from `mysql12`.`role` join `mysql12`.`yd_admin_user` where 1 1 row in set (0.00 sec)

最主要的是你能够看看，在查询的时候MySQL经过explain分析之不一样版本对于数据表的查询状况。

在官方网站中有如上内容对于join的解释：咱们根据前面的两条SQL也作一个解释

5.5 版本：

table	type
yd_admin_user	All
role	const

for each row in yd_admin_user{ 循环yd_admin_user for each row in role matching const { 循环 role 根据 const 匹配 if row satisfies join conditions and where id 知足 join条件和where就返回 } } send to client

5.7 版本：

table	type
yd_admin_user	const
role	const

role_data 先循环role 获取到where的结果 for each row in role matching const { role_data = if row satisfies where id, send to client } 再去根据结果经过join条件查询yd_admin_user的数据 for each in role_data{ for each row in yd_admin_user matching const{ if row satisfies join conditions , send to client } }

3. 问题解答环境

过....

show warnings 这是MySQL5.7的小操做能够查看explain执行以后优化器从新的SQL

4. join算法

SQL中对于join的实现主要是经过Nest额的 Loop join算法处理的，其余数据库多是使用hash join以及sort merge join。NLJ实际上就是经过驱动表的结构及做为循环基础数据，而后讲该结果集中的数据做为过滤条件一条条第到下一个表中查询数据，最后合并结构。若是还有第三个表参与join，则把前面两个表的join结果集做为循环基础数据，再一次经过循环查询条件到第三个表中查询数据，以此往下推

优化的思路：尽量减小 Join 语句中的 Nested Loop 的循环总次数；如何减小 Nested Loop 的循环总次数？最有效的办法只有一个，那就是让驱动表的结果集尽量的小，这也正是在本章第二节中的优化基本原则之一“永远用小结果集驱动大的结果集”。

优先优化 Nested Loop 的内层循环；
保证 Join 语句中被驱动表上 Join 条件字段已经被索引；
当没法保证被驱动表的 Join 条件字段被索引且内存资源充足的前提下，不要太吝惜 JoinBuffer 的设置；

1.1 Nested-Loop Join算法解释

官网join算法 https://www.docs4dev.com/docs/zh/mysql/5.7/reference/nested-loop-joins.html

Simple Nested-Loop Join

以下图，r为驱动表，s为匹配表，能够看到从r中分别取出r一、r二、......、rn去匹配s表的左右列，而后再合并数据，对s表进行了rn次访问，对数据库开销大

Index Nested-Loop Join（索引嵌套）：

这个要求非驱动表（匹配表s）上有索引，能够经过索引来减小比较，加速查询。在查询时，驱动表（r）会根据关联字段的索引进行查找，挡在索引上找到符合的值，再回表进行查询，也就是只有当匹配到索引之后才会进行回表查询。若是非驱动表（s）的关联健是主键的话，性能会很是高，若是不是主键，要进行屡次回表查询，先关联索引，而后根据二级索引的主键ID进行回表操做，性能上比索引是主键要慢。

Block Nested-Loop Join：

若是有索引，会选取第二种方式进行join，但若是join列没有索引，就会采用Block Nested-Loop Join。能够看到中间有个join buffer缓冲区，是将驱动表的全部join相关的列都先缓存到join buffer中，而后批量与匹配表进行匹配，将第一种屡次比较合并为一次，下降了非驱动表（s）的访问频率。默认状况下join_buffer_size=256K，在查找的时候MySQL会将全部的须要的列缓存到join buffer当中，包括select的列，而不是仅仅只缓存关联列。在一个有N个JOIN关联的SQL当中会在执行时候分配N-1个join buffer。

join_buffer_size 官方解释：https://www.docs4dev.com/docs/zh/mysql/5.7/reference/server-system-variables.html

5. 链接驱动表选择

经过来讲就是explain分析以后结果的第一个表就是驱动表

选择方式：

MySQL会先获取数据表的结构根据结构进行排序小的在为驱动表
而后看链接的顺序
根据where进行判断，看where上是否含有某一个表中的字段