SQL关联查询直接join 和子查询的区别

时间 2020-01-15

标签 sql 关联查询直接 join 区别栏目 SQL 繁體版

原文原文链接

运营组的同事最近提出一个需求，但愿能够统计出用系统用户及订单状况，因而乎咱们很想固然的写出了一个统计SQL，用户表user和行程表直接join，而且针对行程作了group，但SQL执行速度出奇的慢。数据库

explain select  users.`mobile_num`, concat(users.`lastName` ,users.`firstName`) as userName, users.`company`,
  (case `users`.`idPhotoCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `idPhotoCheckStatus`,
  (case `users`.`driverLicenseCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `driverLicenseCheckStatus`,
  (case `users`.`companyCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `companyCheckStatus`,
  (case `users`.`unionCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `unionCheckStatus`,
  count(passenger_trip.id) as ptrip_num
from users
left join passenger_trip on passenger_trip.userId = users.id  and passenger_trip.status != 'cancel'
left join driver_trip on driver_trip.`userId`=users.`id` and driver_trip.`status` != 'cancel'
where company != '本公司名' and company != '本公司昵称'

当时的第一反应是数据库挂住了，由于用户表的数据量10W左右，行程表的数据也是10W左右，不可能这么慢！经过explain查看分析计划，而且查看过关联字段的索引状况，发现这是一个最多见的关联查询，固然是经过join实现。ide

转而一想，10W*10W，通过笛卡尔集以后，这不是百亿级的数据筛选吗？！因而换了一种写法进行尝试。3d

explain select  users.`mobile_num`, concat(users.`lastName` ,users.`firstName`) as userName, users.`company`,
  (case `users`.`idPhotoCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `idPhotoCheckStatus`,
  (case `users`.`driverLicenseCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `driverLicenseCheckStatus`,
  (case `users`.`companyCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `companyCheckStatus`,
  (case `users`.`unionCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `unionCheckStatus`,
  (select count(passenger_trip.id) from  passenger_trip where  passenger_trip.userId = users.id  and passenger_trip.status != 'cancel') as ptrip_num,
  (select count(driver_trip.id) from  driver_trip where  driver_trip.userId = users.id  and driver_trip.status != 'cancel') as dtrip_num
from users
where company != '本公司名' and company != '公司昵称'

这样的效果竟然比直接join快了N倍，执行速度从未知到10秒内返回，查看执行计划：code

进一步调整SQL进行尝试：blog

explain select  users.`mobile_num`, concat(users.`lastName` ,users.`firstName`) as userName, users.`company`,
  (case `users`.`idPhotoCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `idPhotoCheckStatus`,
  (case `users`.`driverLicenseCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `driverLicenseCheckStatus`,
  (case `users`.`companyCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `companyCheckStatus`,
  (case `users`.`unionCheckStatus` when '2' then '已认证' when '3' then '已驳回' else '待认证' end) as `unionCheckStatus`,
 ptrip_num, dtrip_num
from users 
 left  join 
 (select count(passenger_trip.id)  as ptrip_num, passenger_trip.`userId` from  passenger_trip where  passenger_trip.status != 'cancel' group by passenger_trip.`userId` ) as ptrip
 on ptrip.userId = users.id
 left join 
 (select count(driver_trip.id)  as dtrip_num, driver_trip.`userId` from  driver_trip where  driver_trip.status != 'cancel' group by driver_trip.`userId` ) as dtrip
 on dtrip.userId = users.id
where company != '本公司名' and company != '公司昵称'

竟然5秒内返回，这才是正常的预期，10W级的数据筛选，应该是几秒内返回的！排序

出现这种差异的缘由，其实很简单，SQL语句执行的时候是有必定顺序的。索引

from 先选择一个表，构成一个结果集。
where 对结果集进行筛选，筛选出须要的信息造成新的结果集。
group by 对新的结果集分组。
having 筛选出想要的分组。
select 选择列。
order by 当全部的条件都弄完了。最后排序。

第一种写法，直接join的结果，就是在100亿条数据中进行筛选；
后面两种则是优先执行子查询，完成10W级别的查询，再进行一次主表10W级的关联查询，因此数量级明显少于第一种写法。ip

SQL关联查询 直接join 和子查询的区别

SQL关联查询直接join 和子查询的区别