一次 group by + order by 性能优化分析

时间 2019-11-06

标签一次 group order 性能优化分析栏目系统性能繁體版

原文原文链接

原文：个人我的博客 https://mengkang.net/1302.html

工做了两三年，技术停滞不前，迷茫没有方向，不如看下个人直播 PHP 进阶之路（金三银四跳槽必考，通常人我不告诉他）

最近经过一个日志表作排行的时候发现特别卡，最后问题获得了解决，梳理一些索引和MySQL执行过程的经验，可是最后仍是有5个谜题没解开，但愿你们帮忙解答下html

主要包含以下知识点mysql

用数听说话证实慢日志的扫描行数究竟是如何统计出来的
从 group by 执行原理找出优化方案
排序的实现细节
gdb 源码调试

背景

须要分别统计本月、本周被访问的文章的 TOP10。日志表以下sql

CREATE TABLE `article_rank` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `aid` int(11) unsigned NOT NULL,
  `pv` int(11) unsigned NOT NULL DEFAULT '1',
  `day` int(11) NOT NULL COMMENT '日期 例如 20171016',
  PRIMARY KEY (`id`),
  KEY `idx_day_aid_pv` (`day`,`aid`,`pv`),
  KEY `idx_aid_day_pv` (`aid`,`day`,`pv`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

准备工做

为了可以清晰的验证本身的一些猜测，在虚拟机里安装了一个 debug 版的 mysql，而后开启了慢日志收集，用于统计扫描行数数据库

安装

下载源码
编译安装
建立 mysql 用户
初始化数据库
初始化 mysql 配置文件
修改密码

若是你兴趣，具体能够参考个人博客，一步步安装 https://mengkang.net/1335.html

开启慢日志

编辑配置文件，在[mysqld]块下添加json

slow_query_log=1
slow_query_log_file=xxx
long_query_time=0
log_queries_not_using_indexes=1

性能分析

发现问题

假如我须要查询2018-12-20 ~ 2018-12-24这5天浏览量最大的10篇文章的 sql 以下，首先使用explain看下分析结果segmentfault

mysql> explain select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+
| id | select_type | table        | partitions | type  | possible_keys                 | key            | key_len | ref  | rows   | filtered | Extra                                                     |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | article_rank | NULL       | range | idx_day_aid_pv,idx_aid_day_pv | idx_day_aid_pv | 4       | NULL | 404607 |   100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+

系统默认会走的索引是idx_day_aid_pv，根据Extra信息咱们能够看到，使用idx_day_aid_pv索引的时候，会走覆盖索引，可是会使用临时表，会有排序。数组

咱们查看下慢日志里的记录信息bash

# Time: 2019-03-17T03:02:27.984091Z
# User@Host: root[root] @ localhost []  Id:     6
# Query_time: 56.959484  Lock_time: 0.000195 Rows_sent: 10  Rows_examined: 1337315
SET timestamp=1552791747;
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

为何扫描行数是 1337315

咱们查询两个数据，一个是知足条件的行数，一个是group by统计以后的行数。app

mysql> select count(*) from article_rank where day>=20181220 and day<=20181224;
+----------+
| count(*) |
+----------+
|   785102 |
+----------+

mysql> select count(distinct aid) from article_rank where day>=20181220 and day<=20181224;
+---------------------+
| count(distinct aid) |
+---------------------+
|              552203 |
+---------------------+

发现知足条件的总行数（785102）+group by 以后的总行数（552203）+limit 的值 = 慢日志里统计的 Rows_examined。less

要解答这个问题，就必须搞清楚上面这个 sql 到底分别都是如何运行的。

执行流程分析

索引示例

为了便于理解，我按照索引的规则先模拟idx_day_aid_pv索引的一小部分数据

day	aid	pv	id
20181220	1	23	1234
20181220	3	2	1231
20181220	4	1	1212
20181220	7	2	1221
20181221	1	5	1257
20181221	10	1	1251
20181221	11	8	1258

由于索引idx_day_aid_pv最左列是day，因此当咱们须要查找20181220~20181224之间的文章的pv总和的时候，咱们须要遍历20181220~20181224这段数据的索引。

查看 optimizer trace 信息

# 开启 optimizer_trace
set optimizer_trace='enabled=on';
# 执行 sql 
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
# 查看 trace 信息
select trace from `information_schema`.`optimizer_trace`\G;

摘取里面最后的执行结果以下

{
  "join_execution": {
    "select#": 1,
    "steps": [
      {
        "creating_tmp_table": {
          "tmp_table_info": {
            "table": "intermediate_tmp_table",
            "row_length": 20,
            "key_length": 4,
            "unique_constraint": false,
            "location": "memory (heap)",
            "row_limit_estimate": 838860
          }
        }
      },
      {
        "converting_tmp_table_to_ondisk": {
          "cause": "memory_table_size_exceeded",
          "tmp_table_info": {
            "table": "intermediate_tmp_table",
            "row_length": 20,
            "key_length": 4,
            "unique_constraint": false,
            "location": "disk (InnoDB)",
            "record_format": "fixed"
          }
        }
      },
      {
        "filesort_information": [
          {
            "direction": "desc",
            "table": "intermediate_tmp_table",
            "field": "num"
          }
        ],
        "filesort_priority_queue_optimization": {
          "limit": 10,
          "rows_estimate": 1057,
          "row_size": 36,
          "memory_available": 262144,
          "chosen": true
        },
        "filesort_execution": [
        ],
        "filesort_summary": {
          "rows": 11,
          "examined_rows": 552203,
          "number_of_tmp_files": 0,
          "sort_buffer_size": 488,
          "sort_mode": "<sort_key, additional_fields>"
        }
      }
    ]
  }
}

分析临时表字段

mysql gdb 调试更多细节 https://mengkang.net/1336.html

经过gdb调试确认临时表上的字段是aid和num

Breakpoint 1, trace_tmp_table (trace=0x7eff94003088, table=0x7eff94937200) at /root/newdb/mysql-server/sql/sql_tmp_table.cc:2306
warning: Source file is more recent than executable.
2306      trace_tmp.add("row_length",table->s->reclength).
(gdb) p table->s->reclength
$1 = 20
(gdb) p table->s->fields
$2 = 2
(gdb) p (*(table->field+0))->field_name
$3 = 0x7eff94010b0c "aid"
(gdb) p (*(table->field+1))->field_name
$4 = 0x7eff94007518 "num"
(gdb) p (*(table->field+0))->row_pack_length()
$5 = 4
(gdb) p (*(table->field+1))->row_pack_length()
$6 = 15
(gdb) p (*(table->field+0))->type()
$7 = MYSQL_TYPE_LONG
(gdb) p (*(table->field+1))->type()
$8 = MYSQL_TYPE_NEWDECIMAL
(gdb)

经过上面的打印，确认了字段类型，一个aid是MYSQL_TYPE_LONG，占4字节，num是MYSQL_TYPE_NEWDECIMAL，占15字节。

The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL), and a DOUBLE value for approximate-value arguments (FLOAT or DOUBLE). (Before MySQL 5.0.3, SUM() and AVG() return DOUBLE for all numeric arguments.)

可是经过咱们上面打印信息能够看到两个字段的长度加起来是19，而optimizer_trace里的tmp_table_info.reclength是20。经过其余实验也发现table->s->reclength的长度就是table->field数组里面全部字段的字段长度和再加1。

总结执行流程

尝试在堆上使用memory的内存临时表来存放group by的数据，发现内存不够；
建立一张临时表，临时表上有两个字段，aid和num字段（sum(pv) as num）；
从索引idx_day_aid_pv中取出1行，插入临时表。插入规则是若是aid不存在则直接插入，若是存在，则把pv的值累加在num上；
循环遍历索引idx_day_aid_pv上20181220~20181224之间的全部行，执行步骤3；
对临时表根据num的值作优先队列排序；
取出最后留在堆（优先队列的堆）里面的10行数据，做为结果集直接返回，不须要再回表；

补充说明优先队列排序执行步骤分析：

在临时表（未排序）中取出前 10 行，把其中的num和aid做为10个元素构成一个小顶堆，也就是最小的 num 在堆顶。
取下一行，根据 num 的值和堆顶值做比较，若是该字大于堆顶的值，则替换掉。而后将新的堆作堆排序。
重复步骤2直到第 552203 行比较完成。

优化

方案1 使用 idx_aid_day_pv 索引

# Query_time: 4.406927  Lock_time: 0.000200 Rows_sent: 10  Rows_examined: 1337315
SET timestamp=1552791804;
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

扫描行数都是1337315，为何执行消耗的时间上快了12倍呢？

索引示例

为了便于理解，一样我也按照索引的规则先模拟idx_aid_day_pv索引的一小部分数据

aid	day	pv	id
1	20181220	23	1234
1	20181221	5	1257
3	20181220	2	1231
3	20181222	22	1331
3	20181224	13	1431
4	20181220	1	1212
7	20181220	2	1221
10	20181221	1	1251
11	20181221	8	1258

group by 不须要临时表的状况

为何性能上比 SQL1 高了，不少呢，缘由之一是idx_aid_day_pv索引上aid是肯定有序的，那么执行group by的时候，则不会建立临时表，排序的时候才须要临时表。若是印证这一点呢，咱们经过下面的执行计划就能看到

使用idx_day_aid_pv索引的效果：

mysql> explain select aid,sum(pv) as num from article_rank force index(idx_day_aid_pv) where day>=20181220 and day<=20181224 group by aid order by null limit 10;
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+
| id | select_type | table        | partitions | type  | possible_keys                 | key            | key_len | ref  | rows   | filtered | Extra                                     |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+
|  1 | SIMPLE      | article_rank | NULL       | range | idx_day_aid_pv,idx_aid_day_pv | idx_day_aid_pv | 4       | NULL | 404607 |   100.00 | Using where; Using index; Using temporary |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+

注意我上面使用了order by null表示强制对group by的结果不作排序。若是不加order by null，上面的 sql 则会出现Using filesort

使用idx_aid_day_pv索引的效果：

mysql> explain select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by null limit 10;
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+
| id | select_type | table        | partitions | type  | possible_keys                 | key            | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | article_rank | NULL       | index | idx_day_aid_pv,idx_aid_day_pv | idx_aid_day_pv | 12      | NULL |   10 |    11.11 | Using where; Using index |
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+

查看 optimizer trace 信息

# 开启optimizer_trace
set optimizer_trace='enabled=on';
# 执行 sql 
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
# 查看 trace 信息
select trace from `information_schema`.`optimizer_trace`\G;

摘取里面最后的执行结果以下

{
  "join_execution": {
    "select#": 1,
    "steps": [
      {
        "creating_tmp_table": {
          "tmp_table_info": {
            "table": "intermediate_tmp_table",
            "row_length": 20,
            "key_length": 0,
            "unique_constraint": false,
            "location": "memory (heap)",
            "row_limit_estimate": 838860
          }
        }
      },
      {
        "filesort_information": [
          {
            "direction": "desc",
            "table": "intermediate_tmp_table",
            "field": "num"
          }
        ],
        "filesort_priority_queue_optimization": {
          "limit": 10,
          "rows_estimate": 552213,
          "row_size": 24,
          "memory_available": 262144,
          "chosen": true
        },
        "filesort_execution": [
        ],
        "filesort_summary": {
          "rows": 11,
          "examined_rows": 552203,
          "number_of_tmp_files": 0,
          "sort_buffer_size": 352,
          "sort_mode": "<sort_key, rowid>"
        }
      }
    ]
  }
}

执行流程以下

建立一张临时表，临时表上有两个字段，aid和num字段（sum(pv) as num）；
读取索引idx_aid_day_pv中的一行，而后查看是否知足条件，若是day字段不在条件范围内（20181220~20181224之间），则读取下一行；若是day字段在条件范围内，则把pv值累加（不是在临时表中操做）；
读取索引idx_aid_day_pv中的下一行，若是aid与步骤1中一致且知足条件，则pv值累加（不是在临时表中操做）。若是aid与步骤1中不一致，则把以前的结果集写入临时表；
循环执行步骤二、3，直到扫描完整个idx_aid_day_pv索引；
对临时表根据num的值作优先队列排序；
根据查询到的前10条的rowid回表（临时表）返回结果集。

补充说明优先队列排序执行步骤分析：

在临时表（未排序）中取出前 10 行，把其中的num和rowid做为10个元素构成一个小顶堆，也就是最小的 num 在堆顶。
取下一行，根据 num 的值和堆顶值做比较，若是该字大于堆顶的值，则替换掉。而后将新的堆作堆排序。
重复步骤2直到第 552203 行比较完成。

该方案可行性

实验发现，当我增长一行20181219的数据时，虽然这行记录不知足咱们的需求，可是扫描索引的也会读取这行。由于我作这个实验，只弄了20181220~201812245天的数据，因此须要扫描的行数正好是全表数据行数。

那么若是该表的数据存储的不是5天的数据，而是10天的数据呢，更或者是365天的数据呢？这个方案是否还可行呢？先模拟10天的数据，在现有时间基础上日后加5天，行数与如今同样785102行。

drop procedure if exists idata;
delimiter ;;
create procedure idata()
begin
  declare i int;
  declare aid int;
  declare pv int;
  declare post_day int;
  set i=1;
  while(i<=785102)do
    set aid = round(rand()*500000);
    set pv = round(rand()*100);
    set post_day = 20181225 + i%5;
    insert into article_rank (`aid`,`pv`,`day`) values(aid, pv, post_day);
    set i=i+1;
  end while;
end;;
delimiter ;
call idata();

# Query_time: 9.151270  Lock_time: 0.000508 Rows_sent: 10  Rows_examined: 2122417
SET timestamp=1552889936;
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

这里扫描行数2122417是由于扫描索引的时候须要遍历整个索引，整个索引的行数就是全表行数，由于我刚刚又插入了785102行。

当我数据量翻倍以后，这里查询时间明显已经翻倍。因此这个优化方式不稳定。

方案2 扩充临时表空间上限大小

默认的临时表空间大小是16MB

mysql> show global variables like '%table_size';
+---------------------+----------+
| Variable_name       | Value    |
+---------------------+----------+
| max_heap_table_size | 16777216 |
| tmp_table_size      | 16777216 |
+---------------------+----------+

https://dev.mysql.com/doc/ref...
https://dev.mysql.com/doc/ref...
max_heap_table_size
This variable sets the maximum size to which user-created MEMORY tables are permitted to grow. The value of the variable is used to calculate MEMORY table MAX_ROWS values. Setting this variable has no effect on any existing MEMORY table, unless the table is re-created with a statement such as CREATE TABLE or altered with ALTER TABLE or TRUNCATE TABLE. A server restart also sets the maximum size of existing MEMORY tables to the global max_heap_table_size value.

tmp_table_size
The maximum size of internal in-memory temporary tables. This variable does not apply to user-created MEMORY tables.
The actual limit is determined from whichever of the values of tmp_table_size and max_heap_table_size is smaller. If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk temporary table. The internal_tmp_disk_storage_engine option defines the storage engine used for on-disk temporary tables.

也就是说这里临时表的限制是16M，max_heap_table_size大小也受tmp_table_size大小的限制。

因此咱们这里调整为32MB，而后执行原始的SQL

set tmp_table_size=33554432;
set max_heap_table_size=33554432;

# Query_time: 5.910553  Lock_time: 0.000210 Rows_sent: 10  Rows_examined: 1337315
SET timestamp=1552803869;
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

方案3 使用 SQL_BIG_RESULT 优化

告诉优化器，查询结果比较多，临时表直接走磁盘存储。

# Query_time: 6.144315  Lock_time: 0.000183 Rows_sent: 10  Rows_examined: 2122417
SET timestamp=1552802804;
select SQL_BIG_RESULT aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

扫描行数是 2x知足条件的总行数（785102）+group by 以后的总行数（552203）+limit 的值。

顺便值得一提的是： 当我把数据量翻倍以后，使用该方式，查询时间基本没变。由于扫描的行数仍是不变的。实际测试耗时6.197484

总结

方案1优化效果不稳定，当总表数据量与查询范围的总数相同时，且不超出内存临时表大小限制时，性能达到最佳。当查询数据量占据总表数据量越大，优化效果越不明显；
方案2须要调整临时表内存的大小，可行；不过当数据库超过32MB时，若是使用该方式，还须要继续提高临时表大小；
方案3直接声明使用磁盘来放临时表，虽然扫描行数多了一次符合条件的总行数的扫描。可是总体响应时间比方案2就慢了0.1秒。由于咱们这里数据量比较，我以为这个时间差还能接受。

因此最后对比，选择方案3比较合适。

问题与困惑

# SQL1
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;
# SQL2
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

SQL1 执行过程当中，使用的是全字段排序最后不须要回表为何总扫描行数还要加上10才对得上？
SQL1 与 SQL2 group by以后获得的行数都是552203，为何会出现 SQL1 内存不够，里面还有哪些细节呢？
trace 信息里的creating_tmp_table.tmp_table_info.row_limit_estimate都是838860；计算由来是临时表的内存限制大小16MB，而一行须要占的空间是20字节，那么最多只能容纳floor(16777216/20) = 838860行，而实际咱们须要放入临时表的行数是785102。为何呢？
SQL1 使用SQL_BIG_RESULT优化以后，原始表须要扫描的行数会乘以2，背后逻辑是什么呢？为何仅仅是再也不尝试往内存临时表里写入这一步会相差10多倍的性能？
经过源码看到 trace 信息里面不少扫描行数都不是实际的行数，既然是实际执行，为何 trace 信息里不输出真实的扫描行数和容量等呢，好比filesort_priority_queue_optimization.rows_estimate在SQL1中的扫描行数我经过gdb看到计算规则如附录图 1
有没有工具可以统计 SQL 执行过程当中的 I/O 次数？

一次 group by + order by 性能优化分析

背景

准备工做

安装

开启慢日志

性能分析

发现问题

为何扫描行数是 1337315

执行流程分析

索引示例

查看 optimizer trace 信息

分析临时表字段

总结执行流程

优化

方案1 使用 idx_aid_day_pv 索引

索引示例

group by 不须要临时表的状况

查看 optimizer trace 信息

执行流程以下

该方案可行性

方案2 扩充临时表空间上限大小

方案3 使用 SQL_BIG_RESULT 优化

总结

问题与困惑

附录