MySQL如何设计索引更高效？

有情怀，有干货，微信搜索【三太子敖丙】关注这个不同的程序员。java

本文 GitHub github.com/JavaFamily 已收录，有一线大厂面试完整考点、资料以及个人系列文章。mysql

前言

数据库系列更新到如今我想你们对全部的概念都已有个大概认识了，这周我在看评论的时候我发现有个网友的提问我以为颇有意思：帅丙如何设计一个索引？大家都是怎么设计索引的？怎么设计更高效？git

我一想索引我写过不少了呀，没道理读者还不会啊，可是我一回头看完，那确实，我就写了索引的概念，优劣势，没提到怎么设计，那这篇文章又这样应运而生了。程序员

本文仍是会有不少以前写过的重复概念，可是也是为了你们能更好的理解MySQL中几种索引设计的原理。github

正文

咱们知道，索引是一个基于链表实现的树状Tree结构，可以快速的检索数据，目前几乎所RDBMS数据库都实现了索引特性，好比MySQL的B+Tree索引，MongoDB的BTree索引等。面试

在业务开发过程当中，索引设计高效与否决定了接口对应SQL的执行效率，高效的索引能够下降接口的Response Time，同时还能够下降成本，咱们要现实的目标是：索引设计->下降接口响应时间->下降服务器配置->下降成本，最终要落实到成本上来，由于老板最关心的是成本。sql

今天就跟你们聊聊MySQL中的索引以及如何设计索引，使用索引才能提下降接口的RT，提升用户体检。shell

MySQL中的索引

MySQL中的InnoDB引擎使用B+Tree结构来存储索引，能够尽可能减小数据查询时磁盘IO次数，同时树的高度直接影响了查询的性能，通常树的高度维持在 3~4 层。数据库

B+Tree由三部分组成：根root、枝branch以及Leaf叶子，其中root和branch不存储数据，只存储指针地址，数据所有存储在Leaf Node，同时Leaf Node之间用双向链表连接，结构以下：json

从上面能够看到，每一个Leaf Node是三部分组成的，即前驱指针p_prev，数据data以及后继指针p_next，同时数据data是有序的，默认是升序ASC，分布在B+tree右边的键值老是大于左边的，同时从root到每一个Leaf的距离是相等的，也就是访问任何一个Leaf Node须要的IO是同样的，即索引树的高度Level + 1次IO操做。

咱们能够将MySQL中的索引能够当作一张小表，占用磁盘空间，建立索引的过程其实就是按照索引列排序的过程，先在sort_buffer_size进行排序，若是排序的数据量大，sort_buffer_size容量不下，就须要经过临时文件来排序，最重要的是经过索引能够避免排序操做（distinct，group by，order by）。

汇集索引

MySQL中的表是IOT（Index Organization Table，索引组织表)，数据按照主键id顺序存储（逻辑上是连续，物理上不连续），并且主键id是汇集索引（clustered index），存储着整行数据，若是没有显示的指定主键，MySQL会将全部的列组合起来构造一个row_id做为primary key，例如表users(id, user_id, user_name, phone, primary key(id))，id是汇集索引，存储了id, user_id, user_name, phone整行的数据。

辅助索引

辅助索引也称为二级索引，索引中除了存储索引列外，还存储了主键id，对于user_name的索引idx_user_name(user_name)而言，其实等价于idx_user_name(user_name, id)，MySQL会自动在辅助索引的最后添加上主键id，熟悉Oracle数据库的都知道，索引里除了索引列还存储了row_id（表明数据的物理位置，由四部分组成：对象编号+数据文件号+数据块号+数据行号），咱们在建立辅助索引也能够显示添加主键id。

-- 建立user_name列上的索引
mysql> create index idx_user_name on users(user_name);
-- 显示添加主键id建立索引
mysql> create index idx_user_name_id on users(user_name,id);
-- 对比两个索引的统计数据
mysql> select a.space as tbl_spaceid, a.table_id, a.name as table_name, row_format, space_type,  b.index_id , b.name as index_name, n_fields, page_no, b.type as index_type  from information_schema.INNODB_TABLES a left join information_schema.INNODB_INDEXES b  on a.table_id =b.table_id where a.name = 'test/users';
+-------------+----------+------------+------------+------------+----------+------------------+----------+------
| tbl_spaceid | table_id | table_name | row_format | space_type | index_id | index_name       | n_fields | page_no | index_type |
+-------------+----------+------------+------------+------------+----------+------------------+----------+------
|         518 |     1586 | test/users | Dynamic    | Single     |     1254 | PRIMARY          |        9 |       4 |          3 |
|         518 |     1586 | test/users | Dynamic    | Single     |     4003 | idx_user_name    |        2 |       5 |          0 |
|         518 |     1586 | test/users | Dynamic    | Single     |     4004 | idx_user_name_id |        2 |      45 |          0 |
mysql> select index_name, last_update, stat_name, stat_value, stat_description from mysql.innodb_index_stats where index_name in ('idx_user_name','idx_user_name_id');
+------------------+---------------------+--------------+------------+-----------------------------------+
| index_name       | last_update         | stat_name    | stat_value | stat_description                  |
+------------------+---------------------+--------------+------------+-----------------------------------+ 
| idx_user_name    | 2021-01-02 17:14:48 | n_leaf_pages |       1358 | Number of leaf pages in the index |
| idx_user_name    | 2021-01-02 17:14:48 | size         |       1572 | Number of pages in the index      |
| idx_user_name_id | 2021-01-02 17:14:48 | n_leaf_pages |       1358 | Number of leaf pages in the index |
| idx_user_name_id | 2021-01-02 17:14:48 | size         |       1572 | Number of pages in the index      |
复制代码

对比一下两个索引的结果，n_fields表示索引中的列数，n_leaf_pages表示索引中的叶子页数，size表示索引中的总页数，经过数据比对就能够看到，辅助索引中确实包含了主键id，也说明了这两个索引时彻底一致。

Index_name	n_fields	n_leaf_pages	size
idx_user_name	2	1358	1572
idx_user_name_id	2	1358	1572

索引回表

上面证实了辅助索引包含主键id，若是经过辅助索引列去过滤数据有可能须要回表，举个例子：业务须要经过用户名user_name去查询用户表users的信息，业务接口对应的SQL：

select  user_id, user_name, phone from users where user_name = 'Laaa';
复制代码

咱们知道，对于索引idx_user_name而言，其实就是一个小表idx_user_name(user_name, id)，若是只查询索引中的列，只须要扫描索引就能获取到所需数据，是不须要回表的，以下SQL语句：

SQL 1: select id, user_name from users where user_name = 'Laaa';

SQL 2: select id from users where user_name = 'Laaa';

mysql> explain select id, name from users where name = 'Laaa';
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
| id | select_type | table | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
|  1 | SIMPLE      | users | NULL       | ref  | idx_user_name | idx_user_name | 82      | const |    1 |   100.00 | Using index |
mysql> explain select id from users where name = 'Laaa';
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
| id | select_type | table | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
|  1 | SIMPLE      | users | NULL       | ref  | idx_user_name | idx_user_name | 82      | const |    1 |   100.00 | Using index |
复制代码

SQL 1和SQL 2的执行计划中的Extra=Using index 表示使用覆盖索引扫描，不须要回表，再来看上面的业务SQL：

select user_id, user_name, phone from users where user_name = 'Laaa';

能够看到select后面的user_id，phone列不在索引idx_user_name中，就须要经过主键id进行回表查找，MySQL内部分以下两个阶段处理：

Section 1： select **id** from users where user_name = 'Laaa' //id = 100101

Section 2: select user_id, user_name, phone from users where id = 100101;

将Section 2的操做称为回表，即经过辅助索引中的主键id去原表中查找数据。

索引高度

MySQL的索引时B+tree结构，即便表里有上亿条数据，索引的高度都不会很高，一般维持在3-4层左右，我来计算下索引idx_name的高度，从上面知道索引信息：index_id = 4003， page_no = 5，它的偏移量offset就是page_no x innodo_page_size + 64 = 81984，经过hexdump进行查看

$hexdump -s 81984 -n 10 /usr/local/var/mysql/test/users.ibd
0014040 00 02 00 00 00 00 00 00 0f a3                  
001404a
复制代码

其中索引的PAGE_LEVEL为00，即idx_user_name索引高度为1，0f a3 表明索引编号，转换为十进制是4003，正是index_id。

数据扫描方式

全表扫描

从左到右依次扫描整个B+Tree获取数据，扫描整个表数据，IO开销大，速度慢，锁等严重，影响MySQL的并发。

对于OLAP的业务场景，须要扫描返回大量数据，这时候全表扫描的顺序IO效率更高。

索引扫描

一般来说索引比表小，扫描的数据量小，消耗的IO少，执行速度块，几乎没有锁等，可以提升MySQL的并发。

对于OLTP系统，但愿全部的SQL都能命中合适的索引老是美好的。

主要区别就是扫描数据量大小以及IO的操做，全表扫描是顺序IO，索引扫描是随机IO，MySQL对此作了优化，增长了change buffer特性来提升IO性能。

索引优化案例

分页查询优化

业务要根据时间范围查询交易记录，接口原始的SQL以下：

select  * from trade_info where status = 0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59' order by id desc limit 102120, 20;
复制代码

表trade_info上有索引idx_status_create_time(status,create_time)，经过上面分析知道，等价于索引**（status,create_time,id)**，对于典型的分页limit m, n来讲，越日后翻页越慢，也就是m越大会越慢，由于要定位m位置须要扫描的数据愈来愈多，致使IO开销比较大，这里能够利用辅助索引的覆盖扫描来进行优化，先获取id，这一步就是索引覆盖扫描，不须要回表，而后经过id跟原表trade_info进行关联，改写后的SQL以下：

select * from trade_info a ,

(select  id from trade_info where status = 0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59' order by id desc limit 102120, 20) as b   //这一步走的是索引覆盖扫描，不须要回表
 where a.id = b.id;
复制代码

不少同窗只知道这样写效率高，可是未必知道为何要这样改写，理解索引特性对编写高质量的SQL尤其重要。

分而治之老是不错的

营销系统有一批过时的优惠卷要失效，核心SQL以下：

-- 须要更新的数据量500w
update coupons set status = 1 where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59';
复制代码

在Oracle里更新500w数据是很快，由于能够利用多个cpu core去执行，可是MySQL就须要注意了，一个SQL只能使用一个cpu core去处理，若是SQL很复杂或执行很慢，就会阻塞后面的SQL请求，形成活动链接数暴增，MySQL CPU 100%，相应的接口Timeout，同时对于主从复制架构，并且作了业务读写分离，更新500w数据须要5分钟，Master上执行了5分钟，binlog传到了slave也须要执行5分钟，那就是Slave延迟5分钟，在这期间会形成业务脏数据，好比重复下单等。

优化思路：先获取where条件中的最小id和最大id，而后分批次去更新，每一个批次1000条，这样既能快速完成更新，又能保证主从复制不会出现延迟。

优化以下：

先获取要更新的数据范围内的最小id和最大id（表没有物理delete，因此id是连续的）

mysql> explain select min(id) min_id, max(id) max_id from coupons where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59'; 
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
| id | select_type | table | partitions | type  | possible_keys          | key                    | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
|  1 | SIMPLE      | users | NULL       | range | idx_status_create_time | idx_status_create_time | 6       | NULL | 180300 |   100.00 | Using where; Using index |
复制代码

Extra=Using where; Using index使用了索引idx_status_create_time，同时须要的数据都在索引中能找到，因此不须要回表查询数据。

以每次1000条commit一次进行循环update，主要代码以下：

current_id = min_id;
for  current_id < max_id do
  update coupons set status = 1 where id >=current_id and id <= current_id + 1000;  //经过主键id更新1000条很快
commit;
current_id += 1000;
done
复制代码

这两个案例告诉咱们，要充分利用辅助索引包含主键id的特性，先经过索引获取主键id走覆盖索引扫描，不须要回表，而后再经过id去关联操做是高效的，同时根据MySQL的特性使用分而治之的思想既能高效完成操做，又能避免主从复制延迟产生的业务数据混乱。

MySQL索引设计

熟悉了索引的特性以后，就能够在业务开发过程当中设计高质量的索引，下降接口的响应时间。

前缀索引

对于使用REDUNDANT或者COMPACT格式的InnoDB表，索引键前缀长度限制为767字节。若是TEXT或VARCHAR列的列前缀索引超过191个字符，则可能会达到此限制，假定为utf8mb4字符集，每一个字符最多4个字节。

能够经过设置参数innodb_large_prefix来开启或禁用索引前缀长度的限制，便是设置为OFF，索引虽然能够建立成功，也会有一个警告，主要是由于index size会很大，效率大量的IO的操做，即便MySQL优化器命中了该索引，效率也不会很高。

-- 设置innodb_large_prefix=OFF禁用索引前缀限制，虽然能够建立成功，可是有警告。
mysql> create index idx_nickname on users(nickname);    // `nickname` varchar(255)
Records: 0  Duplicates: 0  Warnings: 1
mysql> show warnings;
+---------+------+---------------------------------------------------------+
| Level   | Code | Message                                                 |
+---------+------+---------------------------------------------------------+
| Warning | 1071 | Specified key was too long; max key length is 767 bytes |
复制代码

业务发展初期，为了快速实现功能，对一些数据表字段的长度定义都比较宽松，好比用户表users的昵称nickname定义为varchar(128)，并且有业务接口须要经过nickname查询，系统运行了一段时间以后，查询users表最大的nickname长度为30，这个时候就能够建立前缀索引来减少索引的长度提高性能。

-- `nickname` varchar(128) DEFAULT NULL定义的执行计划
mysql> explain select * from users where nickname = 'Laaa';
+----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+--------
| id | select_type | table | partitions | type | possible_keys | key          | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+--------
|  1 | SIMPLE      | users | NULL       | ref  | idx_nickname  | idx_nickname | 515     | const |    1 |   100.00 | NULL  |
复制代码

key_len=515，因为表和列都是utf8mb4字符集，每一个字符占4个字节，变长数据类型+2Bytes，容许NULL额外+1Bytes，即128 x 4 + 2 + 1 = 515Bytes。建立前缀索引，前缀长度也能够不是当前表的数据列最大值，应该是区分度最高的那部分长度，通常能达到90%以上便可，例如email字段存储都是相似这样的值xxxx@yyy.com，前缀索引的最大长度能够是xxxx这部分的最大长度便可。

-- 建立前缀索引，前缀长度为30
mysql> create index idx_nickname_part on users(nickname(30));
-- 查看执行计划
mysql> explain select * from users where nickname = 'Laaa';
+----+-------------+-------+------------+------+--------------------------------+-------------------+---------+-
| id | select_type | table | partitions | type | possible_keys                  | key               | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+--------------------------------+-------------------+---------+-
|  1 | SIMPLE      | users | NULL       | ref  | idx_nickname_part,idx_nickname | idx_nickname_part | 123     | const |    1 |   100.00 | Using where |
复制代码

能够看到优化器选择了前缀索引，索引长度为123，即30 x 4 + 2 + 1 = 123 Bytes，大小不到原来的四分之。

前缀索引虽然能够减少索引的大小，可是不能消除排序。

mysql> explain select gender,count(*) from users where nickname like 'User100%' group by nickname limit 10;
+----+-------------+-------+------------+-------+--------------------------------+--------------+---------+-----
| id | select_type | table | partitions | type  | possible_keys                  | key          | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+--------------------------------+--------------+---------+-----
|  1 | SIMPLE      | users | NULL       | range | idx_nickname_part,idx_nickname | idx_nickname | 515     | NULL |  899 |   100.00 | Using index condition |
--能够看到Extra= Using index condition表示使用了索引，可是须要回表查询数据，没有发生排序操做。
mysql> explain select gender,count(*) from users where nickname like  'User100%' group by nickname limit 10;
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------
| id | select_type | table | partitions | type  | possible_keys     | key               | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------
|  1 | SIMPLE      | users | NULL       | range | idx_nickname_part | idx_nickname_part | 123     | NULL |  899 |   100.00 | Using where; Using temporary |
--能够看到Extra= Using where; Using temporaryn表示在使用了索引的状况下，须要回表去查询所需的数据，同时发生了排序操做。
复制代码

复合索引

在单列索引不能很好的过滤数据的时候，能够结合where条件中其余字段来建立复合索引，更好的去过滤数据，减小IO的扫描次数，举个例子：业务须要按照时间段来查询交易记录，有以下的SQL：

select  * from trade_info where status = 1 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59';
复制代码

开发同窗根据以往复合索引的设计的经验：惟一值多选择性好的列做为复合索引的前导列，因此建立复合索idx_create_time_status是高效的，由于create_time是一秒一个值，惟一值不少，选择性很好，而status只有离散的6个值，因此认为这样建立是没问题的，可是这个经验只适合于等值条件过滤，不适合有范围条件过滤的状况，例如idx_user_id_status(user_id，status)这个是没问题的，可是对于包含有create_time范围的复合索引来讲，就不适应了，咱们来看下这两种不一样索引顺序的差别，即idx_status_create_time和idx_create_time_status。

-- 分别建立两种不一样的复合索引
mysql> create index idx_status_create_time on trade_info(status, create_time);
mysql> create index idx_create_time_status on trade_info(create_time,status);
-- 查看SQL的执行计划
mysql> explain select * from users where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59';
+----+-------------+-------+------------+-------+-----------------------------------------------+---------------
| id | select_type | table | partitions | type  | possible_keys                                 | key                    | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+-------+------------+-------+-----------------------------------------------+---------------
|  1 | SIMPLE      | trade_info | NULL       | range | idx_status_create_time,idx_create_time_status | idx_status_create_time | 6       | NULL | 98518 |   100.00 | Using index condition |
复制代码

从执行计划能够看到，两种不一样顺序的复合索引都存在的状况，MySQL优化器选择的是idx_status_create_time索引，那为何不选择idx_create_time_status，咱们经过optimizer_trace来跟踪优化器的选择。

-- 开启optimizer_trace跟踪
mysql> set session optimizer_trace="enabled=on",end_markers_in_json=on;
-- 执行SQL语句
mysql> select * from trade_info where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59';
-- 查看跟踪结果
mysql>SELECT trace FROM information_schema.OPTIMIZER_TRACE\G;
复制代码

对比下两个索引的统计数据，以下所示：

复合索引	Type	Rows	参与过滤索引列	Chosen	Cause
idx_status_create_time	Index Range Scan	98518	status AND create_time	True	Cost低
idx_create_time_status	Index Range Scan	98518	create_time	False	Cost高

MySQL优化器是基于Cost的，COST主要包括IO_COST和CPU_COST，MySQL的CBO（Cost-Based Optimizer基于成本的优化器）老是选择Cost最小的做为最终的执行计划去执行，从上面的分析，CBO选择的是复合索引idx_status_create_time，由于该索引中的status和create_time都能参与了数据过滤，成本较低；而idx_create_time_status只有create_time参数数据过滤，status被忽略了，其实CBO将其简化为单列索引idx_create_time，选择性没有复合索引idx_status_create_time好。

复合索引设计原则

将范围查询的列放在复合索引的最后面，例如idx_status_create_time。
列过滤的频繁越高，选择性越好，应该做为复合索引的前导列，适用于等值查找，例如idx_user_id_status。

这两个原则不是矛盾的，而是相辅相成的。

跳跃索引

通常状况下，若是表users有复合索引idx_status_create_time，咱们都知道，单独用create_time去查询，MySQL优化器是不走索引，因此还须要再建立一个单列索引idx_create_time。用过Oracle的同窗都知道，是能够走索引跳跃扫描（Index Skip Scan），在MySQL 8.0也实现Oracle相似的索引跳跃扫描，在优化器选项也能够看到skip_scan=on。

| optimizer_switch             |use_invisible_indexes=off,skip_scan=on,hash_join=on |
复制代码

适合复合索引前导列惟一值少，后导列惟一值多的状况，若是前导列惟一值变多了，则MySQL CBO不会选择索引跳跃扫描，取决于索引列的数据分表状况。

mysql> explain select id, user_id，status, phone from users where create_time >='2021-01-02 23:01:00' and create_time <= '2021-01-03 23:01:00';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----
|  1 | SIMPLE      | users | NULL       | range  | idx_status_create_time          | idx_status_create_time | NULL    | NULL | 15636 |    11.11 | Using where; Using index for skip scan|
复制代码

也能够经过optimizer_switch='skip_scan=off'来关闭索引跳跃扫描特性。

总结

本位为你们介绍了MySQL中的索引，包括汇集索引和辅助索引，辅助索引包含了主键id用于回表操做，同时利用覆盖索引扫描能够更好的优化SQL。

同时也介绍了如何更好作MySQL索引设计，包括前缀索引，复合索引的顺序问题以及MySQL 8.0推出的索引跳跃扫描，咱们都知道，索引能够加快数据的检索，减小IO开销，会占用磁盘空间，是一种用空间换时间的优化手段，同时更新操做会致使索引频繁的合并分裂，影响索引性能，在实际的业务开发中，如何根据业务场景去设计合适的索引是很是重要的，今天就聊这么多，但愿对你们有所帮助。

我是敖丙，你知道的越多，你不知道的越多，感谢各位的三连，咱们下期见。

我是敖丙，你知道的越多，你不知道的越多，感谢各位人才的：点赞、收藏和评论，咱们下期见！

文章持续更新，能够微信搜一搜「 三太子敖丙 」第一时间阅读，回复【资料】有我准备的一线大厂面试资料和简历模板，本文 GitHub github.com/JavaFamily 已经收录，有大厂面试完整考点，欢迎Star。