数据量对where in语句的索引影响

时间 2019-11-10

标签数据语句索引影响繁體版

原文原文链接

咱们常常在论坛和面试中遇到这个问题，mysql中，where in会不会用到索引？mysql

为了完全搞明白这个问题，作了一些测试，发现记录数大小对是否命中索引有影响，咱们来看一看。面试

使用的mysql版本是5.7，数据库引擎为默认的innoDB，索引类型是默认的B+树索引，用explain执行计划确认是否命中索引。sql

咱们建立一个表数据库

create table staffs(
    id int primary key auto_increment,
    name varchar(24) not null default '' comment '姓名',
    age int not null default 0 comment '年龄',
    pos varchar(20) not null default '' comment '职位',
    add_time timestamp not null default current_timestamp comment '入职时间'
)charset utf8 comment '员工记录表';

1，咱们测试第一种状况，数据量少的状况

先插入三条数据数组

insert into staffs(name,age,pos,add_time) values('z3',22,'manager',now());
insert into staffs(name,age,pos,add_time) values('July',23,'dev',now());
insert into staffs(name,age,pos,add_time) values('2000',23,'dev',now());

1.1 对单列索引的影响，以name为例

alter table staffs add index idx_staffs_name(name);

mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys   | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_name | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

能够看到，没有命中索引，行数为3，server层对存储引擎返回的数据作过滤以后剩余66.67%，也就是说，存储引擎返回了3条记录，mysql的server层过滤掉1条，剩下2条，filtered的值为66.67%. （explain详见以前的博文: http://www.javashuo.com/article/p-nawevcyl-ds.html）bash

1.2 对联合索引的影响

准备索引测试

alter table staffs drop index idx_staffs_name;
alter table staffs add index idx_staffs_nameAgePos(name, age, pos);

1.2.1 对联合索引最左字段的影响

mysql> explain select * from staffs where name = 'z3';
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.04 sec)

能够看到，用 = 查询时，因为最左原则，用到了索引，而用in查询时，没有用到索引。优化

1.2.2 对联合索引中间字段的影响

mysql> explain select * from staffs where name = 'z3' and age = 22;
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref         | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from staffs where name = 'z3' and age in (22, 23);
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

一样的，当使用 = 查询时，依次使用了联合索引，而第二个字段用 in 查询时，连第一个字段都被拖累，没有使用索引。spa

2，数据量大的状况

为了快速插入大量数据并建立索引，咱们先把原来的那张表drop掉，再建一张同样的表，不带任何索引，这样就不会耗费更新索引的时间。这边用存储过程插入。.net

DELIMITER $$
    CREATE PROCEDURE test_insert()
    BEGIN
        declare i int;
        set i = 1 ;
        WHILE (i < 10000) DO
            INSERT INTO staffs(`name`,`age`,`pos`) VALUES(CONCAT('a', i), FLOOR(20 + RAND() * (100 - i + 1)),'dev');	 
            set i = i + 1;
        END WHILE;
        commit;
END$$
DELIMITER ;

CALL test_insert();

Query OK, 0 rows affected (8 min 7.84 sec)

9999条数据耗时8分多钟，仍是有点慢的。

2.1 对单列索引的影响，以name为例

按照以前的动做，创建索引（命令和上面同样，为了节约篇幅，这里就不放出来了，下同），再查询。

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys   | key             | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_name | idx_staffs_name | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引，2条记录，准确率100%.

1.2 对联合索引的影响

一样先删除单列索引，建立联合索引。

1.2.1 对联合索引最左字段的影响

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引。

mysql> explain select * from staffs where name in ('a1', 'a2000') and age = 23;
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

in字段后面再加条件也能够命中。

1.2.2 对联合索引中间字段的影响

mysql> explain select * from staffs where name = 'a1' and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.01 sec)

mysql> explain select * from staffs where name in ('a1', 'a2000') and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    4 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

对中间字段也没有影响，一样能够命中索引。

3, 总结

3.1 当数据量少时，会按照联合索引的顺序依次使用索引，反而不会使用单列索引，可能的缘由是，mysql认为数据量过小，直接走全表查询，全表扫描反而更快。

3.2 当数据量大时，单列索引必定会使用。联合索引也会按顺序依次使用。

3.3 固然这里in条件里面的数值长度不大，若是是一个很长数组，致使返回的结果占全表记录数量较大时，应该也不会使用索引而走全表查询。

3.4 这里尚未测试，当in条件里面是一个子查询时的状况。同时，这里没有对5.7如下版本作测试。这里引用一段这位博主的话

若是是 5.5 以前的版本确实不会走索引的，在 5.5 以后的版本，MySQL 作了优化。MySQL 在 2010 年发布 5.5 版本中，优化器对 in 操做符能够自动完成优化，针对创建了索引的列可使用索引，没有索引的列仍是会走全表扫描。

好比，5.5 以前的版本（如下都是 5.5 之前的版本）。select * from a where id in (select id from b); 这条 sql 语句它的执行计划其实并非先查询出 b 表的全部 id，而后再与 a 表的 id 进行比较。mysql 会把 in 子查询转换成 exists 相关子查询，因此它实际等同于这条 sql 语句：select * from a where exists(select * from b where b.id=a.id);

而 exists 相关子查询的执行原理是：循环取出 a 表的每一条记录与 b 表进行比较，比较的条件是 a.id=b.id。看 a 表的每条记录的 id 是否在 b 表存在，若是存在就行返回 a 表的这条记录。

数据量对where in语句的索引影响

1， 咱们测试第一种状况，数据量少的状况

1.1 对单列索引的影响，以name为例

1.2 对联合索引的影响

1.2.1 对联合索引最左字段的影响

1.2.2 对联合索引中间字段的影响

2，数据量大的状况

2.1 对单列索引的影响，以name为例

1.2 对联合索引的影响

1.2.1 对联合索引最左字段的影响

1.2.2 对联合索引中间字段的影响

3, 总结

1，咱们测试第一种状况，数据量少的状况