数据量对where in语句的索引影响

咱们常常在论坛和面试中遇到这个问题,mysql中,where in会不会用到索引?mysql

为了完全搞明白这个问题,作了一些测试,发现记录数大小对是否命中索引有影响,咱们来看一看。面试

使用的mysql版本是5.7,数据库引擎为默认的innoDB,索引类型是默认的B+树索引,用explain执行计划确认是否命中索引。sql

咱们建立一个表数据库

create table staffs(
    id int primary key auto_increment,
    name varchar(24) not null default '' comment '姓名',
    age int not null default 0 comment '年龄',
    pos varchar(20) not null default '' comment '职位',
    add_time timestamp not null default current_timestamp comment '入职时间'
)charset utf8 comment '员工记录表';

1, 咱们测试第一种状况,数据量少的状况

先插入三条数据数组

insert into staffs(name,age,pos,add_time) values('z3',22,'manager',now());
insert into staffs(name,age,pos,add_time) values('July',23,'dev',now());
insert into staffs(name,age,pos,add_time) values('2000',23,'dev',now());

1.1 对单列索引的影响,以name为例

alter table staffs add index idx_staffs_name(name);
mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys   | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_name | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

能够看到,没有命中索引,行数为3,server层对存储引擎返回的数据作过滤以后剩余66.67%,也就是说,存储引擎返回了3条记录,mysql的server层过滤掉1条,剩下2条,filtered的值为66.67%. (explain详见以前的博文: http://www.javashuo.com/article/p-nawevcyl-ds.htmlbash

1.2 对联合索引的影响

准备索引测试

alter table staffs drop index idx_staffs_name;
alter table staffs add index idx_staffs_nameAgePos(name, age, pos);

1.2.1 对联合索引最左字段的影响

mysql> explain select * from staffs where name = 'z3';
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.04 sec)

能够看到,用 = 查询时,因为最左原则,用到了索引,而用in查询时,没有用到索引。优化

1.2.2 对联合索引中间字段的影响

mysql> explain select * from staffs where name = 'z3' and age = 22;
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref         | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from staffs where name = 'z3' and age in (22, 23);
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

一样的,当使用 = 查询时,依次使用了联合索引,而第二个字段用 in 查询时,连第一个字段都被拖累,没有使用索引。spa

 

2,数据量大的状况

为了快速插入大量数据并建立索引,咱们先把原来的那张表drop掉,再建一张同样的表,不带任何索引,这样就不会耗费更新索引的时间。这边用存储过程插入。.net

DELIMITER $$
    CREATE PROCEDURE test_insert()
    BEGIN
        declare i int;
        set i = 1 ;
        WHILE (i < 10000) DO
            INSERT INTO staffs(`name`,`age`,`pos`) VALUES(CONCAT('a', i), FLOOR(20 + RAND() * (100 - i + 1)),'dev');	 
            set i = i + 1;
        END WHILE;
        commit;
END$$
DELIMITER ;

CALL test_insert();
Query OK, 0 rows affected (8 min 7.84 sec)

9999条数据耗时8分多钟,仍是有点慢的。

 

2.1 对单列索引的影响,以name为例

按照以前的动做,创建索引(命令和上面同样,为了节约篇幅,这里就不放出来了,下同),再查询。

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys   | key             | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_name | idx_staffs_name | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引,2条记录,准确率100%.

1.2 对联合索引的影响

一样先删除单列索引,建立联合索引。

1.2.1 对联合索引最左字段的影响

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引。

mysql> explain select * from staffs where name in ('a1', 'a2000') and age = 23;
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

in字段后面再加条件也能够命中。

1.2.2 对联合索引中间字段的影响

mysql> explain select * from staffs where name = 'a1' and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.01 sec)
mysql> explain select * from staffs where name in ('a1', 'a2000') and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    4 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

对中间字段也没有影响,一样能够命中索引。

 

3, 总结

3.1 当数据量少时,会按照联合索引的顺序依次使用索引,反而不会使用单列索引,可能的缘由是,mysql认为数据量过小,直接走全表查询,全表扫描反而更快。

3.2 当数据量大时,单列索引必定会使用。联合索引也会按顺序依次使用。

3.3 固然这里in条件里面的数值长度不大,若是是一个很长数组,致使返回的结果占全表记录数量较大时,应该也不会使用索引而走全表查询。

3.4 这里尚未测试,当in条件里面是一个子查询时的状况。同时,这里没有对5.7如下版本作测试。这里引用一段这位博主的话

若是是 5.5 以前的版本确实不会走索引的,在 5.5 以后的版本,MySQL 作了优化。MySQL 在 2010 年发布 5.5 版本中,优化器对 in 操做符能够自动完成优化,针对创建了索引的列可使用索引,没有索引的列仍是会走全表扫描。

好比,5.5 以前的版本(如下都是 5.5 之前的版本)。select * from a where id in (select id from b); 这条 sql 语句它的执行计划其实并非先查询出 b 表的全部 id,而后再与 a 表的 id 进行比较。mysql 会把 in 子查询转换成 exists 相关子查询,因此它实际等同于这条 sql 语句:select * from a where exists(select * from b where b.id=a.id);

而 exists 相关子查询的执行原理是:循环取出 a 表的每一条记录与 b 表进行比较,比较的条件是 a.id=b.id。看 a 表的每条记录的 id 是否在 b 表存在,若是存在就行返回 a 表的这条记录。

相关文章
相关标签/搜索