mysql-mysql16索引

时间 2019-11-30

原文原文链接

一丶什么是索引

知识回顾:数据都是存在硬盘上的，那查询数据不可避免的须要进行IO操做mysql

索引在MySQL中也叫作“键”，是存储引擎用于快速找到记录的一种数据结构。sql

primary key
unique key
index key

注意foreign key不是用来加速查询用的，不在咱们研究范围以内,上面三种key前两种除了有加速查询的效果以外还有额外的约束条件(primary key:非空且惟一，unique key:惟一)，而index key没有任何约束功能只会帮你加速查询数据结构

索引就是一种数据结构，相似于书的目录。意味着之后再查数据应该先找目录再找数据，而不是用翻页的方式查询数据性能

二丶索引本质

经过不断地缩小想要获取数据的范围来筛选出最终想要的结果，同时把随机的事件变成顺序的事件，也就是说，有了这种索引机制，咱们能够老是用同一种查找方式来锁定数据。测试

三丶索引的影响

在表中有大量数据的前提下，建立索引速度会很慢
在索引建立完毕后，对表的查询性能会大幅度提高，可是写的性能会下降

四丶B+树

https://images2017.cnblogs.com/blog/1036857/201709/1036857-20170912011123500-158121126.pngcode

只有叶子结点存放真实数据，根和树枝节点存的仅仅是虚拟数据blog

查询次数由树的层级决定，层级越低次数越少索引

一个磁盘块儿的大小是必定的，那也就意味着能存的数据量是必定的。如何保证树的层级最低呢？一个磁盘块儿存放占用空间比较小的数据项事件

思考咱们应该给咱们一张表里面的什么字段字段创建索引可以下降树的层级高度>>> 主键id字段get

4.1汇集索引(primary key)

汇集索引其实指的就是表的主键，innodb引擎规定一张表中必需要有主键。先来回顾一下存储引擎。

myisam在建表的时候对应到硬盘有几个文件(三个)？

innodb在建表的时候对应到硬盘有几个文件(两个)？frm文件只存放表结构，不可能放索引，也就意味着innodb的索引跟数据都放在idb表数据文件中。

特色:叶子结点放的一条条完整的记录

4.2辅助索引(unique,index)

辅助索引:查询数据的时候不可能都是用id做为筛选条件，也可能会用name，password等字段信息，那么这个时候就没法利用到汇集索引的加速查询效果。就须要给其余字段创建索引，这些索引就叫辅助索引

特色：叶子结点存放的是辅助索引字段对应的那条记录的主键的值(好比:按照name字段建立索引，那么叶子节点存放的是:{name对应的值:name所在的那条记录的主键值})

4.3覆盖索引

select name from user where name='jason';

上述语句叫覆盖索引:只在辅助索引的叶子节点中就已经找到了全部咱们想要的数据

4.4非覆盖索引

select age from user where name='jason';

上述语句叫非覆盖索引,虽然查询的时候命中了索引字段name，可是要查的是age字段，因此还须要利用主键才去查找

测试索引练习

#1. 准备表
create table s1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);

#2. 建立存储过程，实现批量插入记录
delimiter $$ #声明存储过程的结束符号为$$
create procedure auto_insert1()
BEGIN
    declare i int default 1;
    while(i<3000000)do
        insert into s1 values(i,'jason','male',concat('jason',i,'@oldboy'));
        set i=i+1;
    end while;
END$$ #$$结束
delimiter ; #从新声明 分号为结束符号

#3. 查看存储过程
show create procedure auto_insert1\G 

#4. 调用存储过程
call auto_insert1();

# 表没有任何索引的状况下
select * from s1 where id=30000;
# 避免打印带来的时间损耗
select count(id) from s1 where id = 30000;
select count(id) from s1 where id = 1;

# 给id作一个主键
alter table s1 add primary key(id);  # 速度很慢

select count(id) from s1 where id = 1;  # 速度相较于未建索引以前二者差着数量级
select count(id) from s1 where name = 'jason'  # 速度仍然很慢


"""
范围问题
"""
# 并非加了索引，之后查询的时候按照这个字段速度就必定快   
select count(id) from s1 where id > 1;  # 速度相较于id = 1慢了不少
select count(id) from s1 where id >1 and id < 3;
select count(id) from s1 where id > 1 and id < 10000;
select count(id) from s1 where id != 3;

alter table s1 drop primary key;  # 删除主键 单独再来研究name字段
select count(id) from s1 where name = 'jason';  # 又慢了

create index idx_name on s1(name);  # 给s1表的name字段建立索引
select count(id) from s1 where name = 'jason'  # 仍然很慢！！！
"""
再来看b+树的原理，数据须要区分度比较高，而咱们这张表全是jason，根本没法区分
那这个树其实就建成了“一根棍子”
"""
select count(id) from s1 where name = 'xxx';  
# 这个会很快，我就是一根棍，第一个不匹配直接不须要再往下走了
select count(id) from s1 where name like 'xxx';
select count(id) from s1 where name like 'xxx%';
select count(id) from s1 where name like '%xxx';  # 慢 最左匹配特性

# 区分度低的字段不能建索引
drop index idx_name on s1;

# 给id字段建普通的索引
create index idx_id on s1(id);
select count(id) from s1 where id = 3;  # 快了
select count(id) from s1 where id*12 = 3;  # 慢了  索引的字段必定不要参与计算

drop index idx_id on s1;
select count(id) from s1 where name='jason' and gender = 'male' and id = 3 and email = 'xxx';
# 针对上面这种连续多个and的操做，mysql会从左到右先找区分度比较高的索引字段，先将总体范围降下来再去比较其余条件
create index idx_name on s1(name);
select count(id) from s1 where name='jason' and gender = 'male' and id = 3 and email = 'xxx';  # 并无加速

drop index idx_name on s1;
# 给name，gender这种区分度不高的字段加上索引并不难加快查询速度

create index idx_id on s1(id);
select count(id) from s1 where name='jason' and gender = 'male' and id = 3 and email = 'xxx';  # 快了  先经过id已经讲数据快速锁定成了一条了
select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  # 慢了  基于id查出来的数据仍然不少，而后还要去比较其余字段

drop index idx_id on s1

create index idx_email on s1(email);
select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  # 快 经过email字段一剑封喉

联合索引

select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  
# 若是上述四个字段区分度都很高，那给谁建都能加速查询
# 给email加然而不用email字段
select count(id) from s1 where name='jason' and gender = 'male' and id > 3; 
# 给name加然而不用name字段
select count(id) from s1 where gender = 'male' and id > 3; 
# 给gender加然而不用gender字段
select count(id) from s1 where id > 3; 

# 带来的问题是全部的字段都建了索引然而都没有用到，还须要花费四次创建的时间
create index idx_all on s1(email,name,gender,id);  # 最左匹配原则，区分度高的往左放
select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  # 速度变快