mysql索引分析

时间 2019-11-22

标签 mysql 索引分析栏目 MySQL 繁體版

原文原文链接

＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝ 
 链接
 mysql -h 192.168.100.176 -u VISITSTAT -pZFiwjg2Nrr
 
 数据库
 use visit_stat;
 
 建立索引
 alter table entry_page_attraction_analyze add index index_attraction(data_date);
 alter table entry_page_conversion_analyze add index index_conversion(data_date);
 alter table entry_page_flow_analyze add index index_flow(data_date);
 alter table entry_page_kpi_summary add index index_kpi(data_date);
 alter table entry_page_visitor_analyze add index index_visitor(data_date);
＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝




索引场景分析:
    某个表有(C1,C2,C3,C4),如下只能使用联合索引的C1,C2,C3部分
    A.where C1=x and C2=x and C4>x and C3=x
    B.where C1=x and C2=x and C4=x order by C3
    C.where C1=x and C4=x group by C3,C2
    D.where C1=? and C5=? order by C3,C2
    E.where C1=? and C2=? and C5=? order by C2,C3
    
    A.where C1=x and C2=x and C4>x and C3>x
    Mysql内部会优化sql语句,优化结果:where C1=x and C2=x and C3=x and C4>x ,因此都能用上
    
    B.where C1=x and C2=x and C4=x order by C3  C1能,C2能,C3能,  C4用不上
    
    D.where C1=? and C5=? order by C3,C2    C1能,C2能,C3能
    
解决思路B-Tree索引的左前缀匹配原则
    Myisam,Innodb默认使用B-Tree索引
    B-Tree索引:排好序的能够快速查找的数据结构[排好序,可快速查找,能够范围查找]
    Hash索引:在memory表里,默认是hash索引,时间复杂度是o(1)
    [hash值随机,在磁盘上放置也是随机,没法快速定位,没法范围优化]
    
单个索引意义不大.开发中大部分的时候使用的是多列索引号

多列索引号要想利用上必须知足最左前缀原则
分析index(A,B,C)
where A=3   A能
where A=3 and B=5    A能,B能
where A=3 and B=5 and C=4    A能,B能,C能
where B=3 or C=4    没有使用索引[违反了最左前缀原则]
where A=3 or C=4    A能,B不能,C不能[中间断层了]
where A=3 and B>10 and C=4    A能,B能,C不能[B是范围,没法定位C]
where A=3 and B like 'foo%' and C=10    A能,B能,C不能,[同上]




实战:
mysql> explain select name,age from person where  name LIKE '%pttxs%' and age >= 49\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: person
         type: index
possible_keys: NULL
          key: name_age
      key_len: 128
          ref: NULL
         rows: 4114293
        Extra: Using where; Using index
1 row in set (0.00 sec)

ERROR:
No query specified

mysql> explain select name,age from person where  name LIKE 'pttxs%' and age >= 49\G;
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: person
         type: range
possible_keys: name_age
          key: name_age
      key_len: 128
          ref: NULL
         rows: 1016160
        Extra: Using where; Using index
1 row in set (0.00 sec)

其一.这里分析了两句select查询,差异在于1.row是name LIKE '%pttxs%'    2.row name LIKE 'pttxs%',
显然1.row第一个没有用到索引,从possible_keys: NULL能够看出
其二.从rows能够看出1. row>>>>rows: 4114293    2.row>>>rows: 1016160可见1. row形成全表扫描了


分析2.
mysql> explain select name,age from person where  name LIKE 'pttxs%' and age >= 49 order by phone \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: person
         type: ALL
possible_keys: name_age
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4114293
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

ERROR:
No query specified
注意:这里phone没有创建索引,没有排好序, Using filesort说明在文件或者内存中进行了2次排序,并且没有利用索引

建索引的几大原则

1.最左前缀匹配原则，很是重要的原则，mysql会一直向右匹配直到遇到范围查询(>、<、between、like)就中止匹配，好比a = 1 and b = 2 and c > 3 and d = 4 若是创建(a,b,c,d)顺序的索引，d是用不到索引的，若是创建(a,b,d,c)的索引则均可以用到，a,b,d的顺序能够任意调整。
2.=和in能够乱序，好比a = 1 and b = 2 and c = 3 创建(a,b,c)索引能够任意顺序，mysql的查询优化器会帮你优化成索引能够识别的形式
3.尽可能选择区分度高的列做为索引,区分度的公式是count(distinct col)/count(*)，表示字段不重复的比例，比例越大咱们扫描的记录数越少，惟一键的区分度是1，而一些状态、性别字段可能在大数据面前区分度就是0，那可能有人会问，这个比例有什么经验值吗？使用场景不一样，这个值也很难肯定，通常须要join的字段咱们都要求是0.1以上，即平均1条扫描10条记录
4.索引列不能参与计算，保持列“干净”，好比from_unixtime(create_time) = ’2014-05-29’就不能使用到索引，缘由很简单，b+树中存的都是数据表中的字段值，但进行检索时，须要把全部元素都应用函数才能比较，显然成本太大。因此语句应该写成create_time = unix_timestamp(’2014-05-29’);
5.尽可能的扩展索引，不要新建索引。好比表中已经有a的索引，如今要加(a,b)的索引，那么只须要修改原来的索引便可mysql

列子:sql

select
count(*)
from
task
where
status=2
and operator_id=20839
and operate_time>1371169729
and operate_time<1371174603
and type=2;数据库

根据最左匹配原则，最开始的sql语句的索引应该是status、operator_id、type、operate_time的联合索引；
其中status、operator_id、type的顺序能够颠倒，因此我才会说，把这个表的全部相关查询都找到，会综合分析；数据结构

好比还有以下查询
select * from task where status = 0 and type = 12 limit 10;
select count(*) from task where status = 0 ;
那么索引创建成(status,type,operator_id,operate_time)就是很是正确的，由于能够覆盖到全部状况。这个就是利用了索引的最左匹配的原则函数