MySQL 5.6.3提供了对SQL语句的跟踪功能,经过trace文件能够进一步了解优化器是如何选择某个执行计划的,和Oracle的10053事件相似。使用时须要先打开设置,而后执行一次SQL,最后查看INFORMATION_SCHEMA.OPTIMIZER_TRACE表的内容。须要注意的是,该表为临时表,只能在当前会话进行查询,每次查询返回的都是最近一次执行的SQL语句。mysql
设置时相关的参数:sql
mysql> show variables like '%trace%';json
+------------------------------+----------------------------------------------------------------------------+app
| Variable_name | Value |ide
+------------------------------+----------------------------------------------------------------------------+函数
| optimizer_trace | enabled=off,one_line=off |优化
| optimizer_trace_features | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on |ui
| optimizer_trace_limit | 1 |this
| optimizer_trace_max_mem_size | 16384 |.net
| optimizer_trace_offset | -1 |
+------------------------------+----------------------------------------------------------------------------+
5 rows in set (0.02 sec)
如下是打开设置的命令:
SET optimizer_trace='enabled=on'; #打开设置
SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000; #最大内存根据实际状况而定, 能够不设置
SET END_MARKERS_IN_JSON=ON; #增长JSON格式注释,默认为OFF
SET optimizer_trace_limit = 1;
MySQL索引选择不正确并详细解析OPTIMIZER_TRACE格式
http://blog.csdn.net/melody_mr/article/details/48950601
一 表结构以下:
CREATE TABLE t_audit_operate_log (
Fid bigint(16) AUTO_INCREMENT,
Fcreate_time int(10) unsigned NOT NULL DEFAULT '0',
Fuser varchar(50) DEFAULT '',
Fip bigint(16) DEFAULT NULL,
Foperate_object_id bigint(20) DEFAULT '0',
PRIMARY KEY (Fid),
KEY indx_ctime (Fcreate_time),
KEY indx_user (Fuser),
KEY indx_objid (Foperate_object_id),
KEY indx_ip (Fip)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
执行查询:
MySQL> explain select count(*) from t_audit_operate_log where Fuser='XX@XX.com' and Fcreate_time>=1407081600 and Fcreate_time<=1407427199\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t_audit_operate_log
type: ref
possible_keys: indx_ctime,indx_user
key: indx_user
key_len: 153
ref: const
rows: 2007326
Extra: Using where
发现,使用了一个不合适的索引, 不是很理想,因而改为指定索引:
mysql> explain select count(*) from t_audit_operate_log use index(indx_ctime) where Fuser='CY6016@cyou-inc.com' and Fcreate_time>=1407081600 and Fcreate_time<=1407427199\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t_audit_operate_log
type: range
possible_keys: indx_ctime
key: indx_ctime
key_len: 5
ref: NULL
rows: 670092
Extra: Using where
实际执行耗时,后者比前者快了接近10
问题: 很奇怪,优化器为什么不选择使用 indx_ctime 索引,而选择了明显会扫描更多行的 indx_user 索引。
分析2个索引的数据量以下: 两个条件的惟一性对比:
select count(*) from t_audit_operate_log where Fuser='XX@XX.com';
+----------+
| count(*) |
+----------+
| 1238382 |
+----------+
select count(*) from t_audit_operate_log where Fcreate_time>=1407254400 and Fcreate_time<=1407427199;
+----------+
| count(*) |
+----------+
| 198920 |
+----------+
显然,使用索引indx_ctime好于indx_user,但MySQL却选择了indx_user. 为何?
因而,使用 OPTIMIZER_TRACE进一步探索.
二 OPTIMIZER_TRACE的过程说明
以本处事例简要说明OPTIMIZER_TRACE的过程.
查看OPTIMIZER_TRACE方法:
1.set optimizer_trace='enabled=on'; --- 开启trace
2.set optimizer_trace_max_mem_size=1000000; --- 设置trace大小
3.set end_markers_in_json=on; --- 增长trace中注释
4.select * from information_schema.optimizer_trace\G;
[plain] view plain copy
三 其余一个类似问题
单表扫描,使用ref和range从索引获取数据一例
http://blog.163.com/li_hx/blog/static/183991413201461853637715/
四 问题的解决方式
遇到单表上有多个索引的时候,在MySQL5.6.20版本以前的版本,须要人工强制使用索引,以达到最好的效果.
注:原创地址 http://blog.csdn.net/xj626852095/article/details/52767963
我最近遇到线上一个select语句,explain选择的索引是同样的,这个索引是两个字段
好比select * from t1 where a='xxx' and b>='123123',索引是a_b(a,b)
默认状况explain显示的索引访问方式是ref,而force index a_b则使用了range,range访问效果实际更好
--贴查询执行计划所有内容
| 1 | SIMPLE | subscribe_f8 | ref | PRIMARY,uid | uid | 8 | const | 13494670 | Using where; Using index
force index 以后
| 1 | SIMPLE | subscribe_f8 | range | uid | uid | 12 | NULL | 13494674 | Using where; Using index |
--2者计划差异不大
就是type从ref变成range了. force 以前key_length是8,force以后是12 . 其实应该是12才是合理的
--版本支持expalin format=JSON命令吗?支持则试试,有更详细的代价计算值
--show create table 看看?
发来详细的执行计划,见 执行计划结果一 。
执行计划结果一
select uid_from,create_time from subscribe_f8 where uid=12345678 and create_time > '2013-09-08 09:54:07.0' order by create_time asc limit 5000 | { "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `subscribe_f8`.`uid_from` AS `uid_from`,`subscribe_f8`.`create_time` AS `create_time` from `subscribe_f8` where ((`subscribe_f8`.`uid` = 12345678) and (`subscribe_f8`.`create_time` > '2013-09-08 09:54:07.0')) order by `subscribe_f8`.`create_time` limit 5000" } ] } }, { ...... { "considered_execution_plans": [ { "plan_prefix": [ ], "table": "`subscribe_f8`", "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "PRIMARY", "rows": 1.36e7, "cost": 3.01e6, "chosen": true }, { "access_type": "ref", "index": "uid", "rows": 1.36e7, "cost": 2.77e6, "chosen": true }, { "access_type": "range", "rows": 1.02e7, "cost": 5.46e6, "chosen": false } ] }, "cost_for_plan": 2.77e6, "rows_for_plan": 1.36e7, "chosen": true } ] }, ... }
分析: 这个问题,执行计划指示使用ref效果更好,但实际执行时,指定使用range方式sql执行效率更高一些。 并且,一般状况下,ref的效率比range的效率要高,因此MySQL优先使用ref方式(这是一条启发式规则)。 但到底是否使用ref或range,MySQL还须要经过代价估算进行比较再作决定。 代价估算是一个求近似值的过程,由于计算基于的一些值是估算得来的,并不十分精准,这就形成了计算偏差。 可是,若是索引的选择率较低(如低于10%),则使用ref的效果好于range的效果的几率大。反过来讲,若是索引的选择率较高,则ref未必range的效果好,可是因计算偏差,使得执行计划获得了ref好于range的错误结论。 进一步讲,若是索引的选择率很高(如远高于10%,这是大概值,不精确),甚至数据存放是顺序连续的,有可能的是,尽管索引存在,但索引扫描的效果还差与全表扫描。 其余说明:尽管这个事例中的SQL使用了LIMIT子句,但其对ref和range方式的计算和比较,不构成影响。
进一步了解状况:
--这个查询,能获得多少行元组? 占全表的全部元组的百分比是多少? 去掉limit后,符合那个时间段的记录数占那个uid的88%,占全表记录数的的40%
进一步分析: 从更详细的查询执行计划看,查询执行计划结果一,显示了ref的cost是'2.77e6', 而range的cost是’5.46e6‘,这说明优化器理所固然地认为ref比range好。 但是,鉴于实际上索引选择率过高,使得使用索引已经没有意义(但优化器不知道这一信息),因此实际上使用’force index (uid) ‘会获得更好的执行效果。 这就是这个想象的答案。
深刻代码分析: 在best_access_path()函数中,比较了各类路径的代价。因此是使用ref仍是range甚至full table scan,在这个函数中有计算和比较。 摘录代码中部分注释以下,能代表一些含义。 /* Don't test table scan if it can't be better. Prefer key lookup if we would use the same key for scanning.
Don't do a table scan on InnoDB tables, if we can read the used parts of the row from any of the used index. This is because table scans uses index and we would not win anything by using a table scan. The only exception is INDEX_MERGE quick select. We can not say for sure that INDEX_MERGE quick select is always faster than ref access. So it's necessary to check if ref access is more expensive.
We do not consider index/table scan or range access if:
1a) The best 'ref' access produces fewer records than a table scan (or index scan, or range acces), and 1b) The best 'ref' executed for all partial row combinations, is cheaper than a single scan. The rationale for comparing
COST(ref_per_partial_row) * E(#partial_rows) vs COST(single_scan)
is that if join buffering is used for the scan, then scan will not be performed E(#partial_rows) times, but E(#partial_rows)/E(#partial_rows_fit_in_buffer). At this point in best_access_path() we don't know this ratio, but it is somewhere between 1 and E(#partial_rows). To avoid overestimating the total cost of scanning, the heuristic used here has to assume that the ratio is 1. A more fine-grained cost comparison will be done later in this function. (2) This doesn't hold: the best way to perform table scan is to to perform 'range' access using index IDX, and the best way to perform 'ref' access is to use the same index IDX, with the same or more key parts. (note: it is not clear how this rule is/should be extended to index_merge quick selects) (3) See above note about InnoDB. (4) NOT ("FORCE INDEX(...)" is used for table and there is 'ref' access path, but there is no quick select) If the condition in the above brackets holds, then the only possible "table scan" access method is ALL/index (there is no quick select). Since we have a 'ref' access path, and FORCE INDEX instructs us to choose it over ALL/index, there is no need to consider a full table scan. */