MySQL的统计信息学习总结

统计信息概念html

 

MySQL统计信息是指数据库经过采样、统计出来的表、索引的相关信息,例如,表的记录数、汇集索引page个数、字段的Cardinality....。MySQL在生成执行计划时,须要根据索引的统计信息进行估算,计算出最低代价(或者说是最小开销)的执行计划.MySQL支持有限的索引统计信息,因存储引擎不一样而统计信息收集的方式也不一样. MySQL官方关于统计信息的概念介绍几乎等同于无,不过对于已经接触过其它类型数据库的同窗而言,理解这个概念应该不在话下。相对于其它数据库而言,MySQL统计信息没法手工删除。MySQL 8.0以前的版本,MySQL是没有直方图的。mysql

 

统计信息参数sql

 

MySQL的InnoDB存储引擎的统计信息参数有7(个别版本有8个之多),以下所示:数据库

 

MySQL 5.6.41 有8个参数:json

 

mysql> show variables like 'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_sample_pages            | 8           |
| innodb_stats_transient_sample_pages  | 8           |
+--------------------------------------+-------------+
8 rows in set (0.00 sec)

 

MySQL 8.0.18 有7个参数:缓存

 

mysql> show variables like 'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_transient_sample_pages  | 8           |
+--------------------------------------+-------------+

 

关于这些参数的功能,下面作了一个大概的整理、收集。服务器

 

 

参数名称并发

参数意义app

innodb_stats_auto_recalc异步

是否自动触发更新统计信息。当被修改的数据超过10%时就会触发统计信息从新统计计算

innodb_stats_include_delete_marked

控制在从新计算统计信息时是否会考虑删除标记的记录。

innodb_stats_method

null值的统计方法

innodb_stats_on_metadata

操做元数据时是否触发更新统计信息

innodb_stats_persistent

统计信息是否持久化

innodb_stats_sample_pages

不推荐使用,已经被innodb_stats_persistent_sample_pages替换

innodb_stats_persistent_sample_pages

持久化抽样page

innodb_stats_transient_sample_pages

瞬时抽样page

 

 

参数innodb_stats_auto_recalc

 

 

该参数innodb_stats_auto_recalc控制是否自动从新计算统计信息,当表中数据有大于10%被修改时就会从新计算统计信息(注意,因为统计信息从新计算是在后台发生,并且它是异步处理,这个可能存在延时,不会当即触发,具体见下面介绍)。若是关闭了innodb_stats_auto_recalc,须要经过analyze table来保证统计信息的准确性。无论有没有开启全局变量innodb_stats_auto_recalc。即便innodb_stats_auto_recalc=OFF时,当新索引被增长到表中,全部索引的统计信息会被从新计算而且更新到innodb_index_stats表上。

 

 

 

下面验证一下系统变量innodb_stats_auto_recalc=OFF时,建立索引时,会触发该表全部索引从新统计计算。

 

mysql> set global innodb_stats_auto_recalc=off;
Query OK, 0 rows affected (0.00 sec)
 
mysql> show variables like 'innodb_stats_auto_recalc%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_stats_auto_recalc | OFF   |
+--------------------------+-------+
1 row in set (0.00 sec)
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_diff_pfx01 |          2 |           1 | DB_ROW_ID                         |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
 
mysql> create index ix_test_name on test(name);
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_diff_pfx01 |          2 |           1 | DB_ROW_ID                         |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | size         |          1 |        NULL | Number of pages in the index      |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_diff_pfx01 |          1 |           1 | name                              |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_diff_pfx02 |          2 |           1 | name,DB_ROW_ID                    |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | ix_test_name    | 2019-10-28 22:02:07 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)

 

下面是我另一个测试,全局变量innodb_stats_auto_recalc=ON的状况,修改表的属性STATS_AUTO_RECALC=0,而后新建索引,测试验证发现也会从新计算全部索引的统计信息。

 

mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.01 sec)
 
mysql> ALTER TABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY    | 2019-10-30 15:49:00 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
 
mysql> CREATE INDEX ix_test_name ON test(name);
Query OK, 0 rows affected (1.41 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> select * from mysql.innodb_index_stats 
    -> where database_name='MyDB' and table_name = 'test';
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name   | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | n_diff_pfx01 |          0 |           1 | id                                |
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | PRIMARY      | 2019-10-30 15:54:22 | size         |          1 |        NULL | Number of pages in the index      |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx01 |        999 |          17 | name                              |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx02 |        999 |          17 | name,id                           |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | n_leaf_pages |         17 |        NULL | Number of leaf pages in the index |
| MyDB          | test       | ix_test_name | 2019-10-30 15:54:22 | size         |         18 |        NULL | Number of pages in the index      |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)
 
mysql> 

 

 

关于统计信息从新计算延时,官方的介绍以下:

 

Because of the asynchronous nature of automatic statistics recalculation, which occurs in the background, statistics may not be recalculated instantly after running a DML operation that affects more than 10% of a table, even when innodb_stats_auto_recalc is enabled. Statistics recalculation can be delayed by few seconds in some cases. If up-to-date statistics are required immediately, run ANALYZE TABLE to initiate a synchronous (foreground) recalculation of statistics

 

 

参数innodb_stats_include_delete_marked

 

从新计算统计信息时是否会考虑删除标记的记录.

innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

 

网上有个关于innodb_stats_include_delete_marked的建议,以下所示,可是限于经验没法对这个建议鉴定真伪,我的以为仍是选择默认关闭,除非有特定场景真有这种需求。

 

·         innodb_stats_include_delete_marked建议设置开启,这样能够针对未提交事务中删除的数据也收集统计信息。

 

 

By default, InnoDB reads uncommitted data when calculating statistics. In the case of an uncommitted transaction that deletes rows from a table, delete-marked records are excluded when calculating row estimates and index statistics, which can lead to non-optimal execution plans for other transactions that are operating on the table concurrently using a transaction isolation level other than READ UNCOMMITTED. To avoid this scenario, innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

When innodb_stats_include_delete_marked is enabled, ANALYZE TABLE considers delete-marked records when recalculating statistics.

innodb_stats_include_delete_marked is a global setting that affects all InnoDB tables, and it is only applicable to persistent optimizer statistics.

innodb_stats_include_delete_marked was introduced in MySQL 5.6.34.

 

 

 

   

 

 

 

参数innodb_stats_method

 

Specifies how InnoDB index statistics collection code should treat NULLs. Possible values are NULLS_EQUAL (default), NULLS_UNEQUAL and NULLS_IGNORED

 

·         当变量设置为nulls_equal时,全部NULL值都被视为相同(即,它们都造成一个 value group)

·         当变量设置为nulls_unequal时,NULL值不被视为相同。相反,每一个NULL value 造成一个单独的 value group,大小为 1

·         当变量设置为nulls_ignored时,将忽略NULL值。

 

 

 

更多详细信息,参考官方文档InnoDB and MyISAM Index Statistics Collection,另外,还有一个系统变量myisam_stats_method控制MyISAM表对Null值的统计方法。

 

 

mysql> show variables like 'myisam_stat%';
+---------------------+---------------+
| Variable_name       | Value         |
+---------------------+---------------+
| myisam_stats_method | nulls_unequal |
+---------------------+---------------+
1 row in set (0.00 sec)

 

 

 

参数innodb_stats_on_metadata

 

 

参数innodb_stats_on_metadataMySQL 5.6.6以前的版本默认开启(默认值为O),每当查询information_schema元数据库里的表时(例如,information_schema.TABLESinformation_schema.TABLE_CONSTRAINTS .... )或show table statusSHOW INDEX..这类操做时,Innodb还会随机提取其余数据库每一个表索引页的部分数据,从而更新information_schema.STATISTICS表,并返回刚才查询的结果。当你的表很大,且数量不少时,耗费的时间就很长,以至不少常常不访问的数据也会进入Innodb_buffer_pool缓冲池中,形成池污染,关闭这个参数,能够加快对于schema库表访问,同时也能够改善查询执行计划的稳定性(对于Innodb表的访问)。因此从MySQL 5.6.6这个版本开始,此参数默认为OFF

 

注意仅当优化器统计信息配置为非持久性时,此选项才生效。这个参数开启的时候,InnoDB会更新非持久统计信息

 

 

官方文档的介绍以下:

 

innodb_stats_on_metadata

Property

Value

Command-Line Format

--innodb-stats-on-metadata[={OFF|ON}]

System Variable

innodb_stats_on_metadata

Scope

Global

Dynamic

Yes

Type

Boolean

Default Value

OFF

 

This option only applies when optimizer statistics are configured to be non-persistent. Optimizer statistics are not persisted to disk when innodb_stats_persistent is disabled or when individual tables are created or altered with STATS_PERSISTENT=0. For more information, see Section 14.8.11.2, “Configuring Non-Persistent Optimizer Statistics Parameters”.

 

When innodb_stats_on_metadata is enabled, InnoDB updates non-persistent statistics when metadata statements such as SHOW TABLE STATUS or when accessing the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables. (These updates are similar to what happens for ANALYZE TABLE.) When disabled,InnoDB does not update statistics during these operations. Leaving the setting disabled can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.

To change the setting, issue the statement SET GLOBAL innodb_stats_on_metadata=mode, where mode is either ON or OFF (or 1 or 0). Changing the setting requires privileges sufficient to set global system variables (see Section 5.1.8.1, “System Variable Privileges”) and immediately affects the operation of all connections

 

 

参数innodb_stats_persistent

 

 

此参数控制统计信息是否持久化,若是此参数启用,统计信息将会保存到mysql数据库的innodb_table_statsinnodb_index_stats表中。从MySQL 5.6.6开始,MySQL默认使用持久化的统计信息,即默认INNODB_STATS_PERSISTENT=ON Persistent optimizer statistics were introduced in MySQL 5.6.2 and were made the default in MySQL 5.6.6置此参数以后咱们就不须要实时去收集统计信息了,由于实时收集统计信息在高并发下可能会形成必定的性能上影响,而且会致使执行计划有所不一样。

 

 

  另外,咱们可使用表的建表参数(STATS_PERSISTENT,STATS_AUTO_RECALC和STATS_SAMPLE_PAGES子句)来覆盖系统变量设置的值,建表选项能够在CREATE TABLE或ALTER TABLE语句中指定。表上面指定的参数会覆盖全局变量,也就是说优先级要高于全局变量。例子以下:

 

 
mysql> ALTER TABLE test STATS_PERSISTENT=1;
Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> ALTER TABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

 

持久化统计新存储在mysql.innodb_index_stats和mysql.innodb_table_stats中,这两个表的定义以下:

 

 

innodb_table_stats

 

Column name

Description

database_name

数据库名

table_name

表名,分区名或者子分区名

last_update

统计信息最后一次更新时间戳

n_rows

表中数据行数

clustered_index_size

汇集索引page个数

sum_of_other_index_sizes

非汇集索引page个数

 

innodb_index_stats

 

Column name

Description

database_name

数据库名

table_name

表名,分区名或者子分区名

index_name

索引名

last_update

最后一次更新时间戳

stat_name

统计信息名

stat_value

统计信息不一样值个数

sample_size

采样page个数

stat_description

描述

 

 

 

非持久化(Non-persistent optimizer statistics) 存储在内存里,并在服务器关闭时丢失。某些业务和某些条件下也会按期更新统计数据。  注意,这里保存在内存指保存在哪里呢?

 

Optimizer statistics are not persisted to disk when innodb_stats_persistent=OFF or when individual tables are created or altered with STATS_PERSISTENT=0. Instead, statistics are stored in memory, and are lost when the server is shut down. Statistics are also updated periodically by certain operations and under certain conditions.

 

其实这里指保存在内层表(MEMROY TABLE),下面有简单介绍。

 

 

 

参数innodb_stats_persistent_sample_pages

 

若是参数innodb_stats_persistent设置为ON,该参数表示ANALYZE TABLE更新Cardinality值时每次采样页的数量。默认值为20个页面。innodb_stats_persistent_sample_pages太少会致使统计信息不够准确,太多会致使分析执行太慢。

 

咱们能够在建立表的时候对不一样的表指定不一样的page数量、是否将统计信息持久化到磁盘上、是否自动收集统计信息,以下所示:

 

CREATE TABLE `test` (
`id` int(8) NOT NULL auto_increment,
`data` varchar(255),
`date` datetime,
P
PRIMARY KEY  (`id`),
I
INDEX `DATE_IX` (`date`)
) ENGINE=InnoDB,
  STATS_PERSISTENT=1,
  STATS_AUTO_RECALC=1,
  STATS_SAMPLE_PAGES=25;

 

 

参数innodb_stats_sample_pages 

 

 

已弃用. 已用innodb_stats_transient_sample_pages 替代。

 

 

参数innodb_stats_transient_sample_pages

 

 

innodb_stats_transient_sample_pages控制采样pages个数,默认为8Innodb_stats_transient_sample_pages能够runtime设置

 

innodb_stats_transient_sample_pagesinnodb_stats_persistent=0的时候影响采样。注意点:

 

1.若值过小,会致使评估不许

2.若果值太大,会致使disk read增长。

3.会生产很不一样的执行计划,由于统计信息不一样。

 

 

还有一个参数information_schema_stats_expiry。这个参数的做用以下:

 

·         对于INFORMATION_SCHEMA下的STATISTICS表和TABLES表中的信息,8.0中经过缓存的方式,以提升查询的性能。能够经过设置information_schema_stats_expiry参数设置缓存数据的过时时间,默认是86400秒。查询这两张表的数据的时候,首先是到缓存中进行查询,缓存中没有缓存数据,或者缓存数据过时了,查询会从存储引擎中获取最新的数据。若是须要获取最新的数据,能够经过设置information_schema_stats_expiry参数为0或者ANALYZE TABLE操做

 

 

 

查看统计信息

 

 

统计信息分持久化(PERSISTENT)与非持久化统计数据(TRANSIENT),那么它们存储在哪里呢?

 

 

·         持久化统计数据

 

        存储在mysql.innodb_index_statsmysql.innodb_table_stats

 

·         非持久化统计数据

 

           MySQL 8.0以前,存储在information_schema.INDEXESinformation_schema.TABLES中, 那么MySQL8.0以后放在那里呢? INFORMATION_SCHEMA.TABLESINFORMATION_SCHEMA.STATISTICSINNODB_INDEXES

 

 

       官方文档非持久化统计信息放在内存中,其实就是内存表(MEMORY Table)中。

 

 

 

 

咱们能够用下面脚本查看持久化统计信息信息,mysql.innodb_index_stats的数据如何看懂,要搞懂stat_namestat_value的具体含义:

 

 

select * from mysql.innodb_index_stats 
where table_name = 'test';
 
 
select * from mysql.innodb_index_stats 
where database_name='MyDB' and table_name = 'test';

 

 

 

 

stat_name=size时:stat_value表示索引的页的数量(Number of pages in the index

 

stat_name=n_leaf_pages时:stat_value表示叶子节点的数量(Number of leaf pages in the index

 

stat_name=n_diff_pfxNN时:stat_value表示索引字段上惟一值的数量,此处作一下具体说明:

 

  *n_diff_pfxNN NN表明数字(例如: 0102等),当stat_namen_diff_pfxNN时,stat_value列值显示索引的first column(即索引的最前索引列,从索引定义顺序的第一个列开始)列的惟一值数量,例如: NN01时,stat_value列值就表示索引的第一个列的惟一值数量,当NN02时,stat_value列值就表示索引的第一和第二个列的组合惟一值数量,以此类推。 此外,在stat_name = n_diff_pfxNN的状况下,stat_description列显示一个以逗号分隔的计算索引统计信息列的列表。

 

 

 

MySQL的直方图

 

 

MySQL 8.0推出了直方图(histogram), 直方图数据存放在information_schema.column_statistics这个系统表下,每行记录对应一个字段的直方图,以json格式保存。同时,新增了一个参数histogram_generation_max_mem_size来配置创建直方图内存大小。

 

直方图是数字数据分布的准确表示。对于RDBMS,直方图是特定列内数据分布的近似值。

 

 

mysql> show variables like 'histogram_generation_max_mem_size';
+-----------------------------------+----------+
| Variable_name                     | Value    |
+-----------------------------------+----------+
| histogram_generation_max_mem_size | 20000000 |
+-----------------------------------+----------+
1 row in set (0.01 sec)
 
mysql> 
 
mysql> desc information_schema.column_statistics;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| SCHEMA_NAME | varchar(64) | NO   |     | NULL    |       |
| TABLE_NAME  | varchar(64) | NO   |     | NULL    |       |
| COLUMN_NAME | varchar(64) | NO   |     | NULL    |       |
| HISTOGRAM   | json        | NO   |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
 
mysql> 

 

 

MySQL的直方图有两种,等宽直方图和等高直方图。等宽直方图每一个桶(bucket)保存一个值以及这个值累积频率;等高直方图每一个桶须要保存不一样值的个数,上下限以及累计频率等。MySQL会自动分配用哪一种类型的直方图,有时候能够经过设置合适Buckets数量来实现。?

 

 

 

建立删除直方图

 

 

直方图数据会自动生成吗? MySQL的直方图比较特殊,不会在建立索引的时候自动生成直方图数据,须要手工执行 ANALYZE TABLE [table] UPDATE HISTOGRAM .... 这样的命令产生表上各列的直方图,默认状况下这些信息会被复制到备库。

 

 

 

ANALYZE TABLE tbl_name UPDATE HISTOGRAM ON col_name [, col_name] WITH N BUCKETS;

ANALYZE TABLE tbl_name DROP HISTOGRAM ON col_name [, col_name];

 

ANALYZE TABLE test UPDATE HISTOGRAM ON create_date,name WITH 16 BUCKETS;

 

 

注意:可指定BUCKETS的值,也能够不指定,它的取值范围为11024,若是不指定BUCKETS值的话,默认值是100

 

 

咱们测试以下,首先删除全部的直方图数据。而后使用下面SQL生成直方图数据。

 

 

ANALYZE TABLE test UPDATE HISTOGRAM ON name;
 
SELECT SCHEMA_NAME
      ,TABLE_NAME
      ,COLUMN_NAME
   ,HISTOGRAM->>'$."data-type"' AS 'DATA-TYPE'
      ,HISTOGRAM->>'$."sampling-rate"'  AS SAMPLING_RATE
      ,HISTOGRAM->>'$."last-updated"' AS LAST_UPDATED
      ,HISTOGRAM->>'$."number-of-buckets-specified"' AS NUM_BUCKETS_SPECIFIED
      ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE  TABLE_NAME = 'test';

 

 

clip_image001

 

 

其实不是全部默认的BUCKETS都是100,以下所示,若是我将记录删除,只剩下49条记录,而后建立直方图,你会看到BUCKETS的数量为49,全部这个值还跟表的数据量有关系。若是数据量较大的话,默认是100

 

 

clip_image002

 

 

另外,以下测试所示,若是BUCKETS超过1024,就会报ERROR 1690 (22003): Number of buckets value is out of range in 'ANALYZE TABLE'

 

 

mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1024 BUCKETS;
+-----------+-----------+----------+-------------------------------------------------+
| Table     | Op        | Msg_type | Msg_text                                        |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status   | Histogram statistics created for column 'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 row in set (0.13 sec)
 
mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1025 BUCKETS;
ERROR 1690 (22003): Number of buckets value is out of range in 'ANALYZE TABLE'
mysql> 

 

 

clip_image003

 

 

 

 

删除删除直方图

 

 

 

--删除字段上的统计直方图信息

ANALYZE TABLE test DROP HISTOGRAM ON create_date

 

 

mysql> ANALYZE TABLE test DROP HISTOGRAM ON name;
+-----------+-----------+----------+-------------------------------------------------+
| Table     | Op        | Msg_type | Msg_text                                        |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status   | Histogram statistics removed for column 'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 row in set (0.10 sec)

 

 

直方图信息查看

 

 

    咱们知道直方图的数据是以json格式保存的,直接将json格式展现出来,看起来很是不直观。其实有一些SQL能够解决这个问题。

 

 

SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) 
FROM information_schema.column_statistics 
WHERE TABLE_NAME='test'\G
 
 
SELECT SCHEMA_NAME
     ,TABLE_NAME
     ,COLUMN_NAME
     ,HISTOGRAM->>'$."data-type"' AS 'DATA-TYPE'
     ,HISTOGRAM->>'$."sampling-rate"'  AS SAMPLING_RATE
     ,HISTOGRAM->>'$."last-updated"' AS LAST_UPDATED
     ,HISTOGRAM->>'$."histogram-type"' AS HISTOGRAM_TYPE
     ,HISTOGRAM->>'$."number-of-buckets-specified"' AS NUM_BUCKETS_SPECIFIED
     ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE  TABLE_NAME = 'test';
 
 
SELECT FROM_BASE64(SUBSTRING_INDEX(v, ':', -1)) value, concat(round(c*100,1),'%') cumulfreq, 
       CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq  
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
WHERE schema_name  = 'MyDB' and table_name = 'test' and column_name = 'name';
 
 
 
SELECT v value, concat(round(c*100,1),'%') cumulfreq, 
       CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq  
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
WHERE schema_name  = 'MyDB' and table_name = 'test' and column_name = 'name';

 

 

 

 

更新统计信息

 

非持久统计统计信息也会触发自动更新,非持久化统计信息在如下状况会被自动更新,官方文档介绍以下:

 

Non-persistent optimizer statistics are updated when:
 
Running ANALYZE TABLE.
 
Running SHOW TABLE STATUS, SHOW INDEX, or querying the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables with theinnodb_stats_on_metadata option enabled.
The default setting for innodb_stats_on_metadata is OFF. Enabling innodb_stats_on_metadata may reduce access speed for schemas that have a large number of tables or indexes, and reduce stability of execution plans for queries that involve InnoDB tables. innodb_stats_on_metadata is configured globally using a SETstatement.
SET GLOBAL innodb_stats_on_metadata=ON
Note
innodb_stats_on_metadata only applies when optimizer statistics are configured to be non-persistent (when innodb_stats_persistent is disabled).
 
Starting a mysql client with the --auto-rehash option enabled, which is the default. The auto-rehash option causes all InnoDB tables to be opened, and the open table operations cause statistics to be recalculated.
To improve the start up time of the mysql client and to updating statistics, you can turn off auto-rehash using the --disable-auto-rehash option. The auto-rehashfeature enables automatic name completion of database, table, and column names for interactive users.
 
A table is first opened.
 
InnoDB detects that 1 / 16 of table has been modified since the last time statistics were updated.

 

 

 简单整理以下:

 

 

1 执行ANALYZE TABLE

 

2 innodb_stats_on_metadata=ON状况下,执SHOW TABLE STATUS, SHOW INDEX, 查询 INFORMATION_SCHEMA下的TABLES, STATISTICS

 

3 启用--auto-rehash功能状况下,使用mysql client登陆

 

4 表第一次被打开

 

5 距上一次更新统计信息,表1/16的数据被修改

 

 

持久统计信息的统计信息更新上面已经有介绍,还有一种方法就是手动更新统计信息,

 

 

 

一、手动更新统计信息,注意执行过程当中会加读锁:

 

ANALYZE TABLE TABLE_NAME;

 

二、若是更新后统计信息仍不许确,可考虑增长表采样的数据页,两种方式能够修改:

 

1) 全局变量INNODB_STATS_PERSISTENT_SAMPLE_PAGES,默认为20;

 

2) 单个表能够指定该表的采样:

ALTER TABLE TABLE_NAME STATS_SAMPLE_PAGES=100;

 

经测试,此处STATS_SAMPLE_PAGES的最大值是65535,超出会报错。

 

mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65535;
 
Query OK, 0 rows affected (0.12 sec)
 
Records: 0  Duplicates: 0  Warnings: 0
 
 
 
mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65536;
 
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '65536' at line 1
 
mysql>

 

 

 

参考资料:

 

https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html

https://dev.mysql.com/doc/refman/8.0/en/index-statistics.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-performance-optimizer-statistics.html

https://www.percona.com/blog/2019/10/29/column-histograms-on-percona-server-and-mysql-8-0/  重点

http://chinaunix.net/uid-31396856-id-5787793.html

https://mysqlserverteam.com/histogram-statistics-in-mysql/

https://mp.weixin.qq.com/s/698g5lm9CWqbU0B_p0nLMw?

相关文章
相关标签/搜索