统计一个表的数据量是常常遇到的需求,可是不一样的表设计及不一样的写法,统计性能差异会有较大的差别,下面就简单经过实验进行测试(你们测试的时候注意缓存的状况,不然影响测试结果)。mysql
为了后续测试工做的进行,先准备几张用于测试的表及数据,为了使测试数据具备参考意义,建议测试表的数据量大一点,以避免查询时间过小,所以,能够继续使用以前经常使用的连续数生成大法,以下:sql
/* 建立连续数表 */ CREATE TABLE nums(id INT primary key); /* 生成连续数的存储过程,优化事后的 */ DELIMITER $$ CREATE PROCEDURE `sp_createNum`(cnt INT ) BEGIN DECLARE i INT DEFAULT 1; TRUNCATE TABLE nums; INSERT INTO nums SELECT i; WHILE i < cnt DO BEGIN INSERT INTO nums SELECT id + i FROM nums WHERE id + i<=cnt; SET i = i*2; END; END WHILE; END$$ DELIMITER ;
生成数据,本次准备生成1kw条记录数据库
/* 调用存储过程 */ mysql> call sp_createNum(10000000); Query OK, 1611392 rows affected (32.07 sec)
若是逐条循环,那时间至关长,你们能够自行测试,参考连接 效率提高16800倍的连续整数生成方法缓存
生成3张表innodb表,以下:微信
nums_1表只有字符串主键字段函数
/* 生成只有一个字符串类型字段主键的表nums_1 */ mysql> create table nums_1 (p1 varchar(32) primary key ) engine=innodb; Query OK, 0 rows affected (0.01 sec) /* 导入数据,将id经过md5函数转换为字符串 */ mysql> insert into nums_1 select md5(id) from nums; Query OK, 10000000 rows affected (1 min 12.63 sec) Records: 10000000 Duplicates: 0 Warnings: 0
nums_2表有5个字段 ,其中主键为字符串类型字段的p1,其余字段为整型的id,非空的c1,可为空的c2,可为空的c3。性能
其中c1,c2字段内容彻底一致,差异是字段约束不同(c1不可为空,c2可为空),c3与c1,c2的差异在于c1中aa开头的值在c3中为null,其余内容同样。测试
/* 建立表nums_2 */ mysql> create table nums_2(p1 varchar(32) primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb; Query OK, 0 rows affected (1.03 sec) /*导入数据 */ mysql> insert into nums_2(id,p1,c1,c2,c3) select id,md5(id),left(md5(id),10),left(md5(id),10),if(,left(md5(id),10) like 'aa%',null,,left(md5(id),10)) from nums; Query OK, 10000000 rows affected (5 min 6.68 sec) Records: 10000000 Duplicates: 0 Warnings: 0
nums_3表的内容与nums_2彻底同样,区别在于主键字段不同,c3表为整型的id优化
/* 建立表nums_3 */ mysql> create table nums_3(p1 varchar(32) ,id int primary key ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb; Query OK, 0 rows affected (0.01 sec) /* 由于内容彻底一致,直接从nums_2 中导入 */ mysql> insert into nums_3 select * from nums_2; Query OK, 10000000 rows affected (3 min 18.81 sec) Records: 10000000 Duplicates: 0 Warnings: 0
再建立一张MyISAM的表,表结构及内容均与nums_2也一致,只是引擎为MyISAM。spa
/* 建立MyISAM引擎的nums_4表*/ mysql> create table nums_4(p1 varchar(32) not null primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=MyISAM; Query OK, 0 rows affected (0.00 sec) /* 直接从nums_2表导入数据 */ mysql> insert into nums_4 select * from nums_2; Query OK, 10000000 rows affected (3 min 16.78 sec) Records: 10000000 Duplicates: 0 Warnings: 0
查询一张表的数据量有以下几种:
查询大体数据量,能够查统计信息,2.1中会介绍具体方法
精确查找数据量,则能够经过count(主键字段),count(*), count(1) [这里的1能够替换为任意常量]
若是只是查一张表大体有多少数据,尤为是很大的表 只是查询其表属于什么量级的(百万、千万仍是上亿条),能够直接查询统计信息,查询方式有以下几种:
查询索引信息,其中Cardinality 为大体数据量(查看主键PRIMARY行的值,若是为多列的复合主键,则查看最后一列的Cardinality 值)
mysql> show index from nums_2; +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | nums_2 | 0 | PRIMARY | 1 | p1 | A | 9936693 | NULL | NULL | | BTREE | | | +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set (0.00 sec)
查看表状态,其中Rows为大体数据量
mysql> show table status like 'nums_2'; +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+ | nums_2 | InnoDB | 10 | Dynamic | 9936693 | 111 | 1105182720 | 0 | 2250178560 | 4194304 | NULL | 2020-04-04 19:31:34 | NULL | NULL | utf8_general_ci | NULL | | | +--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+ 1 row in set (0.00 sec)
直接查看STATISTICS或TABLES表,内容与查看索引信息或表状态相似,其中TABLE_ROWS的内容为大体的数据量
mysql> select * from information_schema.tables where table_schema='testdb' and table_name like 'nums_2'; +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+ | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+ | def | testdb | nums_2 | BASE TABLE | InnoDB | 10 | Dynamic | 9936693 | 111 | 1105182720 | 0 | 2250178560 | 4194304 | NULL | 2020-04-04 19:31:34 | NULL | NULL | utf8_general_ci | NULL | | | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+ 1 row in set (0.00 sec)
注意:
mysql> select * from information_schema.tables where table_schema='testdb' and table_name like 'nums_4'; +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+ | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+ | def | testdb | nums_4 | BASE TABLE | MyISAM | 10 | Dynamic | 10000000 | 75 | 759686336 | 281474976710655 | 854995968 | 0 | NULL | 2020-04-04 19:20:23 | 2020-04-04 19:21:45 | 2020-04-04 19:23:45 | utf8_general_ci | NULL | | | +---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+ 1 row in set (0.00 sec)
由于2.1中innodb的表查询的结果都是统计值,非准备值,实际工做中大多数状况下须要统计精确值,那么查询精确值的方法有以下几种,且全部引擎的表都适用。
count(主键)
mysql> select count(p1) from nums_2; +-----------+ | count(p1) | +-----------+ | 10000000 | +-----------+ 1 row in set (1.60 sec)
count(1)
其中的1能够是任意常量,例如 count(2),count('a‘)等
mysql> select count(1) from nums_2; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (1.45 sec)
count(*)
mysql> select count(*) from nums_2; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (1.52 sec)
对比 count(主键) count(1) count(*) count(非空字段) count(可为空字段) 性能对比
若是想精确查询一张MyISAM表的数据量,使用 count(主键) count(1) count(*) 效率均一致,直接查出准确结果,耗时几乎为0s
mysql> select count(p1) from nums_4; +-----------+ | count(p1) | +-----------+ | 10000000 | +-----------+ 1 row in set (0.00 sec) mysql> select count(1) from nums_4; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from nums_4; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (0.00 sec)
执行计划也均一致,能够看出没有经过主键或其余索引扫描的方式统计
mysql> explain select count(*) from nums_4; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain select count(p1) from nums_4; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain select count(1) from nums_4; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ 1 row in set, 1 warning (0.00 sec)
小结:
MyISAM的方法查整表数据量效率状况为 count(主键)= count(1) = count(*)
查询部分数据的时候则没法直接从统计信息获取,所以耗时状况大体以下:
mysql> select count(p1) from nums_4 where p1 like 'aa%'; +-----------+ | count(p1) | +-----------+ | 39208 | +-----------+ 1 row in set (0.14 sec) mysql> select count(1) from nums_4 where p1 like 'aa%'; +----------+ | count(1) | +----------+ | 39208 | +----------+ 1 row in set (0.13 sec) mysql> select count(*) from nums_4 where p1 like 'aa%'; +----------+ | count(*) | +----------+ | 39208 | +----------+ 1 row in set (0.13 sec)
执行计划其实均同样:
mysql> explain select count(1) from nums_4 where p1 like 'aa%'; +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+ | 1 | SIMPLE | nums_4 | NULL | range | PRIMARY | PRIMARY | 98 | NULL | 42603 | 100.00 | Using where; Using index | +----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+ 1 row in set, 1 warning (0.00 sec)
小结: MyISAM引擎表统计部分数据的时候直接得出数据量,也许扫描数据进行统计,几种写法效率相近。
innodb引擎由于要支持MVCC,所以不能整表数据量持久化保存,每次查询均需遍历统计,可是不一样的写法,查询效率是有差异的,后面将进行不一样维度进行对比。
经过 count(主键),count(1) , count(*) 对比查询效率
mysql> select count(p1) from nums_2 ; +-----------+ | count(p1) | +-----------+ | 10000000 | +-----------+ 1 row in set (1.68 sec) mysql> select count(1) from nums_2 ; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (1.37 sec) mysql> select count(*) from nums_2 ; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (1.38 sec)
简单的对比发现,查询性能结果为 count(主键) < count(1) ≈ count(*)
可是查看执行计划都是以下状况
mysql> explain select count(p1) from nums_2; +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | nums_2 | NULL | index | NULL | PRIMARY | 98 | NULL | 9936693 | 100.00 | Using index | +----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec
可是查询效率不同,缘由在于统计的方式不同,以下:
nums_2与nums_3内容相同,区别在于num_3的主键字段是整型的id字段,如今对比主键字段不一样时查询性能的差异,
mysql> select /* SQL_NO_CACHE */count(1) from nums_2; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (2.02 sec) mysql> select /* SQL_NO_CACHE */count(1) from nums_3; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (1.69 sec)
测试发现,相同内容数据的表表主键不一样,性能存在差别,且,查询时主键(索引)字段类型小的时候查询效率更好。
注:若是nums_2的id字段上添加索引后,会发现查询会走id的索引,缘由在于主键索引(汇集索引)的类型是varchar(32),而id是int,索引的大小不同,走整型的索引IO开销会少。
所以,建议MySQL的主键使用自增id做为主键(优点不只在数据统计上,有机会在讲解)。
准备工做中的nums_1 与nums_3差异在于主键都是整型的id 可是nums_3的字段更多,也就是说表更大,查询效率对好比下:
mysql> select /* SQL_NO_CACHE */count(1) from nums_1; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (1.61 sec) mysql> select /* SQL_NO_CACHE */count(1) from nums_3; +----------+ | count(1) | +----------+ | 10000000 | +----------+ 1 row in set (1.67 sec)
查询时间仅供参考,取决于机器性能。
因而可知表大小不一样,查询效率也不一样,表越小查询效率越高。
由于nums_3表的c2字段容许为空,可是内容均不为空,c3字段容许为空,可是存在内容为空的状况。如今将nums_3表的c2,c3字段分别统计,查看结果(先添加索引,提升查询性能)
mysql> select count(c2) from nums_3 ; +-----------+ | count(c2) | +-----------+ | 10000000 | +-----------+ 1 row in set (1.69 sec) mysql> select count(c3) from nums_3 ; +-----------+ | count(c3) | +-----------+ | 9960792 | +-----------+ 1 row in set (1.73 sec)
由于c3字段有存在null的值,索引 统计c3行数的时候会忽略null值的行。
以上经过对比MyISAM引擎及InnoDB引擎表经过不一样写法的统计效率进行对比,能够获得以下结论:
其实经过准备工做中的的几张表还能够作更多的测试,感兴趣的同窗能够自行测试(啰嗦一句,注意缓存,哈哈),也可关注微信公众号【数据库干货铺】进入技术交流群及时沟通,谢谢。